Every time a scientist runs an experiment, or a social scientist does a survey, or a humanities scholar analyzes a text, they generate data. Science runs on data – without it, we wouldn’t have the James Webb Space Telescope’s stunning images, disease-preventing vaccines or an evolutionary tree that traces the lineages of all life.
This scholarship generates an unimaginable amount of data – so how do researchers keep track of it? And how do they make sure that it’s accessible for use by both humans and machines?
To improve and advance science, scientists need to be able to reproduce others’ data or combine data from multiple sources to learn something new.
Any kind of sharing requires management. If your neighbor needs to borrow a tool or an ingredient, you have to know whether you have it and where you keep it. Research data might be on a graduate student’s laptop, buried in a professor’s USB collection or saved more permanently within an online data repository.
I’m an information scientist who studies other scientists. More precisely, I study how scientists think about research data and the ways that they interact with their own data and data from others. I also teach students how to manage their own or others’ data in ways that advance knowledge.
Research data management
Research data management is an area of scholarship that focuses on data discovery and reuse. As a field, it encompasses research data services, resources and cyberinfrastructure. For example, one type of infrastructure, the data repository, gives researchers a place to deposit their data for long-term storage so that others can find it. In short, research data management encompasses the data’s life cycle from cradle to grave to reincarnation in the next study.
Proper research data management also allows scientists to use the data already out there rather than recollecting data that already exists, which saves time and resources.
With increasing science politicization, many national and international science organizations have upped their standards for accountability and transparency. Federal agencies and other major research funders like the National Institutes of Health now prioritize research data management and require researchers to have a data management plan before they can receive any funds.
Scientists and data managers can work together to redesign the systems scientists use to make data discovery and preservation easier. In particular, integrating AI can make this data more accessible and reusable.
Artificially intelligent data management
Many of these new standards for research data management also stem from an increased use of AI, including machine learning, across data-driven fields. AI makes it…