Why sharing research data is important
Sharing data is necessary within the research team, which can be complicated by far reaching collaborations and types of data requiring secure storage. On a global level, sharing data increases data use by encouraging transparency, enabling reproducibility of results, and informing the larger scientific community.
In 2016, the FAIR Guiding Principles were published in Nature. These are guidelines to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets. These guidelines are important to follow when sharing research data products.
While sharing can be as simple as making data accessible to anyone who needs it, publishing is a more deliberate act of preparing a data set for transparency and public accessibility.
Sharing data and publishing research results have become more than a professional expectation. Along with other federal funding agencies, the NIH now requires data sharing to be included in any data management and sharing plan.
The 2023 NIH Data Management and Sharing Policy
Previously, the NIH only required grants with $500,000 per year or more in direct costs to provide a brief explanation of how and when data resulting from the grant would be shared.
The 2023 NIH policy is entirely new. Beginning January 25, 2023, ALL grant applications or renewals that generate Scientific Data must include a robust and detailed plan for managing and sharing data during the entire funded period. This includes information on data storage, access policies/procedures, preservation, metadata standards, distribution approaches, and more. You must provide this information in a data management and sharing plan (DMSP). The DMSP is similar to what other funders call a data management plan (DMP).
The DMSP will be assessed by NIH Program Staff (though peer reviewers will be able to comment on the proposed data management budget). The Institute, Center, or Office (ICO)-approved plan becomes a Term and Condition of the Notice of Award.
More information about the 2023 NIH Data Management and Sharing Policy can be found in the ASU Library Libguides.
Publishing Research Data
Many funding agencies and academic journals now require that the data associated with your research will ultimately be made publicly accessible when the research is complete. Publication of your data is expected either upon publication of your article or within a reasonable time after the research concludes.
Data products may include, but are not limited to:
- raw data collected during fieldwork (pre-processing)
- models (code, batch files, etc.) used to process these data
- spatial data files
- classified or processed remotely sensed images
- supporting documentation (additional notes, photos, etc.)
- interviews and codebooks
- laboratory analysis outputs (text files, .csv)
Research Data Repositories
There are many data repositories currently serving the research community and it is worth checking with your funding source to see if they have requirements for where the data are published and archived.
ASU Research Data Repository
Researchers can make datasets accessible and discoverable in ASU’s Research Data Repository provided by the ASU Library. This repository is for ASU-affiliated researchers share, store, preserve, cite, explore, and make research data accessible and discoverable. Research datasets can be directly downloaded, referenced through metadata, or analyzed via 3rd party applications.
- Datasets may be persistently cited
- Curation and preservation of research datasets and supplementary output
The Research Data Repository User Guide provides further information and instructions on how to get started and what you will need to do to prepare your data for publishing.
The repository supports the publication/reuse phase of the research data lifecycle and complements ASU’s KEEP institutional repository to present a consolidated picture of ASU’s scholarly activities.
Submit a request with the ASU Library to share and connect your publications and research data or for other questions related to scholarly publishing.
Completed metadata should be uploaded along with the data files to your repository of choice.
The receiving repository will assign a Digital Object Identifier (DOI) as part of the publication process. A DOI is a persistent identifier which will allow you to link to your data as well as see where it has been used and cited by others.
If you need to archive and publish data that fall outside of the ASU Library’s collection scope, a disciplinary repository may be an alternative solution. You can identify a suitable disciplinary repository via the Registry of Research Repositories, which hosts a searchable database of data repositories.
Disciplinary research data repositories
In many cases, you will need to work with a subject-specific or disciplinary-based repository due to funder, or publisher mandates. It’s also a good idea to have your data where your community will find it. You can identify a suitable disciplinary repository via the Registry of Research Repositories, which hosts a searchable database of data repositories. See the Library’s guide on Publishing Research Datasets in Disciplinary Repositories for more tips and information.
Data Publishing Considerations
Classes of data
Most datasets generated by ASU research projects will fall into this category. These data are considered shareable without restrictions, having been produced solely from local research activities or analysis using other publicly available data.
These data should be documented and made freely available within a reasonable time after collection. This timetable may be dictated by the funding agency or a journal if they relate to a submitted peer-reviewed manuscript.
Restricted data are exceptional datasets that are available only with permission from the PI/investigator(s) or limited to an approved population. These are rare in occurrence and justification for restrictions should be documented by the lead PI and project team. Examples of restricted data include:
- Datasets that contain Personal Identifiable Information (PII), although an anonymized subset of data may be published without restriction.
- Datasets in which some or all of the data are subject to copyright restrictions imposed by non-ASU institutions.
- Datasets in which some or all of the data are subject to licensing restrictions such as purchased satellite data.
Preparing Data for Publication and Sharing
Metadata is simply the data describing the data. It provides information necessary to evaluate the quality of the data and to make use of it in future research activities. In an ideal world metadata would always be prepared in a machine-readable format. Some tools exist to assist with that process, but they may be associated with a particular metadata standard or type of data. If that is not an option, you can still document your data in simple metadata forms.
Create metadata that contains information about the project, study subject, and time period. This is the type of information that might be common to your overall research study. This information potentially describes the data package (dataset) you will publish even though this data package may consist of multiple items.
Each type of data below will have a unique set of metadata that provides the description of the fields contained in each.
By default, we recommend that tabular data are published in .csv format. Using this highly portable data format ensures that the data are accessible to users without the need for specific software environments. These types of data products include spreadsheets, databases, and machine outputs such as sensor data.
Metadata for tabular data is often another table or data dictionary that describes each field and any corresponding units for analysis.
Spatial data are normally published in proprietary format, the general standard being the shape and layer file formats used by ESRI products. These are readily imported into other GIS products such a QGIS.
ArcCatalog allows you to enter metadata for each layer in Federal Geographic Data Committee (FGDC) or ISO (recommended) format.
Imagery (whether camera or remotely sensed) can be published in one of the many common image file formats (i.e. .TIFF, .JPG, .PNG, or .IMG).
Each image file essentially contains tabular data with discreet values. The data table form can be used to describe this type of file.