28 Oct 2021
by Richa
Share
tweet

Data management planning

Our aim should be to make our research results (publications, methodology, data analysis etc.) and the process to be made publicly accessible and usable. For that we need a plan for careful documentation and organization of our data. This document lists the key aspects of data management and what practices are important.

Data types and sources

Our research employs many low and high-throughput methodologies, including quantitative microscopy, proteomics, and sequencing. It is imperative that all lab members maintain good notekeeping. All raw, processed, and analysed data should be cross-referenced with experimental notes, documented, stored, and organized in logical file structures, version controlled, and backed up in hard drives and online repositories.

Formats and standards

Whenever possible, we should strive to use non-proprietary file formats for sharing and archiving to maximize the potential for reuse and longevity of the data. When analyzing raw and processed data from other researchers, all analysis pipelines and metadata should be well documented with appropriate ‘readme files’ so that the analysis is understandable and usable by others.

Roles and responsibilities

It is the primary responsibility of the trainee to ensure their data is well documented and regularly updated. The working data and metadata and presentation summaries should be regularly uploaded on the lab drive. When working collaboratively, how the working data will be accessed, managed and shared should be clearly discussed. Published datasets will be made available on online repositories with unique DOI identifiers. Some archiving options include Gene Expression Omnibus, PRIDE archive, GitHub, Dryad, Zenodo, and Figshare.

Best practices

Cornell’s Data Management Planning & Services provides a comprehensive list of best practices for data acquisition, storage, and dissemination.