Research is becoming increasingly more reliant on computational workflows, such as combining diverse datasets, processing big data, running computational models, and organizing information across geographically dispersed and interdisciplinary teams. Because the scientific process is anchored on reproducibility and peer review, funding organizations and journals have adjusted their strategies to accommodate the new landscape. It is now common for proposals to include data management plans and for journals to require data packages. All of this means that proper data management skills are necessary to conduct good research. However, many researchers are not familiar and/or comfortable with such computational workflows. To better understand these issues, we discussed the barriers to proper data management in the sciences and how to overcome them.
Our discussion began with potential barriers to proper data management:
- Researchers lack formal training
- Data management is not part of the established culture in the laboratory
- Steep learning curve; certain baseline skills are required that older generations may lack
- A lack of a perceived value resulting in little to no buy-in
- Researchers already need to be master communicators, statisticians, educators, and now data managers – it’s too much
We then discussed ways that CAF can assist in fostering a culture of good data management:
- Offer training such as Data Carpentry and CAF-specific workflows
- Provide plug-and-play solutions for those not interested in using their own systems
- Provide best practices for those who have some established systems
- Better communicate value of good data management practices
By the end of the discussion, it was clear what CAF needs to do. We agreed that we should let established PIs off the hook and only ask them to appreciate the value of data management by understanding the complexity and time required to do it properly. Students, technicians, and post-docs, those wonderful researchers in the trenches, need to know how to put data management in practice. To this end, CAF plans to offer a workshop at the beginning of every Fall semester to interested PIs and their researchers. We will split the workshop into two components; the first will provide theory and brief overviews of proper data management, the second will get into the weeds to put theory in practice. Because Data Carpentry and others offer general skill-sets, we will focus on CAF-specific tooling and guidelines. This will allow us to get feedback from collaborators as well, to better adapt our methods to help them. Shortly after the workshop, the CAF data manager will follow up with collaborators on their own turf once they have had a chance to apply what was discussed during training. Data management is complicated, so this will allow new researchers to ask follow-up questions and fine-tune their craft. Finally, to foster a culture of good data practices, the CAF data manager will periodically visit collaborating labs for an open discussion and the labs will be invited to join us during our DataCAFe talks.