Data sharing


Data sharing

Data sharing is the practice of making data used for scholarly research available to other investigators. Replication has a long history in science. The motto of The Royal Society is 'Nullius in verba', translated "Take no man's word for it."[1] Many funding agencies, institutions, and publication venues have policies regarding data sharing because transparency and openness are considered by many to be part of the scientific method.

A number of funding agencies and science journals require authors of peer-reviewed papers to share any supplemental information (raw data, statistical methods or source code) necessary to understand, develop or reproduce published research. A great deal of scientific research is not subject to data sharing requirements, and many of these policies have liberal exceptions. In the absence of any binding requirement, data sharing is at the discretion of the scientists themselves. In addition, in certain situations agencies and institutions prohibit or severely limit data sharing to protect proprietary interests, national security, and subject/patient/victim confidentiality. Data sharing (especially photographs and graphic descriptions of animal research) may also be restricted to protect institutions and scientists from misuse of data for political purposes by animal rights extremists.

Data and methods may be requested from an author years after publication. In order to encourage data sharing and prevent the loss or corruption of data, a number of funding agencies and journals established policies on data archiving. Access to publicly archived data is a recent development in the history of science made possible by technological advances in communications and information technology.

Despite policies on data sharing and archiving, data withholding still happens. Authors may fail to archive data or they only archive a portion of the data. Failure to archive data alone is not data withholding. When a researcher requests additional information, an author sometimes refuses to provide it[2]. When authors withhold data like this, they run the risk of losing the trust of the science community.[3]

Contents

U.S. government policies

Federal law

On August 9, 2007, President Bush signed the "America COMPETES Act" (or the "America Creating Opportunities to Meaningfully Promote Excellence in Technology, Education, and Science Act") requiring civilian federal agencies to provide guidelines, policy and procedures, to facilitate and optimize the open exchange of data and research between agencies, the public and policymakers. See Section 1009.[4]

NIH data sharing policy

‘The National Institutes of Health (NIH) Grants Policy Statement defines “data” as “recorded information, regardless of the form or medium on which it may be recorded, and includes writings, films, sound recordings, pictorial reproductions, drawings, designs, or other graphic representations, procedural manuals, forms, diagrams, work flow charts, equipment descriptions, data files, data processing or computer programs (software), statistical records, and other research data.”’
— Council on Governamental Relations[5]

The NIH Final Statement of Sharing of Research Data says:

‘NIH reaffirms its support for the concept of data sharing. We believe that data sharing is essential for expedited translation of research results into knowledge, products, and procedures to improve human health. The NIH endorses the sharing of final research data to serve these and other important scientific goals. The NIH expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers. ‘NIH recognizes that the investigators who collect the data have a legitimate interest in benefiting from their investment of time and effort. We have therefore revised our definition of “the timely release and sharing” to be no later than the acceptance for publication of the main findings from the final data set. NIH continues to expect that the initial investigators may benefit from first and continuing use but not from prolonged exclusive use.’

NSF Policy from Grant General Conditions

36. Sharing of Findings, Data, and Other Research Products

a. NSF …expects investigators to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections and other supporting materials created or gathered in the course of the work. It also encourages awardees to share software and inventions or otherwise act to make the innovations they embody widely useful and usable.

b. Adjustments and, where essential, exceptions may be allowed to safeguard the rights of individuals and subjects, the validity of results, or the integrity of collections or to accommodate legitimate interests of investigators.
— “National Science Foundation: Grant General Conditions (GC-1)”, April 1, 2001 (p. 17).

Journal policies

The American Naturalist

The American Naturalist requires authors to deposit the data associated with accepted papers in a public archive. For gene sequence data and phylogenetic trees, deposition in GenBank or TreeBASE, respectively, is required. There are many possible archives that may suit a particular data set, including the Dryad repository for ecological and evolutionary biology data. All accession numbers for GenBank, TreeBASE, and Dryad must be included in accepted manuscripts before they go to Production. If the data is deposited somewhere else, please provide a link. If the data is culled from published literature, please deposit the collated data in Dryad for the convenience of your readers. Any impediments to data sharing should be brought to the attention of the editors at the time of submission so that appropriate arrangements can be worked out. For more, see the editorial on data.

Journal of Heredity

The primary data underlying the conclusions of an article are critical to the verifiability and transparency of the scientific enterprise, and should be preserved in usable form for decades in the future. For this reason, Journal of Heredity requires that newly reported nucleotide or amino acid sequences, and structural coordinates, be submitted to appropriate public databases (e.g., GenBank; the EMBL Nucleotide Sequence Database; DNA Database of Japan; the Protein Data Bank ; and Swiss-Prot). Accession numbers must be included in the final version of the manuscript. For other forms of data (e.g., microsatellite genotypes, linkage maps, images), the Journal endorses the principles of the Joint Data Archiving Policy (JDAP) in encouraging all authors to archive primary datasets in an appropriate public archive, such as Dryad, TreeBASE, or the Knowledge Network for Biocomplexity. Authors are encouraged to make data publicly available at time of publication or, if the technology of the archive allows, opt to embargo access to the data for a period up to a year after publication. The American Genetic Association also recognizes the vast investment of individual researchers in generating and curating large datasets. Consequently, we recommend that this investment be respected in secondary analyses or meta-analyses in a gracious collaborative spirit.

Molecular Ecology

Molecular Ecology expects that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, Gene Expression Omnibus, TreeBASE, Dryad, the Knowledge Network for Biocomplexity, your own institutional or funder repository, or as Supporting Information on the Molecular Ecology web site. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species.

Nature

After publication, readers who encounter a persistent refusal by the authors to comply with these guidelines should contact the chief editor of the Nature journal concerned, with “materials complaint” and publication reference of the article as part of the subject line. In cases where editors are unable to resolve a complaint, the journal reserves the right to refer the correspondence to the author's funding institution and/or to publish a statement of formal correction, linked to the publication, that readers have been unable to obtain necessary materials or reagents to replicate the findings.
— “Availability of Data and Materials: The Policy of Nature Magazine”.

Royal Society Publishing

"As a condition of acceptance authors agree to honour any reasonable request by other researchers for materials, methods, or data necessary to verify the conclusion of the article. Supplementary data up to 10Mb is placed on the Society's website free of charge and is publicly accessible. Large datasets must be deposited in a recognised public domain database by the author prior to submission. The accession number should be provided for inclusion in the published article."

Office of Research Integrity

Allegations of misconduct in medical research carry severe consequences. The United States Department of Health and Human Services established an office to oversee investigations of allegations of misconduct, including data withholding. The website defines the mission:

“The Office of Research Integrity (ORI) promotes integrity in biomedical and behavioral research supported by the U.S. Public Health Service (PHS) at about 4,000 institutions worldwide. ORI monitors institutional investigations of research misconduct and facilitates the responsible conduct of research (RCR) through educational, preventive, and regulatory activities.”

Ideals in data sharing

Some research organizations feel particularly strongly about data sharing. Stanford University's WaveLab has a philosophy about reproducible research and disclosing all algorithms and source code necessary to reproduce the research. In a paper titled "WaveLab and Reproducible Research," the authors describe some of the problems they encountered in trying to reproduce their own research after a period of time. In many cases, it was so difficult they gave up the effort. These experiences are what convinced them of the importance of disclosing source code.[7] The philosophy is described:

The idea is: An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.[8]


The Data Observation Network for Earth (DataONE) and Data Conservancy[9] are projects supported by the National Science Foundation to encourage and facilitate data sharing among research scientists and better support meta-analysis. In environmental sciences, the research community is recognizing that major scientific advances involving integration of knowledge in and across fields will require that researchers overcome not only the technological barriers to data sharing but also the historically entrenched institutional and sociological barriers[10]. Dr. Richard J. Hodes, director of the National Institute on Aging has stated, “the old model in which researchers jealously guarded their data is no longer applicable".[11]


The Alliance for Taxpayer Access is a group of organizations that support open access to government sponsored research. The group has expressed a "Statement of Principles" explaining why they believe open access is important.[12] They also list a number of international public access policies.[13]

International policies

Data sharing problems

Academic genetics

Withholding of data has become so commonplace in academic genetics that researchers at Massachusetts General Hospital published a journal article on the subject. The study found that “Because they were denied access to data, 28% of geneticists reported that they had been unable to confirm published research.”[14]

Scientists in training

A study of scientists in training indicated many had already experienced data withholding.[15] This study has given rise to the fear the future generation of scientists will not abide by the established practices.

Differing approaches in different fields

Requirements for data sharing are more commonly imposed by institutions, funding agencies, and publication venues in the medical and biological sciences than in the physical sciences. Requirements vary widely regarding whether data must be shared at all, with whom the data must be shared, and who must bear the expense of data sharing.

Funding agencies such as the NIH and NSF tend to require greater sharing of data, but even these requirements tend to acknowledge the concerns of patient confidentiality, costs incurred in sharing data, and the legitimacy of the request. Private interests and public agencies with national security interests (defense and law enforcement) often discourage sharing of data and methods through non-disclosure agreements.

References

  1. ^ History of the Royal Society webpage accessed June 13, 2011
  2. ^ Savage CJ, Vickers AJ, 2009 Empirical Study of Data Sharing by Authors Publishing in PLoS Journals. PLoS ONE 4(9): e7078. doi:10.1371/journal.pone.0007078
  3. ^Publication and Openness,” chapter from “On Being A Scientist: Responsible Conduct in Research”, National Academy of Sciences.
  4. ^ "America COMPETES Act
  5. ^Access to and retention of research data: Rights and responsibilities”, p. 5. Council on Governmental Relations, March 2006.
  6. ^NIH Data Sharing Policy.”
  7. ^ WaveLab and Reproducible Research by Jonathan B. Buckheit and David L. Donoho
  8. ^ WaveLab850 website
  9. ^ [1]
  10. ^ Reichman,O.J., Jones, M.B., and Schildhauer, M.P. 2011. Challenges and Opportunities of Open Data in Ecology. Science 331(6018): 703-705.[DOI:10.1126/science.1197962]
  11. ^ NY Times article about value of shared data to Alzheimers research
  12. ^ The Alliance for Taxpayer Access website
  13. ^ Worldwide momentum for public access to publicly funded research
  14. ^ Campbell EG, Clarridge BR, Gokhale M, et al. (2002). "Data withholding in academic genetics: evidence from a national survey". JAMA 287 (4): 473–80. doi:10.1001/jama.287.4.473. PMID 11798369. http://jama.ama-assn.org/cgi/pmidlookup?view=long&pmid=11798369. 
  15. ^ Vogeli C, Yucel R, Bendavid E, et al. (February 2006). "Data withholding and the next generation of scientists: results of a national survey". Acad Med 81 (2): 128–36. doi:10.1097/00001888-200602000-00007. PMID 16436573. http://meta.wkhealth.com/pt/pt-core/template-journal/lwwgateway/media/landingpage.htm?issn=1040-2446&volume=81&issue=2&spage=128. 

Literature

Committee on Issues in the Transborder Flow of Scientific Data, National Research Council (1997). Bits of Power: Issues in Global Access to Scientific Data. Washington, D.C: National Academy Press. ISBN 0-309-05635-7. http://www.nap.edu/books/0309056357/html/index.html.  — discusses the international exchange of data in the natural sciences.

External links


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Data Sharing —   [dt. »gemeinsame Datennutzung«], in einem Netzwerk die Möglichkeit, von verschiedenen Rechnern aus gleichzeitig in derselben Datei zu arbeiten …   Universal-Lexikon

  • DSDR Data Sharing for Demographic Research — DSDR[1] (Data Sharing for Demographic Research)[2] is a project of ICPSR in collaboration with the Carolina Population Center (CPC)[3] at the University of North Carolina at Chapel Hill, the Minnesota Population Center (MPC)[4] at the University… …   Wikipedia

  • Data administration — or data resource management is an organizational function working in the areas of information systems and computer science that plans, organizes, describes and controls data resources. Data resources are usually as stored in databases under a… …   Wikipedia

  • Data management plan — A data management plan is a formal document that outlines how you will handle your data both during your research, and after the project is completed [1]. The goal of a data management plan is to consider the many aspects of data management,… …   Wikipedia

  • Data library — A data library refers to both the content and the services that foster use of collections of numeric, audio visual, textual or geospatial data sets for secondary use in research. (See below to view definition from the Online Dictionary for… …   Wikipedia

  • Data Reference Model — The DRM Collaboration Process. The Data Reference Model (DRM) is one of the five reference models of the Federal Enterprise Architecture (FEA). Contents 1 Overview …   Wikipedia

  • Data modeling — The data modeling process. The figure illustrates the way data models are developed and used today. A conceptual data model is developed based on the data requirements for the application that is being developed, perhaps in the context of an… …   Wikipedia

  • Data Center Ethernet — (also known as Converged Enhanced Ethernet) describes an enhanced Ethernet that will enable convergence of various applications in data centers (LAN, SAN, and HPC) onto a single interconnect technology.Today data centers deploy different networks …   Wikipedia

  • Data Access in Real Time — (DART) is a Real time operating system used by EMC Celerra. It is a modified UNIX Kernel with additional functionality. DART is an embedded, real time, operating system comprising a modified UNIX kernel and dedicated file server software that… …   Wikipedia

  • Data driven journalism — is a journalistic process based on analyzing and filtering large data sets for the purpose of creating a new story. Data driven journalism deals with open data that is freely available online and analyzed with open source tools.[1] Data driven… …   Wikipedia