Category Archives: Data Repository

NASA Launches Public Portal for Research Results

NASA-funded research data can now be accessed via their recently unveiled public web portal, PubSpace.  All authors of peer-reviewed papers resulting from research funded by NASA, starting in 2016, will be required to deposit copies of publications and associated data within one year of publication.

This archive of original science results will be available online without a fee. “Making our research data easier to access will greatly magnify the impact of our research,” said NASA Chief Scientist Ellen Stofan. “As scientists and engineers, we work by building upon a foundation laid by others.”

 

Syracuse University rolls out Qualitative Data Repository (QDR)

The Center for Qualitative and Multi-Method Inquiry at Syracuse University has created a repository for qualitative data. The QDR “selects, ingests, curates, archives, manages, durably preserves, and provides access to digital data used in qualitative and multi-method social inquiry”. Right now the repository is in beta, but there are several pilot projects that users can view. This fall, the QDR hopes to begin offering institutional memberships to colleges, universities and other organizations. Institutional members will be able to access some services that are unavailable to other registered users.

For more information about QDR: https://qdr.syr.edu/

Most raw data from old scientific studies is gone

A recent study from Current Biology found that 90% of data from papers published more than 20 years ago was inaccessible. In this study, the researchers tried to contact authors of 516 biological studies (published between 1991 and 2011) and asked them for their raw data. Some data was on obsolete technology (like three and a half inch floppy disks), some couldn’t be located, some email addresses were no longer active, and some authors just didn’t respond. In total, the researchers were only able to track down data for 23% of the published studies.

The results from this study underscore how important it is for researchers to create data management plans, and for funding agencies, journals, and universities to provide resources to help preserve data and provide access to it.

April Highlight on Data Repositories: KNB

This month, we’d like to highlight KNB, The Knowledge Network for Biocomplexity, and their data management application Morpho: http://knb.ecoinformatics.org/

What is KNB?

KNB is an NSF Knowledge and Distributed Intelligence Program sponsored by the National Center for Ecological Analysis and Synthesis, Texas Tech University, the Long Term Ecological Research Network, and the San Diego Supercomputer Center. This network of federated institutions is dedicated to sharing their data to promote access, discovery, and analysis of data that has been made available for community use and re-use.

What is Morpho?

Morpho is a free data management application that helps researchers create and edit ecological metadata, search and query their own metadata collections, view others’ data and data collections on the KNB, and share and upload their data onto the KNB. Morpho uses the Ecological Metadata Language (EML) specification which was developed by and for the ecology discipline to describe the extremely diverse data produced by the discipline.

How does deposit work?

All depositors must register for an account. To register a data set with KNB, you may use Morpho or a web form to supply the necessary metadata. A guide for completing the web form is available here. The EML fields used by KNB greatly improve the reusability and maintainability of the data when filled out as completely as possible.

All of the information provided comes from: http://knb.ecoinformatics.org

March Highlight on Data Repositories: figshare

This month, we’d like to highlight figshare: http://figshare.com/

What is figshare?

figshare is an open data repository founded January 2011 by Mark Hahel and supported by Digital Science. Its mission is to make data more easily discoverable, citable, and shareable. figshare is partnered with open access publisher Public Library of Science (PLOS) to host supplemental data for their journals and provide widgets to view inline data alongside articles.

How does deposit work?

All depositors must register for an account. Researchers may upload data in any format and are given unlimited space for data made freely available on the site and 1 GB of storage for private data. Public-facing datasets will be issued a DOI and licensed under CC0, or no rights reserved. All other objects, including posters, papers, and media, are similarly issued a DOI, but are licensed under CC-BY. Version control is supported, so you may alter your public and private data.

How persistent is the repository?
figshare is partnered with the CLOCKSS Archive to preserve all public content. If figshare goes out of business or its servers experience catastrophic failure, CLOCKSS will trigger and release all materials through the University of Edinburgh and Stanford University.

All of the information provided comes from: http://figshare.com/ and http://www.clockss.org/clockss/Home.

November Highlight on Data Repositories: ICPSR

This month, we’d like to highlight the Inter-university Consortium for Political and Social Research (ICPSR): http://www.icpsr.umich.edu/

What is ICPSR?

ICPSR was founded in 1962 at the University of Michigan and now exists as a unit within the Institute for Social Research. It is the world’s largest social science data archive, with over 7,000 data collections and 500,000 individual data files that can be browsed by topic or searched. As of fall 2012, it has over 700 member institutions, including the University of Texas at Austin. Data is contributed by individual researchers, government agencies, and research organizations. ICPSR maintains a citation database of data-related literature to facilitate literature searches and the study of data as intellectual output. It is an international leader in data management and digital preservation dedicated to ensuring long-term usability of data.

How does deposit work?

Deposits must include all data and documentation necessary to read and interpret the data collection. For researchers interested in depositing their data, ICPSR’s Guide to Social Science Data Preparation and Archiving describes best practice for preparing data to be shared. ICPSR offers a secure electronic deposit form for researchers to upload and describe their data. More information about deposit is available here. Once data are submitted, data processors review data for confidentiality issues, convert documentation to electronic, PDF/A form, generate multiple data formats for dissemination and preservation, create Data Documentation Initiative-compliant documentation, create a descriptive metadata record, and assign the dataset a Digital Object Identifier. Once deposited, dataset usage can then be tracked through Utilization Reports.

What can be deposited?

Using the online secure deposit form, up to 2 GB can be uploaded. Preferred file formats are as follows:

  • Quantitative data files: SPSS, SAS, Stata
  • Qualitative data files: ASCII, RTF
  • Audio files: AIFF, WAV
  • Video files: MPEG4, JPEG2000
  • Documentation: ASCII, DDI-XML, Microsoft Word (PDF is acceptable)

How does ICPSR manage sensitive data and confidentiality?

ICPSR offers several deposit options for sensitive data.

For traditional restricted data, researchers interested in using the data must belong to membership institutions and fill out an application about their research. These requests will then be reviewed by ICPSR staff to ensure all security requirements have been fulfilled and the data will be sent via mail on a CD/DVD. For an additional layer of security, ICPSR can send information to an external body for review if necessary.

For highly sensitive data, data can be restricted to only on-site analysis at the University of Michigan’s Perry Building enclave. Investigators wishing to use materials restricted in this fashion must sign an Application for Use of the ICPSR Data Enclave and Confidentiality Agreement along with an official of their home institution. At the enclave, only the provided computer can be used and materials are reviewed for disclosure risk before leaving. All analysis output is evaluated by an ICPSR unit manager and sent to the researcher on a CD/DVD at after approval.

For simple analysis of sensitive data, ICPSR offers the Survey Documentation and Analysis statistical package that can evaluate output for disclosure risk before displaying it to the end user. More information is available here.

ICPSR will preserve data under a delayed dissemination model if necessary. They will preserve data until a predetermined release date and distribute normally after that date.

ICPSR is working on a virtual data enclave to permit remote access and analysis for sensitive data, which researchers cannot download, copy, or paste. The analysis output will then be evaluated by ICPSR staff before being released. This virtual data enclave is not yet operational.

More information on confidentiality is available here.

All of the information provided comes from http://www.icpsr.umich.edu/ and   Johnson, W. G. (2008). The ICPSR and social science research. Behavioral & Social Sciences Librarian, 140-157.  doi:10.1080/01639260802385200.

August Highlight on Data Repositories: Crystallography Open Database

This month we’d like to highlight the Crystallography Open Database: http://www.crystallography.net/

What is Crystallography Open Database (COD)?

COD is an open access database (started in 2003) containing small molecule/small to medium sized unit cell crystallographic structures of organic, inorganic, metal-organic compounds and minerals. As of August 2012, there are over 200,000 structures in the database. All structures in the database are in the public domain.

How does deposit work?

All newly published structures in peer-reviewed chemical and crystallographic journals are automatically included. Additionally, researchers are invited to submit their unpublished data as a personal communication via the website. Each structure receives a unique seven digit number called a COD number. COD does not accept duplicate structures, and the deposit software uses a simple algorithm to detect any duplicates. Data in COD are stored in the Crystallographic Interchange File/Framework (CIF) format. COD also accepts structure-factor  (Fobs) files.

Reliability of the database

COD servers reside on mirrored disks that are backed up nightly at four locations: Vilnius, Granada, Caen, and Portland. Regular backup copies of the entire collection are made on DVD and stored offline. COD users also have the option to download the entire COD repository.

All the information provided comes from: http://www.crystallography.net/ and Grazulis, S., Daskevic, A., Merkys, A., Chateigner, D., Lutterotti, L., Quiros, M., Serebryanaya, N.R., Moeck, P., Downs, R.T., & Le Bail, A. (2012). Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic Acids Research, 40, D420-D427. Retrieved from http://nar.oxfordjournals.org/content/40/D1/D420.abstract.

July Highlight on Data Repositories: Dryad

This month we’d like to highlight the Dryad data repository: http://datadryad.org

What is Dryad?

Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences. Goals of Dryad include:

  • Preserving the underlying data reported in a paper at the time of publication
  • Assign globally unique identifiers to datasets-making data citation easier
  • Allow end-users to perform sophisticated searches over data
  • Allow journals and societies to pool their resources for a single, shared repository

Who manages Dryad?

Dryad is governed by a consortium of journals that collaboratively promote data archiving and ensure the sustainability of the repository. Dryad is being developed by the National Evolutionary Synthesis Center and the University of North Carolina Metadata Research Center, in coordination with a large group of Journals and Societies. More information about governance can be found here.

What can be deposited?

Authors may submit tables, spreadsheets, flatfiles and all other kinds of data associated with their publications. Dryad accepts data in any format as long as it is associated with a primary publication.

Dryad submitters are required to place all data in the public domain using the Creative Commons Zero Waiver. This allows others to share, copy, or reuse the data. According to scientific norms, those who use the data are still obligated to cite the original creator of the data. By default, data are embargoed until journal article publication. Authors depositing data may choose to embargo the data for a year after publication.

Once the data deposit is complete, the depositer will receive a Digital Object Identifier, or DOI. This is a unique identifier for the data, and it provides for a consistent link between your publication and the data associated with it.

A short video about the data deposit process is available.

How sustainable is Dryad?

Data submitted to Dryad are made available for the long-term, even beyond the lifespan of Dryad, through continuous backup and replication services.

The Dryad Cost Recovery Plan is based on the framework that emerged from the Dryad Board meeting of July 2011. Download the plan.

The revenue streams described are for recovery of operating costs. Research and development of new capabilities will continue to be funded though project grants, and Dryad will continue to seek support from foundations, government funding bodies, and private donors to support its core mission and reduce costs to users.

All the information provided comes from the Dryad website. To find out more about Dryad, please visit http://datadryad.org