Category Archives: Data Repository

Featured Project: Digital Rocks Portal

Digital Rocks is a data portal for fast storage and retrieval, sharing, organization and analysis of images of varied porous micro-structures. It has the purpose of enhancing research resources for modeling/prediction of porous material properties in the fields of Petroleum, Civil and Environmental Engineering as well as Geology.

This platform allows managing,  preserving, visualization and basic analysis of available images of porous materials and experiments performed on them, and any accompanying measurements (porosity, capillary pressure, permeability, electrical, NMR and elastic properties, etc.) required for both validation on modeling approaches and the upscaling and building of larger (hydro)geological models.

Read more about the project:  https://www.tacc.utexas.edu/-/digital-rock-physics-helps-scientists-understand-porous-media 

NASA Launches Public Portal for Research Results

NASA-funded research data can now be accessed via their recently unveiled public web portal, PubSpace.  All authors of peer-reviewed papers resulting from research funded by NASA, starting in 2016, will be required to deposit copies of publications and associated data within one year of publication.

This archive of original science results will be available online without a fee. “Making our research data easier to access will greatly magnify the impact of our research,” said NASA Chief Scientist Ellen Stofan. “As scientists and engineers, we work by building upon a foundation laid by others.”

 

Syracuse University rolls out Qualitative Data Repository (QDR)

The Center for Qualitative and Multi-Method Inquiry at Syracuse University has created a repository for qualitative data. The QDR “selects, ingests, curates, archives, manages, durably preserves, and provides access to digital data used in qualitative and multi-method social inquiry”. Right now the repository is in beta, but there are several pilot projects that users can view. This fall, the QDR hopes to begin offering institutional memberships to colleges, universities and other organizations. Institutional members will be able to access some services that are unavailable to other registered users.

For more information about QDR: https://qdr.syr.edu/

Most raw data from old scientific studies is gone

A recent study from Current Biology found that 90% of data from papers published more than 20 years ago was inaccessible. In this study, the researchers tried to contact authors of 516 biological studies (published between 1991 and 2011) and asked them for their raw data. Some data was on obsolete technology (like three and a half inch floppy disks), some couldn’t be located, some email addresses were no longer active, and some authors just didn’t respond. In total, the researchers were only able to track down data for 23% of the published studies.

The results from this study underscore how important it is for researchers to create data management plans, and for funding agencies, journals, and universities to provide resources to help preserve data and provide access to it.

April Highlight on Data Repositories: KNB

This month, we’d like to highlight KNB, The Knowledge Network for Biocomplexity, and their data management application Morpho: http://knb.ecoinformatics.org/

What is KNB?

KNB is an NSF Knowledge and Distributed Intelligence Program sponsored by the National Center for Ecological Analysis and Synthesis, Texas Tech University, the Long Term Ecological Research Network, and the San Diego Supercomputer Center. This network of federated institutions is dedicated to sharing their data to promote access, discovery, and analysis of data that has been made available for community use and re-use.

What is Morpho?

Morpho is a free data management application that helps researchers create and edit ecological metadata, search and query their own metadata collections, view others’ data and data collections on the KNB, and share and upload their data onto the KNB. Morpho uses the Ecological Metadata Language (EML) specification which was developed by and for the ecology discipline to describe the extremely diverse data produced by the discipline.

How does deposit work?

All depositors must register for an account. To register a data set with KNB, you may use Morpho or a web form to supply the necessary metadata. A guide for completing the web form is available here. The EML fields used by KNB greatly improve the reusability and maintainability of the data when filled out as completely as possible.

All of the information provided comes from: http://knb.ecoinformatics.org

March Highlight on Data Repositories: figshare

This month, we’d like to highlight figshare: http://figshare.com/

What is figshare?

figshare is an open data repository founded January 2011 by Mark Hahel and supported by Digital Science. Its mission is to make data more easily discoverable, citable, and shareable. figshare is partnered with open access publisher Public Library of Science (PLOS) to host supplemental data for their journals and provide widgets to view inline data alongside articles.

How does deposit work?

All depositors must register for an account. Researchers may upload data in any format and are given unlimited space for data made freely available on the site and 1 GB of storage for private data. Public-facing datasets will be issued a DOI and licensed under CC0, or no rights reserved. All other objects, including posters, papers, and media, are similarly issued a DOI, but are licensed under CC-BY. Version control is supported, so you may alter your public and private data.

How persistent is the repository?
figshare is partnered with the CLOCKSS Archive to preserve all public content. If figshare goes out of business or its servers experience catastrophic failure, CLOCKSS will trigger and release all materials through the University of Edinburgh and Stanford University.

All of the information provided comes from: http://figshare.com/ and http://www.clockss.org/clockss/Home.

November Highlight on Data Repositories: ICPSR

This month, we’d like to highlight the Inter-university Consortium for Political and Social Research (ICPSR): http://www.icpsr.umich.edu/

What is ICPSR?

ICPSR was founded in 1962 at the University of Michigan and now exists as a unit within the Institute for Social Research. It is the world’s largest social science data archive, with over 7,000 data collections and 500,000 individual data files that can be browsed by topic or searched. As of fall 2012, it has over 700 member institutions, including the University of Texas at Austin. Data is contributed by individual researchers, government agencies, and research organizations. ICPSR maintains a citation database of data-related literature to facilitate literature searches and the study of data as intellectual output. It is an international leader in data management and digital preservation dedicated to ensuring long-term usability of data.

How does deposit work?

Deposits must include all data and documentation necessary to read and interpret the data collection. For researchers interested in depositing their data, ICPSR’s Guide to Social Science Data Preparation and Archiving describes best practice for preparing data to be shared. ICPSR offers a secure electronic deposit form for researchers to upload and describe their data. More information about deposit is available here. Once data are submitted, data processors review data for confidentiality issues, convert documentation to electronic, PDF/A form, generate multiple data formats for dissemination and preservation, create Data Documentation Initiative-compliant documentation, create a descriptive metadata record, and assign the dataset a Digital Object Identifier. Once deposited, dataset usage can then be tracked through Utilization Reports.

What can be deposited?

Using the online secure deposit form, up to 2 GB can be uploaded. Preferred file formats are as follows:

  • Quantitative data files: SPSS, SAS, Stata
  • Qualitative data files: ASCII, RTF
  • Audio files: AIFF, WAV
  • Video files: MPEG4, JPEG2000
  • Documentation: ASCII, DDI-XML, Microsoft Word (PDF is acceptable)

How does ICPSR manage sensitive data and confidentiality?

ICPSR offers several deposit options for sensitive data.

For traditional restricted data, researchers interested in using the data must belong to membership institutions and fill out an application about their research. These requests will then be reviewed by ICPSR staff to ensure all security requirements have been fulfilled and the data will be sent via mail on a CD/DVD. For an additional layer of security, ICPSR can send information to an external body for review if necessary.

For highly sensitive data, data can be restricted to only on-site analysis at the University of Michigan’s Perry Building enclave. Investigators wishing to use materials restricted in this fashion must sign an Application for Use of the ICPSR Data Enclave and Confidentiality Agreement along with an official of their home institution. At the enclave, only the provided computer can be used and materials are reviewed for disclosure risk before leaving. All analysis output is evaluated by an ICPSR unit manager and sent to the researcher on a CD/DVD at after approval.

For simple analysis of sensitive data, ICPSR offers the Survey Documentation and Analysis statistical package that can evaluate output for disclosure risk before displaying it to the end user. More information is available here.

ICPSR will preserve data under a delayed dissemination model if necessary. They will preserve data until a predetermined release date and distribute normally after that date.

ICPSR is working on a virtual data enclave to permit remote access and analysis for sensitive data, which researchers cannot download, copy, or paste. The analysis output will then be evaluated by ICPSR staff before being released. This virtual data enclave is not yet operational.

More information on confidentiality is available here.

All of the information provided comes from http://www.icpsr.umich.edu/ and   Johnson, W. G. (2008). The ICPSR and social science research. Behavioral & Social Sciences Librarian, 140-157.  doi:10.1080/01639260802385200.

August Highlight on Data Repositories: Crystallography Open Database

This month we’d like to highlight the Crystallography Open Database: http://www.crystallography.net/

What is Crystallography Open Database (COD)?

COD is an open access database (started in 2003) containing small molecule/small to medium sized unit cell crystallographic structures of organic, inorganic, metal-organic compounds and minerals. As of August 2012, there are over 200,000 structures in the database. All structures in the database are in the public domain.

How does deposit work?

All newly published structures in peer-reviewed chemical and crystallographic journals are automatically included. Additionally, researchers are invited to submit their unpublished data as a personal communication via the website. Each structure receives a unique seven digit number called a COD number. COD does not accept duplicate structures, and the deposit software uses a simple algorithm to detect any duplicates. Data in COD are stored in the Crystallographic Interchange File/Framework (CIF) format. COD also accepts structure-factor  (Fobs) files.

Reliability of the database

COD servers reside on mirrored disks that are backed up nightly at four locations: Vilnius, Granada, Caen, and Portland. Regular backup copies of the entire collection are made on DVD and stored offline. COD users also have the option to download the entire COD repository.

All the information provided comes from: http://www.crystallography.net/ and Grazulis, S., Daskevic, A., Merkys, A., Chateigner, D., Lutterotti, L., Quiros, M., Serebryanaya, N.R., Moeck, P., Downs, R.T., & Le Bail, A. (2012). Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic Acids Research, 40, D420-D427. Retrieved from http://nar.oxfordjournals.org/content/40/D1/D420.abstract.