http://blogs.lib.utexas.edu/dmatut/wp-content/themes/tis

April Highlight on Data Repositories: KNB

This month, we’d like to highlight KNB, The Knowledge Network for Biocomplexity, and their data management application Morpho: http://knb.ecoinformatics.org/

What is KNB?

KNB is an NSF Knowledge and Distributed Intelligence Program sponsored by the National Center for Ecological Analysis and Synthesis, Texas Tech University, the Long Term Ecological Research Network, and the San Diego Supercomputer Center. This network of federated institutions is dedicated to sharing their data to promote access, discovery, and analysis of data that has been made available for community use and re-use.

What is Morpho?

Morpho is a free data management application that helps researchers create and edit ecological metadata, search and query their own metadata collections, view others’ data and data collections on the KNB, and share and upload their data onto the KNB. Morpho uses the Ecological Metadata Language (EML) specification which was developed by and for the ecology discipline to describe the extremely diverse data produced by the discipline.

How does deposit work?

All depositors must register for an account. To register a data set with KNB, you may use Morpho or a web form to supply the necessary metadata. A guide for completing the web form is available here. The EML fields used by KNB greatly improve the reusability and maintainability of the data when filled out as completely as possible.

All of the information provided comes from: http://knb.ecoinformatics.org

Posted by on April 25, 2013

March Highlight on Data Repositories: figshare

This month, we’d like to highlight figshare: http://figshare.com/

What is figshare?

figshare is an open data repository founded January 2011 by Mark Hahel and supported by Digital Science. Its mission is to make data more easily discoverable, citable, and shareable. figshare is partnered with open access publisher Public Library of Science (PLOS) to host supplemental data for their journals and provide widgets to view inline data alongside articles.

How does deposit work?

All depositors must register for an account. Researchers may upload data in any format and are given unlimited space for data made freely available on the site and 1 GB of storage for private data. Public-facing datasets will be issued a DOI and licensed under CC0, or no rights reserved. All other objects, including posters, papers, and media, are similarly issued a DOI, but are licensed under CC-BY. Version control is supported, so you may alter your public and private data.

How persistent is the repository?
figshare is partnered with the CLOCKSS Archive to preserve all public content. If figshare goes out of business or its servers experience catastrophic failure, CLOCKSS will trigger and release all materials through the University of Edinburgh and Stanford University.

All of the information provided comes from: http://figshare.com/ and http://www.clockss.org/clockss/Home.

Posted by on March 7, 2013

TACC Training: Introduction to Scientific Visualization

TACC Training: Introduction to Scientific Visualization

February 7, 2013
9 a.m. to 5 p.m.
J.J. Pickle Research Campus
ROC 1.900
10100 Burnet Rd.
Austin, TX 78758

This is an in-person class. There will be no webcast.

On Thursday, February 7, members of the Texas Advanced Computing Center (TACC) Visualization and Data Analysis group will present the training session, Introduction to Scientific Visualization.

Users will receive instructions on the use of remote visualization software to visualize data sets generated on TACC compute systems. A review of the scientific visualization process will precede an overview of the visualization software available to TACC users, including the parallel visualization software VisIt and Paraview.  Labs will provide students with the opportunity to prepare data sets to be visualized using these applications. In addition, attendees will be introduced to the Longhorn visualization portal.

For a detailed agenda and registration, please visit the TACC Training page.

Posted by on February 4, 2013

NIH Announces Public Access Policy Enforcement

On November 16, 2012, the National Institutes of Health announced that beginning spring 2013 at the earliest, NIH will begin delaying the processing of non-competing continuation grant awards if publications arising from that award are not in compliance with the NIH public access policy. Awards will not be processed until publications are in compliance. (NOT-OD-12-160)

What is the NIH Public Access Policy?

The NIH Public Access Policy implements Division G, Title II, Section 218 of PL 110-161 (Consolidated Appropriations Act, 2008).  The law states:

“The Director of the National Institutes of Health shall require that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine’s PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication, to be made publicly available no later than 12 months after the official date of publication: Provided, That the NIH shall implement the public access policy in a manner consistent with copyright law.”

The Public Access Policy ensures that the public has access to the published results of NIH-funded research. It requires scientists to submit final peer-reviewed journal manuscripts that arise from NIH funds to the digital archive PubMed Central. The Policy requires that these final peer-reviewed manuscripts be accessible to the public on PubMed Central to help advance science and improve human health.

Does the NIH Public Access Policy apply to my paper?

The NIH Public Access Policy applies to any peer-reviewed manuscript that is accepted for publication in a journal on or after April 7, 2008 and arises from:

  • Any direct funding from an NIH grant or cooperative agreement active in Fiscal Year 2008 or later OR
  • Any direct funding from an NIH contract signed on or after April 7, 2008 OR
  • Any direct fuding from the NIH Intramural Program OR
  • An NIH employee

Principle Investigators and their institutions are responsible for ensuring all terms and conditions of awards are met, including the submission of final peer-reviewed manuscripts that arise directly from their awards, even if they are not an author or co-author of the paper.

How are papers submitted to PubMed Central?

NIH has agreements with over 1500 journals to automatically deposit final articles into PubMed Central without author involvement. Detailed information about the submission process is available at the National Institutes of Health Public Access site.

How do I avoid delays in funding?

  • Use My NCBI’s “My Bibliography” feature to ensure all papers linked with your NIH award are in compliance.
  • When planning an NIH-funded paper, ensure that arrangements are made for the paper to be submitted to PubMed Central. If there are multiple authors, only one need submit it, though it is the Principal Investigator’s responsibility to ensure compliance.
  • You may wish to let publishers know a manuscript is subject to the NIH Public Access Policy before the publisher decides to review it to avoid miscommunication.
If you have questions about NIH requirements, the Data Management team is here to help! Please email datamanagement@lib.utexas.edu.
All of the information provided comes from http://publicaccess.nih.gov/ and Rockey, Sally. National Institutes of Health, “Extramural Nexus: Improving Public Access to Research Results.” http://nexus.od.nih.gov/all/2012/11/16/improving-public-access-to-research-results/.

Posted by on November 21, 2012

November Highlight on Data Repositories: ICPSR

This month, we’d like to highlight the Inter-university Consortium for Political and Social Research (ICPSR): http://www.icpsr.umich.edu/

What is ICPSR?

ICPSR was founded in 1962 at the University of Michigan and now exists as a unit within the Institute for Social Research. It is the world’s largest social science data archive, with over 7,000 data collections and 500,000 individual data files that can be browsed by topic or searched. As of fall 2012, it has over 700 member institutions, including the University of Texas at Austin. Data is contributed by individual researchers, government agencies, and research organizations. ICPSR maintains a citation database of data-related literature to facilitate literature searches and the study of data as intellectual output. It is an international leader in data management and digital preservation dedicated to ensuring long-term usability of data.

How does deposit work?

Deposits must include all data and documentation necessary to read and interpret the data collection. For researchers interested in depositing their data, ICPSR’s Guide to Social Science Data Preparation and Archiving describes best practice for preparing data to be shared. ICPSR offers a secure electronic deposit form for researchers to upload and describe their data. More information about deposit is available here. Once data are submitted, data processors review data for confidentiality issues, convert documentation to electronic, PDF/A form, generate multiple data formats for dissemination and preservation, create Data Documentation Initiative-compliant documentation, create a descriptive metadata record, and assign the dataset a Digital Object Identifier. Once deposited, dataset usage can then be tracked through Utilization Reports.

What can be deposited?

Using the online secure deposit form, up to 2 GB can be uploaded. Preferred file formats are as follows:

  • Quantitative data files: SPSS, SAS, Stata
  • Qualitative data files: ASCII, RTF
  • Audio files: AIFF, WAV
  • Video files: MPEG4, JPEG2000
  • Documentation: ASCII, DDI-XML, Microsoft Word (PDF is acceptable)

How does ICPSR manage sensitive data and confidentiality?

ICPSR offers several deposit options for sensitive data.

For traditional restricted data, researchers interested in using the data must belong to membership institutions and fill out an application about their research. These requests will then be reviewed by ICPSR staff to ensure all security requirements have been fulfilled and the data will be sent via mail on a CD/DVD. For an additional layer of security, ICPSR can send information to an external body for review if necessary.

For highly sensitive data, data can be restricted to only on-site analysis at the University of Michigan’s Perry Building enclave. Investigators wishing to use materials restricted in this fashion must sign an Application for Use of the ICPSR Data Enclave and Confidentiality Agreement along with an official of their home institution. At the enclave, only the provided computer can be used and materials are reviewed for disclosure risk before leaving. All analysis output is evaluated by an ICPSR unit manager and sent to the researcher on a CD/DVD at after approval.

For simple analysis of sensitive data, ICPSR offers the Survey Documentation and Analysis statistical package that can evaluate output for disclosure risk before displaying it to the end user. More information is available here.

ICPSR will preserve data under a delayed dissemination model if necessary. They will preserve data until a predetermined release date and distribute normally after that date.

ICPSR is working on a virtual data enclave to permit remote access and analysis for sensitive data, which researchers cannot download, copy, or paste. The analysis output will then be evaluated by ICPSR staff before being released. This virtual data enclave is not yet operational.

More information on confidentiality is available here.

All of the information provided comes from http://www.icpsr.umich.edu/ and   Johnson, W. G. (2008). The ICPSR and social science research. Behavioral & Social Sciences Librarian, 140-157.  doi:10.1080/01639260802385200.

Posted by on November 1, 2012

TACC Training: Using Corral for Research Data Management

TACC Training: Using Corral for Research Data Management

October 25, 2012
1 p.m. to 4 p.m.
J.J. Pickle Research Campus
ROC 1.900
10100 Burnet Rd.
Austin, TX 78758

This class will be webcast.

Corral is the research data storage resource provided by TACC as part of the UT System Research Cyberinfrastructure initiative.

In this training, current and prospective users will receive a brief overview of the system followed by detailed discussion of some of the common usage modes, examples of how to perform tasks such as uploading data, changing access permissions, and using graphical client tools to manipulate data on Corral.

We will demonstrate both basic file system access and the iRODS data management system, which provides additional metadata management and data collections features for users with complex collections of data, and we will also discuss important topics such as open vs closed web access, sharing of data, and incorporation of Corral into data management plans for NSF and NIH.

The class will consist of a roughly 2-hour presentation and webcast, followed by an opportunity for one-on-one training and consultation with TACC staff.

Registration

Posted by on September 27, 2012

August Highlight on Data Repositories: Crystallography Open Database

This month we’d like to highlight the Crystallography Open Database: http://www.crystallography.net/

What is Crystallography Open Database (COD)?

COD is an open access database (started in 2003) containing small molecule/small to medium sized unit cell crystallographic structures of organic, inorganic, metal-organic compounds and minerals. As of August 2012, there are over 200,000 structures in the database. All structures in the database are in the public domain.

How does deposit work?

All newly published structures in peer-reviewed chemical and crystallographic journals are automatically included. Additionally, researchers are invited to submit their unpublished data as a personal communication via the website. Each structure receives a unique seven digit number called a COD number. COD does not accept duplicate structures, and the deposit software uses a simple algorithm to detect any duplicates. Data in COD are stored in the Crystallographic Interchange File/Framework (CIF) format. COD also accepts structure-factor  (Fobs) files.

Reliability of the database

COD servers reside on mirrored disks that are backed up nightly at four locations: Vilnius, Granada, Caen, and Portland. Regular backup copies of the entire collection are made on DVD and stored offline. COD users also have the option to download the entire COD repository.

All the information provided comes from: http://www.crystallography.net/ and Grazulis, S., Daskevic, A., Merkys, A., Chateigner, D., Lutterotti, L., Quiros, M., Serebryanaya, N.R., Moeck, P., Downs, R.T., & Le Bail, A. (2012). Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic Acids Research, 40, D420-D427. Retrieved from http://nar.oxfordjournals.org/content/40/D1/D420.abstract.

Posted by on August 22, 2012

TACC Training: Structured Data, Metadata and Provenance in the Context of Scientific Data Management Projects

August 23, 2012 (Thursday)
1 p.m. to 3 p.m. (CT)
J.J. Pickle Research Campus
10100 Burnet Road
ROC 1.469
Austin, TX 78758

This class will be webcast.

An essential component in a Data Management Plan, metadata, allows describing data from multiple perspectives and at different levels (i.e., individual data objects and collection levels), providing essential documentation for managing collections throughout their life cycle.   General and domain specific metadata standards will be reviewed, including those for the description, preservation, and provenance documentation of scientific data collections. Included in the workshop are methods and tools that aid in the identification and storage of metadata for collection access, interoperability, and analysis.  We will cover tools such as Relational Database Management Systems (RDBMS), XML and other structured file types, as well as Geographic Information Systems (GIS).  The course will cover the tools’ basic use as well as addressing issues of schema design such as integrity and validation as well as import and export. To take better advantage of the workshop, have in mind a collection for which you need to develop metadata so you can apply the standards and issues that will be discussed.

Registration

Please submit any questions that you may have via the TACC consulting system: http://portal.tacc.utexas.edu/consulting

Posted by on August 13, 2012

July Highlight on Data Repositories: Dryad

This month we’d like to highlight the Dryad data repository: http://datadryad.org

What is Dryad?

Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences. Goals of Dryad include:

  • Preserving the underlying data reported in a paper at the time of publication
  • Assign globally unique identifiers to datasets-making data citation easier
  • Allow end-users to perform sophisticated searches over data
  • Allow journals and societies to pool their resources for a single, shared repository

Who manages Dryad?

Dryad is governed by a consortium of journals that collaboratively promote data archiving and ensure the sustainability of the repository. Dryad is being developed by the National Evolutionary Synthesis Center and the University of North Carolina Metadata Research Center, in coordination with a large group of Journals and Societies. More information about governance can be found here.

What can be deposited?

Authors may submit tables, spreadsheets, flatfiles and all other kinds of data associated with their publications. Dryad accepts data in any format as long as it is associated with a primary publication.

Dryad submitters are required to place all data in the public domain using the Creative Commons Zero Waiver. This allows others to share, copy, or reuse the data. According to scientific norms, those who use the data are still obligated to cite the original creator of the data. By default, data are embargoed until journal article publication. Authors depositing data may choose to embargo the data for a year after publication.

Once the data deposit is complete, the depositer will receive a Digital Object Identifier, or DOI. This is a unique identifier for the data, and it provides for a consistent link between your publication and the data associated with it.

A short video about the data deposit process is available.

How sustainable is Dryad?

Data submitted to Dryad are made available for the long-term, even beyond the lifespan of Dryad, through continuous backup and replication services.

The Dryad Cost Recovery Plan is based on the framework that emerged from the Dryad Board meeting of July 2011. Download the plan.

The revenue streams described are for recovery of operating costs. Research and development of new capabilities will continue to be funded though project grants, and Dryad will continue to seek support from foundations, government funding bodies, and private donors to support its core mission and reduce costs to users.

All the information provided comes from the Dryad website. To find out more about Dryad, please visit http://datadryad.org

Posted by on July 11, 2012

Data Documentation Initiative (DDI) for the Data Librarian

Data Documentation Initiative (DDI) for the Data Librarian
A 2-day workshop for data librarians and archivists involved in data management and researcher support.

Monday, November 12–Tuesday, November 13, 2012
Perry-Castañeda Library
University of Texas at Austin
Austin, Texas

Workshop Fee: $150
(please note: attendance is limited to 25 participants. Sold out!!! )

Workshop Description:
DDI is a metadata specification for the social and behavioral sciences. This workshop will focus on the use of DDI-Lifecycle by data librarians and archivists as a means of managing data deposited to their systems and supporting the use of DDI by their research community. The first day of the workshop will provide information in a lecture format providing an overview of the DDI model and its applications and identify specific structures within DDI that support data management, data discovery, data processing and study development. The goal is to provide a sense of the internal structure of DDI and how it relates to the data and metadata lifecycle of research data.

The second day will provide hands-on work with creating DDI content from the perspective of the librarian/archivist and from the perspective of the researcher. The first perspective focuses on transferring existing metadata into DDI and then leveraging the DDI structure to support data management and discovery. The second perspective focuses on the capture of metadata at the point of origin and use of managed metadata to support quality control and communication during the research process.

Who should attend? Data librarians and archivists involved in data management and researcher support. Social Science researchers interested in metadata and data curation.

Instructor:
Wendy Thomas
Data Access Core Director, Minnesota Population Center
http://users.pop.umn.edu/~wlt/

Questions? Contact Amy Rushing a.rushing@austin.utexas.edu

Sponsored by the University of Texas Libraries and the UT Population Research Center

Posted by on June 25, 2012

Tags:

Older Posts »