TACC Training: Structured Data, Metadata and Provenance in the Context of Scientific Data Management Projects

August 23, 2012 (Thursday)
1 p.m. to 3 p.m. (CT)
J.J. Pickle Research Campus
10100 Burnet Road
ROC 1.469
Austin, TX 78758

This class will be webcast.

An essential component in a Data Management Plan, metadata, allows describing data from multiple perspectives and at different levels (i.e., individual data objects and collection levels), providing essential documentation for managing collections throughout their life cycle.   General and domain specific metadata standards will be reviewed, including those for the description, preservation, and provenance documentation of scientific data collections. Included in the workshop are methods and tools that aid in the identification and storage of metadata for collection access, interoperability, and analysis.  We will cover tools such as Relational Database Management Systems (RDBMS), XML and other structured file types, as well as Geographic Information Systems (GIS).  The course will cover the tools’ basic use as well as addressing issues of schema design such as integrity and validation as well as import and export. To take better advantage of the workshop, have in mind a collection for which you need to develop metadata so you can apply the standards and issues that will be discussed.


Please submit any questions that you may have via the TACC consulting system: http://portal.tacc.utexas.edu/consulting

July Highlight on Data Repositories: Dryad

This month we’d like to highlight the Dryad data repository: http://datadryad.org

What is Dryad?

Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences. Goals of Dryad include:

  • Preserving the underlying data reported in a paper at the time of publication
  • Assign globally unique identifiers to datasets-making data citation easier
  • Allow end-users to perform sophisticated searches over data
  • Allow journals and societies to pool their resources for a single, shared repository

Who manages Dryad?

Dryad is governed by a consortium of journals that collaboratively promote data archiving and ensure the sustainability of the repository. Dryad is being developed by the National Evolutionary Synthesis Center and the University of North Carolina Metadata Research Center, in coordination with a large group of Journals and Societies. More information about governance can be found here.

What can be deposited?

Authors may submit tables, spreadsheets, flatfiles and all other kinds of data associated with their publications. Dryad accepts data in any format as long as it is associated with a primary publication.

Dryad submitters are required to place all data in the public domain using the Creative Commons Zero Waiver. This allows others to share, copy, or reuse the data. According to scientific norms, those who use the data are still obligated to cite the original creator of the data. By default, data are embargoed until journal article publication. Authors depositing data may choose to embargo the data for a year after publication.

Once the data deposit is complete, the depositer will receive a Digital Object Identifier, or DOI. This is a unique identifier for the data, and it provides for a consistent link between your publication and the data associated with it.

A short video about the data deposit process is available.

How sustainable is Dryad?

Data submitted to Dryad are made available for the long-term, even beyond the lifespan of Dryad, through continuous backup and replication services.

The Dryad Cost Recovery Plan is based on the framework that emerged from the Dryad Board meeting of July 2011. Download the plan.

The revenue streams described are for recovery of operating costs. Research and development of new capabilities will continue to be funded though project grants, and Dryad will continue to seek support from foundations, government funding bodies, and private donors to support its core mission and reduce costs to users.

All the information provided comes from the Dryad website. To find out more about Dryad, please visit http://datadryad.org

TACC Training: Data Storage – Architectures and Networking

June 26, 2012
1 p.m. to 2 p.m. (CT)
J.J. Pickle Research Campus
ROC 1.603
10100 Burnet Rd.
Austin, TX 78758

This class will be webcast.

This training provides an overview of the types of storage systems available in XSEDE, at the campus level, and from commercial providers, with some brief discussion of technical characteristics of the various systems and where they fit within research data workflows. Characteristics of interest to researchers focused on HPC or on data-driven computing will be discussed, as will issues related to transfer of data between systems, management of data across systems, and the overall life cycle of research data as it relates to the details of data architecture. The importance of understanding each storage system and data element within a dynamic ecosystem of data and services will be emphasized as a step towards understanding how to construct and execute research data management plans.


Databib now available

Databib is a registry of repositories for research data. It is a collaboration between Purdue and Penn State, and the registry describes and links to hundreds of data repositories. Some repositories accept data submissions in particular disciplines, so if you’re looking for an appropriate place for your data, Databib may help you find repositories in your field. More info on Databib at their website: http://databib.org

Training: Writing a Data Management Plan: A Guide for the Perplexed

Thursday, March 29, 2012
JJ Pickle Research Campus
ROC 1.468/1.474
10100 Burnet Road
Austin, TX 78758

This class will be webcast.

New requirements from funding agencies, along with the increasing importance of digital data in the conduct of research, have lead to a new and unfamiliar component of proposal writing (and reviewing): the data management plan. This session, following on the Data Management Planning and Execution session offered earlier this year, focuses on the process of writing Data Management Plans, going beyond templates and formulas to present strategies for developing a plan for presentation to collaborators, reviewers, and institutions. We discuss the writing of a data management plan as a way to ease the achievement of research goals, and treat data as a critical component of optimal research outcomes. Topics covered will include the importance of disciplinary norms and community collections, data sharing, planning for provenance and other metadata, the relationship between data management and analysis and long-term preservation of your data.


Please submit any questions that you might have via the TACC Consulting System: http://portal.tacc.utexas.edu/consulting



The DMPTool is now available to the UT research community! Developed by the University of California Curation Center and a group of major research universities, the DMPTool is designed to help researchers:

  • Create ready-to-use data managment plans for specific funding agencies
  • Get step-by-step instructions and guidance for data management plans
  • Learn about resources and services available at their home institution to fulfill the data management requirements of their grants

To access the DMPTool, visit https://dmp.cdlib.org/institutional_login and choose the “University of Texas at Austin” from the pull-down menu. From there you can login with your EID and password.