August Highlight on Data Repositories: Crystallography Open Database

This month we’d like to highlight the Crystallography Open Database:

What is Crystallography Open Database (COD)?

COD is an open access database (started in 2003) containing small molecule/small to medium sized unit cell crystallographic structures of organic, inorganic, metal-organic compounds and minerals. As of August 2012, there are over 200,000 structures in the database. All structures in the database are in the public domain.

How does deposit work?

All newly published structures in peer-reviewed chemical and crystallographic journals are automatically included. Additionally, researchers are invited to submit their unpublished data as a personal communication via the website. Each structure receives a unique seven digit number called a COD number. COD does not accept duplicate structures, and the deposit software uses a simple algorithm to detect any duplicates. Data in COD are stored in the Crystallographic Interchange File/Framework (CIF) format. COD also accepts structure-factor  (Fobs) files.

Reliability of the database

COD servers reside on mirrored disks that are backed up nightly at four locations: Vilnius, Granada, Caen, and Portland. Regular backup copies of the entire collection are made on DVD and stored offline. COD users also have the option to download the entire COD repository.

All the information provided comes from: and Grazulis, S., Daskevic, A., Merkys, A., Chateigner, D., Lutterotti, L., Quiros, M., Serebryanaya, N.R., Moeck, P., Downs, R.T., & Le Bail, A. (2012). Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic Acids Research, 40, D420-D427. Retrieved from

TACC Training: Structured Data, Metadata and Provenance in the Context of Scientific Data Management Projects

August 23, 2012 (Thursday)
1 p.m. to 3 p.m. (CT)
J.J. Pickle Research Campus
10100 Burnet Road
ROC 1.469
Austin, TX 78758

This class will be webcast.

An essential component in a Data Management Plan, metadata, allows describing data from multiple perspectives and at different levels (i.e., individual data objects and collection levels), providing essential documentation for managing collections throughout their life cycle.   General and domain specific metadata standards will be reviewed, including those for the description, preservation, and provenance documentation of scientific data collections. Included in the workshop are methods and tools that aid in the identification and storage of metadata for collection access, interoperability, and analysis.  We will cover tools such as Relational Database Management Systems (RDBMS), XML and other structured file types, as well as Geographic Information Systems (GIS).  The course will cover the tools’ basic use as well as addressing issues of schema design such as integrity and validation as well as import and export. To take better advantage of the workshop, have in mind a collection for which you need to develop metadata so you can apply the standards and issues that will be discussed.


Please submit any questions that you may have via the TACC consulting system: