Metadata is data providing information about data that makes findable, trackable and (re)usable.
It can include information such as contact information, geographic locations, details about units of measure, abbreviations or codes used in the dataset, instrument and protocol information, survey tool details, provenance and version information and much more.
Metadata formats and standards
Metadata can take many different forms, from free text to standardized, structured, machine-readable, extensible content. It is recommended to use a standard metadata format used in your field.
Specific disciplines, repositories or data centers may guide or even dictate the content and format of metadata, possibly using a formal standard. Because creation of standardized metadata can be difficult and time consuming, another consideration when selecting a standard is the availability of tools that can help generate the metadata (e.g. Morpho allows for easy creation of EML, Nesstar for DDI data, etc.). The Digital Curation Center provides a catalog of common metadata standards, organized by discipline: http://www.dcc.ac.uk/resources/metadata-standards. Some specific examples of metadata standards, both general and domain specific are:
- Dublin Core - domain agnostic, basic and widely used metadata standard
- DDI (Data Documentation Initiative) - common standard for social, behavioral and economic sciences, including survey data
- EML (Ecological Metadata Language) - specific for ecology disciplines
- ISO 19115 and FGDC-CSDGM (Federal Geographic Data Committee's Content Standard for Digital Geospatial Metadata) - for describing geospatial information
- MINSEQE (MINimal information about high throughput SEQeuencing Experiments) - Genomics standard
- FITS (Flexible Image Transport System) - Astronomy digital file standard that includes structured, embedded metadata
- MIBBI - Minimum Information for Biological and Biomedical Investigations
Where no appropriate, formal metadata standard exists, for internal use, writing “readme” style metadata is an appropriate strategy.
If you need more information, check the following resources: