Ontologies and Validated Terms
Make the most out of your metadata by using controlled vocabulary
TL;DR: Use vocabulary that can be identified in KBase so your metadata is functional.
When uploading Samples to KBase, there are many metadata terms that are built in and can be validated. Doing so allows your data to be more easily compared with other data and allows us to integrate data when different sources may use different conventions.
When Samples are uploaded into KBase via a spreadsheet, they are transformed into an internal representation. Columns within the spreadsheet are mapped and transformed based on the template used. Columns that correspond to recognized terms are validated to ensure they properly formatted, such as checking that the value is the proper type (e.g. string versus a number), in the correct range, match an enumerated list, or appear in an ontology.
Controlled terms are useful both because they undergo this validation and they provide a more precise meaning for the value and comparisons accounting for units.
When the uploader encounters terms it doesn’t recognize, those terms and values will be stored in a user section of the samples. These values still serve a purpose and can be used in analysis within that data set and they can provide contextual information that the original uploader understands. However, unrecognized terms can not reliably be compared across SampleSets and other samples in the system. For example, two projects may use the same term to represent different concepts (e.g. depth below sea-level or depth below surface).
The Ontology API supports multiple ontology systems. Currently supported systems are Gene Ontology (GO) and Environmental Ontology (ENVO). You can see more information on both ontologies from their own home pages, or view the KBase landing page for a given ontology page with the links and URL format below (respectively).
The file TSV file below contains a list of the metadata terms currently supported for validation along with a description that provides general formatting direction. You can view the most up-to-date version of this list in the KBase Samples GitHub.
Validated Metadata TSV
- Base: meter, m
- Micron/micrometer: µm or um, 1*10^-6 m
- Millimeter: mm, 1*10^-3 m
- Centimeter: cm, 1*10^-2 m
- Kilometer: km, 1*10^3 m
- Base: second, s
- Minute: min, 60 s
- Hour: hr, 3600 s or 60 min
- Day: d, 24 hr
- Year: yr or a, 365.25 d
- Base: gram, g
- Microgram: µg, 1*10^-6 g
- Milligram: mg, 1*10^-3 g
- Kilogram: kg, 1*10^3 g