3. Model Overview
The central RDF Class in DATMM is a single Dataset, which is defined as “a discrete collection of data gathered for use in research.” The Dataset Class is required and all other Classes in the model are generated from their relationship to a specific Dataset. For example, since a single dataset may be related to a collection of datasets within a single study, the model creates a relationship between a Dataset and the Collection to which it belongs.
The following diagram represents the relationships of DATMM Classes and properties:
Limitations
The model is intended to link to dataset informational pages or dataset collection homepages at their home sites rather than providing direct links to datasets for download. Providing information about dataset usability and direct download is difficult to provide as it can differ greatly among sites, so the option to download datasets directly via DATMM was removed from the model. The most basic problems found are that 1) some datasets may not be downloaded without providing user information at the home site, and 2) data concerning usability, such as dataset size and format, is not always readily available in the home site metadata. Rather than attempting to reconcile a variety of different access practices among different dataset home sites, sending users to dataset home sites for additional information and downloading seemed the simplest solution.
Examples
Below are examples of parts of a DATMM dataset description that could follow one upon another. Note that these examples do not necessarily include all properties in each DATMM Class, but rather includes those properties that might be most commonly found in dataset metadata.
1. A single dataset may be represented with DATMM as follows:
datmm:Dataset <http://id.nlm.nih.gov/dataset/0000001> ;
dct:title “Biomedical Research Concerning X” ;
dct:alternative “Potential Alternative Title for Biomedical Research Concerning X” ;
dct:description “This describes what the research and dataset are about.” ;
dct:identifier “12345678AB” ;
foaf:homepage <http://somebiomedrepository/datasethomepageURL> ;
dct:issued “2024-01-01” ;
dct:language <http://id.loc.gov/vocabulary/iso639-1/en> ;
dct:rights “No information provided; contact the repository owner” ;
. . .
2. The dataset home repository site is named and given an internal DATMM identifier; this metadata is intended to be stored and re-used with datasets from the same repository.
dct:isPartOf <http://id.nlm.nih.gov/datmm/repository/12346> ;
a datmm:Repository
dct:title “Some Biomedical Repository” ;
dct:alternative “SBR” ;
foaf:homepage <http://somebiomedrepository> ;
dct:identifier <https://wikidata.org/entity/123ABC> ;
. . .
3. A dataset will have one or more contributors, each of which are given internal DATMM identifiers; this metadata is intended to be stored and re-used, by matching on the dct:identifier, in the event a contributor is related to other datasets.
bf:contribution <http://id.nlm.nih.gov/datmm/contribution/12347> ;
a bf:Contribution
bf:role <http://www.wikidata.org/entity/Q20204892> ;
bf:agent <http://id.nlm.nih.gov/datmm/agent/56789> ;
a foaf:Agent
dct:identifier <https://orcid.org/0000-0000-0000-0000> ;
foaf:name "Doe, Jane" ;
. . .
4. A dataset will have one or more subjects from a variety of subject schema, including local keywords and/or established subject schema, such as Medical Subject Headings (MeSH).
dct:subject <http://id.nlm.nih.gov/datmm/concept/12348> ;
a skos:Concept
dct:identifier <http://id.nlm.nih.gov/mesh/D003920> ;
skos:inScheme “MeSH” ;
rdfs:label “Diabetes Mellitus” ;
. . .
5. A dataset may be related to one or more publications.
dct:isReferencedBy <http://id.nlm.nih.gov/datmm/documentation/12349> ;
a datmm:Documentation
foaf:homepage <http://somejournal/articleURL98765> ;
dct:title “A Journal Article About the Research and Dataset” ;
. . .
6. A dataset may be related to a grant or to funding for which the home site provides information, including the grant identifier and the grant or funding name.
schema:funding <http://id.nlm.nih.gov/datmm/grant/12357> ;
a schema:Grant
schema:identifier “HHS12345” ;
schema:name “Some Biomedical Grant” ;
. . .
7. A single dataset that is part of a collection of related datasets will also include metadata about the collection. As the foci of DATMM are individual datasets, collection metadata is referenced only within dataset metadata.
dct:isPartOf <http://id.nlm.nih.gov/datmm/collection/9101112> ;
a dct:Collection
bf:contribution <http://id.nlm.nih.gov/datmm/contribution/12347> ;
dct:description “A description of the research in the collection of datasets” ;
foaf:homepage <http://somebiomedrepository/collectionhomepageURL> ;
dct:identifier “98765COLL” ;
dct:isPartOf <http://id.nlm.nih.gov/datmm/repository/12346> ;
dct:issued “2024-01-01” ;
dct:subject <http://id.nlm.nih.gov/datmm/concept/12348> ;
dct:title “A Collection of Multiple Datasets About Something Biomedical” ;
. . .
Last Reviewed: April 3, 2024