Skip to content

Class: Dataset (Dataset)

Information about a specific grouping of data files

URI: include:Dataset

classDiagram class Dataset click Dataset href "../Dataset" Thing <|-- Dataset click Thing href "../Thing" Dataset : accessLimitations Dataset : accessRequirements Dataset : dataCategory Dataset --> "1..*" EnumDataCategory : dataCategory click EnumDataCategory href "../EnumDataCategory" Dataset : dataCollectionEndYear Dataset : dataCollectionStartYear Dataset : datasetDescription Dataset : datasetExternalId Dataset : datasetGlobalId Dataset : datasetName Dataset : dataType Dataset : dbgap Dataset : expectedNumberOfFiles Dataset : expectedNumberOfParticipants Dataset : experimentalPlatform Dataset : experimentalStrategy Dataset : isHarmonized Dataset : otherAccessAuthority Dataset : otherRepository Dataset : publication Dataset : studyCode Dataset --> "1" EnumStudyCode : studyCode click EnumStudyCode href "../EnumStudyCode"

Inheritance

Slots

Name Cardinality and Range Description Inheritance
studyCode 1
EnumStudyCode
Unique identifier for the study (generally a short acronym) direct
datasetName 1
String
Full name of the dataset, provided by contributor direct
datasetDescription 0..1
String
Brief additional notes about the dataset (1-3 sentences) that are not already... direct
datasetGlobalId 0..1
String
Unique Global ID for dataset, generated by DCC direct
datasetExternalId 0..1
String
Unique identifier or code for dataset, if provided by contributor direct
expectedNumberOfParticipants 1
Integer
Expected number of participants in this Dataset (or actual number, if data ha... direct
expectedNumberOfFiles 0..1
Integer
Expected number of files associated with this dataset, including dictionaries direct
dataCollectionStartYear 0..1
String
Year that data collection started direct
dataCollectionEndYear 0..1
String
Year that data collection ended direct
dataCategory 1..*
EnumDataCategory
General category of data in Dataset; pipe-separated if multiple direct
dataType *
String
Specific type of data contained in Dataset; pipe-separated if multiple (e direct
experimentalStrategy *
String
Experimental method used to obtain data in Dataset; pipe-separated if multipl... direct
experimentalPlatform *
String
Specific platform used to perform experiment; pipe-separated if multiple (e direct
publication *
Uri
URL for publication(s) describing the Dataset's rationale and methodology (Pu... direct
accessLimitations 0..1
String
Data access limitations, as defined in the GA4GH Data Use Ontology (DUO; can ... direct
accessRequirements 0..1
String
Data access requirements, as defined in the GA4GH Data Use Ontology (DUO; can... direct
dbgap *
String
dbGaP "phs" accession code(s) required to access the files in this Dataset, i... direct
otherRepository 0..1
Uri
URL if dataset is already deposited in a public repository other than dbGaP (... direct
otherAccessAuthority 0..1
String
Email or URL for dataset's Access Authority, if not dbGaP direct
isHarmonized 0..1
Boolean
For omics datasets, is this Dataset already harmonized and available in the I... direct

Identifier and Mapping Information

Annotations

property value
required False

Schema Source

  • from schema: https://w3id.org/include

Mappings

Mapping Type Mapped Value
self include:Dataset
native include:Dataset

LinkML Source

Direct

name: Dataset
definition_uri: include:Dataset
annotations:
  required:
    tag: required
    value: 'False'
description: Information about a specific grouping of data files
title: Dataset
from_schema: https://w3id.org/include
is_a: Thing
slots:
- studyCode
- datasetName
- datasetDescription
- datasetGlobalId
- datasetExternalId
- expectedNumberOfParticipants
- expectedNumberOfFiles
- dataCollectionStartYear
- dataCollectionEndYear
- dataCategory
- dataType
- experimentalStrategy
- experimentalPlatform
- publication
- accessLimitations
- accessRequirements
- dbgap
- otherRepository
- otherAccessAuthority
- isHarmonized
slot_usage:
  dataCategory:
    name: dataCategory
    description: General category of data in Dataset; pipe-separated if multiple
    multivalued: true
  dbgap:
    name: dbgap
    description: dbGaP "phs" accession code(s) required to access the files in this
      Dataset, if applicable (pipe-separated if multiple)
  publication:
    name: publication
    description: URL for publication(s) describing the Dataset's rationale and methodology
      (PubMed Central preferred but not required; pipe-separated if multiple)
  expectedNumberOfParticipants:
    name: expectedNumberOfParticipants
    description: Expected number of participants in this Dataset (or actual number,
      if data has been submitted to INCLUDE DCC). If additional explanation is needed,
      please add to Dataset Description field.
  dataType:
    name: dataType
    description: Specific type of data contained in Dataset; pipe-separated if multiple
      (e.g. Preprocessed metabolite relative abundance, Absolute protein concentration,
      Aligned reads, Simple nucleotide variations, GVCF, Gene expression quantifications,
      Gene fusions, Somatic copy number variations, Somatic structural variations)
    multivalued: true
  experimentalStrategy:
    name: experimentalStrategy
    description: Experimental method used to obtain data in Dataset; pipe-separated
      if multiple (e.g. Whole genome sequencing, RNAseq, Multiplex immunoassay, Mass
      spec metabolomics)
    multivalued: true

Induced

name: Dataset
definition_uri: include:Dataset
annotations:
  required:
    tag: required
    value: 'False'
description: Information about a specific grouping of data files
title: Dataset
from_schema: https://w3id.org/include
is_a: Thing
slot_usage:
  dataCategory:
    name: dataCategory
    description: General category of data in Dataset; pipe-separated if multiple
    multivalued: true
  dbgap:
    name: dbgap
    description: dbGaP "phs" accession code(s) required to access the files in this
      Dataset, if applicable (pipe-separated if multiple)
  publication:
    name: publication
    description: URL for publication(s) describing the Dataset's rationale and methodology
      (PubMed Central preferred but not required; pipe-separated if multiple)
  expectedNumberOfParticipants:
    name: expectedNumberOfParticipants
    description: Expected number of participants in this Dataset (or actual number,
      if data has been submitted to INCLUDE DCC). If additional explanation is needed,
      please add to Dataset Description field.
  dataType:
    name: dataType
    description: Specific type of data contained in Dataset; pipe-separated if multiple
      (e.g. Preprocessed metabolite relative abundance, Absolute protein concentration,
      Aligned reads, Simple nucleotide variations, GVCF, Gene expression quantifications,
      Gene fusions, Somatic copy number variations, Somatic structural variations)
    multivalued: true
  experimentalStrategy:
    name: experimentalStrategy
    description: Experimental method used to obtain data in Dataset; pipe-separated
      if multiple (e.g. Whole genome sequencing, RNAseq, Multiplex immunoassay, Mass
      spec metabolomics)
    multivalued: true
attributes:
  studyCode:
    name: studyCode
    definition_uri: include:studyCode
    description: Unique identifier for the study (generally a short acronym)
    title: Study Code
    from_schema: https://w3id.org/include
    rank: 1000
    alias: studyCode
    owner: Dataset
    domain_of:
    - Biospecimen
    - DataFile
    - Participant
    - Condition
    - Study
    - Dataset
    - DatasetManifest
    range: enum_studyCode
    required: true
  datasetName:
    name: datasetName
    definition_uri: include:datasetName
    description: Full name of the dataset, provided by contributor
    title: Dataset Name
    from_schema: https://w3id.org/include
    rank: 1000
    alias: datasetName
    owner: Dataset
    domain_of:
    - Dataset
    - DatasetManifest
    range: string
    required: true
  datasetDescription:
    name: datasetDescription
    definition_uri: include:datasetDescription
    description: Brief additional notes about the dataset (1-3 sentences) that are
      not already captured in the other fields
    title: Dataset Description
    from_schema: https://w3id.org/include
    rank: 1000
    alias: datasetDescription
    owner: Dataset
    domain_of:
    - Dataset
    range: string
  datasetGlobalId:
    name: datasetGlobalId
    definition_uri: include:datasetGlobalId
    description: Unique Global ID for dataset, generated by DCC
    title: Dataset Global ID
    from_schema: https://w3id.org/include
    rank: 1000
    alias: datasetGlobalId
    owner: Dataset
    domain_of:
    - Dataset
    - DatasetManifest
    range: string
    required: false
  datasetExternalId:
    name: datasetExternalId
    definition_uri: include:datasetExternalId
    description: Unique identifier or code for dataset, if provided by contributor
    title: Dataset External ID
    from_schema: https://w3id.org/include
    rank: 1000
    alias: datasetExternalId
    owner: Dataset
    domain_of:
    - Dataset
    - DatasetManifest
    range: string
  expectedNumberOfParticipants:
    name: expectedNumberOfParticipants
    definition_uri: include:expectedNumberOfParticipants
    description: Expected number of participants in this Dataset (or actual number,
      if data has been submitted to INCLUDE DCC). If additional explanation is needed,
      please add to Dataset Description field.
    title: Expected Number of Participants
    from_schema: https://w3id.org/include
    rank: 1000
    alias: expectedNumberOfParticipants
    owner: Dataset
    domain_of:
    - Study
    - Dataset
    range: integer
    required: true
  expectedNumberOfFiles:
    name: expectedNumberOfFiles
    definition_uri: include:expectedNumberOfFiles
    description: Expected number of files associated with this dataset, including
      dictionaries. If additional explanation is needed, please add to Dataset Description
      field.
    title: Expected Number of Files
    from_schema: https://w3id.org/include
    rank: 1000
    alias: expectedNumberOfFiles
    owner: Dataset
    domain_of:
    - Dataset
    range: integer
    required: false
  dataCollectionStartYear:
    name: dataCollectionStartYear
    definition_uri: include:dataCollectionStartYear
    description: Year that data collection started
    title: Data Collection Start Year
    from_schema: https://w3id.org/include
    rank: 1000
    alias: dataCollectionStartYear
    owner: Dataset
    domain_of:
    - Dataset
    range: string
    required: false
  dataCollectionEndYear:
    name: dataCollectionEndYear
    definition_uri: include:dataCollectionEndYear
    description: Year that data collection ended
    title: Data Collection End Year
    from_schema: https://w3id.org/include
    rank: 1000
    alias: dataCollectionEndYear
    owner: Dataset
    domain_of:
    - Dataset
    range: string
    required: false
  dataCategory:
    name: dataCategory
    definition_uri: include:dataCategory
    description: General category of data in Dataset; pipe-separated if multiple
    title: Data Category
    from_schema: https://w3id.org/include
    rank: 1000
    alias: dataCategory
    owner: Dataset
    domain_of:
    - DataFile
    - Study
    - Dataset
    range: enum_dataCategory
    required: true
    multivalued: true
  dataType:
    name: dataType
    definition_uri: include:dataType
    description: Specific type of data contained in Dataset; pipe-separated if multiple
      (e.g. Preprocessed metabolite relative abundance, Absolute protein concentration,
      Aligned reads, Simple nucleotide variations, GVCF, Gene expression quantifications,
      Gene fusions, Somatic copy number variations, Somatic structural variations)
    title: Data Type
    from_schema: https://w3id.org/include
    rank: 1000
    alias: dataType
    owner: Dataset
    domain_of:
    - DataFile
    - Dataset
    range: string
    multivalued: true
  experimentalStrategy:
    name: experimentalStrategy
    definition_uri: include:experimentalStrategy
    description: Experimental method used to obtain data in Dataset; pipe-separated
      if multiple (e.g. Whole genome sequencing, RNAseq, Multiplex immunoassay, Mass
      spec metabolomics)
    title: Experimental Strategy
    from_schema: https://w3id.org/include
    rank: 1000
    alias: experimentalStrategy
    owner: Dataset
    domain_of:
    - DataFile
    - Dataset
    range: string
    multivalued: true
  experimentalPlatform:
    name: experimentalPlatform
    definition_uri: include:experimentalPlatform
    description: Specific platform used to perform experiment; pipe-separated if multiple
      (e.g. SOMAscan, MSD, Luminex, Illumina)
    title: Experimental Platform
    from_schema: https://w3id.org/include
    rank: 1000
    alias: experimentalPlatform
    owner: Dataset
    domain_of:
    - DataFile
    - Dataset
    range: string
    multivalued: true
  publication:
    name: publication
    definition_uri: include:publication
    description: URL for publication(s) describing the Dataset's rationale and methodology
      (PubMed Central preferred but not required; pipe-separated if multiple)
    title: Publication
    from_schema: https://w3id.org/include
    rank: 1000
    alias: publication
    owner: Dataset
    domain_of:
    - Study
    - Dataset
    range: uri
    multivalued: true
  accessLimitations:
    name: accessLimitations
    definition_uri: include:accessLimitations
    description: Data access limitations, as defined in the GA4GH Data Use Ontology
      (DUO; can list more than one, pipe separated)
    title: Access Limitations
    from_schema: https://w3id.org/include
    rank: 1000
    alias: accessLimitations
    owner: Dataset
    domain_of:
    - Dataset
    range: string
    required: false
  accessRequirements:
    name: accessRequirements
    definition_uri: include:accessRequirements
    description: Data access requirements, as defined in the GA4GH Data Use Ontology
      (DUO; can list more than one, pipe separated)
    title: Access Requirements
    from_schema: https://w3id.org/include
    rank: 1000
    alias: accessRequirements
    owner: Dataset
    domain_of:
    - Dataset
    range: string
    required: false
  dbgap:
    name: dbgap
    definition_uri: include:dbgap
    description: dbGaP "phs" accession code(s) required to access the files in this
      Dataset, if applicable (pipe-separated if multiple)
    title: dbGaP
    from_schema: https://w3id.org/include
    rank: 1000
    alias: dbgap
    owner: Dataset
    domain_of:
    - Study
    - Dataset
    range: string
    multivalued: true
  otherRepository:
    name: otherRepository
    definition_uri: include:otherRepository
    description: URL if dataset is already deposited in a public repository other
      than dbGaP (e.g. LONI, Metabolomics Workbench, etc.)
    title: Other Repository
    from_schema: https://w3id.org/include
    rank: 1000
    alias: otherRepository
    owner: Dataset
    domain_of:
    - Dataset
    range: uri
  otherAccessAuthority:
    name: otherAccessAuthority
    definition_uri: include:otherAccessAuthority
    description: Email or URL for dataset's Access Authority, if not dbGaP
    title: Other Access Authority
    from_schema: https://w3id.org/include
    rank: 1000
    alias: otherAccessAuthority
    owner: Dataset
    domain_of:
    - Dataset
    range: string
  isHarmonized:
    name: isHarmonized
    definition_uri: include:isHarmonized
    description: For omics datasets, is this Dataset already harmonized and available
      in the INCLUDE Data Hub?
    title: Is Harmonized?
    from_schema: https://w3id.org/include
    rank: 1000
    alias: isHarmonized
    owner: Dataset
    domain_of:
    - Dataset
    range: boolean