Skip to content

Class: Data File (DataFile)

Metadata about Data Files

URI: include:DataFile

classDiagram class DataFile click DataFile href "../DataFile" Thing <|-- DataFile click Thing href "../Thing" DataFile : dataAccess DataFile --> "1" EnumDataAccess : dataAccess click EnumDataAccess href "../EnumDataAccess" DataFile : dataCategory DataFile --> "1" EnumDataCategory : dataCategory click EnumDataCategory href "../EnumDataCategory" DataFile : dataType DataFile : drsUri DataFile : experimentalPlatform DataFile : experimentalStrategy DataFile : fileFormat DataFile : fileGlobalId DataFile : fileHash DataFile : fileName DataFile : fileS3Location DataFile : fileSize DataFile : fileSizeUnit DataFile : fileUploadLocation DataFile : participantExternalId DataFile : participantGlobalId DataFile : sampleExternalId DataFile : sampleGlobalId DataFile : studyCode DataFile --> "1" EnumStudyCode : studyCode click EnumStudyCode href "../EnumStudyCode"

Inheritance

Slots

Name Cardinality and Range Description Inheritance
studyCode 1
EnumStudyCode
Unique identifier for the study (generally a short acronym) direct
participantGlobalId 1
String
Unique INCLUDE global identifier for the participant, assigned by DCC direct
participantExternalId 1
String
Unique, de-identified identifier for the participant, assigned by data contri... direct
sampleGlobalId 1
String
INCLUDE global identifier for sample, assigned by DCC direct
sampleExternalId 1
String
Unique identifier for sample, assigned by data contributor direct
fileName 1
String
Name of file, assigned by data contributor direct
fileGlobalId 1
String
INCLUDE global file identifier, assigned by DCC direct
fileS3Location 1
String
S3 bucket location of file; also serves as dewrangle descriptor direct
fileUploadLocation 0..1
String
Where source file was uploaded, if not directly to an S3 bucket (e direct
drsUri 1
Uriorcurie
Data Repository Services API Uniform Resource Identifier direct
fileHash 0..1
String
md5 hash of this file for validation (if known) direct
dataAccess 1
EnumDataAccess
Type of access control on this file, determined by DCC direct
dataCategory 1
EnumDataCategory
General category of data in file (e direct
dataType 0..1
String
Specific type of data contained in file (e direct
experimentalStrategy *
String
Experimental method used to obtain data in file (e direct
experimentalPlatform *
String
Specific platform used to perform experiment; pipe-separated if multiple (e direct
fileFormat 1
String
Format of file (e direct
fileSize 0..1
Integer
Size of file, if known (mainly important if large) direct
fileSizeUnit 0..1
String
Unit of file size direct

Identifier and Mapping Information

Annotations

property value
required True

Schema Source

  • from schema: https://w3id.org/include

Mappings

Mapping Type Mapped Value
self include:DataFile
native include:DataFile

LinkML Source

Direct

name: DataFile
definition_uri: include:DataFile
annotations:
  required:
    tag: required
    value: 'True'
  requires_component:
    tag: requires_component
    value: Study,Participant,Biospecimen
description: Metadata about Data Files
title: Data File
from_schema: https://w3id.org/include
is_a: Thing
slots:
- studyCode
- participantGlobalId
- participantExternalId
- sampleGlobalId
- sampleExternalId
- fileName
- fileGlobalId
- fileS3Location
- fileUploadLocation
- drsUri
- fileHash
- dataAccess
- dataCategory
- dataType
- experimentalStrategy
- experimentalPlatform
- fileFormat
- fileSize
- fileSizeUnit
slot_usage:
  dataCategory:
    name: dataCategory
    description: General category of data in file (e.g. Clinical, Genomics, Proteomics,
      Metabolomics, Immune profiling, Transcriptomics)
  dataType:
    name: dataType
    description: Specific type of data contained in file (e.g. Preprocessed metabolite
      relative abundance, Absolute protein concentration, Aligned reads, Simple nucleotide
      variations, GVCF, Gene expression quantifications, Gene fusions, Somatic copy
      number variations, Somatic structural variations)
  experimentalStrategy:
    name: experimentalStrategy
    description: Experimental method used to obtain data in file (e.g. Whole genome
      sequencing, RNAseq, Multiplex immunoassay, Mass spec metabolomics)

Induced

name: DataFile
definition_uri: include:DataFile
annotations:
  required:
    tag: required
    value: 'True'
  requires_component:
    tag: requires_component
    value: Study,Participant,Biospecimen
description: Metadata about Data Files
title: Data File
from_schema: https://w3id.org/include
is_a: Thing
slot_usage:
  dataCategory:
    name: dataCategory
    description: General category of data in file (e.g. Clinical, Genomics, Proteomics,
      Metabolomics, Immune profiling, Transcriptomics)
  dataType:
    name: dataType
    description: Specific type of data contained in file (e.g. Preprocessed metabolite
      relative abundance, Absolute protein concentration, Aligned reads, Simple nucleotide
      variations, GVCF, Gene expression quantifications, Gene fusions, Somatic copy
      number variations, Somatic structural variations)
  experimentalStrategy:
    name: experimentalStrategy
    description: Experimental method used to obtain data in file (e.g. Whole genome
      sequencing, RNAseq, Multiplex immunoassay, Mass spec metabolomics)
attributes:
  studyCode:
    name: studyCode
    definition_uri: include:studyCode
    description: Unique identifier for the study (generally a short acronym)
    title: Study Code
    from_schema: https://w3id.org/include
    rank: 1000
    alias: studyCode
    owner: DataFile
    domain_of:
    - Biospecimen
    - DataFile
    - Participant
    - Condition
    - Study
    - Dataset
    - DatasetManifest
    range: enum_studyCode
    required: true
  participantGlobalId:
    name: participantGlobalId
    definition_uri: include:participantGlobalId
    description: Unique INCLUDE global identifier for the participant, assigned by
      DCC
    title: Participant Global ID
    from_schema: https://w3id.org/include
    rank: 1000
    alias: participantGlobalId
    owner: DataFile
    domain_of:
    - Biospecimen
    - DataFile
    - Participant
    - Condition
    range: string
    required: true
  participantExternalId:
    name: participantExternalId
    definition_uri: include:participantExternalId
    description: Unique, de-identified identifier for the participant, assigned by
      data contributor. External IDs must be two steps removed from personal information
      in the study records.
    title: Participant External ID
    from_schema: https://w3id.org/include
    rank: 1000
    alias: participantExternalId
    owner: DataFile
    domain_of:
    - Biospecimen
    - DataFile
    - Participant
    - Condition
    range: string
    required: true
  sampleGlobalId:
    name: sampleGlobalId
    definition_uri: include:sampleGlobalId
    description: INCLUDE global identifier for sample, assigned by DCC
    title: Sample Global ID
    from_schema: https://w3id.org/include
    rank: 1000
    alias: sampleGlobalId
    owner: DataFile
    domain_of:
    - Biospecimen
    - DataFile
    range: string
    required: true
  sampleExternalId:
    name: sampleExternalId
    definition_uri: include:sampleExternalId
    description: Unique identifier for sample, assigned by data contributor. A sample
      is a unique biological material; two samples with two different IDs are biologically
      distinct.
    title: Sample External ID
    from_schema: https://w3id.org/include
    rank: 1000
    alias: sampleExternalId
    owner: DataFile
    domain_of:
    - Biospecimen
    - DataFile
    range: string
    required: true
  fileName:
    name: fileName
    definition_uri: include:fileName
    description: Name of file, assigned by data contributor
    title: File Name
    from_schema: https://w3id.org/include
    rank: 1000
    alias: fileName
    owner: DataFile
    domain_of:
    - DataFile
    - DatasetManifest
    range: string
    required: true
  fileGlobalId:
    name: fileGlobalId
    definition_uri: include:fileGlobalId
    description: INCLUDE global file identifier, assigned by DCC
    title: File Global ID
    from_schema: https://w3id.org/include
    rank: 1000
    alias: fileGlobalId
    owner: DataFile
    domain_of:
    - DataFile
    - DatasetManifest
    range: string
    required: true
  fileS3Location:
    name: fileS3Location
    definition_uri: include:fileS3Location
    description: S3 bucket location of file; also serves as dewrangle descriptor
    title: File S3 Location
    from_schema: https://w3id.org/include
    rank: 1000
    alias: fileS3Location
    owner: DataFile
    domain_of:
    - DataFile
    range: string
    required: true
  fileUploadLocation:
    name: fileUploadLocation
    definition_uri: include:fileUploadLocation
    description: Where source file was uploaded, if not directly to an S3 bucket (e.g.
      Synapse)
    title: File Upload Location
    from_schema: https://w3id.org/include
    rank: 1000
    alias: fileUploadLocation
    owner: DataFile
    domain_of:
    - DataFile
    range: string
  drsUri:
    name: drsUri
    definition_uri: include:drsUri
    description: Data Repository Services API Uniform Resource Identifier
    title: DRS URI
    from_schema: https://w3id.org/include
    rank: 1000
    alias: drsUri
    owner: DataFile
    domain_of:
    - DataFile
    range: uriorcurie
    required: true
  fileHash:
    name: fileHash
    definition_uri: include:fileHash
    description: md5 hash of this file for validation (if known)
    title: File Hash
    from_schema: https://w3id.org/include
    rank: 1000
    alias: fileHash
    owner: DataFile
    domain_of:
    - DataFile
    range: string
  dataAccess:
    name: dataAccess
    definition_uri: include:dataAccess
    description: Type of access control on this file, determined by DCC
    title: Data Access
    from_schema: https://w3id.org/include
    rank: 1000
    alias: dataAccess
    owner: DataFile
    domain_of:
    - DataFile
    range: enum_dataAccess
    required: true
  dataCategory:
    name: dataCategory
    definition_uri: include:dataCategory
    description: General category of data in file (e.g. Clinical, Genomics, Proteomics,
      Metabolomics, Immune profiling, Transcriptomics)
    title: Data Category
    from_schema: https://w3id.org/include
    rank: 1000
    alias: dataCategory
    owner: DataFile
    domain_of:
    - DataFile
    - Study
    - Dataset
    range: enum_dataCategory
    required: true
  dataType:
    name: dataType
    definition_uri: include:dataType
    description: Specific type of data contained in file (e.g. Preprocessed metabolite
      relative abundance, Absolute protein concentration, Aligned reads, Simple nucleotide
      variations, GVCF, Gene expression quantifications, Gene fusions, Somatic copy
      number variations, Somatic structural variations)
    title: Data Type
    from_schema: https://w3id.org/include
    rank: 1000
    alias: dataType
    owner: DataFile
    domain_of:
    - DataFile
    - Dataset
    range: string
  experimentalStrategy:
    name: experimentalStrategy
    definition_uri: include:experimentalStrategy
    description: Experimental method used to obtain data in file (e.g. Whole genome
      sequencing, RNAseq, Multiplex immunoassay, Mass spec metabolomics)
    title: Experimental Strategy
    from_schema: https://w3id.org/include
    rank: 1000
    alias: experimentalStrategy
    owner: DataFile
    domain_of:
    - DataFile
    - Dataset
    range: string
    multivalued: true
  experimentalPlatform:
    name: experimentalPlatform
    definition_uri: include:experimentalPlatform
    description: Specific platform used to perform experiment; pipe-separated if multiple
      (e.g. SOMAscan, MSD, Luminex, Illumina)
    title: Experimental Platform
    from_schema: https://w3id.org/include
    rank: 1000
    alias: experimentalPlatform
    owner: DataFile
    domain_of:
    - DataFile
    - Dataset
    range: string
    multivalued: true
  fileFormat:
    name: fileFormat
    definition_uri: include:fileFormat
    description: Format of file (e.g. tsv, cram, gvcf, vcf, maf, txt, pdf, html, png)
    title: File Format
    from_schema: https://w3id.org/include
    rank: 1000
    alias: fileFormat
    owner: DataFile
    domain_of:
    - DataFile
    range: string
    required: true
  fileSize:
    name: fileSize
    definition_uri: include:fileSize
    description: Size of file, if known (mainly important if large)
    title: File Size
    from_schema: https://w3id.org/include
    rank: 1000
    alias: fileSize
    owner: DataFile
    domain_of:
    - DataFile
    range: integer
  fileSizeUnit:
    name: fileSizeUnit
    definition_uri: include:fileSizeUnit
    description: Unit of file size
    title: File Size Unit
    from_schema: https://w3id.org/include
    rank: 1000
    alias: fileSizeUnit
    owner: DataFile
    domain_of:
    - DataFile
    range: string