Class: Data File (DataFile)
Metadata about Data Files
URI: include:DataFile
classDiagram
class DataFile
click DataFile href "../DataFile"
Thing <|-- DataFile
click Thing href "../Thing"
DataFile : dataAccess
DataFile --> "1" EnumDataAccess : dataAccess
click EnumDataAccess href "../EnumDataAccess"
DataFile : dataCategory
DataFile --> "1" EnumDataCategory : dataCategory
click EnumDataCategory href "../EnumDataCategory"
DataFile : dataType
DataFile : drsUri
DataFile : experimentalPlatform
DataFile : experimentalStrategy
DataFile : fileFormat
DataFile : fileGlobalId
DataFile : fileHash
DataFile : fileName
DataFile : fileS3Location
DataFile : fileSize
DataFile : fileSizeUnit
DataFile : fileUploadLocation
DataFile : participantExternalId
DataFile : participantGlobalId
DataFile : sampleExternalId
DataFile : sampleGlobalId
DataFile : studyCode
DataFile --> "1" EnumStudyCode : studyCode
click EnumStudyCode href "../EnumStudyCode"
Inheritance
- Thing
- DataFile
Slots
Name | Cardinality and Range | Description | Inheritance |
---|---|---|---|
studyCode | 1 EnumStudyCode |
Unique identifier for the study (generally a short acronym) | direct |
participantGlobalId | 1 String |
Unique INCLUDE global identifier for the participant, assigned by DCC | direct |
participantExternalId | 1 String |
Unique, de-identified identifier for the participant, assigned by data contri... | direct |
sampleGlobalId | 1 String |
INCLUDE global identifier for sample, assigned by DCC | direct |
sampleExternalId | 1 String |
Unique identifier for sample, assigned by data contributor | direct |
fileName | 1 String |
Name of file, assigned by data contributor | direct |
fileGlobalId | 1 String |
INCLUDE global file identifier, assigned by DCC | direct |
fileS3Location | 1 String |
S3 bucket location of file; also serves as dewrangle descriptor | direct |
fileUploadLocation | 0..1 String |
Where source file was uploaded, if not directly to an S3 bucket (e | direct |
drsUri | 1 Uriorcurie |
Data Repository Services API Uniform Resource Identifier | direct |
fileHash | 0..1 String |
md5 hash of this file for validation (if known) | direct |
dataAccess | 1 EnumDataAccess |
Type of access control on this file, determined by DCC | direct |
dataCategory | 1 EnumDataCategory |
General category of data in file (e | direct |
dataType | 0..1 String |
Specific type of data contained in file (e | direct |
experimentalStrategy | * String |
Experimental method used to obtain data in file (e | direct |
experimentalPlatform | * String |
Specific platform used to perform experiment; pipe-separated if multiple (e | direct |
fileFormat | 1 String |
Format of file (e | direct |
fileSize | 0..1 Integer |
Size of file, if known (mainly important if large) | direct |
fileSizeUnit | 0..1 String |
Unit of file size | direct |
Identifier and Mapping Information
Annotations
property | value |
---|---|
required | True |
Schema Source
- from schema: https://w3id.org/include
Mappings
Mapping Type | Mapped Value |
---|---|
self | include:DataFile |
native | include:DataFile |
LinkML Source
Direct
name: DataFile
definition_uri: include:DataFile
annotations:
required:
tag: required
value: 'True'
requires_component:
tag: requires_component
value: Study,Participant,Biospecimen
description: Metadata about Data Files
title: Data File
from_schema: https://w3id.org/include
is_a: Thing
slots:
- studyCode
- participantGlobalId
- participantExternalId
- sampleGlobalId
- sampleExternalId
- fileName
- fileGlobalId
- fileS3Location
- fileUploadLocation
- drsUri
- fileHash
- dataAccess
- dataCategory
- dataType
- experimentalStrategy
- experimentalPlatform
- fileFormat
- fileSize
- fileSizeUnit
slot_usage:
dataCategory:
name: dataCategory
description: General category of data in file (e.g. Clinical, Genomics, Proteomics,
Metabolomics, Immune profiling, Transcriptomics)
dataType:
name: dataType
description: Specific type of data contained in file (e.g. Preprocessed metabolite
relative abundance, Absolute protein concentration, Aligned reads, Simple nucleotide
variations, GVCF, Gene expression quantifications, Gene fusions, Somatic copy
number variations, Somatic structural variations)
experimentalStrategy:
name: experimentalStrategy
description: Experimental method used to obtain data in file (e.g. Whole genome
sequencing, RNAseq, Multiplex immunoassay, Mass spec metabolomics)
Induced
name: DataFile
definition_uri: include:DataFile
annotations:
required:
tag: required
value: 'True'
requires_component:
tag: requires_component
value: Study,Participant,Biospecimen
description: Metadata about Data Files
title: Data File
from_schema: https://w3id.org/include
is_a: Thing
slot_usage:
dataCategory:
name: dataCategory
description: General category of data in file (e.g. Clinical, Genomics, Proteomics,
Metabolomics, Immune profiling, Transcriptomics)
dataType:
name: dataType
description: Specific type of data contained in file (e.g. Preprocessed metabolite
relative abundance, Absolute protein concentration, Aligned reads, Simple nucleotide
variations, GVCF, Gene expression quantifications, Gene fusions, Somatic copy
number variations, Somatic structural variations)
experimentalStrategy:
name: experimentalStrategy
description: Experimental method used to obtain data in file (e.g. Whole genome
sequencing, RNAseq, Multiplex immunoassay, Mass spec metabolomics)
attributes:
studyCode:
name: studyCode
definition_uri: include:studyCode
description: Unique identifier for the study (generally a short acronym)
title: Study Code
from_schema: https://w3id.org/include
rank: 1000
alias: studyCode
owner: DataFile
domain_of:
- Biospecimen
- DataFile
- Participant
- Condition
- Study
- Dataset
- DatasetManifest
range: enum_studyCode
required: true
participantGlobalId:
name: participantGlobalId
definition_uri: include:participantGlobalId
description: Unique INCLUDE global identifier for the participant, assigned by
DCC
title: Participant Global ID
from_schema: https://w3id.org/include
rank: 1000
alias: participantGlobalId
owner: DataFile
domain_of:
- Biospecimen
- DataFile
- Participant
- Condition
range: string
required: true
participantExternalId:
name: participantExternalId
definition_uri: include:participantExternalId
description: Unique, de-identified identifier for the participant, assigned by
data contributor. External IDs must be two steps removed from personal information
in the study records.
title: Participant External ID
from_schema: https://w3id.org/include
rank: 1000
alias: participantExternalId
owner: DataFile
domain_of:
- Biospecimen
- DataFile
- Participant
- Condition
range: string
required: true
sampleGlobalId:
name: sampleGlobalId
definition_uri: include:sampleGlobalId
description: INCLUDE global identifier for sample, assigned by DCC
title: Sample Global ID
from_schema: https://w3id.org/include
rank: 1000
alias: sampleGlobalId
owner: DataFile
domain_of:
- Biospecimen
- DataFile
range: string
required: true
sampleExternalId:
name: sampleExternalId
definition_uri: include:sampleExternalId
description: Unique identifier for sample, assigned by data contributor. A sample
is a unique biological material; two samples with two different IDs are biologically
distinct.
title: Sample External ID
from_schema: https://w3id.org/include
rank: 1000
alias: sampleExternalId
owner: DataFile
domain_of:
- Biospecimen
- DataFile
range: string
required: true
fileName:
name: fileName
definition_uri: include:fileName
description: Name of file, assigned by data contributor
title: File Name
from_schema: https://w3id.org/include
rank: 1000
alias: fileName
owner: DataFile
domain_of:
- DataFile
- DatasetManifest
range: string
required: true
fileGlobalId:
name: fileGlobalId
definition_uri: include:fileGlobalId
description: INCLUDE global file identifier, assigned by DCC
title: File Global ID
from_schema: https://w3id.org/include
rank: 1000
alias: fileGlobalId
owner: DataFile
domain_of:
- DataFile
- DatasetManifest
range: string
required: true
fileS3Location:
name: fileS3Location
definition_uri: include:fileS3Location
description: S3 bucket location of file; also serves as dewrangle descriptor
title: File S3 Location
from_schema: https://w3id.org/include
rank: 1000
alias: fileS3Location
owner: DataFile
domain_of:
- DataFile
range: string
required: true
fileUploadLocation:
name: fileUploadLocation
definition_uri: include:fileUploadLocation
description: Where source file was uploaded, if not directly to an S3 bucket (e.g.
Synapse)
title: File Upload Location
from_schema: https://w3id.org/include
rank: 1000
alias: fileUploadLocation
owner: DataFile
domain_of:
- DataFile
range: string
drsUri:
name: drsUri
definition_uri: include:drsUri
description: Data Repository Services API Uniform Resource Identifier
title: DRS URI
from_schema: https://w3id.org/include
rank: 1000
alias: drsUri
owner: DataFile
domain_of:
- DataFile
range: uriorcurie
required: true
fileHash:
name: fileHash
definition_uri: include:fileHash
description: md5 hash of this file for validation (if known)
title: File Hash
from_schema: https://w3id.org/include
rank: 1000
alias: fileHash
owner: DataFile
domain_of:
- DataFile
range: string
dataAccess:
name: dataAccess
definition_uri: include:dataAccess
description: Type of access control on this file, determined by DCC
title: Data Access
from_schema: https://w3id.org/include
rank: 1000
alias: dataAccess
owner: DataFile
domain_of:
- DataFile
range: enum_dataAccess
required: true
dataCategory:
name: dataCategory
definition_uri: include:dataCategory
description: General category of data in file (e.g. Clinical, Genomics, Proteomics,
Metabolomics, Immune profiling, Transcriptomics)
title: Data Category
from_schema: https://w3id.org/include
rank: 1000
alias: dataCategory
owner: DataFile
domain_of:
- DataFile
- Study
- Dataset
range: enum_dataCategory
required: true
dataType:
name: dataType
definition_uri: include:dataType
description: Specific type of data contained in file (e.g. Preprocessed metabolite
relative abundance, Absolute protein concentration, Aligned reads, Simple nucleotide
variations, GVCF, Gene expression quantifications, Gene fusions, Somatic copy
number variations, Somatic structural variations)
title: Data Type
from_schema: https://w3id.org/include
rank: 1000
alias: dataType
owner: DataFile
domain_of:
- DataFile
- Dataset
range: string
experimentalStrategy:
name: experimentalStrategy
definition_uri: include:experimentalStrategy
description: Experimental method used to obtain data in file (e.g. Whole genome
sequencing, RNAseq, Multiplex immunoassay, Mass spec metabolomics)
title: Experimental Strategy
from_schema: https://w3id.org/include
rank: 1000
alias: experimentalStrategy
owner: DataFile
domain_of:
- DataFile
- Dataset
range: string
multivalued: true
experimentalPlatform:
name: experimentalPlatform
definition_uri: include:experimentalPlatform
description: Specific platform used to perform experiment; pipe-separated if multiple
(e.g. SOMAscan, MSD, Luminex, Illumina)
title: Experimental Platform
from_schema: https://w3id.org/include
rank: 1000
alias: experimentalPlatform
owner: DataFile
domain_of:
- DataFile
- Dataset
range: string
multivalued: true
fileFormat:
name: fileFormat
definition_uri: include:fileFormat
description: Format of file (e.g. tsv, cram, gvcf, vcf, maf, txt, pdf, html, png)
title: File Format
from_schema: https://w3id.org/include
rank: 1000
alias: fileFormat
owner: DataFile
domain_of:
- DataFile
range: string
required: true
fileSize:
name: fileSize
definition_uri: include:fileSize
description: Size of file, if known (mainly important if large)
title: File Size
from_schema: https://w3id.org/include
rank: 1000
alias: fileSize
owner: DataFile
domain_of:
- DataFile
range: integer
fileSizeUnit:
name: fileSizeUnit
definition_uri: include:fileSizeUnit
description: Unit of file size
title: File Size Unit
from_schema: https://w3id.org/include
rank: 1000
alias: fileSizeUnit
owner: DataFile
domain_of:
- DataFile
range: string