Model Driven Architecture: Conceptual Modelling of Objects for GCP System Implementation

Introduction

 

As a starting point for GCP system implementation, it was felt that the concepts and associated domain objects to be represented in the database schema, and instantiated as data objects at various levels in the system, should be modelled before the system is prototyped. This modelling task would include compilation of a glossary to capture the precise semantics of common concepts underlying the entity classes of an associated (UML) domain model (documented elsewhere).  This glossary could form part of the data interoperability standards used by the GCP.  The associated UML model would graphically represent the relationships between the conceptual entities.

Glossary

Note: associated defined concepts in the definitions are capitalized for emphasis.

 

Germplasm:  A biological entity (“identity”) with a distinct genetic composition. We agreed an instance of Germplasm may exist even if no Accession number has yet been assigned.

 

Accession:  Germplasm identified as belonging to a defined (e.g. genebank) collection. This represents a subset of the complete space of possible Germplasm instances.

 

Passport:  Description of Germplasm information/data uniquely associated with a specified Germplasm identity when it is initially collected or generated. This is associated with Germplasm rather than an associated Accession identifier since Germplasm entities that have not yet been given assigned Accession identifier may still have Passport data associated with them via its originating Germplasm identity. Passport includes associations with Phenotype (characterization, fixed), Taxonomy and Location data.

 

Bulk (or Germplasm Group):  Group of Germplasm an experiment is carried out on.

 

Genotype:  A set of molecular variants associated with the genome of a specified Germplasm identity. This may have a “wide” sense (i.e. the entire genotype of a Germplasm identity) or in a narrow sense (the genotype of Germplasm at a subset of specified loci, included a limited number of fingerprinted but anonymous loci).  At a specific locus, a genotype would be a set of alleles observed in the specified Germplasm at that locus. Genotypes can be descriptive of more than one genome if the Germplasm in question represents a bulk or populations which are genetic pools treated as a genotype for experimental or practical  purposes.

 

Trait: Observable/measurable characteristics of Germplasm

 

Phenotype:  The specific values of traits observed for a specified Germplasm identity. In addition to being specifically associated with specified Germplasm, a Phenotype may be conditional as to Time of observation (e.g. developmental stage) and Environmental conditions (linked to a specified Location).

 

Environment: in the broad sense, the whole complex of climatic, edaphic (i.e. soil) and biotic factors that are external to, but act upon, Germplasm (definition adapted from Webster’s Third New International Dictionary, 1981).  Used in the more narrow sense, the conditions external to a specified Germplasm instance (i.e. in the field, greenhouse or lab in which a plant is grown) under which data on Phenotype of the Germplasm is collected.

 

Marker:   An experimental system that can be applied to germplasm to detect loci in a genome. May have multiple loci connected to an individual marker. The system could identify many segregating loci depending on the method used. Markers can be genes or traits and the traits will link phenotype information to the locus, map and chromosome objects. These may need to be translated to solidify their connection with the locus object. If we define a marker to include QTLs we may need an object to translate the data into its loci. Work must be done on the QTL to break it down into its loci.

 

Chromosome:  Individual genetic units in a genome corresponding to a distinct DNA molecule in the living cells of an organism. Observed loci on a chromosome may be partitioned experimentally into one or more linkage groups. Chromosomes have biological reality independent of the existence of linkage groups and maps that describe them.

 

Locus:  A specified position in a genome. May be a point or interval (as for QTLs).

 

Allele:  A distinct variant of a locus.

 

Linkage Group:  Group of two or more genetically or physically mapped loci with observed linkage. Although biologically inferring the physical proximity of loci on a chromosome, knowledge about a Linkage Group may be curated in the absence of knowledge of the chromosome to which it belongs.

 

Map:  A set of linkage groups for a study. A map may exist without reference to a map study, such as a consensus map form the merging of maps from multiple map studies to define a consensus for a species.

 

Map Study:  The information used to generate a map. Map studies may have more than one map, such as when different methods are used to generate maps from identical sets of data.

 

Haplotype Block:  A set of loci whose alleles co-segregate.

 

Haplotype:  Collection of alleles that co-segregate (the set of alleles which co-segregate for a haplotype block). Defines a region of the genome with no recombination in a set of germplasm. Must have a least two loci.

 

Fingerprint: A subclass of genotype that captures a binary vector of states (usually bands on a gel) for a specified Germplasm.  A Fingerprint may possibly be modelled as a subtype of Haplotype.