-
Notifications
You must be signed in to change notification settings - Fork 1
HBV‐GLUE Schema Extensions
The HBV-GLUE project-specific schema extensions are defined in this project build file.
These extensions expand the standard GLUE schema by introducing new fields and relationships, enhancing the ability to store, analyze, and interpret HBV-specific sequence and metadata.
Below is an overview of the key extensions:
The sequence
table has been extended with a range of fields to capture both GenBank-derived metadata and analysis-specific annotations. Key additions include:
-
GenBank Metadata Fields: Fields like
gb_gi_number
,gb_primary_accession
,gb_country
, andgb_create_date
enable rich metadata storage, providing a direct link between sequences and their original GenBank records. -
HBV-Specific Annotations: Fields such as
is_hbv
,species
,genotype
, andsubgenotype
store results from automated genotyping and species recognition processes, ensuring consistent identification and classification of sequences. -
Collection Data: Fields like
collection_year
,collection_month
, andearliest_collection_year
support temporal analyses of HBV sequences. -
Sequence Transformations: The
rotation
andreverse_complement
fields document transformations (e.g., rotation for circular genomes and reverse-complementing sequences) applied during preprocessing. -
Custom Attributes: Fields like
patent_related
andexclude_from_almt_tree
allow for project-specific curation of sequences.
The alignment
table is extended with fields to support phylogenetic and hierarchical organization of HBV sequences:
-
clade_category
andminimal_name
: Capture hierarchical clade information for phylogenetic analysis. -
phylogeny
: Stores phylogenetic trees directly within the database for alignment-based analyses.
The member_floc_note
table includes the following:
-
ref_nt_coverage_pct
: Tracks the percentage coverage of reference nucleotides for specific feature locations in member sequences.
A standardized schema extension for country and region information (m49SchemaExtension
) is included, linking sequences to standardized country codes through a MANY_TO_ONE
relationship. This enables geographic metadata to be incorporated into analyses with consistency and interoperability.