Skip to content

HBV‐GLUE Schema Extensions

Robert J. Gifford edited this page Nov 27, 2024 · 4 revisions

The HBV-GLUE project-specific schema extensions are defined in this project build file.

These extensions expand the standard GLUE schema by introducing new fields and relationships, enhancing the ability to store, analyze, and interpret HBV-specific sequence and metadata.

Below is an overview of the key extensions:

1. Sequence Table Extensions

The sequence table has been extended with a range of fields to capture both GenBank-derived metadata and analysis-specific annotations. Key additions include:

  • GenBank Metadata Fields: Fields like gb_gi_number, gb_primary_accession, gb_country, and gb_create_date enable rich metadata storage, providing a direct link between sequences and their original GenBank records.
  • HBV-Specific Annotations: Fields such as is_hbv, species, genotype, and subgenotype store results from automated genotyping and species recognition processes, ensuring consistent identification and classification of sequences.
  • Collection Data: Fields like collection_year, collection_month, and earliest_collection_year support temporal analyses of HBV sequences.
  • Sequence Transformations: The rotation and reverse_complement fields document transformations (e.g., rotation for circular genomes and reverse-complementing sequences) applied during preprocessing.
  • Custom Attributes: Fields like patent_related and exclude_from_almt_tree allow for project-specific curation of sequences.

2. Alignment Table Extensions

The alignment table is extended with fields to support phylogenetic and hierarchical organization of HBV sequences:

  • clade_category and minimal_name: Capture hierarchical clade information for phylogenetic analysis.
  • phylogeny: Stores phylogenetic trees directly within the database for alignment-based analyses.

3. Member Feature Location Notes

The member_floc_note table includes the following:

  • ref_nt_coverage_pct: Tracks the percentage coverage of reference nucleotides for specific feature locations in member sequences.

4. Country Schema Integration

A standardized schema extension for country and region information (m49SchemaExtension) is included, linking sequences to standardized country codes through a MANY_TO_ONE relationship. This enables geographic metadata to be incorporated into analyses with consistency and interoperability.