Skip to content

Identifying mutations on the COVID-19 sequences gathered from https://www.gisaid.org/ database.

Notifications You must be signed in to change notification settings

LMSE/COVID_19_Mutations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The COVID_19_Mutation Package

This package is developed to identify mutations on SARS-CoV-2 receptor binding domain (RBD). global distribution of mutations

severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the outbreak of COVID-19, which began in China in December 2019. The SARS-CoV-2 RBD plays the most important roles in viral attachment, fusion and entry, and serves as a target for development of antibodies, entry inhibitors and vaccines (Tai W, et al. 2020).

The user may submit nucleotide sequences of any species to this package and perform mutational analysis.

Package Requirements

  • developed by MATLAB R2020a.
  • To run the code, we recommend that the user make sure the Bioinformatics toolbox is installed on MATLAB (read more).
  • All the nucleotide sequences must be submmited in the FASTA format.
  • The user must prepare the input to this package according to their objectives.

Steps for Preparing the Input

  • Download nucleotide sequences for any species, such as SARS-COV-2 patient samples, available on the GISAID database.
  • Copy the dataset to the COVID_19_Mutations\Input directory.
  • Download a standard nucleotide sequence from NCBI and copy it to the COVID_19_Mutations\Input directory. (As a default, SARS-CoV-2 sequence is provided in this directory)
  • Should the user seek to map mutations’ location to the protein structure, a FASTA file of the PDB structure must be provided in the COVID_19_Mutations\Input directory. (As a default, the PDF file for SARS-CoV-2 spike protein is provided in this directory).
  • Open MATLAB and add COVID_19_Mutations\CODE\Functions to the path.
  • Run COVID_19_Mutations\CODE\main.m script.

Package Workflow/ Output

  • Input sequences are refined to remove any duplicate eneteries, and animal samples.
  • Meta data (Country and date) of submited sequences is mined and stored.
  • This package will generate a database of submitted nucleotide sequences along with their three reading frames (This step can take up to two days for large input files on PC)
  • This database is then saved in the COVID_19_Mutation/Output/database directory.
  • The main reading frame of the standard sequence will also be generated.
  • Nucleotide sequences and Three reading frames are locally aligned to the standard sequences.(This step can take up to three days for large input files on personal PC)
  • Sequences that do not show any mutations are then removed. Also a reading frame that matches the standard amino acid sequence is selected.
  • Mutated Sequences are then saved in the COVID_19_Mutation/Output directory.
  • Data is also stored in several tables of an Excel file that can be access in the COVID_19_Mutation/Output directory.

Generated Excel File

The generated Excel file comprises several sheets, discribed below:

  1. Nucleotid Seq Sheet: Comprises identified Mutations on submitted nucleotide sequences.
  2. Amino Acid Seq Sheet: Comprises identified Mutations on amino acid sequences.
  3. Nucleotide Frequency Sheet: Comprises the frequency of Nucleotide mutations across the submitted dataset.
  4. Amino Acid Frequency Sheet: Comprises the frequency of Amino Acid mutations across the submitted dataset.
  5. NT Location Frequency Sheet: Comprises the frequency of mutations' location across the nucleotide sequences.
  6. AA Location Frequency Sheet: Comprises the frequency of mutations' location across the nucleotide sequences.
  7. Country Frequency Sheet: the frequency of Amino Acid mutations per each country.

Contact Information

Should you face any question, please do not hesitate to contact me via

About

Identifying mutations on the COVID-19 sequences gathered from https://www.gisaid.org/ database.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published