News from Hyrax Biosciences

SARS-CoV-2 sequence analysis software in simple terms.

Arrow pointing left Next news Arrow pointing left Previous news

Dr Natasha Wood 23/03/2022

SARS-CoV-2 sequence analysis software are NGS tools for analysing and interpreting the DNA of the coronavirus that caused the COVID-19 pandemic. There are multiple steps in the analysis workflow, and at each step, a researcher or laboratory scientist can use an array of different SARS-CoV-2 sequence analysis software to obtain the results they need.

During the analysis workflow, the sequenced DNA ā€“ in the form of gigabytes of text data files ā€“ is processed by SARS-CoV-2 sequence analysis software to produce a single consensus sequence: one line of text, a series of letters representing the four DNA bases A, C, T and G (and Nā€™s when the DNA base is ambiguous or if the sequencing reaction did work well in that region), adding up to the total length of the SARS-CoV-2 genome ā€“ 29 903 characters.

This consensus sequence, representing the dominant virus in the sample, is then compared to the SARS-CoV-2 reference sequence. In the case of the COVID-19 pandemic, the reference is the first sample sequenced in Wuhan, the Wuhan-Hu-1 isolate. It is interesting to think back to the start of the pandemic when no reference existed; since the viral genome was unknown at the time, the SARS-CoV-2 sequence analysis software used during the initial genotyping research, compared the first Wuhan-Hu-1 consensus sequence to many other known coronaviruses and other respiratory viruses. This is how evolutionary biologists noticed that the new Wuhan-Hu-1 viral sequence clustered with the Coronaviridae, or coronavirus, family.

By comparing the sampled consensus sequence (and the many reads used to form the consensus) with the Wuhan-Hu-1 reference, SARS-CoV-2 sequence analysis software is able to identify the differences, mismatched DNA, between the sample and the reference. These differences are what the SARS-CoV-2 sequence analysis software use to define and interpret the new variants. With a genome size of 29 903 bases, there are magnitudes of different combinations of mismatched DNA that can form new variants. SARS-CoV-2 sequence analysis software therefore needs to be well designed for analysing large genomes. In comparison, the HIV reference sequence is only 9 719 bases, more than threefold smaller than the SARS-CoV-2 genome. With each virus or bacteria, sequence analysis software need to be optimised to handle the genome features that may be specific to each.

The Exatype SARS-CoV-2 sequence analysis software has been developed to analyse data generated by any sequencing instrument, supports more than ten different assays (including ARTIC, Qiagen QIASeq, Nimagen EasySeq) and can analyse hundreds of sample data files simultaneously. If you are looking for streamlined SARS-CoV-2 sequence analysis software to simplify your analysis workflow, please get in touch or try Exatype SARS-CoV-2 yourself.

Wuhan-Hu-1 isolate: https://www.ncbi.nlm.nih.gov/nuccore/1798174254