An Alignment-Independent Approach for the Study of Viral Sequence Diversity at Any Given Rank of Taxonomy Lineage.

Chong, Li Chuin; Lim, Wei Lun; Ban, Kenneth Hon Kim; Khan, Asif M

Publication:
An Alignment-Independent Approach for the Study of Viral Sequence Diversity at Any Given Rank of Taxonomy Lineage.

Files

biology-10-00853.pdf (2.65 MB)

Authors

Chong, Li Chuin

Lim, Wei Lun

Ban, Kenneth Hon Kim

Khan, Asif M

Date

2021-08-30T21:00:00Z

Abstract

The study of viral diversity is imperative in understanding sequence change and its implications for intervention strategies. The widely used alignment-dependent approaches to study viral diversity are limited in their utility as sequence dissimilarity increases, particularly when expanded to the genus or higher ranks of viral species lineage. Herein, we present an alignment-independent algorithm, implemented as a tool, UNIQmin, to determine the effective viral sequence diversity at any rank of the viral taxonomy lineage. This is done by performing an exhaustive search to generate the minimal set of sequences for a given viral non-redundant sequence dataset. The minimal set is comprised of the smallest possible number of unique sequences required to capture the diversity inherent in the complete set of overlapping -mers encoded by all the unique sequences in the given dataset. Such dataset compression is possible through the removal of unique sequences, whose entire repertoire of overlapping -mers can be represented by other sequences, thus rendering them redundant to the collective pool of sequence diversity. A significant reduction, namely ~44%, ~45%, and ~53%, was observed for all reported unique sequences of species , genus , and family Flaviviridae, respectively, while still capturing the entire repertoire of nonamer (9-mer) viral peptidome diversity present in the initial input dataset. The algorithm is scalable for big data as it was applied to ~2.2 million non-redundant sequences of all reported viruses. UNIQmin is open source and publicly available on GitHub. The concept of a minimal set is generic and, thus, potentially applicable to other pathogenic microorganisms of non-viral origin, such as bacteria.

Keywords

UNIQmin, alignment independent, alignment-free, minimal set, proteome, sequence diversity, virus

URI

https://hdl.handle.net/20.500.12645/37950

Collections

Pubmed Ek Yayınlar

Full item page

5

Views

17

Downloads

View PlumX Details

Publication:
An Alignment-Independent Approach for the Study of Viral Sequence Diversity at Any Given Rank of Taxonomy Lineage.

Files

Organizational Units

Program

Institution Authors

Authors

Advisor

Date

Language

Type

Publisher

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Source:

Keywords:

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

5

Views

17

Downloads

Sustainable Development Goals

Publication: An Alignment-Independent Approach for the Study of Viral Sequence Diversity at Any Given Rank of Taxonomy Lineage.

Files

Organizational Units

Program

Institution Authors

Authors

Advisor

Date

Language

Type

Publisher

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Source:

Keywords:

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

5

Views

17

Downloads

Sustainable Development Goals

Publication:
An Alignment-Independent Approach for the Study of Viral Sequence Diversity at Any Given Rank of Taxonomy Lineage.