A novel pairwise sequence alignment algorithm for similarity search in massive datasets

Research output: Contribution to journalArticlepeer-review

Abstract

Advances in sequencing technologies have resulted in the production of a huge volume of data. Since the pairwise sequence alignment plays an essential role in comparing sequencing data, various algorithms have been developed. Among the previously suggested algorithms, the basic local alignment search tool (BLAST) is currently employed in a wide range of biological applications, largely due to its low time and memory complexity. However, not only BLAST but also other improved sequence alignment algorithms may fail to produce accurate results, therefore, more efficient algorithms can be highly advantageous. In the present study, we introduce a novel algorithm for sequence alignment (NASA) consisting of preprocessing and aligning steps. In the preprocessing step, the positions of residues are determined within a provided nucleotide or peptide sequence, resulting in seeking only informative regions. In the aligning step, based on a constant number of comparisons, the sequence similarity score is calculated between two sequences in a linear time and memory orders. To evaluate NASA, a large volume of sequencing data was analyzed and the outcomes were compared with other algorithms. The results showed that NASA outperforms other basic algorithms in terms of the elapsed time, required memory, system resource utilization, and alignment score precision. Collectively, NASA might be a promising method for retrieving similar sequences from large datasets.

Original languageEnglish
Article numberbbaf512
JournalBriefings in Bioinformatics
Volume26
Issue number5
DOIs
StatePublished - Sep 1 2025

Bibliographical note

Publisher Copyright:
© The Author(s) 2025. Published by Oxford University Press.

ASJC Scopus Subject Areas

  • Information Systems
  • Molecular Biology

Keywords

  • alignment algorithms
  • heuristic methods
  • massive datasets
  • pairwise sequence alignment
  • time and memory complexities

Fingerprint

Dive into the research topics of 'A novel pairwise sequence alignment algorithm for similarity search in massive datasets'. Together they form a unique fingerprint.

Cite this