RMIT University
Browse

An inexact-suffix-tree-based algorithm for detecting extensible patterns

journal contribution
posted on 2024-11-01, 01:16 authored by Abhijit Chattaraj, Laxmi Parida
Given an input sequence of data, a rigid pattern is a repeating sequence, possibly interspersed with dont-care characters. The data could be a sequence of characters or sets of characters or even real values. In practice, the patterns or motifs of interest are the ones that also allow a variable number of gaps (or dont-care characters): these are patterns with spacers termed extensible patterns In a bioinformatics context, similar patterns have also been called flexible patterns or motifs. The extensibility is succinctly defined by a single integer parameter D >= 1 which is interpreted as the allowable space to be between I and D characters between two successive solid characters in a reported motif. We introduce a data structure called the inexact-suffix tree and present an algorithm based on this data structure. This has been tested on primarily biological data such as DNA and protein sequences. However the generality of the system makes it equally applicable in other data mining, clustering, and knowledge extraction applications.

History

Journal

Theoretical Computer Science

Volume

335

Start page

3

End page

14

Total pages

12

Publisher

Elsevier Science

Place published

Amsterdam

Language

English

Copyright

Copyright © 2005 Elsevier B.V.

Former Identifier

2005000255

Esploro creation date

2020-06-22

Fedora creation date

2009-02-27

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC