Lila Kari , Department of Computer Science, University of Waterloo
Additive Methods to Overcome the Limitations of Genomic Signatures
Approaches to understanding and utilizing the mathematical properties of bioinformation take many forms, from using DNA to build lattices and polyhedra, to studies of algorithmic DNA self- assembly, and harnessing DNA strand displacement for biocomputations. This talk presents a view of the structural properties of DNA sequences from yet another angle, by proposing the use of Chaos Game Representation (CGR) of DNA sequences to measure and visualize their interrelationships.
The method we propose computes an “information distance” for each pair of Chaos Game Representations of DNA sequences, and then uses multidimensional scaling to visualize the resulting interrelationships in a three-dimensional space. The result of applying it to a collection of DNA sequences is an easily interpretable 3D Molecular Distance Map wherein sequences are represented by points in a common Euclidean space, and the spatial distance between any two points reflects the differences in their subsequence composition. This general-purpose method does not require DNA sequence alignment and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths.
Our most recent results indicate that while mitochondrial DNA signatures can reliably be used for species classification, conventional CGR signatures of nuclear genomic sequences do not always succeed to separate them by their organism of origin, especially for closely related species. To address this problem, we propose the general concept of additive DNA signature of a set of DNA sequences, which combines information from all the sequences in the set to produce its signature, and show that it successfully overcomes the limitations of conventional genomic signatures.