Browse Prior Art Database

Asymptotic distributions for sequence alignment scores

IP.com Disclosure Number: IPCOM000200008D
Publication Date: 2010-Sep-23
Document File: 3 page(s) / 27K

Publishing Venue

The IP.com Prior Art Database

Abstract

Probability estimation of alignment by chance for score-based alignments (BLAST) requires two parameters to compute in the Karlin-Altschul (K-A) distribution. In the presence of gap-penalties, estimating these parameters has required extensive monte-carlo computations. Disclosed is a very simple formulae to provide these parameters, thus enabling interactive experimentation with scoring schemes, currently impossible in published art.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 3

Asymptotic distributions for sequence alignment scores

Karlin, Altschul, and others derived a distribution function (partial) for probabilities for a score obtained from an alignment scoring matrix and a random alignment. (1-3) This distribution function is governed by two parameters: a scale factor and a prefactor. The computation of the prefactor is numerically intensive and depends on the scoring matrix.
(4)

The first problem addressed with this invention is obtaining a closed-form solution for the prefactor, allowing an interactive scoring matrix specification.

Scoring matrices may come from multiple sources, including Point Accepted Mutations (PAM)(5), BLOcks SUbsitution Matrix (BLOSUM); note BLOSUM specifically excluded indels), (6) Viterbi (Markov Model),(7) and micro-RNA binding energy (miRNA). (8) The disclosure shows that there is a very close relationship between BLOSUM scoring matrices, implying that the alignment probability distribution associated with the scores from the other scoring models will have impact as if they were BLOSUM scores. This implies that the application of scoring matrices in this context have very specific relationships between scores and expected alignment frequencies. The solution, therefore, provides ways in which to directly relate scoring matrices to expected alignment frequencies.

Affine gap penalties (9) have been employed for a long time in controlling false insertions and deletions. A second problem addressed here it that their relationship to scoring and BLOSUM matrices was not understood clearly.

The invention provides a simple way to handle gap penalties that are easily computed, and which places gap penalties representing insertion and deletion mutations (indels) on the same footing as substitution mutation rates from which BLOSUM matrices are computed.

In addition, the Karlin-Altschul (K-A) distribution does not account for length dependence or score density. This is particularly important in exploring whether alignments are denser than expected by BLOSUM (which provides a standard evolutionary separation), and which are important for alignments of fairly specific lengths (miRNA scoring and alignments). The solution explicitly provides ways to consider score density in evaluating significance and probabilities.

The invention provides the following novel contributions...