Bay Area Discrete Math Day, Spring 2005

Inference functions and sequence alignment

Sergi Elizalde, MSRI

Statistical models are used to solve certain problems in computational biology, such as determining what parts of the genome will be translated into proteins, or how a DNA sequence evolved into another via a series of mutations, insertions and deletions. Each answer has a certain probability depending on the model parameters. When these are given, the most probable answer, called the explanation, is obtained by solving a combinatorial optimization problem. The map sending each observation to its explanation is called an inference function.

Using some theory about lattice polytopes, I will prove that the number of inference functions of any graphical model is polynomial in the size of the model. Then I will give applications to optimal sequence alignment, and discuss some open combinatorial problems that arise.