Now that the human genome is (mostly) sequenced, how do we know when some statistical fact about that random-looking string of 3 billion A's, C's, G's and T's is significant? For example, there are strings of length 11 which appear nowhere in the sequence; does this mean anything?
The speaker will describe an efficient combinatorial approach to problems of this sort, implemented with a group of scientists at Rockefeller University (Andy DeWan, Chad Hayes, Josephine Hoh, Jurg Ott, Tony Parrado, and Richard Sackler).