Title: Generalized Methods for Discovering Frequent Poly-Regions in DNA Author: Panagiotis Papapetrou, Gary Benson, George Kollios Date: October 17, 2008 Abstract: The problem of discovering frequent poly-regions (i.e. regions of high occurrence of a set of items or patterns of a given alphabet)in a sequence is studied, and three efficient approaches areproposed to solve it. The first one is entropy-based and appliesa recursive segmentation technique that produces a set of candidate segments which may potentially lead to a poly-region.The key idea of the second approach is the use of a set of slidingwindows over the sequence. Each sliding window covers a sequencesegment and keeps a set of statistics that mainly include the number of occurrences of each item or pattern in that segment.Combining these statistics efficiently yields the complete set ofpoly-regions in the given sequence. The third approach applies atechnique based on the majority vote, achieving linear running time with a minimal number of false negatives. After identifyingthe poly-regions, the sequence is converted to a sequence oflabeled intervals (each one corresponding to a poly-region). Anefficient algorithm for mining frequent arrangements of intervals is applied to the converted sequence to discover frequentlyoccurring arrangements of poly-regions in different parts of DNA,including coding regions. The proposed algorithms are tested onvarious DNA sequences producing results of significant biological meaning.