Individual Gene Cluster Statistics in Noisy Maps
Identification of homologous chromosomal regions is important for understanding evolutionary processes that shape genome evolution, such as genome rearrangements and large scale duplication events. If these chromosomal regions have diverged significantly, statistical tests to determine whether observed similarities in gene content are due to history or chance are imperative. Currently available methods are typically designed for genomic data and are appropriate for whole genome analyses. Statistical methods for estimating significance when a single pair of regions is under consideration are needed. We present a new statistical method, based on generating functions, for estimating the significance of orthologous gene clusters under the null hypothesis of random gene order. Our statistics is suitable for noisy comparative maps, in which a one-to-one homology mapping cannot be established. It is also designed for testing the significance of an individual gene cluster in isolation, in situations where whole genome data is not available. We implement our statistics in Mathematica and demonstrate its utility by applying it to the MHC homologous regions in human and fly.