Motivation: The expression of genes during the cell division process
has now been studied in many different species. An important goal of
these studies is to identify the set of cycling genes. To date, this was
done independently for each of the species studied. Due to noise and
other data analysis problems, accurately deriving a set of cycling genes
from expression data is a hard problem. This is especially true for some
of the multicellular organisms, including humans.
Results: Here we present the first algorithm that combines microarray
expression data from multiple species for identifying cycling
genes. Our algorithm represents genes from multiple species as
nodes in a graph. Edges between genes represent sequence similarity.
Starting with the measured expression values for each species we
use Belief Propagation to determine a posterior score for genes. This
posterior isusedtodetermineanewset ofcyclinggenes foreachspecies.
We applied our algorithm to improve the identification of the set of
cell cycle genes in budding yeast and humans. As we show, by
incorporating sequence similarity information we were able to obtain
a more accurate set of genes compared to methods that rely on
expression data alone. Our method was especially successful for the
human dataset indicating that it can use a high quality dataset from
one species to overcome noise problems in another.
Availability: C implementation is available from the supporting