We have developed
an information theory based
method for modeling interactions of
DNA-binding proteins with their respective binding sites on DNA.
We show the
feasibility of using such a method for
scanning DNA sequences
to predict sites bound by the
Factor for Inversion Stimulation (Fis),
a pleiotropic protein that enhances
site-specific recombination,
controls DNA replication, and
regulates
transcription of a number of genes
in Escherichia coli and Salmonella typhimurium.
When scanning various DNA sequences with a
weight matrix derived from the information
analysis of 60 known Fis binding sites,
we identified Fis sites
that correlated well with published DNaseI protection
experiments and other biochemical data.
Sites we predicted in many different genetic systems
were missed by others because the
DNA sequence there did not match the Fis consensus sequence and
most likely because the protected
regions overlapped with other Fis sites.
A graphical method
was created
to show how binding proteins and other macromolecules
interact with individual bases of nucleotide sequences.
By displaying the
information at individual binding sites as letter graphics,
these ``sequence walkers'' can be stepped along raw sequence data
to visually search for binding sites.
Characters representing the sequence are either
oriented normally and placed above a line
indicating favorable contact,
or displayed upside-down and placed below the line
indicating unfavorable contact.
The positive or negative height of each letter shows the contribution
of that base to the sequence conservation of the binding site.
Fis binding sites spaced 11 base pairs apart
at the E. coli origin of chromosome replication.
Using walkers,
we were able to quickly visualize
overlapping
Fis
binding
sites
spaced 7 or 11 base pairs apart
in several genetic systems.
Gel shift experiments showed that pairs of Fis sites
have two distinct binding modes,
suggesting that Fis competes with itself for binding
and therefore acts as a molecular flip-flop mechanism.
The positioning of Fis binding sites relative to one another
and to the binding sites of other proteins
appears to be key for the ability of
Fis to perform many diverse functions.
As a general sequence analysis tool, walkers
can be used
to investigate the effects of particular mutations.
With a walker, one can interactively alter the DNA sequence
to quantitatively engineer binding sites to
one's own specifications,
predict whether
a change is likely to be a polymorphism or a mutation,
and detect anomalies in sequence databases.