|
|
|
|
Research |

|
 |
In my research I use methods of bioinformatics and
computer modeling to address questions of genome biology related to
epigenetic gene regulation and genome packaging. How the limited number of
proteins can find their binding sites in a genome of billions nucleotides
packaged in a tiny cell? How gene regulation is managed with reliability of
a well-tuned machine in
a myriad of epigenomes across multiple tissues,
conditions, and developmental stages?
With the development of new high-throughput techniques such as tiling
arrays and next generation sequencing these intriguing questions can be
tackled.
The main focus of my research is
on the analysis of the primary structure of chromatin, which includes
nucleosome positioning, distribution of histone variants and modifications
in the genome, and regulatory protein binding.
|
|
|
Completed projects:
|
Impact of chromatin structure on sequence
variability in the human genome
(with Peter Park, Harvard Medical
School; Natalia Volfovsky, Robert M. Stephens, Advanced Biomedical
Computing Center, NCI at Frederick)
DNA sequence
variations in individual genomes give rise to different phenotypes within
the same species. One mechanism in this process is the alteration of
chromatin structure due to sequence variation that impacts gene regulation.
We composed a high-confidence collection of human SNPs and indels based on
analysis of publicly available sequencing data and investigated whether the
DNA loci associated with stable nucleosome positions are protected against
mutations. We addressed how the sequence variation is reflected in the
occupancy profiles of nucleosomes bearing different epigenetic
modifications on genome scale. We find that indels are depleted around
nucleosome positions of all considered types, while SNPs are enriched
around the positions of bulk nucleosomes but depleted around the positions
of epigenetically modified nucleosomes. These findings indicate an
increased level of conservation for the sequences associated with epigenetically
modified nucleosomes, highlighting complex organization of the human
chromatin.
|
|
 |
Interplay of chromatin-mediated mutation
bias and selection can shape sequence variation profile (cf. to schematic illustration in
Semple & Taylor, Science,
2009). (a) Bulk and epigenetically
modified nucleosomes are represented with blue and red ovals. Green and
orange lines represent mutation rate of SNPs and indels respectively, and
black line represents selection pressure acting on the DNA sequence. (b) The significant difference in
the indel rate inside and outside nucleosomes mainly determines the indel
density profile observed in the genome (orange), while SNP density profile
(green) is mainly affected by selection. Our results do not exclude the
possibility that natural selection can affect the distribution of indels
and that alteration of the mutation rate affects the distribution of
SNPs. Rather, they indicate that
these mechanisms are not the major factors shaping the resulting profiles.
|
back
to top
Comparative
analysis of H2A.Z nucleosome organization
in the human and yeast genomes
(with
Peter Park, Peter Kharchenko, Harvard Medical School; Robert Kingston, J.
Aaron Goldman, Massachusetts General Hospital)
Eukaryotic DNA is
wrapped around a histone protein core to constitute the fundamental
repeating units of chromatin, the nucleosomes. The affinity of the histone
core for DNA depends on the nucleotide sequence; however, it is unclear to
what extent DNA sequence determines nucleosome positioning in vivo, and if
the same rules of sequence-directed positioning apply to genomes of varying
complexity. Using the data generated by high-throughput DNA sequencing
combined with chromatin immunoprecipitation, we have identified positions
of nucleosomes containing the H2A.Z histone variant and histone H3
trimethylated at lysine 4 in human CD4+ T-cells. We find that the 10-bp
periodicity observed in nucleosomal sequences in yeast and other organisms
is not pronounced in human nucleosomal sequences. This result was confirmed
for a broader set of mononucleosomal fragments that were not selected for
any specific histone variant or modification. We also find that human H2A.Z
nucleosomes protect only about 120 bp of DNA from MNase digestion and
exhibit specific sequence preferences, suggesting a novel mechanism of
nucleosome organization for the H2A.Z variant. (Genome Res
2009.
19:
967-977)
|
 |
 |
|
Periodograms showing spectral density for the WW and SS
dinucleotide autocorrelation functions for human and yeast H2A.Z
nucleosomes (data from Barski et al, Cell 2007 and Albret et el, Nature
2007). The red lines represent the power spectral density for nucleosomal
sequences and solid and dashed blue lines represent the statistical
significance levels P = 0.001 and P = 0.05 respectively.
|
Fragment of nucleosome core particle structure
containing H2A.Z variant (PDB_ID 1f66 (Suto et al., Nat Struct Biol 2000),
H2A.Z is shown in magenta) with the superimposed major H2A histone (dark
blue) from the best resolved nucleosome structure (PDB_ID 1kx5 (Davey et
al, JMB, 2002)). Base pair at position -43 (yellow) marks the 30-bp
shortening of the protected DNA fragment.
|
back
to top
|
Sequence-directed
nucleosome positioning in CpG islands
(with
Wilma Olson, Rutgers, the State University of New Jersey; Victor Zhurkin,
NCI)
Unlike
most of the genome, the CpG islands remain unmethylated and are associated
with the open, transcriptionally competent form of chromatin rather than the
closed, inactive form. Malfunctions of the gene regulatory machinery that
affect the state of chromatin in CpG islands may result in various cancers
and developmental disorders. We hypothesize that the sequence-dependent
structural properties of DNA in the CpG islands are crucial for maintaining
the open state of chromatin.
In our recent study on the role of
different degrees of freedom in
the formation of the superhelical nucleosomal trajectory
(Tolstorukov, Olson, Zhurkin et al. submitted) a novel structural
approach has been developed to map potential nucleosome locations on genomic
sequences. Since there are very few (if any) direct sequence-specific
interactions between the histones and DNA bases, the affinity of the histone
core to DNA is determined primarily by the energy needed to wrap DNA on the
surface of the nucleosome. Hence, our algorithm is based on the calculation
of the deformation energy required for a duplex of given sequence to follow
the nucleosomal DNA trajectory. The developed algorithm was successfully
tested on a set of the sequences for which nucleosome positions were mapped
to high resolution.
Analysis of human genome sequences showed
that the “concentration” of nucleosome-attracting sites (characterized by a
lower-than-average DNA deformation energy) is noticeably lower and the
“concentration” of nucleosome-repelling sites (characterized by a
higher-than-average DNA deformation energy) is noticeably higher in CpG
islands. The observed non-trivial distribution of nucleosome-positioning
sites provides new insight into the well-documented phenomenon of nucleosome
depletion in CpG islands (these results are currently in preparation for
publication).
|

|
|
Distributions
of nucleosome-attracting (A) and nucleosome-repelling (B)
sites near gene starts. Data points represent the average numbers
<NATT> and
<NREP> of such sites
occurring in a 0.5-kb (kilobase) running window as function of the
distance, dTSS, between the window center and the
transcription start site (denoted by hooked green arrows). Results for two
groups of aligned genes are shown: red, 10,773 genes with CpG islands;
blue, 15,642 genes without CpG islands. |
|
back
to top
|
Effect
of base-pair shear deformations on formation of
nucleosomal DNA path
(with
Victor Zhurkin, NCI; Wilma Olson, Rutgers, the State University of New
Jersey)
The bending of DNA in nucleosomes is
accompanied by lateral displacements of adjacent base pairs, the effect of
which on the overall DNA folding is generally neglected. We demonstrate,
however, that these displacements play a much more important structural role
than ever imagined. Specifically, the Slide deformations imposed on DNA by
the histones at sites of local anisotropic bending appear to govern both the
superhelical trajectory of DNA and the positioning of nucleosomes.
Furthermore, the computed cost of deforming DNA on the nucleosome is sequence
specific: in optimally positioned sequences the most easily deformed
base-pair steps (CA:TG and TA) occur at sites of large positive Slide and negative
Roll (where the DNA bends into the minor groove). These conclusions rest upon
a treatment of DNA that goes beyond the conventional ‘elastic-rod’ model,
incorporating all essential degrees of freedom of ‘real’ duplexes in the
estimation of DNA deformation energies. Indeed, only after lateral Slide
displacements are considered, are we able to account for the
sequence-specific folding of DNA found in nucleosome structures. The close
correspondence between the predicted and observed nucleosome locations demonstrates
the potential advantage of our 'structural' approach in the assessment of
nucleosome positioning.
|

|
|
Effect of
base-pair Slide on the superhelical path of DNA in the best resolved
nucleosome core particle structure (NCP147, Davey et al., 2002). A.
DNA model (red) superimposed at the initial base pair on the superhelical
path of the real nucleosomal DNA (NCP147, white). The model structure is
constructed from the structural parameters of NCP147 with Slide equated to zero
at each dimeric step. B. DNA model with negative and positive
Slide at dimeric steps separated by 5 bp (colored blue and red
respectively). The DNA helical axis is represented by
yellow sticks. Because of the ~180° net helical twisting between base pairs,
the sliding occurs in the same direction (red and blue arrows), i.e.,
the overall effect of sliding is cumulative. |
|
back
to top
|
Non-random distributions of A-tracts facilitates bacterial
genome packaging
(with
Sankar Adhya, Victor Zhurkin, Konstantin Virnik, NCI)
Molecular mechanisms of the bacterial
chromatin packaging are still unclear, as bacteria lack nucleosomes or other
apparent basic elements of the DNA compaction. It is known that the
correlations in the genomic DNA sequence may constitute a structural code,
facilitating DNA folding. We elaborated this concept analyzing the
distributions of the A-tracts (the sequence motifs that introduce the most
pronounced local curvature of DNA). We have observed that their distribution
is highly non-random: (i) A-tracts are phased with the DNA pitch, i.e.
positioned in such a way that the individual bends they introduce produce a
curvature build-up; (ii) the phased A-tracts are organized in clusters of
about 100 bp long, as revealed by the specially designed algorithm based on
the Fourier formalism. Such clusters are present throughout the genome
including the coding sequences. The clustering of A-tracts greatly increases
the local curvature of DNA and therefore appears critical for formation of
the DNA loops and coils. Moreover, the clusters of A-tracts may serve as
binding sites for nucleoid-associated proteins that have propensities for
binding curved DNA (e.g., HU, H-NS, Hfq). Thus, for the first time we have
observed a clear structural signal in the DNA sequences that can facilitate
DNA folding genome-wide, introducing DNA intrinsic curvature and increasing
the stability of the DNA complexes with architectural proteins, so-called
“compactosomes.”
|
back
to top
|
Indirect
sequence readout
(with
Victor Zhurkin, NCI; Robert Jernigan, Iowa State University)
The energy of
protein-induced structural deformations of a DNA duplex in complexes is
sequence-dependent, which provides another way of recognition (indirect
information readout), additional to the specific patterns of hydrogen
bonding. To explore this phenomenon we have built a unique database of
protein-DNA complexes and developed a novel algorithm for analyzing the
protein-DNA interactions separately in the DNA major and minor grooves. As a
result, for the first time we have observed hydrophobicity-structure
correlations in protein-DNA complexes, namely, that the hydrophobic and polar
amino acids, interacting in minor groove, induce distinct DNA structural
deformations. These results and are currently used in my recently started
project on MD simulation of the evolution of DNA structural deformations
during formation and dissolution of protein-DNA complexes, which aims to shed
additional light onto the problem of recognition of degenerate DNA sequences.
|
back
to top
|
Large
nucleoprotein complexes
(with
Sankar Adhya, Victor Zhurkin, Szabolcs Semsey, Mofang Liu, NCI)
Details of spatial
organization of the large nucleoprotein complexes are unidentified in many
cases. To calculate the minimum-energy 3D trajectory of the DNA under the structural
constrains in a particular nucleoprotein assembly we applied a
knowledge-based elastic model of DNA, suitable for mesoscopic simulations
(DNA fragments ~100 bp). This approach allowed determining the trajectory of
the repression loop and the relative positioning of the binding sites of
regulatory proteins (GalR, HU) in the gal repressosome in E. coli
cell
(the higher-order nucleoprotein complex similar to that shown in the left
panel of the above figure). The performed computer modeling helped us to
reveal specific pathways of gal operon repression.
|

|
Minimal
energy configuration of the repression loop model. Two GalR dimers
(purple and teal blue) and shown as ribbons. The
OE operator is highlighted with red
color, and the OI is with
yellow. The -10 element of the P2 promoter is colored blue and
orange. The experimentally observed HU binding site (hbs, position +6.5,
six base pairs are colored magenta and orange). |
Substantial DNA
structural deformations are also known to be crucial for transcription
initiation. In collaboration with experimental group (Lab of Mol. Biol.,
NCI), I study the dependence of the bacterial promoter strength on the
sequence of its spacer region. Particularly, we have observed that presence
of “soft” AT-rich sequence in the spacer region can increase the promoter
strength up to 100 fold. The structural analysis has shown that the
interactions of beta subunit of RNA polymerase in the DNA minor groove are
accountable for the effect. Based on our results, we predicted the mutations
in the promoter spacer region that would increase the promoter strength, as
was confirmed experimentally.
|
back
to top
|
B-to-A
transition in DNA:
Propensity scales based on
trimeric and dimeric models
(with
Victor Zhurkin, NCI; Robert Jernigan, Iowa State University)
Experimental data
on the sequence-dependent B«A conformational transition in 24 oligo- and
polymeric duplexes yield optimal dimeric and trimeric scales for this
transition. The ten sequence dimers and the 32 trimers of the DNA duplex
were characterized by the free energy differences between the B- and A-forms
in water solution. In general, the trimeric scale describes the
sequence-dependent DNA conformational propensities more accurately than the
dimeric scale, which is likely related to the trimeric model accounting for
the two interfaces between adjacent base pairs on both sides (rather than
only one interface in the dimeric model). In particular, the
exceptional preference of the B-form for the AA:TT dimers and AAN:N'TT trimers
is consistent with the cooperative interactions in both grooves. In the
minor groove, this is the hydration spine that stabilizes adenine runs in
B-form. In the major groove, these are hydrophobic interactions between
the thymine methyls and the sugar methylene groups from the preceding
nucleotides, occurring in B-form. This interpretation is in accord with
the key role of hydration in the B«A transition in DNA. Importantly,
our trimeric scale is consistent with the relative occurrences of the DNA trimers
in A-form in protein-DNA cocrystals. The B/A-scales developed here can be
used for analyzing genome sequences in search for A-philic motifs, putatively
operative in the protein-DNA recognition. |
back
to top
|
Modeling
the nucleic acids conformational transitions with hysteresis
over
hydration-dehydration cycle
(with Vladimir Maleev,
V. Karazin Kharkov National University)
A mathematical
model of the conformational transitions of the DNA, mainly molecule during
the water adsorption-desorption cycle has been proposed. The nucleic
acid-water system is considered as an open system. The model describes
the transitions between three main conformations of wet DNA samples: A-, B-
and unordered forms. The analysis of kinetic equations shows that the
non-trivial bifurcation behavior of the system which leads to the
multistability. This fact allows one to explain the hysteresis
phenomena observed experimentally in the DNA-water system. It was shown
that hysteresis phenomena appear only in case cooperative conformational
transitions. Microgravimetrical experiments were performed to test the
system. The model and experimental results are in good agreement with
each other. Distributed parameter model describes conformational junctions in
heterogeneous DNA sequences.
|
back
to top
|
|