Program Objective
Broad Aim
The program project focuses on highly
parallel data processing technology that will result in (1) greatly
increased throughput and (2) significantly higher resolution in
electron cryo-microscopy of single particles. The goal is to
ensure that computation is not a major rate-limiting factor in the
completion of single-particle projects at better than ~12Å.
One of our objectives is to automate the boxing of particle images so
that data sets as large as 105 or 106
particles can be selected with minimal human time and effort. Another
objective is to ensure that all of the computational work needed for
particle-alignment, 3-D reconstruction and refinement of the
reconstruction is routinely completed with turnaround times ranging
from a few hours to a few tens of hours.
It is our ultimate goal that single-particle cryo-EM should
routinely produce structures at atomic resolution. In many cases,
atomic-resolution models of large, multi-protein complexes will be
obtained by fitting already-known atomic structures of the component
proteins into a moderate-resolution, cryo-EM density map of the
complex. The computational technology that we propose is intended
to allow the resolution of such EM density maps to routinely extend to
~8-12Å. In cases where the resolution currently extends to
~8-12Å, the new computational technology would make it possible
for the resolution to further improve to better than~5Å
(when β-sheet can be easily
distinguished from α-helix). In the
future, as the quality of EM images continues to increase, the
high-throughput computational technology developed should routinely
produce density maps at ~3.5Å (which would allow direct
chain-tracing).
Background and Significance
High-resolution electron microscopy has grown to become an important
new technology withing the field of structural biology. Very
high-resolution images have been obtained for two dimensional crystals,
and three-dimensional density maps have been obtained at a resolution
of 7-9Å (or even better) for objects with a high degree of
internal symmetry. However, when objects have relatively little
or no internal symmetry, a much larger number of single particles must
be used to build up signal-to-noise ratio at high resolution.
The current need for state of the art cryo-EM technology is fueled by
the fact that attention in biology is turning more and more to larger
structures and assemblies. Determination of atomic structure of
these machines, motors and sub cellular structures is becoming
increasingly difficult for x-ray crystallography. That is why EM
is becoming a very important technology in the field of structural
biology. However, though it is encouraging to have such an
advanced technology, its rate scientific throughput is quite slow, and
the routinely-achieved resolution is not as high as what it can
potentially be because the current state of computational technology
cannot cope with the task of determining the parameters needed for
translational and rotational alignment for the large amount of
individual particles needed to achieve this high rate of throughput and
high resolution.
This is why the resulting software of the program project would be a
great asset to researchers involved in cryo-EM. Our goal is to
develop computational technology that will make it possible to get
high-resolution density maps, and to do so from EM images of large,
isolated macromolecular particles at a high rate of throughput. We want
to develop versions of single-particle software that will take full
advantage of modern, affordable, and most importantly highly parallel
machine architectures. The parallel machines are able to process
large amounts of data simultaneously, thus completing the computing
tasks required for cryo-EM in a realistic amount of time.
Currently, very large data sets are necessary (though not sufficient)
in order to achieve high resolution. about 105
asymmetric units
are needed to reach a resolution of ~8-12Å, and roughly 106
particles are needed in order to reach 3-5Å. Not only is
the data set extremely large, but the beginning stages of the
reconstruction involve human effort to identify and box images of
single particles, which further slows down the process (the rate of
this particle selection process is usually a mere 104
particles per day
or less). As the demand for larger data sets grow, this becomes a
very big problem. Besides the issue of slow rate, there also exist the
problem of determining the relative position and rotational alignment
of each particle.
Our strategy in developing optimized software for processing large
amounts of data in a relatively short time, giving high resolution, is
to first implement pilot versions of desired code on multiprocessor
clusters that are based on commodity PC hardware. The ultimate
goal is to develop the computational technology that will improve both
resolution and throughput when calculations are run on machines that
are affordable (a) for individual laboratories, (b) as shared
instrumentation, or (c) as dedicated machines, run for community as
multi-user facilities.
To learn more about our individual projects, go to Research
Projects