Research Projects
Five research
projects (A~E) have been selected to address the major computational
elements that currently limit the throughput and the resolution of work
that involves very large data sets. In addition, a core project
(project F) will provide experimental sets of reference-data that are
needed for all projects. the core project will also provide
necessary scientific infrastructure and research coordination that none
of the individual projects would be able to provide. Finally, 6
associated projects (project G) are included within the program to
provide opportunities for beta-testing of the software technology
developed here and to keep the technology development focused on the
needs encountered in real applications. The point of uniting
these seemingly independent projects under one program project is that
the final goal of the program project can be realized far more
effectively when these projects are combined as a program rather than
being pursued independently.
Project
A
The purpose of Project A will be to develop a version of the
SPIDER software package that is optimized for running on large
clusters. altering this well-proven, single-particle software
page, such that it will run with optimal performance on highly parallel
platforms, will prove to be an efficient way to overcome the
computational bottleneck that emerges with data sets of 105
or more
particles. The methods used for large-scale parallelization will be
designed to make the best possible use of the machine architecture
found in currently available as well as future commodity
clusters. Furthermore, in collaboration with the Scientific
Computing Project (Project C), this development phase will be used to
incorporate optimized, modern algorithms in place of existing legacy
code, wherever the latter is found to be less efficient than it might
be.
Project
B
The need to provide ready access to some of the routine tools developed
for x-ray crystallography is also becoming an increasingly important
consideration within electron microscopy. Therefore, the purpose
of Project B will be to integrate single-particle computational tools
(in this case derived from the EMAN package) with other existing
software that has been developed for the field of structural
biology. This will be done within the umbrella of PHENIX, a
package that is currently being developed as the next generation of
crystallographic software. The integrated package, SPARX, will
provide a combination of single-particle and crystallographic software
that appear to be uniform, which will simplify the procedures required
as the work progresses from producing a 3-D density map to interpreting
the map.
The plan to include both the SPIDER project and the SPARX project
withing the program, instead of focusing solely on one of these
options, will greatly increase the choice of software capabilities that
are optimized for use on highly parallel machines. SPIDER and SPARX
will interact extensively with each other; their merger will add unique
capabilities.
Project
C
Project C will bring in expertise from the Scientific Computing Group
at Lawrence Berkeley National Laboratory. The members of this
project will collaborate with the SPIDER and SPARX software developers
in the design and implementation of suitably optimized algorithms and
appropriate methods for parallelization of code. the Scientific
Computing Group will also serve as a key resource in the design and
implementation stages of the work proposed in Projects D and E.
Project
D
The goal of Project D is to fully automate the identification and
selection of images of single particles. Currently, there are
computer-assisted tools for particle boxing, but even these
computer-assisted aids become inadequate as the number of particles
required increases to about 106.
Fortunately, the arrival of inexpensive (commodity -processor)
cluster-machines now allows us to explore methods of particle
identification for which the computational time would have previously
been prohibitive.
Project
E
Project E addresses the need to better optimize the parameters that
describe the relative alignment of images of single particles.
The transition to highly parallel computations that is a central theme
of this program project opens up the opportunity to employ
particle-alignment strategies that are prohibitive to run on serial
machines. Any improvement that can be made in the quality of
alignment will in turn reduce the size of the data set that is needed
to reach a desired level of resolution. We therefore believe that
further research in optimal alignment of images must be a closely
integrated part of our total strategy for achieving high throughput at
high resolution.
Core F
Project F is intended to contribute (1) experimental data sets of~
300,000 particles that will be collected for each of two large,
macromolecular structures for which atomic-resolution models are
already available. The data sets will be used to test the
software that will be developed for highly parallel computation, and to
validate the high-resolution density maps that can be produced with
this technology. (2) The research core will provide staff who are
dedicated to bridging and facilitating the scientific and computational
work of the five major projects, as well as the associated projects
(Project G). (3)The core will also provide resources for
administrative infrastructure and overall coordination of the program
as a whole.
Project
G
Project G is actually a collaboration between the program project and
six associated projects which are individually funded. The
program project is concerned solely with improving the current
computational technology, and as a result no work is budgeted that
would apply the improved technology to actual biological
research. It is very important that the software is applied to
problems in actual research for which it was intended. That's why six
associated principal investigators are included in the program project;
they can test out the software in their already-funded projects
covering a wide range of difficulties and issues. The goal in
doing so is to test our software technology under circumstances that
users in general might encounter, in contrast to just those which would
be experienced by the technology-developers themselves.