LBNL
Baylor College of Medicine
Houston Medical School, University of Texas.
Wadsworth Center, NYSDH
National Institute of Health


Program Overview
Home
Log-in
Program Objective
Research Projects
Directors
 

Research Projects

Project A
Project B
Project C
Project D
Project E
  Core F 
Project G
   
Five research projects (A~E) have been selected to address the major computational elements that currently limit the throughput and the resolution of work that involves very large data sets.  In addition, a core project (project F) will provide experimental sets of reference-data that are needed for all projects.  the core project will also provide necessary scientific infrastructure and research coordination that none of the individual projects would be able to provide.  Finally, 6 associated projects (project G) are included within the program to provide opportunities for beta-testing of the software technology developed here and to keep the technology development focused on the needs encountered in real applications.  The point of uniting these seemingly independent projects under one program project is that the final goal of the program project can be realized far more effectively when these projects are combined as a program rather than being pursued independently.

Project A
The purpose of Project A will be to develop a version of the SPIDER software package that is optimized for running on large clusters.  altering this well-proven, single-particle software page, such that it will run with optimal performance on highly parallel platforms, will prove to be an efficient way to overcome the computational bottleneck that emerges with data sets of 105 or more particles. The methods used for large-scale parallelization will be designed to make the best possible use of the machine architecture found in currently available as well as future commodity clusters.  Furthermore, in collaboration with the Scientific Computing Project (Project C), this development phase will be used to incorporate optimized, modern algorithms in place of existing legacy code, wherever the latter is found to be less efficient than it might be. 

Project B
The need to provide ready access to some of the routine tools developed for x-ray crystallography is also becoming an increasingly important consideration within electron microscopy.  Therefore, the purpose of Project B will be to integrate single-particle computational tools (in this case derived from the EMAN package) with other existing software that has been developed for the field of structural biology.  This will be done within the umbrella of PHENIX, a package that is currently being developed as the next generation of crystallographic software.  The integrated package, SPARX, will provide a combination of single-particle and crystallographic software that appear to be uniform, which will simplify the procedures required as the work progresses from producing a 3-D density map to interpreting the map. 

The plan to include both the SPIDER project and the SPARX project withing the program, instead of focusing solely on one of these options, will greatly increase the choice of software capabilities that are optimized for use on highly parallel machines. SPIDER and SPARX will interact extensively with each other; their merger will add unique capabilities. 

Project C
Project C will bring in expertise from the Scientific Computing Group at Lawrence Berkeley National Laboratory.  The members of this project will collaborate with the SPIDER and SPARX software developers in the design and implementation of suitably optimized algorithms and appropriate methods for parallelization of code.  the Scientific Computing Group will also serve as a key resource in the design and implementation stages of the work proposed in Projects D and E. 

Project D
The goal of Project D is to fully automate the identification and selection of images of single particles.  Currently, there are computer-assisted tools for particle boxing, but even these computer-assisted aids become inadequate as the number of particles required increases to about 106.  Fortunately, the arrival of inexpensive (commodity -processor) cluster-machines now allows us to explore methods of particle identification for which the computational time would have previously been prohibitive.

Project E
Project E addresses the need to better optimize the parameters that describe the relative alignment of images of single particles.  The transition to highly parallel computations that is a central theme of this program project opens up the opportunity to employ particle-alignment strategies that are prohibitive to run on serial machines.  Any improvement that can be made in the quality of alignment will in turn reduce the size of the data set that is needed to reach a desired level of resolution.  We therefore believe that further research in optimal alignment of images must be a closely integrated part of our total strategy for achieving high throughput at high resolution. 

Core F
Project F is intended to contribute (1) experimental data sets of~ 300,000 particles that will be collected for each of two large, macromolecular structures for which atomic-resolution models are already available.  The data sets will be used to test the software that will be developed for highly parallel computation, and to validate the high-resolution density maps that can be produced with this technology.  (2) The research core will provide staff who are dedicated to bridging and facilitating the scientific and computational work of the five major projects, as well as the associated projects (Project G).  (3)The core will also provide resources for administrative infrastructure and overall coordination of the program as a whole. 

Project G
Project G is actually a collaboration between the program project and six associated projects which are individually funded.  The program project is concerned solely with improving the current computational technology, and as a result no work is budgeted that would apply the improved technology to actual biological research.  It is very important that the software is applied to problems in actual research for which it was intended. That's why six associated principal investigators are included in the program project; they can test out the software in their already-funded projects covering a wide range of difficulties and issues.  The goal in doing so is to test our software technology under circumstances that users in general might encounter, in contrast to just those which would be experienced by the technology-developers themselves.