SPARX: Automated Single Particle Reconstruction Software
Director:
Steven Ludtke, Baylor College of Medicine

The goal
of Project B is to provide a software environment, called SPARX (Single
Particle Analysis for Resolution eXtension) capable of performing
nearly automated single particle reconstruction from large numbers of
scanned micrographs or CCD frames. SPARX will make use of
object-oriented concepts and be integrated with a modern scripting
language (Python). It well leverage existing object-oriented
image processing code for EMAN and will be built on top of a new
framework being developed for X-ray crystallography (PHENIX). The
overriding philosophy of this project is that single particle image
reconstruction should require minimal manual intervention from the
user, but should also allow flexibility in the selection of algorithms
and methodologies. Automation of those steps currently performed
manually will allow many more particles to be processed in a
high-throughput manner, and thus extend the maximum possible
resolution.
In order to develop the SPARX software, we will first build the
foundations of SPARX using the framework provided by PHENIX, an object
oriented X-ray crystallographic package currently
underdevelopment. This will be base on a common infrastructure
for parallelism, process tracking, data communication and
algorithm development using a scripting language. In addition,
when similar data is generated by either technique, such as a
three-dimensional electron density map, this data can move seamlessly
back and forth in a common framework. Also, we will port existing
EMAN code into the SPARX framework to provide many fundamental EM image
processing routines. Necessary modifications to the existing C++
code in EMAN well be undertaken; leading to code that can be seamlessly
integrated into the Python framework. We will adopt necessary data and
parameter formats/specifications to allow data to be exchanged between
projects A and B at specific points in the reconstruction process, and
implement any necessary algorithms that are currently absent from EMAN,
and re-implement existing algorithms that are deemed impractical to
port
to the SPARX environment. While many throughly tested routines
can be ported from EMAN relatively easily, some are dependent on Python
incompatible data structures. Such routines will be
rewritten. We will also adopt or implement, as appropriate,
algorithms developed by projects C, D, and E, and implement
Publish&Subscribe parallelism in PHENIX/SPARX. PHENIX and
EMAN both currently implement parallelism using a portable scheme based
on remote shell execution for a predetermined number of
processors. In collaboration with Project C, we will implement a
flexible Publish&Subscribe based scheme which will make near
optimal use of available computational resources while remaining easy
to use and install.
Also, in order to provide users with a flexible and optimized set of
reconstruction procedures we will develop and test strategies for
reconstruction within the SPARX environment. EMAN currently
implements only two fundamental single particle reconstruction
strategies. One of the primary goals of SPARX is to allow for a
more flexible environment where a variety of alternative strategies can
be easily implemented. The completed SPARX package will include a
number of established reconstruction strategies, including the two
already implemented in EMAN. The strategies mentioned here will
be tested with standard data sets produced by Project F (F1).
Periodic comparisons will be made between SPARX and the software
developed in Project A to identify strengths and weaknesses of
algorithms and strategies used in each. The different approaches
used in the two packages will ultimately enable substantial
improvements to each.