LBNL
Baylor College of Medicine
Houston Medical School, University of Texas.
Wadsworth Center, NYSDH
National Institute of Health


Program Overview
Home
Log-in
Program Objective
Research Projects
Directors


Project A

Project A
Project B
Project C
Project D
Project E
  Core F 
Project G

Optimization of SPIDER for Highly Parallel Machines
Director: Joachim Frank, Wadsworth Center



The aim of this project is to adapt SPIDER to run on massively parallel processor platforms such as Linux Beowulf clusters or a group of distributed, networked workstation running under the Unix or NT operating system.  SPIDER (System for Processing Image Data in Electron microscopy and Related fields) is currently parallelized only on a fine-grained level of computing, using the Open Multiprocessing directives for Shared memory Processing that are placed within the SPIDER Fortran code.  We intend to add high-level coarse-grain parallelism at the process or SPIDER job level, in a way that does not require shared memory.  Processing will be partitioned by groups of images such as defocus groups, or on the level of single images. 

The way this will be implemented is through a system of scheduling independent jobs on different processors.  Starting with an initial serial or parallel SPIDER job, the availability of jobs will be "published" to all processors.  Each available processor runs a small active subscription process that looks at the queue for pending jobs, "subscribes" to one, and locks it to make it unavailable to other subscriptions.  Upon normal termination, the locked job is also removed from the shared queue.  Abnormal termination will free the job up for another subscription. 

We intend to augment this approach in collaboration with Project C by inserting powerful Message Passing Interface (MPI) subroutine calls into our code, which will take advantage of homogeneous clusters wherever these are available.  Together, we will develop and test implementations of SPIDER code that make use of both Publish & Subscribe and MPI.  Specific computationally intensive parts of the code, such as orientation search and alignment (AP_MQ in the SPIDER suite of commands) and iterative algebraic reconstruction, will be analyzed for opportunities of parallelization with MPI under the Single Program Multiple Data programming model.

This overall plan entails the need for the development of specific tools within SPIDER for error recovery, check pointing, and message passing, all of which will be addressed in this proposal.  For the final design of the system we propose to have master publish and subscribe processes written in Python.

Another aim is to also develop a high-level organizational framework for image processing and reconstruction, in a way that facilitates overall optimization and backtracking based on complete records of data.  While the first aim addresses numerical-processing speed without regard to the contents of the operations, this aim recognizes the lack of an overall intelligence in the execution of a series of algorithms where each has been optimized with attention to local, not global needs. 
While this second aim marks a radical departure from the philosophy with which SPIDER was designed (with maximum flexibility on the lowest level) we will refrain from any radical redesign of the software, as a matter of practicality and to save an enormous investment of many programmer years.  Rather, our approach will be to keep the SPIDER command structure and syntax in place, and design a higher-level "shell" structure for the control of the processing following current SPIDER procedures, as well as for the design of new SPIDER procedures. 

Project A will interact extensively with Project B.  The above mentioned aims will be done in liaison with Project B.  there are many aspects of parallelization, algorithm design, and reconstruction strategies that impact both software implementations in similar ways.  We will jointly decide how to break down the entire reconstruction procedure into large interchangeable modules with standardized interfaces.  The purpose of this plan is threefold: (1) make benefits of either package, at every step of development, available to all users; (2) facilitate comparative performance testing; and (3) enforce standardization of parameters, options, and procedures.