Automatic Particle Identification
Director: Ravikanth Malladi, LBL

In Project D, we aim to
develop a massively parallel software module for fully automated boxing
of images of single particles, which misses fewer that 25% of the
particles that would be selected manually, and which produces a data
set in which fewer than 10% of the particles chosen are ones that would
be rejected manually. Another goal is to use geometry-based
diffusion methodology to pre-process the particle images in order to
de-noise and enhance the images. Choose from a set of candidate
filtering schemes which include the curvature flow, weighted curvature
flow with a shock component to accentuate the feature information, and
Beltrami flow.
Also, we plan to develop algorithms to automate the selection of
geometry-driven filter parameters; this is used as a preprocessing
step, so that the parameters do not need to be adjusted manually for
each micrograph. We will explore the differences between the
classical approach of using cross-correlation measures for particle
boxing and a new approach would subsequently lead to a boxing
algorithm. Furthermore, we want to develop a toolbox of filters
(criteria such as area, perimeter-to-area ratio, axial ratio,
integrated intensity, etc.) that can be employed in a project-specific
way to reject false hits. Appropriate tools would be manually
adjusted in order to match the characteristics of particle-image that
are selected by an expert, prior to advancing to a fully automated
phase of particle selection.
In the current state-of-the-art, it is normal practice to use the
computer to identify ("box") images of candidate single
particles. In some implementations, the images of candidate
particles are subjected to further analysis in order to reject some of
these false-positives. In the end, however, a human operator must
still edit, or "prune" the set of candidate particles in order to
further reduce the number of false positives included in the data
set. The current state-of-the-art therefore provides the user
with valuable computer-assisted technology for boxing particles, but
this technology usually can not be used in a completely automated
fashion, i.e. without further human effort. This is why the
goals of Project D are set as such; so that we can fully automate the
boxing process.
Existing computer-assisting tools for particle boxing work very
effectively for projects in which the data set is intended to include
as many as ten thousand particles. If, however, the study
involves repeating the data collection for numerous variants of the
same object, such as multiple conformational states or particles
labeled (e.g. with antibodies) at numerous different sites, the amount
of human effort required then begins to make the work tedious and
demoralizing. At this point, th need for fully automated
particle boxing becomes quite apparent.
The development of fully automated particle boxing thus becomes
virtually essential as one moves into the dual arenas of (1) higher
resolution, requiring individual data sets of 105
to 106 particles, and
(2) higher throughput, requiring turnaround times of days rather than
weeks or months for each step in the process.
One can expect to edit only 5x103 to 104
boxes a day and thus 105 boxes
in about two weeks. Manual editing of a million boxes, on the other
hand, would be a heroic one-time-only task. Even the editing of
105 boxes over the period of a week or two is
time that might be spent
on better things, and it is a frustrating bottleneck in the research
progress. Certainly, for higher resolution studies to be done in
a routine and rapid fashion, particle boxing now needs to be made fully
automatic.