Name
sxk_means - K-means classification of a set of images
Usage
Usage in command lines:
sxk_means.py stack outdir <maskfile> --opt_method=optimization_method --K=number_of_classes --rand_seed=1000 --maxit=max_iter --trials=num_trials --crit=criterion_name --T0=first_temperature --F=factor_temperature --CTF --MPI
Usage in python programming:
k_means_main(stack, outdir, maskfile=None, opt_method=minimization_method, K, rand_seed=1000, maxit=max_iter, trials=num_trials, crit=criterion_name, CTF, F, T0, MPI)
To use MPI || version:
- 1. set the flag --MPI in command line
- 2. mpirun -np 32 sxk_means.py and the remaining parameters
- The above example is for mympi.
Example:
sxk_means.py hri_stack.hdf RES mask2d_23.hdf --opt_method="SSE" --K=128 --maxit=500 --crit="D"
sxk_means.py dbd:hri_stack RES dbd:mask2d_23 --opt_method="SSE" --K=128 --maxit=1000 --rand_seed=100 --T0=2.5 --F=0.995 --MPI
Input
- stack
- The input stack of images
- maskfile
- optional mask file to be used
- outdir
- name of directory where the results are writed, if outdir='None' none results are writed
The parameters preceded with -- are optional and default values are given in parenthesis.
names of criterion used: 'all' all criterion, 'C' Coleman, 'H' Harabasz or 'D' Davies-Bouldin, thoses criterions return the values of classification quality, see also sxk_means_groups. Possibility to free composed, like 'CD', 'HC', 'CHD', ...
Output
- outdir
- The directory to which the averages of K clusters, the variance, and the classification charts are saved.
The program will write two kinds of image stack files:
the averages of each cluster (average.hdf) and the
variance of each cluster (variance.hdf).
The averages have the following attributes set:
- 'Class_average':1 (indicate that the image is a class avergae, not the raw data),
- 'nobjects':number of objects in a given class,
- 'members':list of images assigned to this class.
The variances have the following attributes set:
- 'Class_average':1 and
- 'nobjects'.
The classification chart (kmeans_classification_chart.txt) -- specifies which objects are classified to which cluster.
Description
- The command implements two minimization methods and two different algorithms depending on the CTF flag. In each case, random initialization is used, i.e., initially, images are randomly assigned to K classes.
- Minimization methods:
cla - classical K-means, in which class averages are updated after reassignment of each image. The method is fast, except for trivial cases it fails to find good assignment.
SSE - Sum-of-Squared-Error K-means class averages are updated after reassignment of each object. The method is slower (in case of CTF it is painfully slow), but yields better classification results.
- The results of K-means classification are (in most cases) irreproducible, i.e., if classification is repeated for the same number of classes but using different initial assignment (as in this implementation), the result will be different. In order to find reproducible results one is advised to repeat K-means many times and accept the 'best' solution, as identified by the criterion value. For a sufficiently large number of trials and reasonable data, it is possible to find optimum solution. This process is facilitated by the number of trials user can provide: program will repeat classificiation specified number of times and return the best solution found.
Program calculates and returns values of classification quality - see sxk_means_groups.
Reference
Pattern Classification II Edition - Richard O.Duda, Peter E.Hart, David G.Stork
Author / Maintainer
Julien Bert
Keywords
- category 1
- APPLICATIONS
Files
statisctics.py, sxk_means.py
See also
Maturity
- beta
- works for author, often works for others.
Bugs
None. It is perfect.
