cluster, data, centers=centers [, index=index,
size=size, sample=sample, phantom=phantom,
maxit=maxit, rms=rms, second=second, /update,
/iterate, /vocal, /quick, /record, /ordered]
This routine divides vectors into clusters, based on their proximity to
maxit are input variables;
second are output variables and must therefore be named
variables. The arguments are:
The data points. The first dimensions of
data selects the
components of the vectors; i.e. the first vector is
data has two dimensions. Each data
point is assigned to the cluster whose center is closest of all cluster
centers to the data point at that time.
The cluster centers. If
centers is a scalar on entry, then it
denotes the number of clusters to divide the data into, and a random
sample (of appropriate size) of the data vectors is taken as initial
guess for the cluster center positions. If
centers is an array on
input, then each
centers(*,k…) is taken to contain the
(initial) position of a cluster center. The final (possibly updated)
cluster centers are returned in
centers on exit.
The cluster assignment of the data points. If
index is an array
of appropriate size on entry, and if
phantom is not specified, then
the elements of
index denote the initial cluster membership of the
datapoints. On exit,
index contains the cluster numbers that the
data points have been assigned to. No or an undefined
The cluster sizes. If
size is specified, then the number of
elements in each cluster is returned in it upon exit from the routine.
The data point sample size. If
sample is an integer larger than
one, then it indicates the size of a random sample of data points that
should be treated. If
sample equals 1, then a sample ten times
bigger than the number of clusters is used. If
sample is not
specified or falls outside the range mentioned before, then all data
points are treated.
phantom is specified, then it indicates that the
clusters should be pre-stocked with phantom members, to partially
suppress movement of the cluster centers during clustering. The value
phantom, if an integer larger than 1, indicates
how many phantom members to assign to each cluster prior to treatment of
the data points. If
phantom is not specified, and
index is an array of appropriate size, then the clusters
contain the members indicated by
index prior to clustering.
phantom equals 1, then 10 phantom members are assigned
to each cluster prior to clustering. Any phantom members are removed
after clustering, before exiting the routine.
maxit specifies the maximum number of iterations that is
allowed for this call to
maxit is not
specified, then the number of iterations is unlimited.
rms is specified then the average root-mean-square
distance of the members of each cluster to its center is returned in it.
The keywords are:
signals that the cluster center positions must be updated during the
clustering, so that each cluster center at any time during the
clustering equals the average position of all members in the cluster at
that time (including any phantom members). If
/update is not
specified, then the cluster centers do not move during clustering.
specifies that updating must be iterated until the cluster centers are
stable and upon exit all data points are members of the cluster whose
center is the closest one of all cluster centers.
specifies that the number of reclustered data points (and the number of
changed clusters, if
/iterate is also specified) are printed.
specifies that only data points in clusters that were changed during the
last iteration or so far during the current one should be treated during
the current iteration. The other data points are left unchanged. This
specifies that there is some degree of order in
that there is more than a random chance that the current data point and
the previous data point belong in the same cluster. The time penalty of
this option is small, so it is selected by default. Specify
/noorder to deselect it. This option only affects the first
specifies that the cluster positions and sizes should be written to file
cluster.out after each iteration. This option has effect only if
/iterate was also selected.
cluster,data,c,i with undefined
i yields cluster centers
which are not distrubuted evenly.
yields a more even distribution. For large data sets it is advised to
cluster,data,c,i,/sample,/iterate (or with
sample=sample_size) to get fairly evenly distributed cluster
centers, and then
cluster,data,c,i to assign all data points to
When random samples of data points are treated, then the current clustering algorithm is known as the Continous k-Means Algorithm.
See also: Topology