Next: colorcomponents, Previous: close, Up: Internal Routines [Contents][Index]

`cluster, `

`data`, centers=`centers` [, index=`index`,
size=`size`, sample=`sample`, phantom=`phantom`,
maxit=`maxit`, rms=`rms`, second=`second`, /update,
/iterate, /vocal, /quick, /record, /ordered]

This routine divides vectors into clusters, based on their proximity to
cluster centers.

, `data`

,
`centers`

, `index`

, `sample`

, and
`phantom`

are input variables; `maxit`

,
`centers`

, `index`

, `size`

, and
`rms`

are output variables and must therefore be named
variables. The arguments are:
`second`

`data`The data points. The first dimensions of

selects the components of the vectors; i.e. the first vector is`data`

if`data`(*,0)

has two dimensions. Each data point is assigned to the cluster whose center is closest of all cluster centers to the data point at that time.`data``centers`The cluster centers. If

is a scalar on entry, then it denotes the number of clusters to divide the data into, and a random sample (of appropriate size) of the data vectors is taken as initial guess for the cluster center positions. If`centers`

is an array on input, then each`centers`

is taken to contain the (initial) position of a cluster center. The final (possibly updated) cluster centers are returned in`centers`(*,k…)

on exit.`centers``index`The cluster assignment of the data points. If

is an array of appropriate size on entry, and if`index`

is not specified, then the elements of`phantom`

denote the initial cluster membership of the datapoints. On exit,`index`

contains the cluster numbers that the data points have been assigned to. No or an undefined`index`

implies`index``/phantom`

.`size`The cluster sizes. If

is specified, then the number of elements in each cluster is returned in it upon exit from the routine.`size``sample`The data point sample size. If

is an integer larger than one, then it indicates the size of a random sample of data points that should be treated. If`sample`

equals 1, then a sample ten times bigger than the number of clusters is used. If`sample`

is not specified or falls outside the range mentioned before, then all data points are treated.`sample``phantom`If

is specified, then it indicates that the clusters should be pre-stocked with phantom members, to partially suppress movement of the cluster centers during clustering. The value assigned to`phantom`

, if an integer larger than 1, indicates how many phantom members to assign to each cluster prior to treatment of the data points. If`phantom`

is not specified, and`phantom`

is an array of appropriate size, then the clusters contain the members indicated by`index`

prior to clustering. If`index`

equals 1, then 10 phantom members are assigned to each cluster prior to clustering. Any phantom members are removed after clustering, before exiting the routine.`phantom``maxit`

specifies the maximum number of iterations that is allowed for this call to`maxit``cluster`

. If

is not specified, then the number of iterations is unlimited.`maxit``rms`If

is specified then the average root-mean-square distance of the members of each cluster to its center is returned in it.`rms`

The keywords are:

`/update`

signals that the cluster center positions must be updated during the clustering, so that each cluster center at any time during the clustering equals the average position of all members in the cluster at that time (including any phantom members). If

`/update`

is not specified, then the cluster centers do not move during clustering.`/iterate`

specifies that updating must be iterated until the cluster centers are stable and upon exit all data points are members of the cluster whose center is the closest one of all cluster centers.

`/iterate`

implies`/update`

.`/vocal`

specifies that the number of reclustered data points (and the number of changed clusters, if

`/iterate`

is also specified) are printed.`/quick`

specifies that only data points in clusters that were changed during the last iteration or so far during the current one should be treated during the current iteration. The other data points are left unchanged. This keyword implies

`/iterate`

and`/update`

.`/order`

specifies that there is some degree of order in

, so that there is more than a random chance that the current data point and the previous data point belong in the same cluster. The time penalty of this option is small, so it is selected by default. Specify`data``/noorder`

to deselect it. This option only affects the first iteration.`/record`

specifies that the cluster positions and sizes should be written to file

`cluster.out`after each iteration. This option has effect only if`/iterate`

was also selected.

`cluster,data,c,i`

with undefined `i`

yields cluster centers
which are not distrubuted evenly. `cluster,data,c,i,/iterate`

yields a more even distribution. For large data sets it is advised to
first use `cluster,data,c,i,/sample,/iterate`

(or with
`sample=sample_size`

) to get fairly evenly distributed cluster
centers, and then `cluster,data,c,i`

to assign all data points to
the clusters.

When random samples of data points are treated, then the current clustering algorithm is known as the Continous k-Means Algorithm.

See also: Topology

Next: colorcomponents, Previous: close, Up: Internal Routines [Contents][Index]