Canopy clustering algorithm

Canopy clustering algorithm

The canopy clustering algorithm is an unsupervised clustering algorithm related to the K-means algorithm.

It is intended to speed up clustering operations on large data sets, where using another algorithm directly may be impractical because of the size of the data set.

The algorithm proceeds as follows:
* Cheaply partition the data into overlapping subsets, called 'canopies'
* Perform more expensive clustering, but only within these canopies

Benefits

* The number of instances of training data that must be compared at each step is reduced
* There is some evidence that the resulting clusters are improved

References

[http://www.kamalnigam.com/papers/canopy-kdd00.pdf McCallum, Nigamy and Ungar: "Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching"]

External links

* [http://www.youtube.com/watch?v=1ZDybXl212Q Cluster Computing and MapReduce Lecture 4] from Google

ee also

* Data clustering
* K-means algorithm
* Linde-Buzo-Gray algorithm


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • k-means clustering — In statistics and data mining, k means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. This results into a partitioning of… …   Wikipedia

  • Linde-Buzo-Gray algorithm — The Linde Buzo Gray algorithm is a vector quantization algorithm to derive a good codebook.It is similar to the k means method in data clustering.The algorithm At each iteration, each vector is split into two new vectors.*A initial state:… …   Wikipedia

  • List of mathematics articles (C) — NOTOC C C closed subgroup C minimal theory C normal subgroup C number C semiring C space C symmetry C* algebra C0 semigroup CA group Cabal (set theory) Cabibbo Kobayashi Maskawa matrix Cabinet projection Cable knot Cabri Geometry Cabtaxi number… …   Wikipedia

  • Cluster analysis — The result of a cluster analysis shown as the coloring of the squares into three clusters. Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”