Group B
Written by Ashim Abhinav Ekka (14133)
Cluster Analysis: Also
called as clustering, is the task of assigning a set of cases into clusters so
that the cases in the same cluster are more similar to each other than to those
in other clusters.
Here we see 4 different clusters in which one cluster will
be different to other in some aspect
|
The objective of clustering is to determine the intrinsic
grouping in a set of unlabeled data in large numbers of variables and
observation.
Hierarchical clustering: A method of cluster analysis which seeks to build a hierarchy of
clusters. It is most appropriate for small samples. When the sample is large,
the algorithm may be very slow to reach a solution. In general, users should
consider K-Means Cluster when the sample size is larger than 200. It is of two
types-
·
Agglomerative (bottom-up approach): every observation start in its own cluster
and the pairs of clusters are merged as one moves up the hierarchy.
·
Divisive (top-down approach): all observations start in only one cluster
and splits are performed recursively as one moves down the hierarchy
The results of hierarchical clustering can be presented in a
dendrogram.
Distance measure of hierarchical clusters:
·
Interval – Euclidean
·
Counts – Chi Sq
·
Binary - Jaccard
Different cluster methods:
·
Nearest neighbour: In this method, the distance
between two clusters is taken to be the distance between their closest neighbouring
objects. This method is recommended if plotted clusters are elongated.
·
Furthest neighbour: In this method, the distance
between two clusters is the maximum distance between two objects in different
clusters. This method is recommended if the plotted clusters form distinct
clumps (not elongated chains).
·
Group average: In this method, the distance
between two clusters is calculated as the average distance between all pairs of
objects in the different clusters. This method is usually recommended as it
makes use of more information.
·
Centroid: The cluster to be merged is the one
with the smallest sum of distances between the centroid for all variables. The
centroid of a cluster is the average point in the multidimensional space.
·
Median: This method is identical to the Centroid
method but is unweighted. It should not be used when cluster sizes vary
markedly.
Example of hierarchical clustering:
In Marketing: finding groups of customers with similar behaviour
given a large database of customer data containing their properties and past
buying records
Reference:
http://www.wikipedia.org/
http://www.originlab.com
No comments:
Post a Comment