Today we learnt and used
clustering methods and used various distances:
There are two types of clustering
techniques: Hierarchical Clustering and K-means Clustering.
Hierarchical
clustering requires a distance or similarity matrix between all pairs of cases.
That’s a humongous matrix if we have tens of thousands of cases trapped in our
data file. Even today’s computers will take pause, as will we, waiting for
results. A clustering method that doesn’t require computation of all possible
distances is k-means
clustering. It differs from hierarchical clustering in several ways. We have to
know in advance the number of clusters we want. We can’t get solutions for a
range of cluster numbers unless we rerun the analysis for each different number
of clusters. The algorithm repeatedly reassigns cases to clusters, so the same
case can move from cluster to cluster during the analysis.
The
algorithm is called k-means,
where k is the
number of clusters we want, since a case is assigned to the cluster for which
its distance to the cluster mean is the smallest. The action in the algorithm
centers around finding the k-means.
The
next segment will be about the Jaccard Distance which we used today:
The Jaccard Distance, which
measures dissimilarity between sample sets, is complementary to the Jaccard
coefficient and is obtained by subtracting the Jaccard coefficient from 1.
Team B
Author: Vivek Agarwal
No comments:
Post a Comment