Today we learned about Clustering Analysis and
started by classifying the data using Hierarchical cluster.
There are two types of cluster analysis:
- Hierarchical
- K
means.
Hierarchical
Clustering is whole families of methods that differ by
the way distances are computed. It is based on the core idea of objects being
more related to nearby objects than to objects farther away. Apart from the
usual choice of distance functions, the user also needs to decide on the
linkage criterion (since a cluster consists of multiple objects, there are
multiple candidates to compute the distance to) to use.
Hierarchical Clustering can be further classified
into :
·
agglomerative hierarchical clustering - starting
with single elements and aggregating them into clusters
·
divisive hierarchical clustering - starting
with the complete data set and dividing it into partitions
The agglomerative hierarchical
clustering is an approach to the clustering task. In it we start with
every element of the set of interest represented in its own cluster. After that
we perform a sequence of steps that gradually merge clusters.
We used Dendogram to understand the
details on cluster formation. A Dendogram is a tree diagram frequently
used to illustrate the arrangement of the
clusters produced by hierarchical
clustering. The Dendogram is a visual representation of the
spot correlation data.
Another important component of a clustering
algorithm is the distance measure between data points. If it’s an
interval measure, Euclidean Distance is the simplest of the measures we can use
for clustering. Euclidean Distance is nothing but the length of the straight
line drawn between two points.
K-Means
Clustering is a method of cluster analysis which aims
to partition n observations into k clusters in which each observation belongs
to the cluster with the nearest mean.
Group
D
Author : Ayush Agarwal
No comments:
Post a Comment