Wednesday, 5 September 2012

Hierarchical Clustering - Group D


Today we learned about Clustering Analysis and started by classifying the data using Hierarchical cluster.
There are two types of cluster analysis:
  • Hierarchical 
  • K means. 
Hierarchical Clustering is whole families of methods that differ by the way distances are computed. It is based on the core idea of objects being more related to nearby objects than to objects farther away. Apart from the usual choice of distance functions, the user also needs to decide on the linkage criterion (since a cluster consists of multiple objects, there are multiple candidates to compute the distance to) to use.

Hierarchical Clustering can be further classified into :

·        agglomerative hierarchical clustering - starting with single elements and aggregating them into clusters
·        divisive hierarchical clustering - starting with the complete data set and dividing it into partitions

The agglomerative hierarchical clustering is an approach to the clustering task. In it we start with every element of the set of interest represented in its own cluster. After that we perform a sequence of steps that gradually merge clusters.

We used Dendogram to understand the details on cluster formation. A Dendogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. The Dendogram is a visual representation of the spot correlation data.



Another important component of a clustering algorithm is the distance measure between data points. If it’s an interval measure, Euclidean Distance is the simplest of the measures we can use for clustering. Euclidean Distance is nothing but the length of the straight line drawn between two points.

K-Means Clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean.


Group D
Author : Ayush Agarwal


No comments:

Post a Comment