In the 3rd & 4th lecture of Business Analytics we studied the clustering analysis done in order to form small groups to analyse the data. Cluster analysis can be done in many ways but mostly the following two types/models are used :
- Hierarchial ( used for less than 500 objects)
- K-Means (used for more than 500 objects)
There are 3 steps in the process of clustering. They are:
- Selection of Variable
- Distance measurement
- Clustering criteria
After selection of appropriate variable , Distance measurement is carried out by combining the data into small groups called clusters. In order to determine the distance between the clusters we use many techniques/methods like,
- Nearest Neighbour clustering
- Furthest Neighbour Clustering
- Centroid Clustering
- Between Group Clustering
Nearest Neighbour Clustering:
It is also called as the Single Linkage clustering. In Nearest neighbour clustering the distance between two clusters is defined as the smallest distance between two cases in the different clusters.Each cluster comprises of more than 1 cases. So the shortest distance between any 2 cases of different cluster determines the distance between the clusters.
Furthest neighbour Clustering:
It is also called as the complete Linkage clustering. As opposite to nearest neighbour, here the distance between two clusters is defined as the furthest distance between two cases in different clusters. It means that the longest distance between any 2 cases of different clusters is considered to be the distance between the clusters.
Centroid Clustering:
Centroid clustering is a squared Euclidean distance measuring method.The centroid of a cluster is a point whose parameter values are the
mean of the parameter values of all the points in the clusters. This method calculates the distance between two clusters as the mean of the sum of distances between cluster, for all of the variables or cases. In the centroid method, the centroid of a merged cluster is a weighted combination of the centroids of the two individual clusters, where the weights are proportional to the sizes of the clusters.
Average Linkage Between Group Clustering:
The average-linkage-between-groups method, defines the distance between two clusters as the average of the distances between all pairs of cases of a cluster to other cases of other cluster. Here we
consider the distance between one cluster and another cluster to be equal to
the average distance from any case of one cluster to all cases of the other
cluster. In other words, case 1 of one cluster has its distance from cases 5,6,7 of other cluster, then distance between the cluster is taken as the average of all the diatances of 5,6,7 from case 1.
It uses information about all pairs of distances, not just the nearest or the furthest. Hence, it is usually preferred over the single and complete linkage methods for cluster analysis.
Submitted by-
Rohan Moon
Roll no- 14159
Operations Batch
Group D
No comments:
Post a Comment