After learning Hierarchical clustering, today in class k means Clustering is discussed. K
means Clustering as the name suggests is a clustering method where the data
available is classified into various clusters based on
attributes/features into K number of groups. Final required number of
clusters is chosen in such a way that the points are mutually farthest apart.
Next, it examines each component in the population and assigns it to one of the
clusters depending on the minimum distance. The centroid's position is recalculated everytime a component
is added to the cluster and this continues until all the components are grouped
into the final required number of clusters.The objective is to classify
the data into some meaningful groups and find out different relationships
between the variables and thereby formulate some strategy to enhance business.
While clustering the basic criteria is to ensure –
·
The clusters have different number of variables
or difference among each other
·
There are enough number of cases in each
cluster.
To start the process,first we need to decide on what
strategy we want to make and accordingly decide what all variables we need to
consider. After selecting the variables, following route map has to be followed
for clustering :
·
Select the variables to be used
in the cluster analysis.
·
Specify the number of clusters.
The number of clusters must be at least two and must not be greater than the
number of cases in the data file.
Here we will make sure that each cluster has got a decent
number of cases to build up a meaningful story.Once we get the clusters we get
a graph like the following using route : Graph ->Legacy dialogue->Bon
pilot ->Simple ->Define
Here the points plotted far off from the central
distribution are called Outliers and putting some condition we need to remove
these outliers.
After clustering , we use the function of Frequency to create a profiling of the
cases. The route for creating profiles using frequency is:
Analyze > Descriptive
statistics >Frequency
For this we always
choose the category Variables.
In our exercise we got two groups like the following :
Cluster 1
|
Cluster 2
|
Male-95,female-21
|
Male-24, female-3
|
Relatively more number of females
|
Relatively less number of females
|
Relatively
more educated
|
Moderately educated
|
Spends more on
value-added services
|
Spends more on voice calls and sms bills
|
More of a once
a month recharge
|
Not more than three or more recharge in a month
|
Now we use Final Cluster
function centre
table which gives the mean abundance of each species in
each of the clusters. This will enable us to give descriptive names to each
cluster based on their dominant species.
Finally depending on the clustering we try to
develop some meaning relationships & stories which ebentually help us to
formulate a strategy.
By Manas Kalita 14144
No comments:
Post a Comment