Monday, 10 September 2012

BA group B class 7&8


After learning Hierarchical clustering, today in class k means Clustering is discussed. K means Clustering as the name suggests is a clustering method where the data available is classified into various clusters based on attributes/features into K number of groups. Final required number of clusters is chosen in such a way that the points are mutually farthest apart. Next, it examines each component in the population and assigns it to one of the clusters depending on the minimum distance. The centroid's  position is recalculated everytime a component is added to the cluster and this continues until all the components are grouped into the final required number of clusters.The objective is to classify the data into some meaningful groups and find out different relationships between the variables and thereby formulate some strategy to enhance business.
While clustering the basic criteria is to ensure –
·         The clusters have different number of variables or difference among each other
·         There are enough number of cases in each cluster.
To start the process,first we need to decide on what strategy we want to make and accordingly decide what all variables we need to consider. After selecting the variables, following route map has to be followed for clustering :
·         Select the variables to be used in the cluster analysis.
·         Specify the number of clusters. The number of clusters must be at least two and must not be greater than the number of cases in the data file.
Here we will make sure that each cluster has got a decent number of cases to build up a meaningful story.Once we get the clusters we get a graph like the following  using route : Graph ->Legacy dialogue->Bon pilot ->Simple ->Define

           
Here the points plotted far off from the central distribution are called Outliers and putting some condition we need to remove these outliers.
After clustering , we use the function of Frequency to create a profiling of the cases. The route for creating profiles using frequency is:
Analyze > Descriptive statistics >Frequency
For this we always choose the category Variables.
In our exercise we got two groups like the following :
                            Cluster 1
Cluster 2
Male-95,female-21
Male-24, female-3
Relatively more number of females
Relatively less number of females
Relatively more educated                     
Moderately educated
Spends more on value-added services
Spends more on voice calls and sms bills
More of a once a month recharge
Not more than three or more recharge in a month

Now we use Final Cluster function centre table which gives the mean abundance of each species in each of the clusters. This will enable us to give descriptive names to each cluster based on their dominant species.
Finally depending on the clustering we try to develop some meaning relationships & stories which ebentually help us to formulate a strategy.

By Manas Kalita 14144

No comments:

Post a Comment