Monday, 10 September 2012

K-MEANS CLUSTERING[Class 7&8-Group G]


K-MEANS CLUSTERING

Clustering : A process of partitioning or grouping a given set of patterns into disjoint clusters. This is done such that patterns in the same cluster are alike and patterns belonging to two different clusters are different.

K-means clustering: An algorithm to classify or to group your objects based on attributes/features into K number of group. K is positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus, the purpose of K-mean clustering is to classify the data into different groupings & thereby finding out relationships between different variables to formulate strategy.

Two criteria for clustering:

  • ·         Each cluster should have enough number of cases
  • ·         There should be difference among clusters.

To Obtain a K-Means Cluster Analysis

From the menus choose:
·         Select the variables to be used in the cluster analysis.
·         Specify the number of clusters. The number of clusters must be at least two and must not be greater than the number of cases in the data file.
Select either Iterate and classify or Classify only

Objective identified: To find out the cluster which gives maximum revenue.

First identification of variables: Variables should be linked to the objective. From the file cell-inter selected variables are monthly expenditure, fixed component, voice calls bill, sms bill and other charges.
  
Then we checked for the outliers. We opened Graph ->Legacy dialogue->Bon pilot ->Simple ->Define


The graph shows median, stars (extremes), whiskers. We should remove the outliers. This will be done by using the condition IF [Monthly expenditure <600]. After removal three clusters become better. We analyzed that maximum revenue can be generated more by the 1st cluster (339.5*116)
Now after this we saved the k-means cluster and thus we got to know which customers are in which clusters.

Comparison of cluster1 with cluster 2:
                            Cluster 1
Cluster 2
Male-95,female-21
Male-24, female-3
Relatively more number of females
Relatively less number of females
Relatively more educated                     
Moderately educated
Spends more on value-added services
Spends more on voice calls and sms bills
More of a once a month recharge
Not more than three or more recharge in a month

The "final cluster centres" table gives the mean abundance of each species in each of the clusters. This will enable you to give descriptive names to each cluster based on their dominant species.

Benefits:
  • ·         With a large number of variables, K-Means may be computationally faster than hierarchical clustering (if K is small).
  • ·         K-Means may produce tighter clusters than hierarchical clustering, especially if the clusters are globular.


Submitted by:
Somya shraddha
14166
HR-Group G

No comments:

Post a Comment