K-MEANS
CLUSTERING
Clustering
: A
process of partitioning or grouping a given set of patterns into disjoint
clusters. This is done such that patterns in the same cluster are alike and
patterns belonging to two different clusters are different.
K-means
clustering: An algorithm to classify or to group
your objects based on attributes/features into K number of group. K is positive
integer number. The grouping is done by minimizing the sum of squares of distances
between data and the corresponding cluster centroid. Thus, the purpose of
K-mean clustering is to classify the data into different groupings &
thereby finding out relationships between different variables to formulate
strategy.
Two
criteria for clustering:
- · Each cluster should have enough number of cases
- · There should be difference among clusters.
To Obtain a K-Means Cluster Analysis
From the menus choose:
·
Select the variables to be used in the cluster
analysis.
·
Specify the number of clusters. The number of
clusters must be at least two and must not be greater than the number of cases
in the data file.
Select either Iterate and
classify or Classify only
Objective identified: To find out the cluster which gives maximum revenue.
First identification of variables: Variables should be linked to the objective. From the file cell-inter selected variables are monthly expenditure, fixed component, voice calls bill, sms bill and other charges.
Then we checked for the outliers. We opened Graph ->Legacy dialogue->Bon pilot
->Simple ->Define
The graph shows median, stars (extremes),
whiskers. We should remove the outliers. This will be done by using the condition IF [Monthly expenditure <600]. After
removal three clusters become better. We analyzed that maximum revenue can be
generated more by the 1st cluster (339.5*116)
Now after this we saved the k-means cluster and thus
we got to know which customers are in which clusters.
Comparison of cluster1 with cluster 2:
Cluster 1
|
Cluster 2
|
Male-95,female-21
|
Male-24,
female-3
|
Relatively
more number of females
|
Relatively
less number of females
|
Relatively
more educated
|
Moderately
educated
|
Spends
more on value-added services
|
Spends
more on voice calls and sms bills
|
More
of a once a month recharge
|
Not
more than three or more recharge in a month
|
The "final cluster centres" table gives
the mean abundance of each species in each of the clusters. This will enable
you to give descriptive names to each cluster based on their dominant species.
Benefits:
- · With a large number of variables, K-Means may be computationally faster than hierarchical clustering (if K is small).
- · K-Means may produce tighter clusters than hierarchical clustering, especially if the clusters are globular.
Submitted by:
Somya shraddha
14166
HR-Group G
No comments:
Post a Comment