Business Analytics is not only about Statistics, we do have to use
our learning, intuition and expert opinion for taking a decision. Today, we
were taught how to take decisions, when chi square test is not conclusive
enough. In such cases we need to include layers in addition to the rows and
columns.
While analysing the Retail Data,
I learned, how to select the cases based on certain criteria using the If
condition. In BA, it does not happen that we can arrive at certain decisions by
analysing only one layer of data, we have to dig in further and carry out
analysis on the selected cases so that the root causes can be identified. In
the Retail Data set, we had to select cases based on the conditions, and
further analysis was done to find that the majority of the customers were not
satisfied by the Clothing department and those customers were mostly females.
Further it was found that
customers who were satisfied in all the stores except store 4 were from the
distance of 1-5 km and that too for the clothing department.
This taught us about the factors
needed to be considered for measuring and improving the customer satisfaction
in any Retail Store based on the sample data.
Cluster Analysis
Cluster analysis is an
exploratory data analysis tool for solving classification problems. Its object is to sort cases (people, things,
events, etc) into groups, or clusters, so that the degree of association is
strong between members of the same cluster and weak between members of
different clusters. Each cluster thus describes,
in terms of the data collected, the class to which its members belong; and this
description may be abstracted through use from the particular to the general
class or type.
The most widely used ways of
doing cluster analysis are:
·
Hierarchical clustering : Done for less than 50
objects
·
K-Mean : Done for more than 50 objects
Hierarchial Clustering is based
on the core idea of objects being more related to nearby objects than to
objects farther away. As such, these algorithms connect "objects" to
form "clusters" based on their distance. A cluster can be described
largely by the maximum distance needed to connect parts of the cluster.
In K-Mean Clustering, clusters
are represented by a central vector, which may not necessarily be a member of
the data set. When the number of clusters is fixed to k, k-means clustering
gives a formal definition as an optimization problem: find the cluster centres
and assign the objects to the nearest cluster centre, such that the squared
distances from the cluster are minimized.
Hierarchical Clustering
In the Hierarchical Clustering,
the clustering process consists of 3 steps :
1. Selection
of Variables
2. Distance
Measurement
3. Clustering
Criteria
The Strategies for hierarchical
clustering generally fall into two types:
· Agglomerative: This is a "bottom up"
approach: each observation starts in its own cluster, and pairs of clusters are
merged as one moves up the hierarchy.
· Divisive: This is a "top down"
approach: all observations start in one cluster, and splits are performed
recursively as one moves down the hierarchy.
Dendrogram
Dendrogram is a tree diagram
frequently used to illustrate the arrangement of the clusters produced by
hierarchical clustering. The dendrogram is a visual representation of the spot
correlation data. The individual spots are arranged along the bottom of the
dendrogram and referred to as leaf nodes. Spot clusters are formed by joining
individual spots or existing spot clusters with the join point referred to as a
node. This can be seen in the diagram above. At each dendrogram node we have a
right and left sub-branch of clustered spots. In the following discussion, spot
clusters can refer to a single spot of a group of spots. The vertical axis is
labelled distance and refers to a distance measure between spots or spot
clusters. The height of the node can be thought of as the distance value
between the right and left sub-branch clusters.
Rajendra Kumar Das
Group B
Operations Batch - 14156
No comments:
Post a Comment