Tuesday, 4 September 2012

BA_GroupB_2ndDay


Business Analytics is not only about Statistics, we do have to use our learning, intuition and expert opinion for taking a decision. Today, we were taught how to take decisions, when chi square test is not conclusive enough. In such cases we need to include layers in addition to the rows and columns.
While analysing the Retail Data, I learned, how to select the cases based on certain criteria using the If condition. In BA, it does not happen that we can arrive at certain decisions by analysing only one layer of data, we have to dig in further and carry out analysis on the selected cases so that the root causes can be identified. In the Retail Data set, we had to select cases based on the conditions, and further analysis was done to find that the majority of the customers were not satisfied by the Clothing department and those customers were mostly females.
Further it was found that customers who were satisfied in all the stores except store 4 were from the distance of 1-5 km and that too for the clothing department.
This taught us about the factors needed to be considered for measuring and improving the customer satisfaction in any Retail Store based on the sample data.

Cluster Analysis
Cluster analysis is an exploratory data analysis tool for solving classification problems.  Its object is to sort cases (people, things, events, etc) into groups, or clusters, so that the degree of association is strong between members of the same cluster and weak between members of different clusters.  Each cluster thus describes, in terms of the data collected, the class to which its members belong; and this description may be abstracted through use from the particular to the general class or type.
The most widely used ways of doing cluster analysis are:
·         Hierarchical clustering : Done for less than 50 objects
·         K-Mean : Done for more than 50 objects
Hierarchial Clustering is based on the core idea of objects being more related to nearby objects than to objects farther away. As such, these algorithms connect "objects" to form "clusters" based on their distance. A cluster can be described largely by the maximum distance needed to connect parts of the cluster.
In K-Mean Clustering, clusters are represented by a central vector, which may not necessarily be a member of the data set. When the number of clusters is fixed to k, k-means clustering gives a formal definition as an optimization problem: find the cluster centres and assign the objects to the nearest cluster centre, such that the squared distances from the cluster are minimized.

Hierarchical Clustering
In the Hierarchical Clustering, the clustering process consists of 3 steps :
1.       Selection of Variables
2.       Distance Measurement
3.       Clustering Criteria
The Strategies for hierarchical clustering generally fall into two types:
·     Agglomerative: This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
·       Divisive: This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

Dendrogram
Dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. The dendrogram is a visual representation of the spot correlation data. The individual spots are arranged along the bottom of the dendrogram and referred to as leaf nodes. Spot clusters are formed by joining individual spots or existing spot clusters with the join point referred to as a node. This can be seen in the diagram above. At each dendrogram node we have a right and left sub-branch of clustered spots. In the following discussion, spot clusters can refer to a single spot of a group of spots. The vertical axis is labelled distance and refers to a distance measure between spots or spot clusters. The height of the node can be thought of as the distance value between the right and left sub-branch clusters.

Rajendra Kumar Das
Group B
Operations Batch - 14156




No comments:

Post a Comment