In the third session of
Business Analytics we continued with the first level of analysis. We worked on
frequencies and crosstabs using SPSS, which is one of the well known software packages
used by companies. We had data about a chain of retail stores such as customer
purchases in departments such as clothing, appliances and stationary, gender, frequency
of shopping, mode of payment and follow up by customers. We tried to establish
relationships between different variables, seeing how customers feel about the
service at stores (negative or positive). Through this exercise we tried to
find where the problems were, where it can be improved by the store managers.
We worked on finding
the problem areas in the stores using crosstabs, frequencies. Using these, we
arrived at computed results and analyzed them. We even got chi square
calculated which is something major in our analysis. The significance level arrived
by the chi square calculation showed us whether our null hypothesis is correct
or not. In the chi square test, if the outcome is greater than 0.05 the null
hypothesis is correct, which also means that the variables compared are not
related.
In the second lecture,
we went through a slideshow of cluster analysis. As learnt from Wikipedia cluster
analysis is the work of putting a set of items or objects into clusters or
groups. This is done to put similar items in the same cluster. Clustering is a
major task of explorative data mining, a commonly used method for analyzing
data in statistics. This is used in many fields such as image analysis,
information retrieval and bioinformatics.
Connectivity based
clustering which is also known as hierarchical clustering is based on the main
idea of items being related to closer objects than faraway ones. A cluster can
be understood greatly by the maximum distance needed to connect parts of the
cluster. At varied distances, different clusters will be formed. This can be
represented by a dendrogram. A dendrogram is a very critical element. It is a
tree diagram used a lot to show the arrangement of the clusters formed from
hierarchical clustering.
Hierarchical clustering
can be the following:-
·
Agglomerative,
starting with single elements and putting them into clusters
·
Divisive,
starting with the complete data set and dividing it into partitions
Centroid based
clustering is another type in analysis. Clusters are shown by a central vector,
which may not be a member of the data set. The number of clusters is fixed to K.
K-means clustering gives a formal definition as an optimization problem, to
find the k cluster centers and assign the objects to the nearest center, so
that the squared distances from the cluster are minimized.
We studied Euclidean
distance which is straight line distance. In the clustering process the
following steps are involved:-
·
Selection of
variables
·
Distance
measurement
·
Clustering
criteria
Aditya Kannan
Human Resources
Roll Number 14062
No comments:
Post a Comment