Tuesday, 4 September 2012

First level of analysis


In the third session of Business Analytics we continued with the first level of analysis. We worked on frequencies and crosstabs using SPSS, which is one of the well known software packages used by companies. We had data about a chain of retail stores such as customer purchases in departments such as clothing, appliances and stationary, gender, frequency of shopping, mode of payment and follow up by customers. We tried to establish relationships between different variables, seeing how customers feel about the service at stores (negative or positive). Through this exercise we tried to find where the problems were, where it can be improved by the store managers.

We worked on finding the problem areas in the stores using crosstabs, frequencies. Using these, we arrived at computed results and analyzed them. We even got chi square calculated which is something major in our analysis. The significance level arrived by the chi square calculation showed us whether our null hypothesis is correct or not. In the chi square test, if the outcome is greater than 0.05 the null hypothesis is correct, which also means that the variables compared are not related.

In the next lecture, we went through a slideshow of cluster analysis. As learnt from Wikipedia cluster analysis is the work of putting a set of items or objects into clusters or groups. This is done to put similar items in the same cluster. Clustering is a major task of explorative data mining, a commonly used method for analyzing data in statistics. This is used in many fields such as image analysis, information retrieval and bioinformatics.

Connectivity based clustering which is also known as hierarchical clustering is based on the main idea of items being related to closer objects than faraway ones. A cluster can be understood greatly by the maximum distance needed to connect parts of the cluster. At varied distances, different clusters will be formed. This can be represented by a dendrogram. A dendrogram is a very critical element. It is a tree diagram used a lot to show the arrangement of the clusters formed from hierarchical clustering.

Hierarchical clustering can be the following:-
·         Agglomerative, starting with single elements and putting them into clusters
·         Divisive, starting with the complete data set and dividing it into partitions

Centroid based clustering is another type in analysis. Clusters are shown by a central vector, which need not be a member of the data set. The number of clusters is fixed to K. K-means clustering gives a formal definition as an optimization problem, to find the k cluster centers and assign the objects to the nearest center, so that the squared distances from the cluster are minimized.

We studied Euclidean distance which is straight line distance. In the clustering process the following steps are involved:-
·         Selection of variables
·         Distance measurement
·         Clustering criteria

Aditya Kannan
Human Resources Batch
Roll Number 14062

No comments:

Post a Comment