SIBM B- Business Analytics

Wednesday, 5 September 2012

Group A- Session 5 & 6

Today we continued to learn about the cluster analysis but we did practical aspect of hierarchical clustering.

Hierarchical clustering is usually used for variables not for cases. Variables contain less than fifty objects whereas cases contain more than fifty objects.

In hierarchical clustering, we learnt mainly about two tools to find the relations between variables and on the basis of these; we should be able to take business decisions which would be helpful. Those two tools are

a) Dendrogram

b) Proximity Matrix

Dendrogram –

A Dendrogram (from Greek dendron "tree", -gramma "drawing") is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. The Dendrogram is a visual representation of the spot correlation data. The individual spots are arranged along the bottom of the Dendrogram and referred to as leaf nodes. Spot clusters are formed by joining individual spots or existing spot clusters with the join point referred to as a node. This can be seen in the diagram below. The vertical axis is labelled distance and refers to a distance measure between spots or spot clusters. The height of the node can be thought of as the distance value between the right and left sub-branch clusters.

It becomes difficult to interpret distance between spot clusters when spot clusters increase in size. A possible way to think about the expression profile behavior of two spots would be to see how far up the Dendrogram you need to go so you can move between the two spots. In the Dendrogram above, you see that to get from the spot on the left to the spot in the middle, you need to move up a distance of 0.6 (just follow the branches).

Proximity Matrix –

It is the output of an SPSS distance matrix. The matrix is symmetric, meaning that the numbers on the lower half will be the same as the numbers in the top half. Quite often only the lower half of a symmetric matrix is displayed, with other information being displayed in the upper half (such as a combination between distances and correlation coefficients).

Proximity is thought of as a similarity if the larger the value for a pair of objects, the closer or more alike we think they are. Proximity is dissimilarity if the smaller the value for a pair of objects, the closer or more alike we think of them. Proximities are normally symmetric, so that the proximity of object a to object b is the same as the proximity of object b to object a.

Finally we learnt about the OLAP cube.

OLAP – Online Analytical Processing

The OLAP (Online Analytical Processing) Cubes procedure calculates totals, means, and other univariate statistics for continuous summary variables within categories of one or more categorical grouping variables. A separate layer in the table is created for each category of each grouping variable. Summary and intervals are continuous variables and scale whereas category and grouping variables may be ordinal and nominal.

Double-click on the table and a drop-down menu will appear. As a default it shows the total but you can pull down on it and select any individual category.

Submitted By:

Naquib Ahmed(14028)

Operations, Group-A

SIBM B- Business Analytics

Wednesday, 5 September 2012

No comments:

Post a Comment

About Me