Group A- Session 5 & 6
Today we continued to learn about the cluster analysis but we did
practical aspect of hierarchical clustering.
Hierarchical
clustering is usually used for variables not for cases. Variables contain less
than fifty objects whereas cases contain more than fifty objects.
In
hierarchical clustering, we learnt mainly about two tools to find the relations
between variables and on the basis of these; we should be able to take business
decisions which would be helpful. Those two tools are
a)
Dendrogram
b)
Proximity Matrix
Dendrogram –
A
Dendrogram (from Greek dendron "tree", -gramma "drawing")
is a tree diagram frequently used to illustrate the arrangement of the clusters
produced by hierarchical clustering. The Dendrogram is a visual representation
of the spot correlation data. The individual spots are arranged along the
bottom of the Dendrogram and referred to as leaf nodes. Spot clusters are
formed by joining individual spots or existing spot clusters with the join point
referred to as a node. This can be seen in the diagram below. The vertical axis
is labelled distance and refers to a distance measure between spots or spot
clusters. The height of the node can be thought of as the distance value
between the right and left sub-branch clusters.
It
becomes difficult to interpret distance between spot clusters when spot
clusters increase in size. A possible way to think about the expression profile behavior of two spots would be to see how far up the Dendrogram you need to go
so you can move between the two spots. In the Dendrogram above, you see that to
get from the spot on the left to the spot in the middle, you need to move up a
distance of 0.6 (just follow the branches).
Proximity Matrix –
It
is the output of an SPSS distance matrix. The matrix is symmetric, meaning that
the numbers on the lower half will be the same as the numbers in the top half.
Quite often only the lower half of a symmetric matrix is displayed, with other
information being displayed in the upper half (such as a combination between
distances and correlation coefficients).
Proximity is thought of as a
similarity if the larger the value for a pair of objects, the closer or more
alike we think they are. Proximity is dissimilarity if the smaller the value
for a pair of objects, the closer or more alike we think of them. Proximities
are normally symmetric, so that the proximity of object a to object b is the same as the proximity of
object b to object a.
Finally
we learnt about the OLAP cube.
OLAP – Online Analytical Processing
The
OLAP (Online Analytical Processing) Cubes procedure calculates totals, means,
and other univariate statistics for continuous summary variables within
categories of one or more categorical grouping variables. A separate layer in
the table is created for each category of each grouping variable. Summary and
intervals are continuous variables and scale whereas category and grouping
variables may be ordinal and nominal.
Double-click
on the table and a drop-down menu will appear. As a default it shows the total
but you can pull down on it and select any individual category.
Submitted By:
Naquib Ahmed(14028)
Operations, Group-A
No comments:
Post a Comment