Wednesday, 5 September 2012

Business Analytics Session 5&6 group G

Malvika Bhagat
Group G
14087

Today we started our session by analyzing "The mobile services": which features of the phone are being used by the customer & later on built a story on it using the OLAP Cube.

We started by classifying the data using Hierarchical cluster.
In data mining hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types: 
1) Agglomerative: This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. 
2) Divisive: This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

As the no. of objects should be less than 50 we analyzed on variables rather than cases.
The three parameters of clustering:
1) Selection of variable. (Objective: For what purpose one is forming these groups)
2) Distance measurement: Proximity Matrix.                      3) Clustering criteria.

We also used Dendogram to know the details/order on cluster formation: A dendrogram (from Greek dendron "tree", -gramma "drawing") is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. Dendrograms are often used in computational biology to illustrate the clustering of genes or samples.

Binary Jaccard:
The Jaccard index, also known as the Jaccard similarity coefficient (originally coined coefficient de communauté by Paul Jaccard), is a statistic used for comparing the similarity and diversity of sample sets.
The Jaccard coefficient measures similarity between sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:
 J(A,B) = {{|A \cap B|}\over{|A \cup B|}}.

The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from 1, or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union:
 J_{\delta}(A,B) = 1 - J(A,B) = { { |A \cup B| - |A \cap B| } \over |A \cup B| }.
Level refers to the general size of the numbers and is measured by the mean of all the values. Amplitude refers to the extremeness or variability of the numbers and is measured by the standard deviation. Pattern refers to the sequence of ups and downs in the values as we move from case to case. It is not measureable in isolation. We can ask whether two profiles have the same pattern, and even how different they are from each other, but there is no monadic measurement of pattern.


The Euclidean distance between two profiles is a function of differences in mean, differences in amplitude, and differences in pattern. Only if two profiles are the same across all three aspects will Euclidean distance say they are the same.Euclidean distance is defined as the square root of the sum of squared differences between two profiles.
 
Thereafter we did the OLAP Cube.
An OLAP cube is an array of data that is understood in terms of its 0 or more dimensions. OLAP is an acronym for online analytical processing. OLAP is a computer-based technique for analyzing business data in the search for business intelligence.




A cube can be thought of as a generalization of a two-dimensional spreadsheet. For example a company might wish to summarize financial data by product, by time-period, by city to compare actual and budget expenses. Product, time, city and scenario (actual and budget) are the data's dimensions.

The user-initiated process of navigating by calling for page displays interactively, through the specification of slices via rotations and drill down/up is sometimes called "slice and dice". Common operations include slice and dice, drill down, roll up, and pivot.

OLAP Cube consists of:
1) Summary: Gives info in the item.
2) Grouping/category: Specifies various components of item.

No comments:

Post a Comment