Tuesday, 11 September 2012

Boxplot_session 7&8_Group-A


Boxplot
A Boxplot is another useful visualization for viewing how the data are distributed. In descriptive statistics, a boxplot is a convenient way of graphically depicting groups of numerical data through their five-number summaries: the smallest observation (sample minimum), lower quartile (Q1),median (Q2), upper quartile (Q3), and largest observation (sample maximum). Boxplot are non parametric, and displays differences between populations without making any assumptions of the underlying statistical distribution. Boxplots can be drawn either horizontally or vertically.  The spacings between the different parts of the box help indicate the degree of dispersion (spread) and skewness in the data, and identify outliers.
Outliers are defined as values that do not fall in the inner fences. Outliers are extreme values. The asterisks or stars shown below are extreme outliers. These represent cases/rows that have values more than three times the height of the boxes.



Exploring different parts of the boxplot
• The dark line in the middle of the boxes is the median of salary. Half of the cases/rows have a value greater than the median, and half have a value lower. Unlike the mean, it is less influenced by cases/rows with extreme values. In this example, the median is lower than the mean. The difference between the mean and median indicates that there are a few cases/rows with extreme values that are elevating the mean. That is, there are a few employees who earn large salaries.
• The bottom of the box indicates the 25th percentile. Twenty-five percent of cases/rows have values below the 25th percentile. The top of the box represents the 75th percentile. Twenty-five percent of cases/rows have values above the 75th percentile. This means that 50% of the case/rows lie within the box. The box is much shorter for females than for males. This is one clue that salary varies less for females than for males. The top and bottom of the box are often called hinges.
• The T-bars that extend from the boxes are called inner fences or whiskers. These extend to 1.5 times the height of the box or, if no case/row has a value in that range, to the minimum or maximum values. If the data are distributed normally, approximately 95% or the data are expected to lie between the inner fences. In this example, the inner fences extend less for females compared to males, another indication that salary varies less for females than for males.

Rahul Rauniyar
14038
Operation Batch
Group-A

No comments:

Post a Comment