The Box Plot
The box plot (a.k.a. box and whisker diagram) is a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile, and maximum. In the simplest box plot the central rectangle spans the first quartile to the third quartile (the interquartile range or IQR). A segment inside the rectangle shows the median and "whiskers" above and below the box show the locations of the minimum and maximum.
This simplest possible box plot displays the full range of variation (from min to max), the likely range of variation (the IQR), and a typical value (the median). Not uncommonly real datasets will display surprisingly high maximums or surprisingly low minimums called outliers.
This simplest possible box plot displays the full range of variation (from min to max), the likely range of variation (the IQR), and a typical value (the median). Not uncommonly real datasets will display surprisingly high maximums or surprisingly low minimums called outliers.
- Outliers (Extreme outliers) are either 3×IQR or more above the third quartile or 3×IQR or more below the first quartile.
- Suspected outliers (Outliers) are slightly more central versions of outliers: either 1.5×IQR or more above the third quartile or 1.5×IQR or more below the first quartile.
An example of Box plot in SPSS
· The dark line in the middle of the boxes is the median of salary. Half of the cases/rows have a value greater than the median, and half have a value lower. Like the mean, the median is a measure of central tendency. Unlike the mean, it is less influenced by cases/rows with extreme values. The difference between the mean and median indicates that there are a few cases/rows with extreme values that are elevating the mean. That is, there are a few employees who earn large salaries.
• The bottom of the box indicates the 25th percentile. Twenty-five percent of cases/rows have values below the 25th percentile. The top of the box represents the 75th percentile. Twenty-five percent of cases/rows have values above the 75th percentile. This means that 50% of the case/rows lie within the box. The box is much shorter for females than for males. This is one clue that salary varies less for females than for males. The top and bottom of the box are often called hinges.
• The T-bars that extend from the boxes are called inner fences or whiskers. These extend to 1.5 times the height of the box or, if no case/row has a value in that range, to the minimum or maximum values. If the data are distributed normally, approximately 95% or the data are expected to lie between the inner fences. In this example, the inner fences extend less for females compared to males, another indication that salary varies less for females than for males.
• The points are outliers. These are defined as values that do not fall in the inner fences. Outliers are extreme values. The asterisks or stars are extreme outliers. These represent cases/rows that have values more than three times the height of the boxes. There are several outliers for both females and males. The mean is greater than the median. The greater mean is caused by these outliers.
Submitted by
Ruchika.M.S
HR – Group F
Roll No. 14044
References
1.http://www.physics.csbsju.edu/stats/box2.html
2.http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Fgraphboard_creating_examples_boxplot.htm
No comments:
Post a Comment