Boxplot
A Boxplot is another useful visualization for viewing how the data
are distributed. In descriptive statistics, a boxplot is a convenient
way of graphically depicting groups of numerical data through their five-number summaries: the smallest observation (sample minimum), lower quartile (Q1),median (Q2), upper quartile (Q3), and largest
observation (sample maximum). Boxplot
are non parametric, and displays differences between populations without making
any assumptions of the underlying statistical distribution. Boxplots can be drawn either horizontally or vertically. The spacings between the different
parts of the box help indicate the degree of dispersion (spread)
and skewness in the
data, and identify outliers.
Outliers are defined as values that do
not fall in the inner fences. Outliers
are extreme values. The asterisks or stars shown below are extreme outliers. These
represent cases/rows that have values more than three times the height of the
boxes.
Exploring different parts of the
boxplot
• The
dark line in the middle of the boxes is the median of salary.
Half of the cases/rows have a value greater than the median, and half have a
value lower. Unlike the mean, it is less influenced by cases/rows with extreme
values. In this example, the median is lower than the mean. The difference
between the mean and median indicates that there are a few cases/rows with
extreme values that are elevating the mean. That is, there are a few employees
who earn large salaries.
• The bottom of the box indicates the 25th percentile.
Twenty-five percent of cases/rows have values below the 25th percentile. The
top of the box represents the 75th percentile. Twenty-five percent of
cases/rows have values above the 75th percentile. This means that 50% of the
case/rows lie within the box. The box is much shorter for females than for
males. This is one clue that salary varies less for females than for
males. The top and bottom of the box are often called hinges.
• The T-bars that extend from the boxes are called inner
fences or whiskers.
These extend to 1.5 times the height of the box or, if no case/row has a value
in that range, to the minimum or maximum values. If the data are distributed
normally, approximately 95% or the data are expected to lie between the inner
fences. In this example, the inner fences extend less for females compared to
males, another indication that salary varies
less for females than for males.
Rahul Rauniyar
14038
Operation
Batch
Group-A
No comments:
Post a Comment