Saturday, July 29, 2023

Chapter 15.3 - Standard Deviation

In the previous section, we completed a discussion on mean deviation. In this section, we will see standard deviation.

• In the previous sections,we have studied two items:
   ♦ Mean deviation about Mean.
   ♦ Mean deviation about Median.
• Both have some limitations. Those limitations can be explained in 4 steps:
1. Consider the mean deviation about mean.
• We take the absolute values of the deviations. So the -ve signs are ignored.
• When a result is obtained by ignoring the signs, that result cannot be used for any further algebraic calculations.
2. Consider the mean deviation about median.
• Here also, we take the absolute values of the deviations. So the -ve signs are ignored.
• So this result is also not helpful for any further algebraic calculations.
3. Consider the two sums:
   ♦ Sum of absolute deviations about mean
   ♦ Sum of absolute deviations about median
• For any problem that we consider, the first sum mentioned above, will be greater than the second sum.
• But we divide both sums by the same quantity, which is “total number of observations”. So a confusion arises, as to which sum gives the accurate picture.
4. Consider the two series below:
   ♦ 35, 40, 47, 50, 54, 60, 65, 68, 72, 79
   ♦ 5, 8, 47, 50, 54, 60, 65, 68, 72, 79
• In the first series, the observations are more or less evenly distributed about the median.
• In the second series, the median will not represent the true nature because, two extreme values (5 and 8) are present. In such a series, if we use the deviations about median, we will get inaccurate results.


• Based on the limitations mentioned above, we can write:
An improved method is necessary to measure dispersion.
Standard deviation is such an improved method, which is widely used in science, engineering, economics, business studies and sociology.


Variance and Standard deviation

This can be explained in 22 steps:
1. Recall that, for calculating mean deviation, we used the absolute values of the deviations. This was necessary to prevent the cancellation between +ve and -ve values.
• By taking the absolute values, we could ensure that only non-negative values come into the calculations.
2. There is another method to ensure the presence of only non-negative values:
Taking squares of the deviations.
3. Consider a series with n observations.
   ♦ Let $x_1, x_2, x_3, x_4,~.~.~.~, x_n$ be those n observations.
   ♦ Let $\bar{x}$ be the mean of those n observations.
• Then the deviations will be:
$(x_1 - \bar{x}), (x_2 - \bar{x}), (x_3 - \bar{x}), (x_4 - \bar{x}),~.~.~.~, (x_n - \bar{x})$
• So the squares of the deviations will be:
$(x_1 - \bar{x})^2, (x_2 - \bar{x})^2, (x_3 - \bar{x})^2, (x_4 - \bar{x})^2,~.~.~.~, (x_n - \bar{x})^2$
4. Now we can write the sum of those squares:
$(x_1 - \bar{x})^2~+~(x_2 - \bar{x})^2~+~(x_3 - \bar{x})^2~+~(x_4 - \bar{x})^2~+~.~.~.~+~ (x_n - \bar{x})^2~=~\sum{(x_i - \bar{x})^2}$
5. Consider the sum in the above step (4).
• No term in the L.H.S can become -ve. This is because, all terms are squares.
• Each term in the L.H.S will be either zero or +ve.
• Then, if the R.H.S is zero, it means that each term in the L.H.S is zero.
6. Suppose that, the R.H.S is zero.
Then we can write:

$\begin{array}{ll}
{\text{(i)}}&{(x_1 - \bar{x})^2}
& {~=~}& {0}
&{} \\

{\Rightarrow}&{x_1 - \bar{x}}
& {~=~}& {0}
&{} \\

{\Rightarrow}&{x_1}
& {~=~}& {\bar{x}}
&{} \\

{\text{(ii)}}&{(x_2 - \bar{x})^2}
& {~=~}& {0}
&{} \\

{\Rightarrow}&{x_2 - \bar{x}}
& {~=~}& {0}
&{} \\

{\Rightarrow}&{x_2}
& {~=~}& {\bar{x}}
&{} \\

{\text{(iii)}}&{(x_3 - \bar{x})^2}
& {~=~}& {0}
&{} \\

{\Rightarrow}&{x_3 - \bar{x}}
& {~=~}& {0}
&{} \\

{\Rightarrow}&{x_3}
& {~=~}& {\bar{x}}
&{} \\

\end{array}$

- - -
- - -
so on . . .

7. Based on the above step (6), we can write:
• If the sum of the squares of the deviations is zero, then each observation will be equal to $\bar{x}$
   ♦ That means, every observation is the same.
   ♦ That means, there is no dispersion.
• So we can write:
If the sum of the squares of the deviations is zero, then the series has no dispersion.
8. Consider again, the result in (4).
• If the deviations are small, then sum of the squares of the deviations will be small.
• If the deviations are small, then the dispersion of the series is said to be small.
• So we can write:
If the sum of the squares of the deviations is small, then the series has small dispersion.
9. Consider again, the result in (4).
• If the deviations are large, then sum of the squares of the deviations will be large.
• If the deviations are large, then the dispersion of the series is said to be large.
• So we can write:
If the sum of the squares of the deviations is large, then the series has large dispersion.
10. Let us compile the results in the above steps (7), (8) and (9):
(i) If the sum of squares is zero, then there is no dispersion.
(ii) If the sum of squares is small, then the dispersion is small.
(iii) If the sum of squares is large, then the dispersion is large.
11. Based on the above step (10), we are tempted to conclude that:
Sum of squares of the deviations is a good measure of dispersion.
• But before making such a conclusion, we will look at some examples.
12. As our first example, let us consider the following series of 6 observations:
5, 15, 25, 35, 45, 55
• The table 15.17 below shows the calculations:

Table 15.17

• We can write:
   ♦ $\bar{x}~=~\frac{\sum{f_i x_i}}{\sum{f_i}}~=~\frac{180}{6}~=~30$
   ♦ $\sum{(x_i - 30)^2~=~1750}$
13. As our second example, let us consider the following series of 31 observations:
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45
• The table 15.18 below shows the calculations:

Table 15.18

• We can write:
   ♦ $\bar{x}~=~\frac{\sum{f_i x_i}}{\sum{f_i}}~=~\frac{930}{31}~=~30$
   ♦ $\sum{(x_i - 30)^2~=~2480}$


Note:
• The sum of all items in the column VI is:
(-15)2 + (15)2 + (-14)2 + (14)2 + (-13)2 + (13)2 + . . . + (-1)2 + (1)2 +
• This can be written as:
2[152 + 142 + 132 + . . .  + (1)2]
• In the above result, the portion inside square brackets is:
Sum of the squares of first n natural numbers, where n = 15.
• We have a formula for calculating that sum (details here):
$\sum\limits_{k\,=\,1}^{k\,=\,n}{k^2}~=~\frac{n(2n + 1) (n+1)}{6}$
• Substituting n = 15, we get:
$\sum\limits_{k\,=\,1}^{k\,=\,15}{k^2}~=~\frac{15(30 + 1) (15+1)}{6}~=~\frac{15 × 31  × 16}{6}~=~1240$
• So the sum in column VI = (2 × 1240) = 2480


14. Comparing the results in steps (12) and (13), and based on the knowledge that we acquired so far, we are forced to write this:
Sum of squares is larger for the second example. So the series in the second example has greater dispersion.
15. However, let us plot both the series on the same graph. It is shown in fig.15.3 below:

Fig.15.3

• The green circles denote the six observations in the first example.
• The red diamonds denote the thirty one observations in the second example.
• We see that:
   ♦ The mean in both cases is 30.
   ♦ The six observations of example 1, are dispersed along a wider length of the number line.
   ♦ The thirty one observations of example 2, are dispersed along a narrower length of the number line.
• So we can write:
Example 2 has a lesser dispersion.
16. It is clear that, what we wrote in step (14) is wrong.
◼ How did it become wrong?
• We can write a brief answer in 3 steps:
(i) In the second example, individual deviations are small.
(ii) But compared to the first example, there are a large number of observations.
(iii) So the sum of squares became larger.
17. We must not use sum of squares to measure dispersion.
• Instead, we must modify it in accordance to the number of observations.
• For applying the modification, we must divide the sum by the number of observations ($\sum{f_i}$).
• When we divide the sum by $\sum{f_i}$, we get this:
   ♦ Mean
   ♦ of the
   ♦ squares
   ♦ of the
   ♦ deviations.
18. This mean is called variance. It is denoted by $\sigma^2$. It is read as sigma square.
• We can write:
$\text{Variance}~(\sigma^2) ~=~ \frac{\sum{(x_i- \bar{x})^2}}{\sum{f_i}}$
19. Let us calculate the variance for our examples:
• The variance of our first example will be:
$\frac{1750}{6}~=~291.67$
• The variance of our second example will be:
$\frac{2480}{31}~=~80$
20. We see that, the second example has a smaller variance.
• Based on this, we can write:
Data in the second example has a smaller dispersion.
21. While dealing with variance, we encounter a difficulty related to units. It can be explained using some examples.
Example 1:
This can be written in 2 steps:
(i) Suppose that the observations in a series, are the heights of students in a class.
(ii) Then we will be using two units:
   ♦ Each observation ($x_i$) will be expressed in cm.
   ♦ The mean ($\bar{x}$) will also be expressed in cm.
   ♦ The variance ($\sigma^2$) will be expressed in cm2
Example 2:
This can be written in 2 steps:
(i) Suppose that the observations in a series, are the weights of fruits from a field.
(ii) Then we will be using two units:
   ♦ Each observation ($x_i$) will be expressed in gm.
   ♦ The mean ($\bar{x}$) will also be expressed in gm.
   ♦ The variance ($\sigma^2$) will be expressed in gm2
22. In order to avoid the usage of a unit and it's square, we take the square root of the variance.
• Remember that, there will be a +ve square root and a -ve square root. We must discard the -ve square root.
• The +ve square root of the variance is called standard deviation. It is denoted by $\sigma$.
• So we can write:
$\sigma~=~\sqrt{\frac{\sum{(x_i- \bar{x})^2}}{\sum{f_i}}}$


In the next section, we will see some solved examples.

Previous

Contents

Next

Copyright©2023 Higher secondary mathematics.blogspot.com

No comments:

Post a Comment