Sunday, October 20, 2019
Sum of Squares Formula Shortcut
Sum of Squares Formula Shortcut          The calculation of a sample variance or standard deviation is typically stated as a fraction. The numerator of this fraction involves a sum of squared deviations from the mean. In statistics, the formula for this total sum of squares is         à £ (xi - xÃâ)2         Here the symbol xÃâ refers to the sample mean, and the symbol à £ tells us to add up the squared differences (xi - xÃâ) for all i.         While this formula works for calculations, there is an equivalent, shortcut formula that does not require us to first calculate the sample mean. This shortcut formula for the sum of squares is         à £(xi2)-(à £ xi)2/n         Here the variable n refers to the number of data points in our sample.          Standard Formula Example      To see how this shortcut formula works, we will consider an example that is calculated using both formulas. Suppose our sample is 2, 4, 6, 8. The sample mean is (2  4  6  8)/4  20/4  5. Now we calculate the difference of each data point with the mean 5.         2 ââ¬â 5  -34 ââ¬â 5  -16 ââ¬â 5  18 ââ¬â 5  3         We now square each of these numbers and add them together. (-3)2  (-1)2  12  32  9  1  1  9  20.          Shortcut Formula Example      Now we will use the same set of data: 2, 4, 6, 8, with the shortcut formula to determine the sum of squares. We first square each data point and add them together: 22  42  62  82  4  16  36  64  120.         The next step is to add together all of the data and square this sum: (2  4  6  8)2  400. We divide this by the number of data points to obtain 400/4 100.         We now subtract this number from 120. This gives us that the sum of the squared deviations is 20. This was exactly the number that we have already found from the other formula.          How Does This Work?      Many people will just accept the formula at face value and do not have any idea why this formula works. By using a little bit of algebra, we can see why this shortcut formula is equivalent to the standard, traditional way of calculating the sum of squared deviations.         Although there may be hundreds, if not thousands of values in a real-world data set, we will assume that there are only three data values: x1 , x2, x3. What we see here could be expanded to a data set that has thousands of points.         We begin by noting that( x1  x2  x3)  3 xÃâ. The expression à £(xi - xÃâ)2  (x1 - xÃâ)2  (x2 - xÃâ)2  (x3 - xÃâ)2.         We now use the fact from basic algebra that (a  b)2  a2 2ab  b2. This means that (x1 - xÃâ)2  x12 -2x1 xÃâ xÃâ2. We do this for the other two terms of our summation, and we have:         x12 -2x1 xÃâ xÃâ2  x22 -2x2 xÃâ xÃâ2  x32 -2x3 xÃâ xÃâ2.         We rearrange this and have:         x12 x22  x32 3xÃâ2 - 2xÃâ(x1  x2  x3) .         By rewriting (x1  x2  x3)  3xÃâ the above becomes:         x12 x22  x32 - 3xÃâ2.         Now since 3xÃâ2  (x1 x2  x3)2/3, our formula becomes:         x12 x22  x32 - (x1 x2  x3)2/3         And this is a special case of the general formula that was mentioned above:         à £(xi2)-(à £ xi)2/n          Is It Really a Shortcut?      It may not seem like this formula is truly a shortcut. After all, in the example above it seems that there are just as many calculations. Part of this has to do with the fact that we only looked at a sample size that was small.         As we increase the size of our sample, we see that the shortcut formula reduces the number of calculations by about half. We do not need to subtract the mean from each data point and then square the result. This cuts down considerably on the total number of operations.    
Subscribe to:
Post Comments (Atom)
 
 
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.