What are measures of variation?Measures of variation describe the width of a distribution. They define how spread out the values are in a dataset. They are also referred to as measures of dispersion/spread. Show
In this article, we will look at 4 measures of variation.
We will also see examples of how to calculates these measures of variation and when to use them. But before we get started, let’s understand why we need measures of variation in addition to measures of centre when exploring data for visualization. Why do we need measures of variation?A single statistic – the mode, the median or the mean may not be a model that represents the entire dataset accurately. Anytime we use a single number to represent the data, we lose the sense of variability in the data. Do averages tell the whole story?An average is a good measure to compare performance of “a group” over time. One way to think of an average is like a snapshot of a movie. It does not tell the whole story, it just gives a snapshot of a frame. Averages ignore the impact of the inevitable variations that occur in the data. Here is an example of two sample populations with the same mean and different standard deviations. Red population has mean 100 and SD 10; blue population has mean 100 and SD 50. What are the disadvantages of averages?
Understanding Measures of VariationRangeRange is the simplest measure of variation. The range of a dataset is the difference between the highest value and the lowest value in the dataset. Range is also the most affected by outliers as it uses only the extreme values. Interquartile Range (IQR)The Interquartile Range or IQR describes the middle 50% of the values when ordered from lowest to highest value. To calculate the IQR, we find the median of the lower and upper half of the data. These are Quartile 1 and Quartile 3. The IQR is the difference between Quartile 3 and Quartile 1. IQR is considered a good measure of variation in skewed datasets as it is resistant to outliers. VarianceVariance is the average squared difference of values from the mean. To calculate variance, we square the difference between each data value and the mean. We divide the sum of these squares by the number of items in the dataset. Because variance is a squared quantity, there is no intuitive way to compare variance directly to data values or mean. Standard DeviationStandard deviation is a measure of how much data values deviate away from the mean. Larger the standard deviation, greater the amount of variation. Standard deviation is calculated as the square root of variance. Standard deviation uses the original units of data which makes interpretation easier. Hence standard deviation is the most commonly used measure of variation. Calculate Range, IQR, Standard Deviation and Variance : ExampleLet’s consider a small dataset of heights of 10 people. Here is how we can calculate the range, variance, standard deviation and interquartile range. Here is a video tutorial to learn more about calculating interquartile range. When to use Range, IQR, Standard Deviation and Variance?
Which measure of variation is resistant to outliers?The standard deviation is resistant to outliers.
Which of the following is a measure of central tendency that is resistant to outliers?For distributions that have outliers or are skewed, the median is often the preferred measure of central tendency because the median is more resistant to outliers than the mean.
Which statistics are resistant to outliers?When a statistic changes because of a “rogue” data point, your result can be far from the true value you're trying to estimate. This bias can be avoided if you eliminate outliers or use weighting factorsto minimize the damage they cause. The median is a resistant statistic.
Is the variance resistant to outliers?Neither the standard deviation nor the variance is robust to outliers. A data value that is separate from the body of the data can increase the value of the statistics by an arbitrarily large amount. The mean absolute deviation (MAD) is also sensitive to outliers.
|