Which of the following is a measure of variation that is resistant to outliers?

Q: Which statistics are resistant to outliers?

When a statistic changes because of a rogue” data point, your result can be far from the true value youre trying to estimate. This bias can be avoided if you eliminate outliers or use weighting factorsto minimize the damage they cause. The median is a resistant statistic.

What are measures of variation?

Measures of variation describe the width of a distribution. They define how spread out the values are in a dataset. They are also referred to as measures of dispersion/spread.

Inhaltsverzeichnis Show

What are measures of variation?
Why do we need measures of variation?
Do averages tell the whole story?
What are the disadvantages of averages?
Understanding Measures of Variation
Interquartile Range (IQR)
Standard Deviation
Calculate Range, IQR, Standard Deviation and Variance : Example
When to use Range, IQR, Standard Deviation and Variance?
Which measure of variation is resistant to outliers?
Which of the following is a measure of central tendency that is resistant to outliers?
Which statistics are resistant to outliers?
Is the variance resistant to outliers?

In this article, we will look at 4 measures of variation.

Range
Interquartile Range (IQR)
Variance
Standard Deviation

We will also see examples of how to calculates these measures of variation and when to use them. But before we get started, let’s understand why we need measures of variation in addition to measures of centre when exploring data for visualization.

Why do we need measures of variation?

A single statistic – the mode, the median or the mean may not be a model that represents the entire dataset accurately. Anytime we use a single number to represent the data, we lose the sense of variability in the data.

Do averages tell the whole story?

An average is a good measure to compare performance of “a group” over time. One way to think of an average is like a snapshot of a movie. It does not tell the whole story, it just gives a snapshot of a frame. Averages ignore the impact of the inevitable variations that occur in the data.

Here is an example of two sample populations with the same mean and different standard deviations. Red population has mean 100 and SD 10; blue population has mean 100 and SD 50.

What are the disadvantages of averages?

Averages do not tell the whole story
Averages are susceptible to outliers. Outliers skew the averages and pull them in their direction. Averages make us believe data points are clustered around a point higher or lower than where they truly cluster.
Averages do not account for segments in data.

“…, many people tend to think of “average” as “typical;” the problem with that assumption in many data sets — certainly those involving humans — is there are many, many, exceptions to “typical.” One could even argue there is no such thing as “typical.” For example, one commonly misused statistic is the average divorce rate in the United States, which is often cited as “about 50%.” If one researches the question, they will discover average divorce rates vary widely based on factors such as age, level of educational attainment, profession, race, geography, religious beliefs, etc. As such, everyone is categorized differently by these factors resulting in many different average divorce rates depending on which factors describe those being measured.”
towardsdatascience.com

Understanding Measures of Variation

Range

Range is the simplest measure of variation. The range of a dataset is the difference between the highest value and the lowest value in the dataset.

Range is also the most affected by outliers as it uses only the extreme values.

Interquartile Range (IQR)

The Interquartile Range or IQR describes the middle 50% of the values when ordered from lowest to highest value.

To calculate the IQR, we find the median of the lower and upper half of the data. These are Quartile 1 and Quartile 3. The IQR is the difference between Quartile 3 and Quartile 1.

IQR is considered a good measure of variation in skewed datasets as it is resistant to outliers.

Variance

Variance is the average squared difference of values from the mean.

To calculate variance, we square the difference between each data value and the mean. We divide the sum of these squares by the number of items in the dataset.

Because variance is a squared quantity, there is no intuitive way to compare variance directly to data values or mean.

Standard Deviation

Standard deviation is a measure of how much data values deviate away from the mean. Larger the standard deviation, greater the amount of variation.

Standard deviation is calculated as the square root of variance.

Standard deviation uses the original units of data which makes interpretation easier. Hence standard deviation is the most commonly used measure of variation.

Calculate Range, IQR, Standard Deviation and Variance : Example

Let’s consider a small dataset of heights of 10 people. Here is how we can calculate the range, variance, standard deviation and interquartile range.

Calculating Range, Variance and Standard Deviation

Calculating Interquartile Range

Here is a video tutorial to learn more about calculating interquartile range.

When to use Range, IQR, Standard Deviation and Variance?

Range uses only the extreme values of a dataset and is hence very susceptible to outliers. It is advisable to use range only for very small distributions with no outliers.
IQR is good for skewed distributions. This is because IQR is resistant to outliers in the data. They are generally paired with median to describe the data.
Standard deviation is a good measure of variability for normal distributions or distributions that aren’t terribly skewed. Paired with mean this is a good way to describe the data.
Variance is not used much as it is represented in squared units and is not an intuitive measure.

Which measure of variation is resistant to outliers?

The standard deviation is resistant to outliers.

Which of the following is a measure of central tendency that is resistant to outliers?

For distributions that have outliers or are skewed, the median is often the preferred measure of central tendency because the median is more resistant to outliers than the mean.

Which statistics are resistant to outliers?

When a statistic changes because of a “rogue” data point, your result can be far from the true value you're trying to estimate. This bias can be avoided if you eliminate outliers or use weighting factorsto minimize the damage they cause. The median is a resistant statistic.

Is the variance resistant to outliers?

Neither the standard deviation nor the variance is robust to outliers. A data value that is separate from the body of the data can increase the value of the statistics by an arbitrarily large amount. The mean absolute deviation (MAD) is also sensitive to outliers.