You choose which measure of central tendency—(mean, median and mode)— to use based on the spread of your observations. If you have some observations that are far away from the the others, the mean might show a skewed picture, because the faraway observations might over-contribute to the value of the mean compared to their true importance. In that case, it could be smart to look at the mode or the median as an alternative. If you have an even distribution without extreme values, the mean tends to give you a good overview of the situation.
Example 1
22 people were asked how much they like pineapple on their pizza. They answered on a scale from 1 to 5—with 1 meaning they hated it, and 5 meaning they loved it. The distribution was as follows:
$$1,1,2,1,2,5,4,1,4,4,1,5,2,4,5,2,3,2,1,5,5,4$$ |
Because there’s so many observations, the first thing you should do is to systematize the data in a frequency table. You can put the ratings in the column to the left and the frequencies in the column to the right.
Rating | Frequency |
1 | 6 |
2 | 5 |
3 | 1 |
4 | 5 |
5 | 5 |
Let’s look at the mean first.
$$\begin{array}{llll}\hfill \stackrel{}{x}& =\frac{1\cdot 6+2\cdot 5+3\cdot 1+4\cdot 5+5\cdot 5}{22}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =\frac{64}{22}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & \approx 2.91\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$The mean rating of answers is $2.91$. But most people answered towards the extremes—only one out of 22 people answered 3, the closest number to 2.91! As a consequence, the mean wouldn’t be a very good measure of central tendency to describe how well people like pineapple on pizza. It could give the false impression that people are mostly neutral about pineapple on pizza, which as we can see from the data, definitely isn’t the case.
Next, you can find the median to see whether that will describe the distribution in a better way. Expand the table from above to get a column of cumulative frequencies.
Rating | Frequency | Cumulative Frequency |
1 | 6 | |
2 | 5 | |
3 | 1 | |
4 | 5 | |
5 | 5 | |
Rating | Frequency | Cumulative Frequency |
1 | 6 | 6 |
2 | 5 | 11 |
3 | 1 | 12 |
4 | 5 | 17 |
5 | 5 | 22 |
The total number of observations is an even number, making the median the average of observation $\frac{n}{2}$ and observation $\frac{n}{2}+1$. We have $n=22$, so we want to look at observation 11 and 12, which are the responses 2 and 3 respectively. The average of 2 and 3 is
$$\stackrel{}{x}=\frac{2+3}{2}=2.5$$ |
That makes the median of this distribution $2.5$. Half of the people who were asked responded with 2 or lower, which makes the median a somewhat better measure of central tendency to describe the distribution.
Finally, you can find the mode, the observation that occurs most frequently, and see what that tells you. From the table of frequencies you can see that the response 1 occurs the most often, making the mode 1. This doesn’t give a good picture of the distribution, because 10 of the people responded with a 4 or 5.
The three measures of central tendency give us different numbers. This shows us how important it is to look at all three to get a complete picture of the situation.