Definition
A measure of central tendency is a measure that
tells us where the middle of a bunch of data lies.
There are three main measures
of central tendency: the mode, the median and the mean. Each of these measures
describes a different indication of the typical or central value in the
distribution.
Introduction
A measure of central tendency is a single value
that attempts to describe a set of data by identifying the central position
within that set of data. As such, measures of central tendency are sometimes
called measures of central location. They are also classed as summary
statistics. The mean (often called the average) is most likely the measure of
central tendency that you are most familiar with, but there are others, such as
the median and the mode.
The mean, median and mode are all valid
measures of central tendency, but under different conditions, some measures of
central tendency become more appropriate to use than others. In the following
sections, we will look at the mean, mode and median, and learn how to calculate
them and under what conditions they are most appropriate to be used.
What
is the mean?
The Mean is the most common measure of central tendency. It is simply the sum of the numbers divided by the number of numbers in a set of data. This is also known as average.
Looking at the retirement age distribution again:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
The mean is calculated by adding together all the values (54+54+54+55+56+57+57+58+58+60+60 = 623) and dividing by the number of observations (11) which equals 56.6 years.
Advantage of the mean:
The mean can be used for both continuous and discrete numeric data.
Limitations of the mean:
The mean cannot be calculated for categorical data, as the values cannot be summed.
As the mean includes every value in the distribution the mean is influenced by outliers and skewed distributions.
What else do I need to know about the mean?
The population mean is indicated by the Greek symbol ยต (pronounced ‘mu’). When the mean is calculated on a distribution from a sample it is indicated by the symbol x̅ (pronounced X-bar).
What
is the mode?
The mode is the most commonly occurring value in a distribution.
Consider this data-set showing the retirement age of 11 people, in whole years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
This table shows a simple frequency distribution of the retirement age data.
|
Age
|
Frequency
|
|
54
|
3
|
|
55
|
1
|
|
56
|
1
|
|
57
|
2
|
|
58
|
2
|
|
60
|
2
|
The most commonly occurring value is 54, therefore the mode of this distribution is 54 years.
Advantage of the mode:
The mode has an advantage over the median and the mean as it can be found for both numerical and categorical (non-numerical) data.
Limitations of the mode:
The are some limitations to using the mode. In some distributions, the mode may not reflect the center of the distribution very well. When the distribution of retirement age is ordered from lowest to highest value, it is easy to see that the center of the distribution is 57 years, but the mode is lower, at 54 years.
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
It is also possible for there to be more than one mode for the same distribution of data, (bi-modal, or multi-modal). The presence of more than one mode can limit the ability of the mode in describing the center or typical value of the distribution because a single value to describe the center cannot be identified.
In some cases, particularly where the data are continuous, the distribution may have no mode at all (i.e. if all values are different).
In cases such as these, it may be better to consider using the median or mean, or group the data in to appropriate intervals, and find the modal class.
In other, The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the most popular option. An example of a mode is presented below:
Normally, the mode is used for
categorical data where we wish to know which is the most common category, as
illustrated below:
We can see above that the most
common form of transport, in this particular data set, is the bus. However, one
of the problems with the mode is that it is not unique, so it leaves us with
problems when we have two or more values that share the highest frequency, such
as below:
We are now stuck as to which mode
best describes the central tendency of the data. This is particularly
problematic when we have continuous data because we are more likely not to have
any one value that is more frequent than the other. For example, consider
measuring 30 peoples' weight (to the nearest 0.1 kg). How likely is it that we
will find two or more people with exactly the same weight (e.g., 67.4 kg)? The
answer, is probably very unlikely - many people might be close, but with such a
small sample (30 people) and a large range of possible weights, you are
unlikely to find two people with exactly the same weight; that is, to the
nearest 0.1 kg. This is why the mode is very rarely used with continuous data.
Another problem with the mode is
that it will not provide us with a very good measure of central tendency when
the most common mark is far away from the rest of the data in the data set, as
depicted in the diagram below:
In the above diagram the mode has a
value of 2. We can clearly see, however, that the mode is not representative of
the data, which is mostly concentrated around the 20 to 30 value range. To use
the mode to describe the central tendency of this data set would be misleading.
What is the median?
The median is the middle value in distribution when the values are arranged in ascending or descending order. If the number of numbers in a data set is even, then the median is the mean of the two middle numbers.
The median divides the distribution in half (there are 50% of observations on either side of the median value). In a distribution with an odd number of observations, the median value is the middle value.
Looking at the retirement age distribution (which has 11 observations), the median is the middle value, which is 57 years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
When the distribution has an even number of observations, the median value is the mean of the two middle values. In the following distribution, the two middle values are 56 and 57, therefore the median equals 56.5 years:
52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
Advantage of the median:
The median is less affected by outliers and skewed data than the mean, and is usually the preferred measure of central tendency when the distribution is not symmetrical.
Limitation of the median:
The median cannot be identified for categorical nominal data, as it cannot be logically ordered.
EXAMPLES OF MEASURES OF CENTRAL TENDENCY
For the data 1, 2, 3, 4, 5, 5, 6, 7, 8 the
measures of central tendency are
Median = 5
Mode = 5
SOLVED EXAMPLE ON MEASURES OF
CENTRAL TENDENCY
Ques: Find the measures of central
tendency for the data set 3, 7, 9, 4, 5, 4, 6, 7, and 9.
Choices:
·
A. Mean = 6,
median = 6 and modes are 4, 7 and 9
·
B. Mean = 6,
median = 6 and mode is 4
·
C. Mean = 6,
median = 6 and modes are 4 and 9
·
D. Mean = 6,
median = 9 and modes are 4, 7 and 9
Correct Answer: A
SOLUTION:
·
Step 1: Mean,
median and mode of a data set are the measures of central tendency.
·
·
·
·
Step 5: The
data set in the ascending order is 3, 4, 4, 5, 6, 7, 7, 9, and 9. So, Median of
the set is 6. [Median is the middle data value of the ordered set.]
·
Step 6: Mode
is/are the data value(s) that appear most often in the data set. So, the modes
of the data set are 4, 7 and 9.
·
Step 7: So, the
measures of central tendency of the given set of data are mean = 6, median = 6
and modes are 4, 7, and 9.
Reference:








No comments:
Post a Comment