Wednesday, 13 July 2016

Measures of Central Tendency

Definition

A measure of central tendency is a measure that tells us where the middle of a bunch of data lies.
There are three main measures of central tendency: the mode, the median and the mean. Each of these measures describes a different indication of the typical or central value in the distribution.

Introduction

A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. As such, measures of central tendency are sometimes called measures of central location. They are also classed as summary statistics. The mean (often called the average) is most likely the measure of central tendency that you are most familiar with, but there are others, such as the median and the mode.
The mean, median and mode are all valid measures of central tendency, but under different conditions, some measures of central tendency become more appropriate to use than others. In the following sections, we will look at the mean, mode and median, and learn how to calculate them and under what conditions they are most appropriate to be used.


What is the mean?


The Mean is the most common measure of central tendency. It is simply the sum of the numbers divided by the number of numbers in a set of data. This is also known as average.

Looking at the retirement age distribution again: 

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

The mean is calculated by adding together all the values (54+54+54+55+56+57+57+58+58+60+60 = 623) and dividing by the number of observations (11) which equals 56.6 years.


Advantage of the mean:

The mean can be used for both continuous and discrete numeric data.


Limitations of the mean:

The mean cannot be calculated for categorical data, as the values cannot be summed.

As the mean includes every value in the distribution the mean is influenced by outliers and skewed distributions.


What else do I need to know about the mean?

The population mean is indicated by the Greek symbol ยต (pronounced ‘mu’). When the mean is calculated on a distribution from a sample it is indicated by the symbol  (pronounced X-bar). 


What is the mode?


The mode is the most commonly occurring value in a distribution.

Consider this data-set showing the retirement age of 11 people, in whole years:

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

This table shows a simple frequency distribution of the retirement age data.

Age
Frequency
54
3
55
1
56
1
57
2
58
2
60
2

The most commonly occurring value is 54, therefore the mode of this distribution is 54 years. 


Advantage of the mode:

The mode has an advantage over the median and the mean as it can be found for both numerical and categorical (non-numerical) data. 


Limitations of the mode:

The are some limitations to using the mode. In some distributions, the mode may not reflect the center of the distribution very well. When the distribution of retirement age is ordered from lowest to highest value, it is easy to see that the center of the distribution is 57 years, but the mode is lower, at 54 years. 

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

It is also possible for there to be more than one mode for the same distribution of data, (bi-modal, or multi-modal). The presence of more than one mode can limit the ability of the mode in describing the center or typical value of the distribution because a single value to describe the center cannot be identified.

In some cases, particularly where the data are continuous, the distribution may have no mode at all (i.e. if all values are different).

In cases such as these, it may be better to consider using the median or mean, or group the data in to appropriate intervals, and find the modal class. 

In other, The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the most popular option. An example of a mode is presented below:



Normally, the mode is used for categorical data where we wish to know which is the most common category, as illustrated below:



We can see above that the most common form of transport, in this particular data set, is the bus. However, one of the problems with the mode is that it is not unique, so it leaves us with problems when we have two or more values that share the highest frequency, such as below:



We are now stuck as to which mode best describes the central tendency of the data. This is particularly problematic when we have continuous data because we are more likely not to have any one value that is more frequent than the other. For example, consider measuring 30 peoples' weight (to the nearest 0.1 kg). How likely is it that we will find two or more people with exactly the same weight (e.g., 67.4 kg)? The answer, is probably very unlikely - many people might be close, but with such a small sample (30 people) and a large range of possible weights, you are unlikely to find two people with exactly the same weight; that is, to the nearest 0.1 kg. This is why the mode is very rarely used with continuous data.
Another problem with the mode is that it will not provide us with a very good measure of central tendency when the most common mark is far away from the rest of the data in the data set, as depicted in the diagram below:



In the above diagram the mode has a value of 2. We can clearly see, however, that the mode is not representative of the data, which is mostly concentrated around the 20 to 30 value range. To use the mode to describe the central tendency of this data set would be misleading.


What is the median?


The median is the middle value in distribution when the values are arranged in ascending or descending order. If the number of numbers in a data set is even, then the median is the mean of the two middle numbers.

The median divides the distribution in half (there are 50% of observations on either side of the median value). In a distribution with an odd number of observations, the median value is the middle value. 

Looking at the retirement age distribution (which has 11 observations), the median is the middle value, which is 57 years: 

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60 

When the distribution has an even number of observations, the median value is the mean of the two middle values. In the following distribution, the two middle values are 56 and 57, therefore the median equals 56.5 years: 

52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60


Advantage of the median:

The median is less affected by outliers and skewed data than the mean, and is usually the preferred measure of central tendency when the distribution is not symmetrical. 


Limitation of the median:

The median cannot be identified for categorical nominal data, as it cannot be logically ordered.

EXAMPLES OF MEASURES OF CENTRAL TENDENCY
For the data 1, 2, 3, 4, 5, 5, 6, 7, 8 the measures of central tendency are

Mean=

Median = 5
Mode = 5

SOLVED EXAMPLE ON MEASURES OF CENTRAL TENDENCY

Ques: Find the measures of central tendency for the data set 3, 7, 9, 4, 5, 4, 6, 7, and 9.

Choices:
·         A. Mean = 6, median = 6 and modes are 4, 7 and 9
·         B. Mean = 6, median = 6 and mode is 4
·         C. Mean = 6, median = 6 and modes are 4 and 9
·         D. Mean = 6, median = 9 and modes are 4, 7 and 9
Correct Answer: A

SOLUTION:

·         Step 1: Mean, median and mode of a data set are the measures of central tendency.
·         

       Step 2: Mean of the data set =  [Formula.]
·         

        Step 3:  [Substitute the values.]
·         

        Step 4: [Add the data values in the numerator and divide.]
·        

        Step 5: The data set in the ascending order is 3, 4, 4, 5, 6, 7, 7, 9, and 9. So, Median of the set is 6. [Median is the middle data value of the ordered set.]
·         
        Step 6: Mode is/are the data value(s) that appear most often in the data set. So, the modes of the data set are 4, 7 and 9.

·         
       Step 7: So, the measures of central tendency of the given set of data are mean = 6, median = 6 and modes are 4, 7, and 9.
     
      Reference:

No comments:

Post a Comment