Academic Courses
FACULTY OF clinical medicin
Click here to access Unit one Content..
Topic One: objectives
Topic Objectives
By the end of the topic, the learner should be able to understand , define and explain the various terminologies related to health statistics
Click here to access Unit one Content..
Topic 1: definitions
Statistics is the mathematical science involving the collection, analysis and interpretation of data.
- its the methodology for collecting, analyzing, interpreting and drawing conclusions from information.
Biostatistics is a branch of biology that studies biological phenomena and observations by means of statistical analysis, and includes medical statistics
· Demography is the statistical study of all populations. It can be a very general science that can be applied to any kind of dynamic population, that is, one that changes over time or space.
· Environmental statistics is the application of statistical methods to environmental science. Weather, climate, air and water quality are included, as are studies of plant and animal populations.
· Epidemiology is the study of factors affecting the health and illness of populations, and serves as the foundation and logic of interventions made in the interest of public health and preventive medicine.
- Population: All subjects possessing a common characteristic that is being studied.
Sample A subgroup or subset of the population.
Parameter Characteristic or measure obtained from a population.
Statistic (not to be confused with Statistics): Characteristic or measure obtained from a sample.
Descriptive Statistics: Collection, organization, summarization, and presentation of data.
Inferential Statistics: Generalizing from samples to populations using probabilities. Performing hypothesis testing, determining relationships between variables, and making predictions.
Click here to access Unit one Content..
Topic one : definitions (cntd)
· Population ecology is a sub-field of ecology that deals with the dynamics of species populations and how these populations interact with the environment.
· Psychometric is the theory and technique of educational and psychological measurement of knowledge, abilities, attitudes, and personality traits.
· Quality control reviews the factors involved in manufacturing and production; it can make use of statistical sampling of product items to aid decisions in process control or in accepting deliveries.
· Quantitative psychology is the science of statistically explaining and changing mental processes and behaviors in humans.
- Biostatistics more commonly connotes all applications of statistics to biology.
- Pharmaceutical statistics is the application of statistics to matters concerning the pharmaceutical industry. This can be from issues of design of experiments, to analysis of drug trials, to issues of commercialization of a medicine.
- Vital statistics- vital events of life; birth, deaths, occurrence of a particular disease.
Click here to access Unit one Content..
Topic one : definitions (cntd)
· Population ecology is a sub-field of ecology that deals with the dynamics of species populations and how these populations interact with the environment.
· Psychometric is the theory and technique of educational and psychological measurement of knowledge, abilities, attitudes, and personality traits.
· Quality control reviews the factors involved in manufacturing and production; it can make use of statistical sampling of product items to aid decisions in process control or in accepting deliveries.
· Quantitative psychology is the science of statistically explaining and changing mental processes and behaviors in humans.
- Biostatistics more commonly connotes all applications of statistics to biology.
- Pharmaceutical statistics is the application of statistics to matters concerning the pharmaceutical industry. This can be from issues of design of experiments, to analysis of drug trials, to issues of commercialization of a medicine.
- Vital statistics- vital events of life; birth, deaths, occurrence of a particular disease.
Click here to access Unit Two Content..
Topic 1: INTRODUCTION TO SCALES/LEVELS OF MEASUREMENT
OVERVIEWThere are four levels of
measurement: Nominal, Ordinal, Interval, and Ratio. These go from lowest level
to highest level.
Besides being classified as
either qualitative or quantitative, variables can be described according to the
scale on which they are defined. The scale of the variable gives certain
structure to the variable and also defines the meaning of the variable
Data is classified according to the highest level which it fits. Each
additional level adds something the previous level didn't have.
- Nominal is the lowest level. Only names are
meaningful here.
- Ordinal adds an order to the names.
- Interval adds meaningful differences
Ratio adds a zero so that ratios are meaningful
Scales for Qualitative Variables
Based on what scale a
qualitative variable is defined, the variable can be called as a nominal
variable or an ordinal variable.
Scales for Quantitative Variables
Quantitative variables,
whether discrete or continuous, are defined either on an interval scale or on a
ratio scale.
OVERVIEWThere are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. These go from lowest level to highest level.
Besides being classified as
either qualitative or quantitative, variables can be described according to the
scale on which they are defined. The scale of the variable gives certain
structure to the variable and also defines the meaning of the variable
Data is classified according to the highest level which it fits. Each
additional level adds something the previous level didn't have.
- Nominal is the lowest level. Only names are meaningful here.
- Ordinal adds an order to the names.
- Interval adds meaningful differences
Scales for Qualitative Variables
Based on what scale a qualitative variable is defined, the variable can be called as a nominal variable or an ordinal variable.
Scales for Quantitative Variables
Quantitative variables, whether discrete or continuous, are defined either on an interval scale or on a ratio scale.
Click here to access Unit Two Content..
Topic 1: Definition of terms

1.
Variable
a.
Characteristic or attribute that can assume different values
2.
Random Variable
a.
variable whose values are
determined by chance
3.
Qualitative Variables
a.
Variables which assume non-numerical values.
4.
Quantitative Variables
a.
Variables which assume numerical values. eg number of students in a class. this can only assume whole numbers 50 students.
5.
Discrete Variables
a.
Variables which assume a finite or countable number of possible
values. Usually obtained by counting.
6.
Continuous Variables
Variables which assume an
infinite number of possible values. Usually obtained by measurement

1. Variable
a. Characteristic or attribute that can assume different values
2. Random Variable
a. variable whose values are determined by chance
3. Qualitative Variables
a. Variables which assume non-numerical values.
4. Quantitative Variables
a. Variables which assume numerical values. eg number of students in a class. this can only assume whole numbers 50 students.
5. Discrete Variables
a. Variables which assume a finite or countable number of possible values. Usually obtained by counting.
6. Continuous Variables
Variables which assume an infinite number of possible values. Usually obtained by measurementClick here to access Unit Two Content..
Topic 1: Definition of terms (cntd)

7. Nominal Level
a.
Level of measurement which classifies data into mutually
exclusive, all inclusive categories in which no order or ranking can be imposed
on the data.
8. Ordinal Level
a.
Level of measurement which classifies data into categories that
can be ranked. Differences between the ranks do not exist.
9. Interval Level
a.
Level of measurement which classifies data that can be ranked and
differences are meaningful. However, there is no meaningful zero, so ratios are
meaningless.
10. Ratio Level
a. Level of measurement
which classifies data that can be ranked, differences are meaningful, and there
is a true zero. True ratios exist between the different units of measure
11. Data
are the quantities (numbers) or qualities
(attributes) measured or observed that are to be collected and/or analyzed

7. Nominal Level
a. Level of measurement which classifies data into mutually exclusive, all inclusive categories in which no order or ranking can be imposed on the data.
8. Ordinal Level
a. Level of measurement which classifies data into categories that can be ranked. Differences between the ranks do not exist.
9. Interval Level
a. Level of measurement which classifies data that can be ranked and differences are meaningful. However, there is no meaningful zero, so ratios are meaningless.
10. Ratio Level
a. Level of measurement which classifies data that can be ranked, differences are meaningful, and there is a true zero. True ratios exist between the different units of measure
11. Data
are the quantities (numbers) or qualities (attributes) measured or observed that are to be collected and/or analyzed
Click here to access Unit Two Content..
Topic 1: nominal scale
Nominal data/scale:-
Data that represent categories or names. There is no
implied order to the categories of nominal data. In these types of data,
individuals are simply placed in the proper category or group, and the
number in each category is counted. Each item must fit into exactly one category.
The simplest data consist of unordered, dichotomous, or "either - or"
types of observations, i.e., either the patient lives or the patient dies,
either he has some particular attribute or he does not.
Some other examples of nominal data:
Eye color - brown, black, etc.
Religion - Christianity, Islam, Hinduism, etc
Sex - male, female
Click here to access Unit Two Content..
Topic 1: Ordinal Data/scale
Ordinal Data/scale:- have order among the response classifications
(categories). The spaces or intervals between the categories are not
necessarily equal.
Example:
1. strongly agree
2. agree
3. no opinion
4. disagree
5. strongly disagree
In the above situation, we only know that the data are ordered.
Click here to access Unit Two Content..
Topic 1: Interval Data/scale
Interval Data/scale:- In interval data the intervals between values are the
same. For example, in the Fahrenheit temperature scale, the difference
between 70 degrees and 71 degrees is the same as the difference
between 32 and 33 degrees. But the scale is not a RATIO Scale. 40
degrees Fahrenheit is not twice as much as 20 degrees Fahrenheit.

Click here to access Unit Two Content..
Topic 1: Ratio Data/scale
Ratio Data/scale:- The data values in ratio data do have meaningful ratios,
for example, age is a ratio data, some one who is 40 is twice as old as
someone who is 20.
NOTE : Both interval and ratio data involve measurement. Most data analysis
techniques that apply to ratio data also apply to interval data. Therefore,
in most practical aspects, these types of data (interval and ratio) are
grouped under metric data. In some other instances, these type of data
are also known as numerical discrete and numerical continuous.
Click here to access Unit Two Content..
Topic 1: Numerical discrete vs Numerical continuous
Numerical discrete vs Numerical continuous
Numerical discrete
Numerical discrete data occur when the observations are integers that
correspond with a count of some sort. Some common examples are:
the number of bacteria colonies on a plate, the number of cells within a
prescribed area upon microscopic examination, the number of heart
beats within a specified time interval, a mother’s history of number of
births ( parity) and pregnancies (gravidity), the number of episodes of
illness a patient experiences during some time period, etc.
Numerical continuous
The scale with the greatest degree of quantification is a numerical
continuous scale. Each observation theoretically falls somewhere along
a continuum. One is not restricted, in principle, to particular values such
as the integers of the discrete scale. The restricting factor is the degree
of accuracy of the measuring instrument most clinical measurements,
such as blood pressure, serum cholesterol level, height, weight, age
etc. are on a numerical continuous scaleClick here to access Unit Two Content..
Topic One: Summary
In this topic we have defined the various types of scales of measurement , when to use them and specific variables they measure. we have seen the difference between numerical discrete and numerical continuous data.
Click here to access Unit Two Content..
Topic One: Further Reading
Reference Material
1.Agrestic, A. and Finlay, B. (2008). Social Methods for the Social Sciences, 4th edition. Edinburgh: Pearson Education Limited
2) Clarke, G. M. and Cooke, D. (2004). A Basic Program in Statistics. 5th edition. West Sussex: John Wiley & Sons
3) Freund, J. E. (2001). Modern Elementary Statistics. Upper Saddle River, NJ: Prentice-Hall
4) Johnson, R. A. and Bhattacharyga, G. K. (1992). Statistics: Principles and Methods, 2nd edition. New York: John Wiley & Sons
5) Moore, D. S., Craig, B. and McCabe, G. P. (2007). Introduction to the Practice of Statistics, 6th edition. New York: W. H. Freeman
Further Reading Resources
Click here to access Unit Three Content..
Topic 1: Measures of central tendency (Definition of terms)
INTRODUCTION
The tendency of
statistical data to get concentrated at certain values is called the “Central
Tendency” and the various methods of determining the actual value at
which the data tend to concentrate are called measures of central
Tendency or averages. Hence, an average is a value which tends to
sum up or describe the mass of the data. The clustering at a
particular value is known as the central location or central tendency
of a frequency distribution measures of central location are commonly used
in : arithmetic mean, median, mode, midrange and geometric
mean.
Definition of terms
Mean
o Sum of all the values divided by the number of values. This can either be a population mean (denoted by mu) or a sample mean (denoted by x bar)
Median
o The midpoint of the data after being ranked (sorted in ascending order). There are as many numbers below the median as above the median.
Mode
o The most frequent number
Weighted Mean
o The mean when each value is multiplied by its weight and summed. This sum is divided by the total of the weights.
Midrange
o The mean of the highest and lowest values. (Max + Min) / 2
Click here to access Unit Three Content..
Topic 1: measures of central tendency (mode)
The Mode
The sample mode of a qualitative or a discrete quantitative variable is that value of the variable which occurs with the greatest frequency in a data set. Simply put it “ the mode is the most frequent observation in a data set”. There may be no mode if no one value appears more than any other. There may also be two modes (bimodal), three modes (trimodal), or more than three modes (multi-modal).Click here to access Unit Three Content..
Topic 1: measures of central tendency (mode)
Calculation of mode
Obtain the frequency of each observed value of the variable in a data and note the greatest frequency.1. If the greatest frequency is 1 (i.e. no value occurs more than once), then the variable has no mode.
2. If the greatest frequency is 2 or greater, then any value that occurs with that greatest frequency is called a sample mode of the variable. To obtain the mode(s) of a variable, we first construct a frequency distribution for the data using classes based on single value. The mode(s) can then be determined easily from the frequency distribution.
Click here to access Unit Three Content..
Topic 1: measures of central tendency (mode)
Calculation of mode (cntd)
Example Let us consider the frequency table for
blood types of 10 persons. Blood group No of clients A 4 B 2 AB 1 O 3
We can see from frequency table that the mode of blood types is A.
For grouped frequency distributions, the modal class is the class with the largest frequency.
Click here to access Unit Three Content..
Topic 1: measures of central tendency (mode)
Calculation of mode (cntd)
Example Let us consider the frequency table for
blood types of 10 persons. Blood group No of clients A 4 B 2 AB 1 O 3
We can see from frequency table that the mode of blood types is A.
For grouped frequency distributions, the modal class is the class with the largest frequency.
Click here to access Unit Three Content..
Topic 1: measures of central tendency ( Properties and uses of the mode)
Properties
and uses of the mode • The mode is the
easiest measure of central location to understand and explain. It is also the
easiest to identify, and requires no calculations. • The mode is the
preferred measure of central location for addressing which value is the most
popular or the most common. • As demonstrated, a
distribution can have a single mode. However, a distribution
has more than one mode if two or more values tie as the most frequent values.
It has no mode if no value appears more than once. • The mode is used
almost exclusively as a “descriptive” measure. It is almost
never used in statistical manipulations or analyses. • The mode is not
typically affected by one or two extreme values (outliers).
Click here to access Unit Three Content..
Topic 1: MEASURES OF CENTRAL TENDENCY (The Arithmetic Mean or simple Mean)
a)
The Mean/The
Arithmetic Mean or simple Mean
This is what people
usually intend when they say "average"
The most commonly used measure of center for quantitative variable is the
(arithmetic) sample mean. When people speak of taking an average, it is mean
that they are most often referring to.
Definition (Mean) The sample mean of the variable is the sum of observed values
in a data divided by the number of observations.
Example 7 participants in bike race had the following finishing times in
minutes: 28,22,26,29,21,23,24.
What is the mean?
Mean (Ẍ) =Sum/totals= 172/7= 24.6
Example 8 participants in bike race had the following finishing times in minutes: 28,22,26,29,21,23,24,50.
What is the mean?
Mean (Ẍ) =Sum/totals= 222/8= 27.8
Click here to access Unit Three Content..
Topic 1: MEASURES OF CENTRAL TENDENCY (The population and sample Mean)

population mean,
sample mean,
Click here to access Unit Three Content..
Topic 1: MEASURES OF CENTRAL TENDENCY (Geometric mean)
Geometric
mean It is obtained by
taking the nth root of the product of “n” values, i.e, if the values of the
observation are demoted by x1,x2,…,x n then, GM = n√(x1)(x2)….(xn) . Geometric
mean Definition
of geometric mean The geometric mean is
the mean or average of a set of data measured on a
logarithmic scale. The geometric mean is used when the logarithms of
the observations are distributed normally (symmetrically) rather
than the observations themselves. The geometric mean is
particularly useful in the laboratory for data from serial dilution
assays (1/2, 1/4, 1/8, 1/16, etc.) and in environmental sampling
data.
Click here to access Unit Three Content..
Topic 1: MEASURES OF CENTRAL TENDENCY (Geometric mean)
Method for calculating the geometric mean
There are two methods for calculating the geometric mean.
Method A
Step 1. Take the logarithm of each value.
Step 2. Calculate the mean of the log values by summing the log values, then dividing by the number of observations.
Step 3. Take the antilog of the mean of the log values to get the geometric mean.
Method B
Step 1. Calculate the product of the values by multiplying all of the values together.
Step 2. Take the nth root of the product (where n is the number of observations) to get the geometric mean.
Click here to access Unit Three Content..
Topic 1: MEASURES OF CENTRAL TENDENCY (Properties and uses of the geometric mean)
Properties and uses of the geometric mean
• The geometric mean is the average of logarithmic values, converted back to the base. The geometric mean tends to dampen the effect of extreme values and is always smaller than
the corresponding arithmetic mean. In that sense, the geometric mean is less sensitive than the arithmetic mean to one or a few extreme values.
• The geometric mean is the measure of choice for variables measured on an exponential or logarithmic scale, such as dilutional titers or assays.
• The geometric mean is often used for environmental samples, when levels can range over several orders of magnitude. For example, levels of coliforms in samples taken from a body of water can range from less than 100 to more than 100,000.
Click here to access Unit Three Content..
Topic 1: measures of central tendency (median)
Median
The median is the middle value of a set of data that has been put into rank order. the statistical median is the value that divides the data into two halves, with one half of the observations being
smaller than the median value and the other half being larger. The median is also the 50th percentile of the distribution.
Method for identifying the median
Step 1. Arrange the observations into increasing or decreasing order.
Step 2. Find the middle position of the distribution by using the
following formula:Middle position = (n + 1) / 2
a. If the number of observations (n) is odd, the middle position falls on a single observation.
b. If the number of observations is even, the middle position falls between two observations.
Step 3. Identify the value at the middle position.
a. If the number of observations (n) is odd and the middle position falls on a single observation, the median equals the value of that observation.
b. If the number of observations is even and the middle position falls between two observations, the median equals the average of the two values.
Click here to access Unit Three Content..
Topic 1: measures of central tendency ( Properties and uses of the median)
Properties and uses of the median
• The median is a good descriptive measure, particularly for data that are skewed, because it is the central point of the distribution.
• The median is relatively easy to identify. It is equal to either a single observed value (if odd number of observations) or the average of two observed values (if even number of
observations).
• The median, like the mode, is not generally affected by one or two extreme values (outliers).
• The median has less-than-ideal statistical properties. Therefore, it is not often used in statistical manipulations and analyses.
Click here to access Unit Three Content..
Topic 1: Further Reading
Reference Material
1.Agrestic, A. and Finlay, B. (2008). Social Methods for the Social Sciences, 4th edition. Edinburgh: Pearson Education Limited
2) Clarke, G. M. and Cooke, D. (2004). A Basic Program in Statistics. 5th edition. West Sussex: John Wiley & Sons
3) Freund, J. E. (2001). Modern Elementary Statistics. Upper Saddle River, NJ: Prentice-Hall
4) Johnson, R. A. and Bhattacharyga, G. K. (1992). Statistics: Principles and Methods, 2nd edition. New York: John Wiley & Sons
5) Moore, D. S., Craig, B. and McCabe, G. P. (2007). Introduction to the Practice of Statistics, 6th edition. New York: W. H. Freeman
Further Reading Resources
for further reading on median click here
Click here to access Unit Three Content..
Topic 2: Measures of central tendency for grouped data (objectives)
Objectives
At the end of this section, learners should be able to:
1. Outline the various methods used to describe data in a data set
2. Able to calculate the mean, mode and median of a ungrouped data set
Able to calculate the mean, mode and median of a grouped data set
Click here to access Unit Three Content..
Topic 2: Measures of central tendency for grouped data (definitions cntd)
Class Mark (Midpoint)
o The number in the middle of the class. It is found by adding the upper and lower limits and dividing by two. It can also be found by adding the upper and lower boundaries and dividing by two.
Cumulative Frequency
o The number of values less than the upper class boundary for the current class. This is a running total of the frequencies.
Relative Frequency
o The frequency divided by the total frequency. This gives the percent of values falling in that class.
Cumulative Relative Frequency (Relative Cumulative Frequency)
o The running total of the relative frequencies or the cumulative frequency divided by the total frequency. Gives the percent of the values which are less than the upper class boundary.
NOTE: For grouped data, we cannot find the exact Mean, Median and Mode, we can only give estimates.
Click here to access Unit Three Content..
Topic 2: Mean of grouped data
Mean of grouped data
To estimate the Mean use the midpoints of the class intervals:
Estimated Mean = Sum of (Midpoint × Frequency)/Sum of Freqency
To calculate the mean for grouped data,
First find the midpoint of each class and then multiply the midpoint by the frequencies of the corresponding classes.
The sum of these products gives an approximation for the sum of all values. To find the value of mean, divide this sum by the total number of observations in the data.
Click here to access Unit Three Content..
Topic 2: Mean of grouped data (cntd)
The formulas used to calculate the mean for grouped data are as follows.
Mean for population data
Mean for sample data
Where m is the midpoint and f is the frequency of a class.
Click here to access Unit Three Content..
Topic 2: Median of grouped data
To estimate the Median use:
Estimated Median = L + (n/2) – B/G × w
where:
- L is the lower class boundary of the group containing the median
- n is the total number of data
- B is the cumulative frequency of the groups before the median group
- G is the frequency of the median group
- w is the group width
Alternative
Find out what proportion of the distance into the median class the median by dividing the sample size by 2, subtracting the cumulative frequency of the previous class, and then dividing all that bay the frequency of the median class.
Multiply this proportion by the class width and add it to the lower boundary of the median class
Click here to access Unit Three Content..
Topic 2: Mode of grouped data
To estimate the Mode use:
Estimated Mode = L + fm − fm-1/(fm − fm-1) + (fm − fm+1) × w
where:
- L is the lower class boundary of the modal group
- fm-1 is the frequency of the group before the modal group
- fm is the frequency of the modal group
- fm+1 is the frequency of the group after the modal group
- w is the group width
Click here to access Unit Three Content..
Topic 2: Mean, median and mode of grouped data video
Click here to access Unit Three Content..
Topic 2: Summary
in summary
The Mean is used in computing other statistics (such as the variance) and does not exist for open ended grouped frequency distributions (1). It is often not appropriate for skewed distributions such as salary information.
The Median is the center number and is good for skewed distributions because it is resistant to change.
The Mode is used to describe the most typical case. The mode can be used with nominal data whereas the others can't. The mode may or may not exist and there may be more than one value for the mode multimodal.
The Midrange is not used very often. It is a very rough estimate of the average and is greatly affected by extreme values (even more so than the mean).
Properties of the various measures
|
Property |
Mean |
Median |
Mode |
Midrange |
|
Always Exists |
No (1) |
Yes |
No (2) |
Yes |
|
Uses all data values |
Yes |
No |
No |
No |
|
Affected by extreme values |
Yes |
No |
No |
Yes |
Click here to access Unit Three Content..
Topic 2: Further Reading
Reference Material
1. Clarke, G. M. and Cooke, D. (2004). A Basic Program in Statistics. 5th edition. West Sussex: John Wiley & Sons
3) Freund, J. E. (2001). Modern Elementary Statistics. Upper Saddle River, NJ: Prentice-Hall
4) Johnson, R. A. and Bhattacharyga, G. K. (1992). Statistics: Principles and Methods, 2nd edition. New York: John Wiley & Sons
5) Moore, D. S., Craig, B. and McCabe, G. P. (2007). Introduction to the Practice of Statistics, 6th edition. New York: W. H. Freeman
Further Reading Material
for worked examples and questions on measures of central tendency , click here
Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (objectives)
Objectives
At the end of this section, learners should be able to: 
1. Outline the various methods used to describe variability of data in a data set
2. Able to calculate the range, variance and standard deviation of a ungrouped data set
3. Able to calculate the range, variance and standard deviation of a grouped data set
Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (Definition of terms)
Definition of terms
Range
o The difference between the highest and lowest values. Max - Min
Population Variance
o The average of the squares of the distances from the population mean. It is the sum of the squares of the deviations from the mean divided by the population size. The units on the variance are the units of the population squared.
Sample Variance
o Unbiased estimator of a population variance. Instead of dividing by the population size, the sum of the squares of the deviations from the sample mean is divided by one less than the sample size. The units on the variance are the units of the population squared.
Standard Deviation
o The square root of the variance. The population standard deviation is the square root of the population variance and the sample standard deviation is the square root of the sample variance. The sample standard deviation is not the unbiased estimator for the population standard deviation. The units on the standard deviation is the same as the units of the population/sample.
Coefficient of Variation
o Standard deviation divided by the mean, expressed as a percentage. We won't work with the Coefficient of Variation in this course.
Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (Introduction)
Introduction
Spread, or dispersion
or variation, is the second important feature of frequency distributions. Just
as measures of central location describe where the peak is located, measures of
spread describe the dispersion (or variation) of values from that peak in the
distribution. The object of measuring
this scatter or dispersion is to obtain a single summary figure which
adequately exhibits whether the distribution is compact or spread out. Measures of spread
include the range, interquartile range, and standard deviation. In addition to locating the center of the observed
values of the variable in the data, another important aspect of a descriptive
study of the variable is numerically measuring the extent of variation around
the center. Two data sets of the same variable may exhibit similar positions of
center but may be remarkably different with respect to variability.
Just as there are several different measures of center, there are also several
different measures of variation. In this section, we will examine three of the
most frequently used measures of variation; the sample range, the sample
interquartile range and the sample standard deviation. Measures of variation
are used mostly only for quantitative variables.
Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (range)
Range
The sample range is obtained by computing the difference between the
largest observed value of the variable in a data set and the smallest one.
Definition 5.1 (Range). The sample range of the variable is the difference
between its maximum and minimum values in a data set:
Range = Max − Min.
The sample range of the variable is quite easy to compute. However, in using
the range, a great deal of information is ignored, that is, only the largest
and smallest values of the variable are considered; the other observed values
are disregarded. It should also be remarked that the range cannot ever
decrease, but can increase, when additional observations are included in the
data set and that in sense the range is overly sensitive to the sample size.
Example 7 participants in bike race had the following finishing times in
minutes: 28,22,26,29,21,23,24.
What is the range?
Solution: Range = 29-21 = 8
Example 5.2. 8 participants in bike race had the following finishing times in
minutes: 28,22,26,29,21,23,24,50.
Since the range only uses the largest and smallest values, it is greatly affected by extreme values, that is - it is not resistant to change.
Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (Interquartile range)
Interquartile range
The interquartile range is a measure of spread used most commonly with the median. It represents the central portion of the distribution, from the 25th percentile to the 75th percentile. In other words, the interquartile range includes the second and third quartiles of a distribution. The interquartile range thus includes approximately one half of the observations in the set, leaving one quarter of the observations on each side.
Method for determining the interquartile range
Step 1. Arrange the observations in increasing order.
Step 2. Find the position of the 1st and 3rd quartiles with the following formulas. Divide the sum by the number of observations.
Position of 1st quartile (Q1) = 25th percentile = (n + 1) / 4
Position of 3rd quartile (Q3) = 75th percentile = 3(n + 1) / 4 = 3 x Q1
Step 3. Identify the value of the 1st and 3rd quartiles. a. If a quartile lies on an observation (i.e., if its position is a whole number) , the value of the quartile is the value of that observation.
If a quartile lies between observations, the value of the quartile is the value of the lower observation plus the specified fraction of the difference between
the observations.
Step 4. Epidemiologically, report the values at Q1 and Q3.
Statistically, calculate the interquartile range as Q3 minus Q1.
Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (Interquartile range Properties)
Properties and uses of the interquartile range
• The interquartile range is generally used in conjunction with the median. Together, they are useful for characterizing the central location and spread of any frequency distribution, but
particularly those that are skewed.
• For a more complete characterization of a frequency distribution, the 1st and 3rd quartiles are sometimes used with the minimum value, the median, and the maximum value to produce a five-number summary of the distribution
Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (Average Deviation")
Average Deviation"
The range only involves the smallest and largest numbers, and it would be desirable to have a statistic which involved all of the data values.
The first attempt one might make at this is something they might call the average deviation from the mean and define it as:
The problem is that this summation is always zero. So, the average deviation will always be zero. That is why the average deviation is never used.
Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (Variance)
Variance
Unbiased Estimate of the Population Variance One would expect the
sample variance to simply be the population variance with the population mean
replaced by the sample mean. However, one of the major uses of statistics is to
estimate the corresponding parameter. This formula has the problem that the
estimated value isn't the same as the parameter. To counteract this, the sum of
the squares of the deviations is divided by one less than the sample size.
Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (Variance)

Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (Variance video)
Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (standard deviation)
Definition of standard deviation
The standard deviation is the measure of spread used most commonly with the arithmetic mean. Earlier, the centering property of the mean was described — subtracting the mean from
each observation and then summing the differences adds to 0. This concept of subtracting the mean from each observation is the basis or the standard deviation. However, the difference between the mean and each observation is squared to eliminate negative numbers. Then the average is calculated and the square root is taken to get back to the original units.
Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (standard deviation cntd)
Method for calculating the standard deviation
Step 1. Calculate the arithmetic mean.
Step 2. Subtract the mean from each observation. Square the
difference.
Step 3. Sum the squared differences.
Step 4. Divide the sum of the squared differences by n – 1.
Step 5. Take the square root of the value obtained in Step 4.
The result is the standard deviation.
Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (standard deviation cntd)
Properties and uses of the standard deviation
• The numeric value of the standard deviation does not have an easy, non-statistical interpretation, but similar to other measures of spread, the standard deviation conveys how widely or tightly the observations are distributed from the center.
• Standard deviation is usually calculated only when the data are more-or-less “normally distributed,” i.e., the data fall into a typical bell-shaped curve. For normally distributed data, the arithmetic mean is the recommended measure of central location, and the standard deviation is the recommended measure of spread. In fact, means should never be reported without their associated standard deviation.
Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (standard deviation cntd)
There is a problem with variances. Recall that the deviations were squared. That means that the units were also squared. To get the units back the same as the original data values, the square root must be taken.
Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (standard deviation cntd)
The sample standard deviation is the most frequently used measure of variability, although it is not as easily understood as ranges. It can be considered as a kind of average of the absolute deviations of observed values from the mean of the variable in question.
The sample standard deviation is not the unbiased estimator for the population standard deviation.
The more variation there is in the observed values, the larger is the standard deviation for the variable in question. Thus the standard deviation satisfies the basic criterion for a measure of variation and like said, it is the most commonly used measure of variation. However, the standard deviation does have its drawbacks. For instance, its values can be strongly affected by a few extreme observations.
Click here to access Unit Three Content..
Topic 3: Measures of dispersion /spread/ variation (standard deviation video)
Click here to access Unit Three Content..
Topic 3: variance and standard deviation for grouped data
|
Variance and standard deviation for grouped data |
|
Following are the basic formulas used to calculate the population and sample variances for grouped data. |
where
In either cases, the standard deviation is obtained by taking the positive square root of the variance.
Click here to access Unit Three Content..
Topic 3: variance and standard deviation for grouped data (worked example)
Solved Example Problem
This below solved example problem for frequency distribution standard deviation may help the users to understand how the values are being used to work out such calculation based on the above mathematical formulas.
Q1
The following gives the frequency distribution of the daily commuting time (in
minutes) from home to work for all 25 employees of a company.
|
Daily commuting time |
Number of employees |
|
0 to less than 10 |
4 |
|
10 to less than 20 |
9 |
|
20 to less than 30 |
6 |
|
30 to less than 40 |
4 |
|
40 to less than 50 |
2 |
Click here to access Unit Three Content..
Topic 3: variance and standard deviation for grouped data (worked example cntd)
|
Calculate the mean, variance and standard deviation of the daily commuting times. |
| Daily commuting time | ||||
|---|---|---|---|---|
0 to less than 10 | 4 | 5 | 20 | 100 |
| 10 to less than 20 | 9 | 15 | 135 | 2025 |
| 20 to less than 30 | 6 | 25 | 150 | 3750 |
| 30 to less than 40 | 4 | 35 | 140 | 4900 |
| 40 to less than 50 | 2 | 45 | 90 | 4050 |
| N = 25 |
Click here to access Unit Three Content..
Topic 3: variance and standard deviation for grouped data (worked example cntd)
How to calculate grouped data standard deviation?
Step by step calculation:
Follow these below steps using the above formulas to understand how to calculate standard deviation for the frequency
table data set
step 1: find the mid-point for each group or range of the frequency
table.
step 2: calculate the number of samples of a data set by summing up
the frequencies.
step 3: find the mean for the grouped data by dividing the addition
of multiplication of each group mid-point and frequency of the data set by the
number of samples.
step 4: calculate the variance for the frequency table data by
using the above formula.
step 5:estimate standard deviation for the frequency table by taking
square root of the variance.
Click here to access Unit Three Content..
Topic 3: variance and standard deviation for grouped data (worked example cntd)
The values are as below
mean
variance
standard deviation
Click here to access Unit Three Content..
Topic 3: Summary
in summary
the various measures of dispersion include variance and standard deviation.
The standard deviation is square root of variance
Measures of dispersion tell us about variability in the data. Dispersion give us information about how much our variables vary from the mean, because if they don’t it makes it difficult to infer anything from the data. Dispersion is also known as the spread or range of variability.
Basic question: how much do values differ for a variable from the min to max, and distance among scores in between?
The common measures of dispersion are:
– Range
– Standard Deviation
– Variance
Click here to access Unit Three Content..
Topic 3: Further Reading
Reference Material
1. Clarke, G. M. and Cooke, D. (2004). A Basic Program in Statistics. 5th edition. West Sussex: John Wiley & Sons
3) Freund, J. E. (2001). Modern Elementary Statistics. Upper Saddle River, NJ: Prentice-Hall
4) Johnson, R. A. and Bhattacharyga, G. K. (1992). Statistics: Principles and Methods, 2nd edition. New York: John Wiley & Sons
5) Moore, D. S., Craig, B. and McCabe, G. P. (2007). Introduction to the Practice of Statistics, 6th edition. New York: W. H. Freeman
Further Reading Material
For more details on standard deviation, read
https://statistics.laerd.com/statistical-guides/measures-of-spread-range-quartiles.php
read this material for more information on variance and standard deviation
Click here to access Unit Three Content..
Topic 4: Measures of position (Definition of terms)
Definition of terms
Percentile
o The percent of the population which lies below that value. The data must be ranked to find percentiles.
Quartile
o Either the 25th, 50th, or 75th percentiles. The 50th percentile is also called the median.
Decile
o Either the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, or 90th percentiles.
Lower Quartile
o The median of the lower half of the numbers (up to and including the median). The lower hinge is the first Quartile unless the remainder when dividing the sample size by four is 3.
Upper Quartile
o The median of the upper half of the numbers (including the median). The upper hinge is the 3rd Quartile unless the remainder when dividing the sample size by four is 3.
InterQuartile Range (IQR)
o The difference between the 3rd and 1st Quartiles.
Outlier
o An extremely high or low value when compared to the rest of the values.
Mild Outliers
o Values which lie between 1.5 and 3.0 times the InterQuartile Range below the 1st Quartile or above the 3rd Quartile. Note, some texts use hinges instead of Quartiles.
Extreme Outliers
Values which lie more than 3.0 times the InterQuartile Range below the 1st Quartile or above the 3rd QuartileClick here to access Unit Three Content..
Topic 4: Measures of position (Quartiles)
Quantiles
The quartiles divide the data into 4 equal regions. Instead of dividing by 100 in step 2, divide by 4.
The lower quartile is the median of the lower half of the data up to and including the median. The upper quartile is the median of the upper half of the data up to and including the median.
The statement about the lower half or upper half including the median tends to be confusing to some students. If the median is split between two values (which happens whenever the sample size is even), the median isn't included in either since the median isn't actually part of the data.
Note: The 2nd quartile is the same as the median. The 1st quartile is the 25th percentile, the 3rd quartile is the 75th percentile.
The quartiles are commonly used (much more so than the percentiles or deciles).
Click here to access Unit Three Content..
Topic 4: Measures of position (Quantile,s worked example)
Example 1: sample size of 20
The median will be in position 10.5. The lower half is positions 1 - 10 and the upper half is positions 11 - 20. The lower quartile is the median of the lower half and would be in position 5.5. The upper quartile is the median of the upper half and would be in position 5.5 starting with original position 11 as position 1 -- this is the original position 15.5.
Example 2: sample size of 21
The median is in position 11. The lower half is positions 1 - 11 and the upper half is positions 11 - 21. The lower quartile is the median of the lower half and would be in position 6. The upper quartile is the median of the upper half and would be in position 6 when starting at position 11 -- this is original position 16.
Click here to access Unit Three Content..
Topic 4: Measures of position (inter Quartile range)
Interquartile Range (IQR)
The interquartile range is the difference between the third and first quartiles. That's it: Q3 - Q1
The interquartile range is a measure of spread used most commonly with the median. It represents the central portion of the distribution, from the 25th percentile to the 75th percentile. In other words, the interquartile range includes the second and third quartiles of a distribution. The interquartile range thus includes approximately one half of the observations in the set, leaving one quarter of the observations on each side.
Method for determining the interquartile range
Step 1. Arrange the observations in increasing order.
Step 2. Find the position of the 1st and 3rd quartiles with the following formulas. Divide the sum by the number of observations.
Position of 1st quartile (Q1) = 25th percentile = (n + 1) / 4
Position of 3rd quartile (Q3) = 75th percentile = 3(n + 1) / 4 = 3 x Q1
Step 3. Identify the value of the 1st and 3rd quartiles. a. If a quartile lies on an observation (i.e., if its position is a whole number) , the value of the quartile is the value of that observation.
If a quartile lies between observations, the value of the quartile is the value of the lower observation plus the specified fraction of the difference between
the observations.
Step 4. Epidemiologically, report the values at Q1 and Q3.
Statistically, calculate the interquartile range as Q3
minus Q1.
Click here to access Unit Three Content..
Topic 4: Measures of position (inter Quartile range properties)
Properties and uses of the interquartile range
• The interquartile range is generally used in conjunction with the median. Together, they are useful for characterizing the central location and spread of any frequency distribution, but
particularly those that are skewed.
• For a more complete characterization of a frequency distribution, the 1st and 3rd quartiles are sometimes used with the minimum value, the median, and the maximum value to produce a five-number summary of the distribution
Click here to access Unit Three Content..
Topic 4: Skewness
Skewness
if extremely low or extremely high
observations are present in a
distribution, then the mean tends to shift towards those scores. Based on the
type of skewness, distributions can be: a) Negatively skewed
distribution: occurs when majority of scores are at the right
end of the curve and a few small scores are scattered at the left
end. b) Positively skewed
distribution: Occurs when the majority of scores are at the left
end of the curve and a few extreme large scores are scattered at the
right end. c) Symmetrical
distribution: It is neither positively nor negatively skewed. A curve is
symmetrical if one half of the curve is the mirror image of the other
half. In unimodal ( one-peak)
symmetrical distributions, the mean, median and mode are identical. On the other hand, in unimodal skewed distributions, it is
important to remember that the mean, median and mode occur in
alphabetical order when the longer tail is at the left of the distribution or in
reverse alphabetical order when the longer tail is at the right of the
distribution.
Click here to access Unit Three Content..
Topic 4: skewness
Choosing the Right Measure of Central Location and Spread
Measures of central location and spread are useful for summarizing a distribution of data. They also facilitate the comparison of two or more sets of data. However, not every measure of central location and spread is well suited to every set of data. For example, because the normal distribution (or bell-shaped curve) is perfectly symmetrical, the mean, median, and mode all have the same value. In practice, however, observed data rarely approach this ideal shape. As a result, the mean, median, and mode usually differ.
Click here to access Unit Three Content..
Topic 4: Effect of Skewness on Mean, Median, and Mode
Effect of Skewness on Mean, Median, and Mode
How, then, do you choose the most appropriate measures? A partial answer to this question is to select the measure of central location on the basis of how the data are distributed, and then use the corresponding measure of spread.
In statistics, the arithmetic mean is the most commonly used measure of central location, and is the measure upon which the majority of statistical tests and analytic techniques are based.
The standard deviation is the measure of spread most commonly used with the mean. But as noted previously, one disadvantage of the mean is that it is affected by the presence of one or a few observations with extremely high or low values. The mean is “pulled” in the direction of the extreme values. You can tell the direction in which the data are skewed by comparing the values of the mean and the median; the mean is pulled away from the median in the direction of the extreme values. If the mean is higher than the median, the distribution of data is skewed to the right. If the mean is lower than the median, as in the right side the distribution is skewed to the left.
The advantage of the median is that it is not affected by a few extremely high or low observations. Therefore, when a set of data is skewed, the median is more representative of the data than is the mean. For descriptive purposes, and to avoid making any assumption that the data are normally distributed, present the median for incubation periods, duration of illness, and age of the study subjects.
Two measures of spread can be used in conjunction with the median: the range and the interquartile range.
The mode is the least useful measure of central location. Some sets of data have no mode; others have more than one. The most common value may not be anywhere near the center of the
distribution. Modes generally cannot be used in more elaborate statistical calculations. Nonetheless, even the mode can be helpful when one is interested in the most common value or most popular choice.
The geometric mean is used for exponential or logarithmic data such as laboratory titers, and for environmental sampling data whose values can span several orders of magnitude.
Click here to access Unit Three Content..
Topic 4: outliers
i) Outliers
Outliers are extreme values. There are mild outliers and extreme outliers.
Extreme Outliers
Extreme outliers are any data values which lie more than 3.0 times the interquartile range below the first quartile or above the third quartile. x is an extreme outlier if ...
x < Q1 - 3 * IQR or x > Q3 + 3 * IQR
Mild Outliers
Mild outliers are any data values which lie between 1.5 times and 3.0 times the interquartile range below the first quartile or above the third quartile. x is a mild outlier if ...
Q1 - 3 * IQR <= x < Q1 - 1.5 * IQR or Q1 + 1.5 * IQR < x <= Q3 + 3 * IQR
Click here to access Unit Three Content..
Topic 4: Standard Scores (z-scores)
Standard Scores (z-scores)
The standard score is obtained by subtracting the mean and dividing the difference by the standard deviation. The symbol is z, which is why it's also called a z-score.
The mean of the standard scores is zero and the standard deviation is 1. This is the nice feature of the standard score -- no matter what the original scale was, when the data is converted to its standard score, the mean is zero and the standard deviation is 1.
Click here to access Unit Three Content..
Topic 4: Measures of position (percentiles)
Percentiles
Percentiles divide the data in a distribution into 100 equal parts.
The Pth percentile (P ranging from 0 to 100) is the value that has P percent of the observations falling at or below it. In other words, the 90th percentile has 90% of the observations at or below it. The median, the halfway point of the distribution, is the 50th percentile.
The maximum value is the 100th percentile, because all values fall at or below the maximum.
The Kth
percentile is the number which has k% of the values below it. The data must be
ranked. It is sometimes easier
to count from the high end rather than counting from the low end. For example,
the 80th percentile is the number which has 80% below it and
20% above it. Rather than counting 80% from the bottom, count 20% from the top.
Click here to access Unit Three Content..
Topic 4: Measures of position (percentiles)
Percentiles
Percentiles divide the data in a distribution into 100 equal parts.
The Pth percentile (P ranging from 0 to 100) is the value that has P percent of the observations falling at or below it. In other words, the 90th percentile has 90% of the observations at or below it. The median, the halfway point of the distribution, is the 50th percentile.
The maximum value is the 100th percentile, because all values fall at or below the maximum.
The Kth
percentile is the number which has k% of the values below it. The data must be
ranked. It is sometimes easier
to count from the high end rather than counting from the low end. For example,
the 80th percentile is the number which has 80% below it and
20% above it. Rather than counting 80% from the bottom, count 20% from the top.
Click here to access Unit Three Content..
Topic 4: Summary
in summary
topic four has highlighted the measures of position as well as the effect of skewness and outliers on various statistical measures
Click here to access Unit Three Content..
Topic 4: Further Reading
Reference Material
1. Clarke, G. M. and Cooke, D. (2004). A Basic Program in Statistics. 5th edition. West Sussex: John Wiley & Sons
3) Freund, J. E. (2001). Modern Elementary Statistics. Upper Saddle River, NJ: Prentice-Hall
4) Johnson, R. A. and Bhattacharyga, G. K. (1992). Statistics: Principles and Methods, 2nd edition. New York: John Wiley & Sons
5) Moore, D. S., Craig, B. and McCabe, G. P. (2007). Introduction to the Practice of Statistics, 6th edition. New York: W. H. Freeman
Further Reading Material
For more details on measures of position for ungrouped data, read
For more details on measures of position for grouped data, read
Click here to access Unit Four Content..
Topic 1: Statistical data (Objectives)
Objectives
At the end of this section, learners should be able to:
- Discuss procedures in managing data.
- Explain steps that need to be followed to carry out an effective analysis of quantitative and qualitative data
Click here to access Unit Four Content..
Topic 1: statistical data (definition of terms)
Definitions of terms
Statistic
o Characteristic or measure obtained from a sample
Parameter
o Characteristic or measure obtained from a population
Empirical or Normal Rule
o Only valid when a distribution in bell-shaped (normal). Approximately 68% lies within 1 standard deviation of the mean; 95% within 2 standard deviations; and 99.7% within 3 standard deviations of the mean.
Standard Score or Z-Score
o The value obtained by subtracting the mean and dividing by the standard deviation. When all values are transformed to their standard scores, the new mean (for Z) will be zero and the standard deviation will be one.
Click here to access Unit Four Content..
Topic 1: Parameters & Statistics
Parameters & Statistics
Usually the features of the population under investigation can be summarized by
numerical parameters. Hence the research problem usually becomes as on investigation
of the values of parameters. These population parameters are unknown and sample
statistics are used to make inference about them. That is, a statistic
describes a characteristic of the sample which can then be used to make inference
about unknown parameters.
Definition (Parameters and Statistics).
A parameter is an unknown numerical summary of the population. A statistic is a
known numerical summary of the sample which can be used to make inference about
parameters. (Agresti & Finlay, 1997)
So the inference about some specific unknown parameter is based on a statistic.
We use known sample statistics in making inferences about unknown population
parameters. The primary focus of most research studies is the parameters of the
population, not statistics calculated for the particular sample selected. The
sample and statistics describing it are important only insofar as they provide
information about the unknown parameters.
Example (Parameters and Statistics). Consider the research problem of finding
out what percentage of under-five year-olds presenting in the MCH clinic with
malnutrition at least once a month.
• Parameter: The proportion p of under-five year-olds presenting in the MCH
clinic with malnutrition at least once a month
• Statistic: The proportion pˆ of under-five year-olds presenting in the MCH
clinic with malnutrition at least once a month calculated from the sample of
under-five year-olds.
Click here to access Unit Four Content..
Topic 1: Types of data
Types of data Primary vs secondary Data can be categorized as primary or secondary and
or quantitative or qualitative. Primary data is that has not been documented and is
collected through survey or experimentation. It is first hand information.
Secondary data is data that exist in form of published or unpublished
documentation
Click here to access Unit Four Content..
Topic 1: Variables
Variables can be classified into one of four types, depending on the type of scale
used to characterize their values
• A nominal-scale variable is one whose values are categories without any numerical ranking, such as county of residence. In
epidemiology, nominal variables with only two categories are very common: alive or dead, ill or well, vaccinated or unvaccinated, or did or did not eat the potato salad. A nominal variable with two mutually exclusive categories is sometimes called a dichotomous variable.
• An ordinal-scale variable has values that can be ranked but are not necessarily evenly spaced, such as stage of cancer
• An interval-scale variable is measured on a scale of equally spaced units, but without a true zero point, such as date of
birth.
• A ratio-scale variable is an interval variable with a true zero point, such as height in centimeters or duration of illness.
Nominal- and ordinal-scale variables are considered qualitative or categorical variables, whereas interval- and ratio-scale variables are considered quantitative or continuous variables. Sometimes the same variable can be measured using both a nominal scale and a ratio scale. For example, the tuberculin skin tests of a group of persons potentially exposed to a co-worker with tuberculosis can be measured as “positive” or “negative” (nominal scale) or in millimeters of induration (ratio scale).
. Quantitative data is information that is expressed in numerical forms that can be measured with standard scales, while qualitative data include description or measurements with non standard scales. It describes the quality of a phenomenon with terms such as yes or no, bad or good etc.
Click here to access Unit Four Content..
Topic 2: Vital statistics
Natality (Birth) Measures
Natality measures are population-based measures of birth. These measures are used primarily by persons working in the field of maternal and child health.
Morbidity Frequency Measures
Morbidity has been defined as any departure, subjective or objective, from a state of physiological or psychological wellbeing.
In practice, morbidity encompasses disease, injury, and disability.
In addition, although for this lesson the term refers to the number of persons who are ill, it can also be used to describe
the periods of illness that these persons experienced, or the duration of these illnesses.
Measures of morbidity frequency characterize the number of persons in a population who become ill (incidence) or are ill at a
given time (prevalence).
Incidence refers to the occurrence of new cases of disease or injury in a population over a specified period of time
Two types of incidence are commonly used — incidence proportion and incidence rate.
Incidence proportion or risk
Definition of incidence proportion
Incidence proportion is the proportion of an initially disease-free population that develops disease, becomes injured, or dies during a specified (usually limited) period of time.
Synonyms include attack rate, risk, probability of getting disease, and cumulative incidence.
Click here to access Unit Four Content..
Topic 2: Vital statistics (incidence proportions)
Properties and uses of incidence proportions
• Incidence proportion is a measure of the risk of disease or the
probability of developing the disease during the specified
period. As a measure of incidence, it includes only new cases
of disease in the numerator. The denominator is the number of
persons in the population at the start of the observation period.
Because all of the persons with new cases of disease
(numerator) are also represented in the denominator, a risk is
also a proportion.
• In the outbreak setting, the term attack rate is often used as a
synonym for risk. It is the risk of getting the disease during a
specified period, such as the duration of an outbreak. A variety
of attack rates can be calculated.
Overall attack rate is the total number of new cases
divided by the total population.
A food-specific attack rate is the number of persons who
ate a specified food and became ill divided by the total
number of persons who ate that food, as illustrated in the
previous potato salad example.
A secondary attack rate is sometimes calculated to
document the difference between community transmission
of illness versus transmission of illness in a household,
barracks, or other closed population.


