Medical Student Arena

Topic Objectives

By the end of the topic, the learner should be able to understand , define and explain the various terminologies related to health statistics

Course dashboard

Click here to access Unit one Content..

Topic 1: definitions

Statistics is the mathematical science involving the collection, analysis and interpretation of data.

- its the methodology for collecting, analyzing, interpreting and drawing conclusions from information.

Biostatistics is a branch of biology that studies biological phenomena and observations by means of statistical analysis, and includes medical statistics

· Demography is the statistical study of all populations. It can be a very general science that can be applied to any kind of dynamic population, that is, one that changes over time or space.

· Environmental statistics is the application of statistical methods to environmental science. Weather, climate, air and water quality are included, as are studies of plant and animal populations.

· Epidemiology is the study of factors affecting the health and illness of populations, and serves as the foundation and logic of interventions made in the interest of public health and preventive medicine.

Population: All subjects possessing a common characteristic that is being studied.

Sample A subgroup or subset of the population.

Parameter Characteristic or measure obtained from a population.

Statistic (not to be confused with Statistics): Characteristic or measure obtained from a sample.

Descriptive Statistics: Collection, organization, summarization, and presentation of data.

Inferential Statistics: Generalizing from samples to populations using probabilities. Performing hypothesis testing, determining relationships between variables, and making predictions.

Course dashboard

Click here to access Unit one Content..

Topic one : definitions (cntd)

· Population ecology is a sub-field of ecology that deals with the dynamics of species populations and how these populations interact with the environment.

· Psychometric is the theory and technique of educational and psychological measurement of knowledge, abilities, attitudes, and personality traits.

· Quality control reviews the factors involved in manufacturing and production; it can make use of statistical sampling of product items to aid decisions in process control or in accepting deliveries.

· Quantitative psychology is the science of statistically explaining and changing mental processes and behaviors in humans.

Biostatistics more commonly connotes all applications of statistics to biology.
Pharmaceutical statistics is the application of statistics to matters concerning the pharmaceutical industry. This can be from issues of design of experiments, to analysis of drug trials, to issues of commercialization of a medicine.
Vital statistics- vital events of life; birth, deaths, occurrence of a particular disease.

Course dashboard

Click here to access Unit one Content..

Topic one : definitions (cntd)

· Population ecology is a sub-field of ecology that deals with the dynamics of species populations and how these populations interact with the environment.

· Psychometric is the theory and technique of educational and psychological measurement of knowledge, abilities, attitudes, and personality traits.

· Quality control reviews the factors involved in manufacturing and production; it can make use of statistical sampling of product items to aid decisions in process control or in accepting deliveries.

· Quantitative psychology is the science of statistically explaining and changing mental processes and behaviors in humans.

Biostatistics more commonly connotes all applications of statistics to biology.
Pharmaceutical statistics is the application of statistics to matters concerning the pharmaceutical industry. This can be from issues of design of experiments, to analysis of drug trials, to issues of commercialization of a medicine.
Vital statistics- vital events of life; birth, deaths, occurrence of a particular disease.

Course dashboard

Click here to access Unit Two Content..

Topic 1: INTRODUCTION TO SCALES/LEVELS OF MEASUREMENT

OVERVIEW
There are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. These go from lowest level to highest level.

Besides being classified as either qualitative or quantitative, variables can be described according to the scale on which they are defined. The scale of the variable gives certain structure to the variable and also defines the meaning of the variable
Data is classified according to the highest level which it fits. Each additional level adds something the previous level didn't have.

Nominal is the lowest level. Only names are meaningful here.

Ordinal adds an order to the names.

Interval adds meaningful differences

Ratio adds a zero so that ratios are meaningful
Scales for Qualitative Variables

Based on what scale a qualitative variable is defined, the variable can be called as a nominal variable or an ordinal variable.
Scales for Quantitative Variables

Quantitative variables, whether discrete or continuous, are defined either on an interval scale or on a ratio scale.

Course dashboard

Click here to access Unit Two Content..

Topic 1: Definition of terms

1.      Variable

a.       Characteristic or attribute that can assume different values

2.      Random Variable

a.       variable whose values are determined by chance

3.      Qualitative Variables

a.       Variables which assume non-numerical values.

4.      Quantitative Variables

a.       Variables which assume numerical values. eg number of students in a class. this can only assume whole numbers 50 students.

5.      Discrete Variables

a.       Variables which assume a finite or countable number of possible values. Usually obtained by counting.

6.      Continuous Variables
Variables which assume an infinite number of possible values. Usually obtained by measurement

Course dashboard

Click here to access Unit Two Content..

Topic 1: Definition of terms (cntd)

7. Nominal Level

a.       Level of measurement which classifies data into mutually exclusive, all inclusive categories in which no order or ranking can be imposed on the data.

8. Ordinal Level

a.       Level of measurement which classifies data into categories that can be ranked. Differences between the ranks do not exist.

9.   Interval Level

a.       Level of measurement which classifies data that can be ranked and differences are meaningful. However, there is no meaningful zero, so ratios are meaningless.

10.   Ratio Level

a.       Level of measurement which classifies data that can be ranked, differences are meaningful, and there is a true zero. True ratios exist between the different units of measure

11. Data
are the quantities (numbers) or qualities (attributes) measured or observed that are to be collected and/or analyzed

Course dashboard

Click here to access Unit Two Content..

Topic 1: nominal scale

Nominal data/scale:-

Data that represent categories or names. There is no

implied order to the categories of nominal data. In these types of data,

individuals are simply placed in the proper category or group, and the

number in each category is counted. Each item must fit into exactly one category.

The simplest data consist of unordered, dichotomous, or "either - or"

types of observations, i.e., either the patient lives or the patient dies,

either he has some particular attribute or he does not.

Some other examples of nominal data:

Eye color - brown, black, etc.

Religion - Christianity, Islam, Hinduism, etc

Sex - male, female

Course dashboard

Click here to access Unit Two Content..

Topic 1: Ordinal Data/scale

Ordinal Data/scale:- have order among the response classifications

(categories). The spaces or intervals between the categories are not

necessarily equal.

Example:

1. strongly agree

2. agree

3. no opinion

4. disagree

5. strongly disagree

In the above situation, we only know that the data are ordered.

Course dashboard

Click here to access Unit Two Content..

Topic 1: Interval Data/scale

Interval Data/scale:- In interval data the intervals between values are the

same. For example, in the Fahrenheit temperature scale, the difference

between 70 degrees and 71 degrees is the same as the difference

between 32 and 33 degrees. But the scale is not a RATIO Scale. 40

degrees Fahrenheit is not twice as much as 20 degrees Fahrenheit.

Course dashboard

Click here to access Unit Two Content..

Topic 1: Ratio Data/scale

Ratio Data/scale:- The data values in ratio data do have meaningful ratios,

for example, age is a ratio data, some one who is 40 is twice as old as

someone who is 20.

NOTE : Both interval and ratio data involve measurement. Most data analysis

techniques that apply to ratio data also apply to interval data. Therefore,

in most practical aspects, these types of data (interval and ratio) are

grouped under metric data. In some other instances, these type of data

are also known as numerical discrete and numerical continuous.

Course dashboard

Click here to access Unit Two Content..

Topic 1: Numerical discrete vs Numerical continuous

Numerical discrete vs Numerical continuous

Numerical discrete

Numerical discrete data occur when the observations are integers that

correspond with a count of some sort. Some common examples are:

the number of bacteria colonies on a plate, the number of cells within a

prescribed area upon microscopic examination, the number of heart

beats within a specified time interval, a mother’s history of number of

births ( parity) and pregnancies (gravidity), the number of episodes of

illness a patient experiences during some time period, etc.

Numerical continuous

The scale with the greatest degree of quantification is a numerical

continuous scale. Each observation theoretically falls somewhere along

a continuum. One is not restricted, in principle, to particular values such

as the integers of the discrete scale. The restricting factor is the degree

of accuracy of the measuring instrument most clinical measurements,

such as blood pressure, serum cholesterol level, height, weight, age

etc. are on a numerical continuous scale

Course dashboard

Click here to access Unit Two Content..

Topic One: Summary

In this topic we have defined the various types of scales of measurement , when to use them and specific variables they measure. we have seen the difference between numerical discrete and numerical continuous data.

Course dashboard

Click here to access Unit Two Content..

Topic One: Further Reading

Reference Material

1.Agrestic, A. and Finlay, B. (2008). Social Methods for the Social Sciences, 4th edition. Edinburgh: Pearson Education Limited

2) Clarke, G. M. and Cooke, D. (2004). A Basic Program in Statistics. 5th edition. West Sussex: John Wiley & Sons

3) Freund, J. E. (2001). Modern Elementary Statistics. Upper Saddle River, NJ: Prentice-Hall

4) Johnson, R. A. and Bhattacharyga, G. K. (1992). Statistics: Principles and Methods, 2nd edition. New York: John Wiley & Sons

5) Moore, D. S., Craig, B. and McCabe, G. P. (2007). Introduction to the Practice of Statistics, 6th edition. New York: W. H. Freeman

Further Reading Resources

ou can watch this video on

for more q uestions on scales of measurement

Course dashboard

Click here to access Unit Three Content..

Topic 1: Measures of central tendency (Definition of terms)

INTRODUCTION

The tendency of statistical data to get concentrated at certain values is

called the “Central Tendency” and the various methods of determining

the actual value at which the data tend to concentrate are called

measures of central Tendency or averages. Hence, an average is a

value which tends to sum up or describe the mass of the data.

The clustering at a particular value is known as the central location or central tendency of a frequency distribution measures of central location are commonly used in : arithmetic mean, median, mode, midrange and geometric mean.

Definition of terms

Mean

o Sum of all the values divided by the number of values. This can either be a population mean (denoted by mu) or a sample mean (denoted by x bar)

Median

o The midpoint of the data after being ranked (sorted in ascending order). There are as many numbers below the median as above the median.

Mode

o The most frequent number

Weighted Mean

o The mean when each value is multiplied by its weight and summed. This sum is divided by the total of the weights.

Midrange

o The mean of the highest and lowest values. (Max + Min) / 2

Course dashboard

Click here to access Unit Three Content..

Topic 1: measures of central tendency (mode)

The Mode

The sample mode of a qualitative or a discrete quantitative variable is that value of the variable which occurs with the greatest frequency in a data set. Simply put it “ the mode is the most frequent observation in a data set”. There may be no mode if no one value appears more than any other. There may also be two modes (bimodal), three modes (trimodal), or more than three modes (multi-modal).

Course dashboard

Click here to access Unit Three Content..

Topic 1: measures of central tendency (mode)

Calculation of mode

Obtain the frequency of each observed value of the variable in a data and note the greatest frequency.

1. If the greatest frequency is 1 (i.e. no value occurs more than once), then the variable has no mode.

2. If the greatest frequency is 2 or greater, then any value that occurs with that greatest frequency is called a sample mode of the variable. To obtain the mode(s) of a variable, we first construct a frequency distribution for the data using classes based on single value. The mode(s) can then be determined easily from the frequency distribution.

Course dashboard

Click here to access Unit Three Content..

Topic 1: measures of central tendency (mode)

Calculation of mode (cntd)

Example Let us consider the frequency table for blood types of 10 persons.

Blood group	No of clients
A	4
B	2
AB	1
O	3

We can see from frequency table that the mode of blood types is A.

For grouped frequency distributions, the modal class is the class with the largest frequency.

Course dashboard

Click here to access Unit Three Content..

Topic 1: measures of central tendency (mode)

Calculation of mode (cntd)

Example Let us consider the frequency table for blood types of 10 persons.

Blood group	No of clients
A	4
B	2
AB	1
O	3

We can see from frequency table that the mode of blood types is A.

For grouped frequency distributions, the modal class is the class with the largest frequency.

Course dashboard

Click here to access Unit Three Content..

Topic 1: measures of central tendency ( Properties and uses of the mode)

Properties and uses of the mode

• The mode is the easiest measure of central location to understand and explain. It is also the easiest to identify, and requires no calculations.

• The mode is the preferred measure of central location for addressing which value is the most popular or the most common.

• As demonstrated, a distribution can have a single mode.

However, a distribution has more than one mode if two or more values tie as the most frequent values. It has no mode if no value appears more than once.

• The mode is used almost exclusively as a “descriptive”

measure. It is almost never used in statistical manipulations or

analyses.

• The mode is not typically affected by one or two extreme

values (outliers).

Course dashboard

Click here to access Unit Three Content..

Topic 1: MEASURES OF CENTRAL TENDENCY (The Arithmetic Mean or simple Mean)

a) The Mean/The Arithmetic Mean or simple Mean

This is what people usually intend when they say "average"

The most commonly used measure of center for quantitative variable is the (arithmetic) sample mean. When people speak of taking an average, it is mean that they are most often referring to.

Definition (Mean) The sample mean of the variable is the sum of observed values in a data divided by the number of observations.

Example 7 participants in bike race had the following finishing times in minutes: 28,22,26,29,21,23,24.
What is the mean?

Mean (Ẍ) =Sum/totals= 172/7= 24.6

Example 8 participants in bike race had the following finishing times in minutes: 28,22,26,29,21,23,24,50.

What is the mean?
Mean (Ẍ) =Sum/totals= 222/8= 27.8

Course dashboard

Click here to access Unit Three Content..

Topic 1: MEASURES OF CENTRAL TENDENCY (The population and sample Mean)

population mean,

sample mean,

Course dashboard

Click here to access Unit Three Content..

Topic 1: MEASURES OF CENTRAL TENDENCY (Geometric mean)

Geometric mean

It is obtained by taking the nth root of the product of “n” values, i.e, if the values of the observation are demoted by x1,x2,…,x n then, GM = n√(x1)(x2)….(xn) .

Geometric mean

Definition of geometric mean

The geometric mean is the mean or average of a set of data measured on a logarithmic scale. The geometric mean is used when the logarithms of the observations are distributed normally

(symmetrically) rather than the observations themselves. The geometric mean is particularly useful in the laboratory for data from serial dilution assays (1/2, 1/4, 1/8, 1/16, etc.) and in environmental sampling data.

Course dashboard

Click here to access Unit Three Content..

Topic 1: MEASURES OF CENTRAL TENDENCY (Geometric mean)

Method for calculating the geometric mean

There are two methods for calculating the geometric mean.

Method A

Step 1. Take the logarithm of each value.

Step 2. Calculate the mean of the log values by summing the log values, then dividing by the number of observations.

Step 3. Take the antilog of the mean of the log values to get the geometric mean.

Method B

Step 1. Calculate the product of the values by multiplying all of the values together.

Step 2. Take the nth root of the product (where n is the number of observations) to get the geometric mean.

Course dashboard

Click here to access Unit Three Content..

Topic 1: MEASURES OF CENTRAL TENDENCY (Properties and uses of the geometric mean)

Properties and uses of the geometric mean

• The geometric mean is the average of logarithmic values, converted back to the base. The geometric mean tends to dampen the effect of extreme values and is always smaller than

the corresponding arithmetic mean. In that sense, the geometric mean is less sensitive than the arithmetic mean to one or a few extreme values.

• The geometric mean is the measure of choice for variables measured on an exponential or logarithmic scale, such as dilutional titers or assays.

• The geometric mean is often used for environmental samples, when levels can range over several orders of magnitude. For example, levels of coliforms in samples taken from a body of water can range from less than 100 to more than 100,000.

Course dashboard

Click here to access Unit Three Content..

Topic 1: measures of central tendency (median)

Median

The median is the middle value of a set of data that has been put into rank order. the statistical median is the value that divides the data into two halves, with one half of the observations being

smaller than the median value and the other half being larger. The median is also the 50th percentile of the distribution.

Method for identifying the median

Step 1. Arrange the observations into increasing or decreasing order.

Step 2. Find the middle position of the distribution by using the

following formula:Middle position = (n + 1) / 2

a. If the number of observations (n) is odd, the middle position falls on a single observation.

b. If the number of observations is even, the middle position falls between two observations.

Step 3. Identify the value at the middle position.

a. If the number of observations (n) is odd and the middle position falls on a single observation, the median equals the value of that observation.

b. If the number of observations is even and the middle position falls between two observations, the median equals the average of the two values.

Course dashboard

Click here to access Unit Three Content..

Topic 1: measures of central tendency ( Properties and uses of the median)

Properties and uses of the median

• The median is a good descriptive measure, particularly for data that are skewed, because it is the central point of the distribution.

• The median is relatively easy to identify. It is equal to either a single observed value (if odd number of observations) or the average of two observed values (if even number of

observations).

• The median, like the mode, is not generally affected by one or two extreme values (outliers).

• The median has less-than-ideal statistical properties. Therefore, it is not often used in statistical manipulations and analyses.

Course dashboard

Click here to access Unit Three Content..

Topic 1: Further Reading

Reference Material

1.Agrestic, A. and Finlay, B. (2008). Social Methods for the Social Sciences, 4th edition. Edinburgh: Pearson Education Limited

2) Clarke, G. M. and Cooke, D. (2004). A Basic Program in Statistics. 5th edition. West Sussex: John Wiley & Sons

3) Freund, J. E. (2001). Modern Elementary Statistics. Upper Saddle River, NJ: Prentice-Hall

4) Johnson, R. A. and Bhattacharyga, G. K. (1992). Statistics: Principles and Methods, 2nd edition. New York: John Wiley & Sons

5) Moore, D. S., Craig, B. and McCabe, G. P. (2007). Introduction to the Practice of Statistics, 6th edition. New York: W. H. Freeman

Further Reading Resources

for further reading on median click here

Course dashboard

Click here to access Unit Three Content..

Topic 2: Measures of central tendency for grouped data (objectives)

Objectives

At the end of this section, learners should be able to:

1. Outline the various methods used to describe data in a data set

2. Able to calculate the mean, mode and median of a ungrouped data set
Able to calculate the mean, mode and median of a grouped data set

Course dashboard

Click here to access Unit Three Content..

Topic 2: Measures of central tendency for grouped data (definitions cntd)

Class Mark (Midpoint)

o The number in the middle of the class. It is found by adding the upper and lower limits and dividing by two. It can also be found by adding the upper and lower boundaries and dividing by two.

Cumulative Frequency

o The number of values less than the upper class boundary for the current class. This is a running total of the frequencies.

Relative Frequency

o The frequency divided by the total frequency. This gives the percent of values falling in that class.

Cumulative Relative Frequency (Relative Cumulative Frequency)

o The running total of the relative frequencies or the cumulative frequency divided by the total frequency. Gives the percent of the values which are less than the upper class boundary.

NOTE: For grouped data, we cannot find the exact Mean, Median and Mode, we can only give estimates.

Course dashboard

Click here to access Unit Three Content..

Topic 2: Mean of grouped data

Mean of grouped data

To estimate the Mean use the midpoints of the class intervals:

Estimated Mean = Sum of (Midpoint × Frequency)/Sum of Freqency

To calculate the mean for grouped data,

First find the midpoint of each class and then multiply the midpoint by the frequencies of the corresponding classes.

The sum of these products gives an approximation for the sum of all values. To find the value of mean, divide this sum by the total number of observations in the data.

Course dashboard

Click here to access Unit Three Content..

Topic 2: Mean of grouped data (cntd)

The formulas used to calculate the mean for grouped data are as follows.

Mean for population data

Mean for sample data

Where m is the midpoint and f is the frequency of a class.

Course dashboard

Click here to access Unit Three Content..

Topic 2: Median of grouped data

To estimate the Median use:

Estimated Median = L + (n/2) – B/G × w

where:

L is the lower class boundary of the group containing the median
n is the total number of data
B is the cumulative frequency of the groups before the median group
G is the frequency of the median group
w is the group width

Alternative

Find out what proportion of the distance into the median class the median by dividing the sample size by 2, subtracting the cumulative frequency of the previous class, and then dividing all that bay the frequency of the median class.

Multiply this proportion by the class width and add it to the lower boundary of the median class

Course dashboard

Click here to access Unit Three Content..

Topic 2: Mode of grouped data

To estimate the Mode use:

Estimated Mode = L + f_m − f_m-1/(f_m − f_m-1) + (f_m − f_m+1) × w

where:

L is the lower class boundary of the modal group
f_m-1 is the frequency of the group before the modal group
f_m is the frequency of the modal group
f_m+1 is the frequency of the group after the modal group
w is the group width

Course dashboard

Click here to access Unit Three Content..

Topic 2: Mean, median and mode of grouped data video

Course dashboard

Click here to access Unit Three Content..

Topic 2: Summary

in summary

The Mean is used in computing other statistics (such as the variance) and does not exist for open ended grouped frequency distributions (1). It is often not appropriate for skewed distributions such as salary information.

The Median is the center number and is good for skewed distributions because it is resistant to change.

The Mode is used to describe the most typical case. The mode can be used with nominal data whereas the others can't. The mode may or may not exist and there may be more than one value for the mode multimodal.

The Midrange is not used very often. It is a very rough estimate of the average and is greatly affected by extreme values (even more so than the mean).

Properties of the various measures

Property	Mean	Median	Mode	Midrange
Always Exists	No (1)	Yes	No (2)	Yes
Uses all data values	Yes	No	No	No
Affected by extreme values	Yes	No	No	Yes

Course dashboard

Click here to access Unit Three Content..

Topic 2: Further Reading

Reference Material

1. Clarke, G. M. and Cooke, D. (2004). A Basic Program in Statistics. 5th edition. West Sussex: John Wiley & Sons

3) Freund, J. E. (2001). Modern Elementary Statistics. Upper Saddle River, NJ: Prentice-Hall

4) Johnson, R. A. and Bhattacharyga, G. K. (1992). Statistics: Principles and Methods, 2nd edition. New York: John Wiley & Sons

5) Moore, D. S., Craig, B. and McCabe, G. P. (2007). Introduction to the Practice of Statistics, 6th edition. New York: W. H. Freeman

Further Reading Material

for worked examples and questions on measures of central tendency , click here

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (objectives)

Objectives

At the end of this section, learners should be able to:

1. Outline the various methods used to describe variability of data in a data set

2. Able to calculate the range, variance and standard deviation of a ungrouped data set

3. Able to calculate the range, variance and standard deviation of a grouped data set

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (Definition of terms)

Definition of terms

Range

o The difference between the highest and lowest values. Max - Min

Population Variance

o The average of the squares of the distances from the population mean. It is the sum of the squares of the deviations from the mean divided by the population size. The units on the variance are the units of the population squared.

Sample Variance

o Unbiased estimator of a population variance. Instead of dividing by the population size, the sum of the squares of the deviations from the sample mean is divided by one less than the sample size. The units on the variance are the units of the population squared.

Standard Deviation

o The square root of the variance. The population standard deviation is the square root of the population variance and the sample standard deviation is the square root of the sample variance. The sample standard deviation is not the unbiased estimator for the population standard deviation. The units on the standard deviation is the same as the units of the population/sample.

Coefficient of Variation

o Standard deviation divided by the mean, expressed as a percentage. We won't work with the Coefficient of Variation in this course.

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (Introduction)

Introduction

Spread, or dispersion or variation, is the second important feature of frequency distributions. Just as measures of central location describe where the peak is located, measures of spread describe the dispersion (or variation) of values from that peak in the distribution.

The object of measuring this scatter or dispersion is to obtain a single summary figure which adequately exhibits whether the distribution is compact or spread out.

Measures of spread include the range, interquartile range, and standard deviation.

In addition to locating the center of the observed values of the variable in the data, another important aspect of a descriptive study of the variable is numerically measuring the extent of variation around the center. Two data sets of the same variable may exhibit similar positions of center but may be remarkably different with respect to variability.

Just as there are several different measures of center, there are also several different measures of variation. In this section, we will examine three of the most frequently used measures of variation; the sample range, the sample interquartile range and the sample standard deviation. Measures of variation are used mostly only for quantitative variables.

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (range)

Range
The sample range is obtained by computing the difference between the largest observed value of the variable in a data set and the smallest one. Definition 5.1 (Range). The sample range of the variable is the difference between its maximum and minimum values in a data set:

Range = Max − Min.

The sample range of the variable is quite easy to compute. However, in using the range, a great deal of information is ignored, that is, only the largest and smallest values of the variable are considered; the other observed values are disregarded. It should also be remarked that the range cannot ever decrease, but can increase, when additional observations are included in the data set and that in sense the range is overly sensitive to the sample size.

Example 7 participants in bike race had the following finishing times in minutes: 28,22,26,29,21,23,24.

What is the range?

Solution: Range = 29-21 = 8
Example 5.2. 8 participants in bike race had the following finishing times in minutes: 28,22,26,29,21,23,24,50.

Solution: Range = 50-21 = 29
Since the range only uses the largest and smallest values, it is greatly affected by extreme values, that is - it is not resistant to change.

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (Interquartile range)

Interquartile range

The interquartile range is a measure of spread used most commonly with the median. It represents the central portion of the distribution, from the 25th percentile to the 75th percentile. In other words, the interquartile range includes the second and third quartiles of a distribution. The interquartile range thus includes approximately one half of the observations in the set, leaving one quarter of the observations on each side.

Method for determining the interquartile range

Step 1. Arrange the observations in increasing order.

Step 2. Find the position of the 1st and 3rd quartiles with the following formulas. Divide the sum by the number of observations.

Position of 1st quartile (Q1) = 25th percentile = (n + 1) / 4

Position of 3rd quartile (Q3) = 75th percentile = 3(n + 1) / 4 = 3 x Q1

Step 3. Identify the value of the 1st and 3rd quartiles. a. If a quartile lies on an observation (i.e., if its position is a whole number) , the value of the quartile is the value of that observation.

If a quartile lies between observations, the value of the quartile is the value of the lower observation plus the specified fraction of the difference between

the observations.

Step 4. Epidemiologically, report the values at Q1 and Q3.

Statistically, calculate the interquartile range as Q3 minus Q1.

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (Interquartile range Properties)

Properties and uses of the interquartile range

• The interquartile range is generally used in conjunction with the median. Together, they are useful for characterizing the central location and spread of any frequency distribution, but

particularly those that are skewed.

• For a more complete characterization of a frequency distribution, the 1st and 3rd quartiles are sometimes used with the minimum value, the median, and the maximum value to produce a five-number summary of the distribution

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (Average Deviation")

Average Deviation"

The range only involves the smallest and largest numbers, and it would be desirable to have a statistic which involved all of the data values.

The first attempt one might make at this is something they might call the average deviation from the mean and define it as:

The problem is that this summation is always zero. So, the average deviation will always be zero. That is why the average deviation is never used.

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (Variance)

Variance

So, to keep it from being zero, the deviation from the mean is squared and called the "squared deviation from the mean". This "average squared deviation from the mean" is called the variance

Unbiased Estimate of the Population Variance

One would expect the sample variance to simply be the population variance with the population mean replaced by the sample mean. However, one of the major uses of statistics is to estimate the corresponding parameter. This formula has the problem that the estimated value isn't the same as the parameter. To counteract this, the sum of the squares of the deviations is divided by one less than the sample size.

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (Variance)

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (Variance video)

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (standard deviation)

Definition of standard deviation

The standard deviation is the measure of spread used most commonly with the arithmetic mean. Earlier, the centering property of the mean was described — subtracting the mean from

each observation and then summing the differences adds to 0. This concept of subtracting the mean from each observation is the basis or the standard deviation. However, the difference between the mean and each observation is squared to eliminate negative numbers. Then the average is calculated and the square root is taken to get back to the original units.

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (standard deviation cntd)

Method for calculating the standard deviation

Step 1. Calculate the arithmetic mean.

Step 2. Subtract the mean from each observation. Square the

difference.

Step 3. Sum the squared differences.

Step 4. Divide the sum of the squared differences by n – 1.

Step 5. Take the square root of the value obtained in Step 4.

The result is the standard deviation.

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (standard deviation cntd)

Properties and uses of the standard deviation

• The numeric value of the standard deviation does not have an easy, non-statistical interpretation, but similar to other measures of spread, the standard deviation conveys how widely or tightly the observations are distributed from the center.

• Standard deviation is usually calculated only when the data are more-or-less “normally distributed,” i.e., the data fall into a typical bell-shaped curve. For normally distributed data, the arithmetic mean is the recommended measure of central location, and the standard deviation is the recommended measure of spread. In fact, means should never be reported without their associated standard deviation.

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (standard deviation cntd)

There is a problem with variances. Recall that the deviations were squared. That means that the units were also squared. To get the units back the same as the original data values, the square root must be taken.

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (standard deviation cntd)

The sample standard deviation is the most frequently used measure of variability, although it is not as easily understood as ranges. It can be considered as a kind of average of the absolute deviations of observed values from the mean of the variable in question.

The sample standard deviation is not the unbiased estimator for the population standard deviation.

The more variation there is in the observed values, the larger is the standard deviation for the variable in question. Thus the standard deviation satisfies the basic criterion for a measure of variation and like said, it is the most commonly used measure of variation. However, the standard deviation does have its drawbacks. For instance, its values can be strongly affected by a few extreme observations.

Course dashboard

Click here to access Unit Three Content..

Topic 3: Measures of dispersion /spread/ variation (standard deviation video)

Course dashboard

Click here to access Unit Three Content..

Topic 3: variance and standard deviation for grouped data

Variance and standard deviation for grouped data

Following are the basic formulas used to calculate the population and sample variances for grouped data.

and

where

is the population variance,

is the sample variance and m is the midpoint of a class.
In either cases, the standard deviation is obtained by taking the positive square root of the variance.

Course dashboard

Click here to access Unit Three Content..

Topic 3: variance and standard deviation for grouped data (worked example)

Solved Example Problem

This below solved example problem for frequency distribution standard deviation may help the users to understand how the values are being used to work out such calculation based on the above mathematical formulas.

Q1
The following gives the frequency distribution of the daily commuting time (in minutes) from home to work for all 25 employees of a company.

Daily commuting time	Number of employees
0 to less than 10	4
10 to less than 20	9
20 to less than 30	6
30 to less than 40	4
40 to less than 50	2

Course dashboard

Click here to access Unit Three Content..

Topic 3: variance and standard deviation for grouped data (worked example cntd)

Calculate the mean, variance and standard deviation of the daily commuting times.


Daily commuting time
0 to less than 10	4	5	20	100
10 to less than 20	9	15	135	2025
20 to less than 30	6	25	150	3750
30 to less than 40	4	35	140	4900
40 to less than 50	2	45	90	4050
	N = 25

Course dashboard

Click here to access Unit Three Content..

Topic 3: variance and standard deviation for grouped data (worked example cntd)

How to calculate grouped data standard deviation?

Step by step calculation:
Follow these below steps using the above formulas to understand how to calculate standard deviation for the frequency table data set

step 1: find the mid-point for each group or range of the frequency table.
step 2: calculate the number of samples of a data set by summing up the frequencies.
step 3: find the mean for the grouped data by dividing the addition of multiplication of each group mid-point and frequency of the data set by the number of samples.
step 4: calculate the variance for the frequency table data by using the above formula.
step 5:estimate standard deviation for the frequency table by taking square root of the variance.

Course dashboard

Click here to access Unit Three Content..

Topic 3: variance and standard deviation for grouped data (worked example cntd)

The values are as below

mean

minutes

variance

standard deviation

minutes

Course dashboard

Click here to access Unit Three Content..

Topic 3: Summary

in summary

the various measures of dispersion include variance and standard deviation.

The standard deviation is square root of variance

Measures of dispersion tell us about variability in the data. Dispersion give us information about how much our variables vary from the mean, because if they don’t it makes it difficult to infer anything from the data. Dispersion is also known as the spread or range of variability.

Basic question: how much do values differ for a variable from the min to max, and distance among scores in between?

The common measures of dispersion are:

– Range

– Standard Deviation

– Variance

Course dashboard

Click here to access Unit Three Content..

Topic 3: Further Reading

Reference Material

1. Clarke, G. M. and Cooke, D. (2004). A Basic Program in Statistics. 5th edition. West Sussex: John Wiley & Sons

3) Freund, J. E. (2001). Modern Elementary Statistics. Upper Saddle River, NJ: Prentice-Hall

4) Johnson, R. A. and Bhattacharyga, G. K. (1992). Statistics: Principles and Methods, 2nd edition. New York: John Wiley & Sons

5) Moore, D. S., Craig, B. and McCabe, G. P. (2007). Introduction to the Practice of Statistics, 6th edition. New York: W. H. Freeman

Further Reading Material

For more details on standard deviation, read

https://statistics.laerd.com/statistical-guides/measures-of-spread-range-quartiles.php

read this material for more information on variance and standard deviation

Course dashboard

Click here to access Unit Three Content..

Topic 4: Measures of position (Definition of terms)

Definition of terms

Percentile

o The percent of the population which lies below that value. The data must be ranked to find percentiles.

Quartile

o Either the 25th, 50th, or 75th percentiles. The 50th percentile is also called the median.

Decile

o Either the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, or 90th percentiles.

Lower Quartile

o The median of the lower half of the numbers (up to and including the median). The lower hinge is the first Quartile unless the remainder when dividing the sample size by four is 3.

Upper Quartile

o The median of the upper half of the numbers (including the median). The upper hinge is the 3rd Quartile unless the remainder when dividing the sample size by four is 3.

InterQuartile Range (IQR)

o The difference between the 3rd and 1st Quartiles.

Outlier

o An extremely high or low value when compared to the rest of the values.

Mild Outliers

o Values which lie between 1.5 and 3.0 times the InterQuartile Range below the 1st Quartile or above the 3rd Quartile. Note, some texts use hinges instead of Quartiles.

Extreme Outliers

Values which lie more than 3.0 times the InterQuartile Range below the 1st Quartile or above the 3rd Quartile

Course dashboard

Click here to access Unit Three Content..

Topic 4: Measures of position (Quartiles)

Quantiles

The quartiles divide the data into 4 equal regions. Instead of dividing by 100 in step 2, divide by 4.

The lower quartile is the median of the lower half of the data up to and including the median. The upper quartile is the median of the upper half of the data up to and including the median.

The statement about the lower half or upper half including the median tends to be confusing to some students. If the median is split between two values (which happens whenever the sample size is even), the median isn't included in either since the median isn't actually part of the data.

Note: The 2^nd quartile is the same as the median. The 1^st quartile is the 25^th percentile, the 3^rd quartile is the 75^th percentile.

The quartiles are commonly used (much more so than the percentiles or deciles).

Course dashboard

Click here to access Unit Three Content..

Topic 4: Measures of position (Quantile,s worked example)

Example 1: sample size of 20

The median will be in position 10.5. The lower half is positions 1 - 10 and the upper half is positions 11 - 20. The lower quartile is the median of the lower half and would be in position 5.5. The upper quartile is the median of the upper half and would be in position 5.5 starting with original position 11 as position 1 -- this is the original position 15.5.

Example 2: sample size of 21

The median is in position 11. The lower half is positions 1 - 11 and the upper half is positions 11 - 21. The lower quartile is the median of the lower half and would be in position 6. The upper quartile is the median of the upper half and would be in position 6 when starting at position 11 -- this is original position 16.

Course dashboard

Click here to access Unit Three Content..

Topic 4: Measures of position (inter Quartile range)

Interquartile Range (IQR)

The interquartile range is the difference between the third and first quartiles. That's it: Q3 - Q1

The interquartile range is a measure of spread used most commonly with the median. It represents the central portion of the distribution, from the 25th percentile to the 75th percentile. In other words, the interquartile range includes the second and third quartiles of a distribution. The interquartile range thus includes approximately one half of the observations in the set, leaving one quarter of the observations on each side.

Method for determining the interquartile range

Step 1. Arrange the observations in increasing order.

Step 2. Find the position of the 1st and 3rd quartiles with the following formulas. Divide the sum by the number of observations.

Position of 1st quartile (Q1) = 25th percentile = (n + 1) / 4

Position of 3rd quartile (Q3) = 75th percentile = 3(n + 1) / 4 = 3 x Q1

Step 3. Identify the value of the 1st and 3rd quartiles. a. If a quartile lies on an observation (i.e., if its position is a whole number) , the value of the quartile is the value of that observation.

If a quartile lies between observations, the value of the quartile is the value of the lower observation plus the specified fraction of the difference between

the observations.

Step 4. Epidemiologically, report the values at Q1 and Q3.

Statistically, calculate the interquartile range as Q3

minus Q1.

Course dashboard

Click here to access Unit Three Content..

Topic 4: Measures of position (inter Quartile range properties)

Properties and uses of the interquartile range

• The interquartile range is generally used in conjunction with the median. Together, they are useful for characterizing the central location and spread of any frequency distribution, but

particularly those that are skewed.

• For a more complete characterization of a frequency distribution, the 1st and 3rd quartiles are sometimes used with the minimum value, the median, and the maximum value to produce a five-number summary of the distribution

Course dashboard

Click here to access Unit Three Content..

Topic 4: Skewness

Skewness

if extremely low or extremely high observations are present in a distribution, then the mean tends to shift towards those

scores.

Based on the type of skewness, distributions can be:

a) Negatively skewed distribution: occurs when majority of

scores are at the right end of the curve and a few small scores are

scattered at the left end.

b) Positively skewed distribution: Occurs when the majority of

scores are at the left end of the curve and a few extreme large scores

are scattered at the right end.

c) Symmetrical distribution: It is neither positively nor negatively

skewed. A curve is symmetrical if one half of the curve is the mirror image of the other half.

In unimodal ( one-peak) symmetrical distributions, the mean, median and mode are identical.

On the other hand, in unimodal skewed distributions, it is important to remember that the mean, median and

mode occur in alphabetical order when the longer tail is at the left of the distribution or in reverse alphabetical order when the longer tail is at the right of the distribution.

Course dashboard

Click here to access Unit Three Content..

Topic 4: skewness

Choosing the Right Measure of Central Location and Spread

Measures of central location and spread are useful for summarizing a distribution of data. They also facilitate the comparison of two or more sets of data. However, not every measure of central location and spread is well suited to every set of data. For example, because the normal distribution (or bell-shaped curve) is perfectly symmetrical, the mean, median, and mode all have the same value. In practice, however, observed data rarely approach this ideal shape. As a result, the mean, median, and mode usually differ.

Course dashboard

Click here to access Unit Three Content..

Topic 4: Effect of Skewness on Mean, Median, and Mode

Effect of Skewness on Mean, Median, and Mode

How, then, do you choose the most appropriate measures? A partial answer to this question is to select the measure of central location on the basis of how the data are distributed, and then use the corresponding measure of spread.

In statistics, the arithmetic mean is the most commonly used measure of central location, and is the measure upon which the majority of statistical tests and analytic techniques are based.

The standard deviation is the measure of spread most commonly used with the mean. But as noted previously, one disadvantage of the mean is that it is affected by the presence of one or a few observations with extremely high or low values. The mean is “pulled” in the direction of the extreme values. You can tell the direction in which the data are skewed by comparing the values of the mean and the median; the mean is pulled away from the median in the direction of the extreme values. If the mean is higher than the median, the distribution of data is skewed to the right. If the mean is lower than the median, as in the right side the distribution is skewed to the left.

The advantage of the median is that it is not affected by a few extremely high or low observations. Therefore, when a set of data is skewed, the median is more representative of the data than is the mean. For descriptive purposes, and to avoid making any assumption that the data are normally distributed, present the median for incubation periods, duration of illness, and age of the study subjects.

Two measures of spread can be used in conjunction with the median: the range and the interquartile range.

The mode is the least useful measure of central location. Some sets of data have no mode; others have more than one. The most common value may not be anywhere near the center of the

distribution. Modes generally cannot be used in more elaborate statistical calculations. Nonetheless, even the mode can be helpful when one is interested in the most common value or most popular choice.

The geometric mean is used for exponential or logarithmic data such as laboratory titers, and for environmental sampling data whose values can span several orders of magnitude.

Course dashboard

Click here to access Unit Three Content..

Topic 4: outliers

i) Outliers

Outliers are extreme values. There are mild outliers and extreme outliers.

Extreme Outliers

Extreme outliers are any data values which lie more than 3.0 times the interquartile range below the first quartile or above the third quartile. x is an extreme outlier if ...

x < Q1 - 3 * IQR or x > Q3 + 3 * IQR

Mild Outliers

Mild outliers are any data values which lie between 1.5 times and 3.0 times the interquartile range below the first quartile or above the third quartile. x is a mild outlier if ...

Q1 - 3 * IQR <= x < Q1 - 1.5 * IQR or Q1 + 1.5 * IQR < x <= Q3 + 3 * IQR

Course dashboard

Click here to access Unit Three Content..

Topic 4: Standard Scores (z-scores)

Standard Scores (z-scores)

The standard score is obtained by subtracting the mean and dividing the difference by the standard deviation. The symbol is z, which is why it's also called a z-score.

The mean of the standard scores is zero and the standard deviation is 1. This is the nice feature of the standard score -- no matter what the original scale was, when the data is converted to its standard score, the mean is zero and the standard deviation is 1.

Course dashboard

Click here to access Unit Three Content..

Topic 4: Measures of position (percentiles)

Percentiles

Percentiles divide the data in a distribution into 100 equal parts.

The P^th percentile (P ranging from 0 to 100) is the value that has P percent of the observations falling at or below it. In other words, the 90^th percentile has 90% of the observations at or below it. The median, the halfway point of the distribution, is the 50th percentile.

The maximum value is the 100^th percentile, because all values fall at or below the maximum.

The K^th percentile is the number which has k% of the values below it. The data must be ranked.

Rank the data
Find k% (k /100) of the sample size, n.
If this is an integer, add 0.5. If it isn't an integer round up.
Find the number in this position. If your depth ends in 0.5, then take the midpoint between the two numbers.

It is sometimes easier to count from the high end rather than counting from the low end. For example, the 80^th percentile is the number which has 80% below it and 20% above it. Rather than counting 80% from the bottom, count 20% from the top.

Course dashboard

Click here to access Unit Three Content..

Topic 4: Measures of position (percentiles)

Percentiles

Percentiles divide the data in a distribution into 100 equal parts.

The P^th percentile (P ranging from 0 to 100) is the value that has P percent of the observations falling at or below it. In other words, the 90^th percentile has 90% of the observations at or below it. The median, the halfway point of the distribution, is the 50th percentile.

The maximum value is the 100^th percentile, because all values fall at or below the maximum.

The K^th percentile is the number which has k% of the values below it. The data must be ranked.

Rank the data
Find k% (k /100) of the sample size, n.
If this is an integer, add 0.5. If it isn't an integer round up.
Find the number in this position. If your depth ends in 0.5, then take the midpoint between the two numbers.

It is sometimes easier to count from the high end rather than counting from the low end. For example, the 80^th percentile is the number which has 80% below it and 20% above it. Rather than counting 80% from the bottom, count 20% from the top.

Course dashboard

Click here to access Unit Three Content..

Topic 4: Summary

in summary

topic four has highlighted the measures of position as well as the effect of skewness and outliers on various statistical measures

Course dashboard

Click here to access Unit Three Content..

Topic 4: Further Reading

Reference Material

1. Clarke, G. M. and Cooke, D. (2004). A Basic Program in Statistics. 5th edition. West Sussex: John Wiley & Sons

3) Freund, J. E. (2001). Modern Elementary Statistics. Upper Saddle River, NJ: Prentice-Hall

4) Johnson, R. A. and Bhattacharyga, G. K. (1992). Statistics: Principles and Methods, 2nd edition. New York: John Wiley & Sons

5) Moore, D. S., Craig, B. and McCabe, G. P. (2007). Introduction to the Practice of Statistics, 6th edition. New York: W. H. Freeman

Further Reading Material

For more details on measures of position for ungrouped data, read

For more details on measures of position for grouped data, read

Course dashboard

Click here to access Unit Four Content..

Topic 1: Statistical data (Objectives)

Objectives

At the end of this section, learners should be able to:

Discuss procedures in managing data.
Explain steps that need to be followed to carry out an effective analysis of quantitative and qualitative data

Course dashboard

Click here to access Unit Four Content..

Topic 1: statistical data (definition of terms)

Definitions of terms

Statistic

o Characteristic or measure obtained from a sample

Parameter

o Characteristic or measure obtained from a population

Empirical or Normal Rule

o Only valid when a distribution in bell-shaped (normal). Approximately 68% lies within 1 standard deviation of the mean; 95% within 2 standard deviations; and 99.7% within 3 standard deviations of the mean.

Standard Score or Z-Score

o The value obtained by subtracting the mean and dividing by the standard deviation. When all values are transformed to their standard scores, the new mean (for Z) will be zero and the standard deviation will be one.

Course dashboard

Click here to access Unit Four Content..

Topic 1: Parameters & Statistics

Parameters & Statistics
Usually the features of the population under investigation can be summarized by numerical parameters. Hence the research problem usually becomes as on investigation of the values of parameters. These population parameters are unknown and sample statistics are used to make inference about them. That is, a statistic describes a characteristic of the sample which can then be used to make inference about unknown parameters.

Definition (Parameters and Statistics). A parameter is an unknown numerical summary of the population. A statistic is a known numerical summary of the sample which can be used to make inference about parameters. (Agresti & Finlay, 1997)

So the inference about some specific unknown parameter is based on a statistic. We use known sample statistics in making inferences about unknown population parameters. The primary focus of most research studies is the parameters of the population, not statistics calculated for the particular sample selected. The sample and statistics describing it are important only insofar as they provide information about the unknown parameters.

Example (Parameters and Statistics). Consider the research problem of finding out what percentage of under-five year-olds presenting in the MCH clinic with malnutrition at least once a month.

• Parameter: The proportion p of under-five year-olds presenting in the MCH clinic with malnutrition at least once a month

• Statistic: The proportion pˆ of under-five year-olds presenting in the MCH clinic with malnutrition at least once a month calculated from the sample of under-five year-olds.

Course dashboard

Click here to access Unit Four Content..

Topic 1: Types of data

Types of data

Primary vs secondary

Data can be categorized as primary or secondary and or quantitative or qualitative.

Primary data is that has not been documented and is collected through survey or experimentation. It is first hand information. Secondary data is data that exist in form of published or unpublished documentation

Numerical VS categorical ?????

Course dashboard

Click here to access Unit Four Content..

Topic 1: Variables

Variables can be classified into one of four types, depending on the type of scale

used to characterize their values

• A nominal-scale variable is one whose values are categories without any numerical ranking, such as county of residence. In

epidemiology, nominal variables with only two categories are very common: alive or dead, ill or well, vaccinated or unvaccinated, or did or did not eat the potato salad. A nominal variable with two mutually exclusive categories is sometimes called a dichotomous variable.

• An ordinal-scale variable has values that can be ranked but are not necessarily evenly spaced, such as stage of cancer

• An interval-scale variable is measured on a scale of equally spaced units, but without a true zero point, such as date of

birth.

• A ratio-scale variable is an interval variable with a true zero point, such as height in centimeters or duration of illness.

Nominal- and ordinal-scale variables are considered qualitative or categorical variables, whereas interval- and ratio-scale variables are considered quantitative or continuous variables. Sometimes the same variable can be measured using both a nominal scale and a ratio scale. For example, the tuberculin skin tests of a group of persons potentially exposed to a co-worker with tuberculosis can be measured as “positive” or “negative” (nominal scale) or in millimeters of induration (ratio scale).

. Quantitative data is information that is expressed in numerical forms that can be measured with standard scales, while qualitative data include description or measurements with non standard scales. It describes the quality of a phenomenon with terms such as yes or no, bad or good etc.

Course dashboard

Click here to access Unit Four Content..

Topic 2: Vital statistics

Natality (Birth) Measures

Natality measures are population-based measures of birth. These measures are used primarily by persons working in the field of maternal and child health.

Morbidity Frequency Measures

Morbidity has been defined as any departure, subjective or objective, from a state of physiological or psychological wellbeing.

In practice, morbidity encompasses disease, injury, and disability.

In addition, although for this lesson the term refers to the number of persons who are ill, it can also be used to describe

the periods of illness that these persons experienced, or the duration of these illnesses.

Measures of morbidity frequency characterize the number of persons in a population who become ill (incidence) or are ill at a

given time (prevalence).

Incidence refers to the occurrence of new cases of disease or injury in a population over a specified period of time

Two types of incidence are commonly used — incidence proportion and incidence rate.

Incidence proportion or risk

Definition of incidence proportion

Incidence proportion is the proportion of an initially disease-free population that develops disease, becomes injured, or dies during a specified (usually limited) period of time.

Synonyms include attack rate, risk, probability of getting disease, and cumulative incidence.

Course dashboard

Click here to access Unit Four Content..

Topic 2: Vital statistics (incidence proportions)

Properties and uses of incidence proportions

• Incidence proportion is a measure of the risk of disease or the

probability of developing the disease during the specified

period. As a measure of incidence, it includes only new cases

of disease in the numerator. The denominator is the number of

persons in the population at the start of the observation period.

Because all of the persons with new cases of disease

(numerator) are also represented in the denominator, a risk is

also a proportion.

• In the outbreak setting, the term attack rate is often used as a

synonym for risk. It is the risk of getting the disease during a

specified period, such as the duration of an outbreak. A variety

of attack rates can be calculated.

Overall attack rate is the total number of new cases

divided by the total population.

A food-specific attack rate is the number of persons who

ate a specified food and became ill divided by the total

number of persons who ate that food, as illustrated in the

previous potato salad example.

A secondary attack rate is sometimes calculated to

document the difference between community transmission

of illness versus transmission of illness in a household,

barracks, or other closed population.

Course dashboard

Click here to access Unit Four Content..

Topic 2: vital statistics (Incidence rate )

Incidence rate or person-time rate

Definition of incidence rate

Incidence rate or person-time rate is a measure of incidence that

incorporates time directly into the denominator. A person-time rate

is generally calculated from a long-term cohort follow-up study,

wherein enrollees are followed over time and the occurrence of

new cases of disease is documented. Typically, each person is

observed from an established starting time until one of four “end

points” is reached: onset of disease, death, migration out of the

study (“lost to follow-up”), or the end of the study. Similar to the

incidence proportion, the numerator of the incidence rate is the

number of new cases identified during the period of observation.

However, the denominator differs. The denominator is the sum of

the time each person was observed, totaled for all persons. This

denominator represents the total time the population was at risk of

and being watched for disease. Thus, the incidence rate is the ratio

of the number of cases to the total time the population is at risk of

disease.

Course dashboard

Click here to access Unit Four Content..

Topic 2: vital statistics (Properties and uses of incidence rates)

Properties and uses of incidence rates

• An incidence rate describes how quickly disease occurs in a

population. It is based on person-time, so it has some

advantages over an incidence proportion. Because person-time

is calculated for each subject, it can accommodate persons

coming into and leaving the study.

Course dashboard

Click here to access Unit Four Content..

Topic 2: vital statistics (prevalence)

Prevalence

Definition of prevalence

Prevalence, sometimes referred to as prevalence rate, is the

proportion of persons in a population who have a particular disease

or attribute at a specified point in time or over a specified period of

time. Prevalence differs from incidence in that prevalence includes

all cases, both new and preexisting, in the population at the

specified time, whereas incidence is limited to new cases only.

Point prevalence refers to the prevalence measured at a particular

point in time. It is the proportion of persons with a particular

disease or attribute on a particular date.

Period prevalence refers to prevalence measured over an interval

of time. It is the proportion of persons with a particular disease or

attribute at any time during the interval.

Course dashboard

Click here to access Unit Four Content..

Topic 2: vital statistics (Properties and uses of prevalence)

Properties and uses of prevalence

• Prevalence and incidence are frequently confused. Prevalence

refers to proportion of persons who have a condition at or

during a particular time period, whereas incidence refers to the

proportion or rate of persons who develop a condition during a

particular time period. So prevalence and incidence are similar,

but prevalence includes new and pre-existing cases whereas

incidence includes new cases only. The key difference is in

their numerators.

Numerator of incidence = new cases that occurred during

a given time period

Numerator of prevalence = all cases present during a given

time period

• The numerator of an incidence proportion or rate consists only

of persons whose illness began during the specified interval.

The numerator for prevalence includes all persons ill from a

specified cause during the specified interval regardless of

when the illness began. It includes not only new cases, but

also preexisting cases representing persons who remained ill

during some portion of the specified interval.

• Prevalence is based on both incidence and duration of illness.

High prevalence of a disease within a population might reflect

high incidence or prolonged survival without cure or both.

Conversely, low prevalence might indicate low incidence, a

rapidly fatal process, or rapid recovery.

• Prevalence rather than incidence is often measured for chronic

diseases such as diabetes or osteoarthritis which have long

duration and dates of onset that are difficult to pinpoint.

Course dashboard

Click here to access Unit Four Content..

Topic 2: vital statistics (Mortality measures)

Mortality measures

A mortality rate is a measure of the frequency of occurrence of

death in a defined population during a specified interval. Morbidity

and mortality measures are often the same mathematically; it’s just

a matter of what you choose to measure, illness or death.

When mortality rates are based on vital statistics (e.g., counts of

death certificates), the denominator most commonly used is the

size of the population at the middle of the time period.

Crude mortality rate (crude death rate)

The crude mortality rate is the mortality rate from all causes of

death for a population.

Cause-specific mortality rate

The cause-specific mortality rate is the mortality rate from a

specified cause for a population. The numerator is the number of

deaths attributed to a specific cause. The denominator remains the

size of the population at the midpoint of the time period. The

fraction is usually expressed per 100,000 population.

Age-specific mortality rate

An age-specific mortality rate is a mortality rate limited to a

particular age group. The numerator is the number of deaths in that

age group; the denominator is the number of persons in that age

group in the population.

Some specific types of age-specific mortality rates are neonatal,

postneonatal, and infant mortality rates.

Course dashboard

Click here to access Unit Four Content..

Topic 2: vital statistics (mortality measures cntd)

Infant mortality rate

The infant mortality rate is perhaps the most commonly used

measure for comparing health status among nations.

The infant mortality rate is generally calculated on an annual basis.

It is a widely used measure of health status because it reflects the

health of the mother and infant during pregnancy and the year

thereafter. The health of the mother and infant, in turn, reflects a

wide variety of factors, including access to prenatal care,

prevalence of prenatal maternal health behaviors (such as alcohol

or tobacco use and proper nutrition during pregnancy, etc.),

postnatal care and behaviors (including childhood immunizations

and proper nutrition), sanitation, and infection control.

Is the infant mortality rate a ratio? Yes. Is it a proportion? No,

because some of the deaths in the numerator were among children

born the previous year.

Neonatal mortality rate

The neonatal period covers birth up to but not including 28 days.

The numerator of the neonatal mortality rate therefore is the

number of deaths among children under 28 days of age during a

given time period. The denominator of the neonatal mortality rate,

like that of the infant mortality rate, is the number of live births

reported during the same time period. The neonatal mortality rate

is usually expressed per 1,000 live births

Course dashboard

Click here to access Unit Four Content..

Topic 2: vital statistics (mortality measures cntd)

Postneonatal mortality rate

The postneonatal period is defined as the period from 28 days of

age up to but not including 1 year of age. The numerator of the

postneonatal mortality rate therefore is the number of deaths

among children from 28 days up to but not including 1 year of age

during a given time period. The denominator is the number of live

births reported during the same time period. The postneonatal

mortality rate is usually expressed per 1,000 live births.

Maternal mortality rate

The maternal mortality rate is really a ratio used to measure

mortality associated with pregnancy. The numerator is the number

of deaths during a given time period among women while pregnant

or within 42 days of termination of pregnancy, irrespective of the

duration and the site of the pregnancy, from any cause related to or

aggravated by the pregnancy or its management, but not from

accidental or incidental causes. The denominator is the number of

live births reported during the same time period. Maternal

mortality rate is usually expressed per 100,000 live births

Course dashboard

Click here to access Unit Four Content..

Topic 2: vital statistics (mortality measures cntd)

Sex-specific mortality rate

A sex-specific mortality rate is a mortality rate among either males or females. Both numerator and denominator are limited to the one sex.

Race-specific mortality rate

A race-specific mortality rate is a mortality rate related to a specified racial group. Both numerator and denominator are

limited to the specified race.

Combinations of specific mortality rates

Mortality rates can be further stratified by combinations of cause, age, sex, and/or race. These rates are a cause-,age-, and sex-specific rates, because they refer to one cause (diseases of the heart), one age group eg. (45–54 years), and one sex (female or male).

Age-adjusted mortality rates

Mortality rates can be used to compare the rates in one area with the rates in another area, or to compare rates over time. However, because mortality rates obviously increase with age, a higher mortality rate among one population than among another might simply reflect the fact that the first population is older than the second.

To eliminate the distortion caused by different underlying age distributions in different populations, statistical techniques are used to adjust or standardize the rates among the populations to be compared. These techniques take a weighted average of the age specific mortality rates, and eliminate the effect of different age distributions among the different populations. Mortality rates computed with these techniques are age-adjusted or age-standardized mortality rates.

Course dashboard

Click here to access Unit Four Content..

Topic 2: vital statistics (mortality measures cntd)

Death-to-case ratio

Definition of death-to-case ratio

The death-to-case ratio is the number of deaths attributed to a

particular disease during a specified time period divided by the

number of new cases of that disease identified during the same

time period. The death-to-case ratio is a ratio but not necessarily a

proportion, because some of the deaths that are counted in the

numerator might have occurred among persons who developed

disease in an earlier period, and are therefore not counted in the

denominator.

Case-fatality rate

The case-fatality rate is the proportion of persons with a particular

condition (cases) who die from that condition. It is a measure of

the severity of the condition.

The case-fatality rate is a proportion, so the numerator is restricted

to deaths among people included in the denominator.

Proportionate mortality

Definition of proportionate mortality

Proportionate mortality describes the proportion of deaths in a

specified population over a period of time attributable to different

causes. Each cause is expressed as a percentage of all deaths, and

the sum of the causes must add to 100%. These proportions are not

mortality rates, because the denominator is all deaths rather than

the population in which the deaths occurred.

Course dashboard

Click here to access Unit Four Content..

Topic 3: Data analysis

Method of data analysis

The common methods of data analysis are:

1. Descriptive statistics – is concerned with portraying accurately the characteristics of a sample. It includes frequency distribution, grouped frequencies, measures of central tendency (mean, median, mode), measures of variability (range, inter quartile range, standard deviation), measures of association or relationships (correlation and regression) and measures of relative position (percentile rank)

2. Inferential statistics – is concerned with what data obtained from a sample can tell about the population. It is used to make generalizations about the population from data obtained from the sample. It is thus used to test the null hypothesis. It includes T-test for independent group, t-test for correlate groups, ANOVA, Chi-square and Pearson moment correlation.

Course dashboard

Click here to access Unit Four Content..

Topic 3: data analysis

data analysis is defined as a process of cleaning, transforming, and modeling data to discover useful information for business decision-making. The purpose of Data Analysis is to extract useful information from data and taking the decision based upon the data analysis.

Whenever we take any decision in our day-to-day life is by thinking about what happened last time or what will happen by choosing that particular decision. This is nothing but analyzing our past or future and making decisions based on it. For that, we gather memories of our past or dreams of our future. So that is nothing but data analysis. Now same thing analyst does for business purposes, is called Data Analysis.

Course dashboard

Click here to access Unit Four Content..

Topic 3: Types of Data Analysis: Techniques and Methods

procurement

Types of Data Analysis: Techniques and Methods

There are several types of data analysis techniques that exist based on business and technology. The major types of data analysis are:

Text Analysis
Statistical Analysis
Diagnostic Analysis
Predictive Analysis
Prescriptive Analysis

Course dashboard

Click here to access Unit Four Content..

Topic 3: Types of Data Analysis: Text Analysis

procurement

Text Analysis

Text Analysis is also referred to as Data Mining. It is a method to discover a pattern in large data sets using databases or data mining tools. It used to transform raw data into business information. Business Intelligence tools are present in the market which is used to take strategic business decisions. Overall it offers a way to extract and examine data and deriving patterns and finally interpretation of the data.

Course dashboard

Click here to access Unit Four Content..

Topic 3: Types of Data Analysis: Statistical Analysis

Statistical Analysis

Statistical Analysis shows "What happen?" by using past data in the form of dashboards. Statistical Analysis includes collection, Analysis, interpretation, presentation, and modeling of data. It analyses a set of data or a sample of data. There are two categories of this type of Analysis - Descriptive Analysis and Inferential Analysis.

Descriptive Analysis

analyses complete data or a sample of summarized numerical data. It shows mean and deviation for continuous data whereas percentage and frequency for categorical data.

Inferential Analysis

analyses sample from complete data. In this type of Analysis, you can find different conclusions from the same data by selecting different samples.

Course dashboard

Click here to access Unit Four Content..

Topic 3: Types of Data Analysis: Diagnostic Analysis

Diagnostic Analysis

Diagnostic Analysis shows "Why did it happen?" by finding the cause from the insight found in Statistical Analysis. This Analysis is useful to identify behavior patterns of data. If a new problem arrives in your business process, then you can look into this Analysis to find similar patterns of that problem. And it may have chances to use similar prescriptions for the new problems.

Course dashboard

Click here to access Unit Four Content..

Topic 3: Types of Data Analysis: Predictive Analysis

Predictive Analysis

Predictive Analysis shows "what is likely to happen" by using previous data. The simplest example is like if last year I bought two dresses based on my savings and if this year my salary is increasing double then I can buy four dresses. But of course it's not easy like this because you have to think about other circumstances like chances of prices of clothes is increased this year or maybe instead of dresses you want to buy a new bike, or you need to buy a house!

So here, this Analysis makes predictions about future outcomes based on current or past data. Forecasting is just an estimate. Its accuracy is based on how much detailed information you have and how much you dig in it.

Course dashboard

Click here to access Unit Four Content..

Topic 3: Types of Data Analysis: Prescriptive Analysis

Prescriptive Analysis

Prescriptive Analysis combines the insight from all previous Analysis to determine which action to take in a current problem or decision. Most data-driven companies are utilizing Prescriptive Analysis because predictive and descriptive Analysis are not enough to improve data performance. Based on current situations and problems, they analyze the data and make decisions.

Course dashboard

Click here to access Unit Four Content..

Topic 3: Further Reading

Reference Material

1. Barton E. et al (1980) On Being InCharge. Geneva: World Health Organization.

2. Basavanthappa, B. P. (2000). Nursing Administration. New Delhi, Jaypee Brothers Medical Publishers Ltd.

3. Bennet, R. (2004) Management. Essex: Pearson Education Ltd.

4. Cole, G. A. (1996) Management - Theory and Practice. London: Martins the Printers.

5. Tappen, R. M. (2001) Nursing Leadership and Management 4th F.A Davis, Philadelphia.

Further Reading Material

Congratulations - end of lesson reached, go to the top

Academic Courses

FACULTY OF clinical medicin

Click here to access Unit one Content..

Topic One: objectives

Topic Objectives

Click here to access Unit one Content..

Topic 1: definitions

Click here to access Unit one Content..

Topic one : definitions (cntd)

Click here to access Unit one Content..

Topic one : definitions (cntd)

Click here to access Unit Two Content..

Topic 1: INTRODUCTION TO SCALES/LEVELS OF MEASUREMENT

Click here to access Unit Two Content..

Topic 1: Definition of terms

Click here to access Unit Two Content..

Topic 1: Definition of terms (cntd)

Click here to access Unit Two Content..

Topic 1: nominal scale

Click here to access Unit Two Content..

Topic 1: Ordinal Data/scale

Click here to access Unit Two Content..

Topic 1: Interval Data/scale

Click here to access Unit Two Content..

Topic 1: Ratio Data/scale

Click here to access Unit Two Content..

Topic 1: Numerical discrete vs Numerical continuous

Click here to access Unit Two Content..

Topic One: Summary

Click here to access Unit Two Content..

Topic One: Further Reading

Click here to access Unit Three Content..

Topic 1: Measures of central tendency (Definition of terms)

INTRODUCTION

Click here to access Unit Three Content..

Topic 1: measures of central tendency (mode)

Click here to access Unit Three Content..

Topic 1: measures of central tendency (mode)

Click here to access Unit Three Content..

Topic 1: measures of central tendency (mode)

Click here to access Unit Three Content..

Topic 1: measures of central tendency (mode)

Click here to access Unit Three Content..

Topic 1: measures of central tendency ( Properties and uses of the mode)

Click here to access Unit Three Content..

Topic 1: MEASURES OF CENTRAL TENDENCY (The Arithmetic Mean or simple Mean)

Click here to access Unit Three Content..

Topic 1: MEASURES OF CENTRAL TENDENCY (The population and sample Mean)

Click here to access Unit Three Content..

Topic 1: MEASURES OF CENTRAL TENDENCY (Geometric mean)

Click here to access Unit Three Content..

Topic 1: MEASURES OF CENTRAL TENDENCY (Geometric mean)

Click here to access Unit Three Content..

Topic 1: MEASURES OF CENTRAL TENDENCY (Properties and uses of the geometric mean)

Click here to access Unit Three Content..

Topic 1: measures of central tendency (median)

Click here to access Unit Three Content..

Topic 1: measures of central tendency ( Properties and uses of the median)

Click here to access Unit Three Content..

Topic 1: Further Reading

Click here to access Unit Three Content..

Topic 2: Measures of central tendency for grouped data (objectives)

Click here to access Unit Three Content..

Topic 2: Measures of central tendency for grouped data (definitions cntd)

Click here to access Unit Three Content..

Topic 2: Mean of grouped data

Click here to access Unit Three Content..

Topic 2: Mean of grouped data (cntd)

Click here to access Unit Three Content..

Topic 2: Median of grouped data

Click here to access Unit Three Content..

Topic 2: Mode of grouped data

Click here to access Unit Three Content..

Topic 2: Mean, median and mode of grouped data video

Click here to access Unit Three Content..

Topic 2: Summary

in summary

Click here to access Unit Three Content..

Topic 2: Further Reading

Click here to access Unit Three Content..