MEV 019: Unit 08 - Descriptive Statistics-II
UNIT 8: DESCRIPTIVE STATISTICS – II
8.1
Introduction
While Unit 7 focused on summarizing data through
measures of central tendency and dispersion, this unit moves toward
understanding relationships between variables.
In environmental science, for example, we might study whether air temperature
is related to humidity, or whether pollution levels are linked to respiratory
diseases.
Two main statistical tools help us in this regard:
- Correlation –
Measures the strength and direction of association between two variables.
- Regression –
Provides a mathematical equation to predict one variable based on another.
8.2
Objectives
After studying this unit, you will be able to:
- Explain
the meaning and types of correlation.
- Draw
and interpret a scatter diagram.
- Calculate
Karl Pearson’s correlation coefficient.
- Compute
Spearman’s rank correlation coefficient.
- Understand
regression and regression lines.
- Calculate
regression equations for predicting values.
- Explain
the properties of correlation and regression coefficients.
8.3
Correlation Analysis
Correlation indicates the degree to which two
variables move together.
8.3.1 Types of Correlation
- Positive
Correlation – Both variables move in the same direction.
Example: Temperature and ice cream sales. - Negative
Correlation – Variables move in opposite directions.
Example: Rainfall and outdoor cricket matches played. - Zero
Correlation – No relationship between the variables.
- Perfect
Correlation – Correlation coefficient is exactly +1 or
–1.
8.3.2 Measures of Correlation
- Scatter
Diagram – A graphical method.
- Karl
Pearson’s Correlation Coefficient – Measures linear correlation numerically.
- Spearman’s
Rank Correlation – Based on ranks rather than actual values.
8.4 Scatter
Diagram
A scatter diagram is a plot of paired values (X, Y)
on a Cartesian plane.
- Upward
slope: Positive correlation
- Downward
slope: Negative correlation
- No
clear slope: Zero correlation
Example: Plotting PM₂.₅ concentration against daily
hospital visits.
8.5 Karl
Pearson’s Correlation Coefficient
Karl Pearson’s method gives a numerical value (r)
ranging between –1 and +1.
Formula:
r=∑(X−Xˉ)(Y−Yˉ)∑(X−Xˉ)2⋅∑(Y−Yˉ)2r =
\frac{\sum (X - \bar{X})(Y - \bar{Y})}{\sqrt{\sum (X - \bar{X})^2 \cdot \sum (Y
- \bar{Y})^2}}r=∑(X−Xˉ)2⋅∑(Y−Yˉ)2∑(X−Xˉ)(Y−Yˉ)
8.5.1 Properties of Correlation Coefficient
- Range: –1 ≤
r ≤ +1
- Sign:
Indicates direction (positive or negative)
- Magnitude:
Indicates strength
- Unit-free:
Independent of measurement units
- Symmetry:
rXY=rYXr_{XY} = r_{YX}rXY=rYX
8.5.2 Calculation of Correlation Coefficient
Example:
X |
Y |
10 |
20 |
20 |
25 |
30 |
28 |
40 |
35 |
50 |
40 |
Step 1: Find deviations from mean.
Step 2: Apply the formula.
Step 3: Interpret rrr.
8.6
Spearman’s Rank Correlation Coefficient
Spearman’s method is useful for qualitative data or
when the data is in ranks.
ρ=1−6∑d2n(n2−1)\rho = 1 - \frac{6 \sum d^2}{n(n^2 - 1)}ρ=1−n(n2−1)6∑d2
Where:
- ddd =
difference between paired ranks
- nnn =
number of observations
8.6.1 Method of Calculation of Rank Correlation
- Rank
the data for X and Y separately.
- Compute
d=Rank(X)−Rank(Y)d = \text{Rank}(X) - \text{Rank}(Y)d=Rank(X)−Rank(Y).
- Square
each ddd and sum them.
- Apply
the formula for ρ\rhoρ.
8.7 Concept
of Regression
Regression estimates the value of one variable
based on the value of another.
- Dependent
Variable (Y) – The variable to be predicted.
- Independent
Variable (X) – The variable used for prediction.
8.8 Lines of
Regression
Two lines of regression exist:
- Regression
of Y on X: Predicts Y from X.
Y=a+bXY = a + bXY=a+bX
- Regression
of X on Y: Predicts X from Y.
X=a′+b′YX = a' + b'YX=a′+b′Y
8.8.1 Single Variable Linear Regression Lines
Used when only one independent variable is present.
8.8.2 Calculation of Regression Lines
Formulas:
bYX=∑(X−Xˉ)(Y−Yˉ)∑(X−Xˉ)2b_{YX} = \frac{\sum (X - \bar{X})(Y -
\bar{Y})}{\sum (X - \bar{X})^2}bYX=∑(X−Xˉ)2∑(X−Xˉ)(Y−Yˉ) a=Yˉ−bYXXˉa =
\bar{Y} - b_{YX} \bar{X}a=Yˉ−bYXXˉ
Similarly for bXYb_{XY}bXY and a′a'a′.
8.9
Regression Coefficients
The regression coefficients (bYXb_{YX}bYX and
bXYb_{XY}bXY) measure the slope of the regression lines.
8.9.1 Properties of Regression Coefficients
- Both
have the same sign as the correlation coefficient.
- Product
relation: bYX⋅bXY=r2b_{YX}
\cdot b_{XY} = r^2bYX⋅bXY=r2.
- If r =
0, both coefficients are zero.
- Sensitive
to extreme values.
8.10 Let Us
Sum Up
- Correlation
measures the degree and direction of association between two variables.
- Karl
Pearson’s coefficient is used for quantitative linear
relationships; Spearman’s is used for rank data.
- Regression helps
in prediction using a mathematical equation.
- Regression
coefficients indicate the slope and are related to the correlation
coefficient.
8.11 Key
Words
- Correlation –
Degree of association between two variables.
- Scatter
Diagram – Graphical representation of paired data.
- Karl
Pearson’s r – Numerical measure of linear correlation.
- Spearman’s
ρ – Rank-based correlation measure.
- Regression
Line – Best-fit line used for prediction.
- Regression
Coefficient – Slope indicating rate of change.
Comments
Post a Comment