MEV 019: Unit 08 - Descriptive Statistics-II

 UNIT 8: DESCRIPTIVE STATISTICS – II


8.1 Introduction

While Unit 7 focused on summarizing data through measures of central tendency and dispersion, this unit moves toward understanding relationships between variables.
In environmental science, for example, we might study whether air temperature is related to humidity, or whether pollution levels are linked to respiratory diseases.

Two main statistical tools help us in this regard:

  1. Correlation – Measures the strength and direction of association between two variables.
  2. Regression – Provides a mathematical equation to predict one variable based on another.

8.2 Objectives

After studying this unit, you will be able to:

  • Explain the meaning and types of correlation.
  • Draw and interpret a scatter diagram.
  • Calculate Karl Pearson’s correlation coefficient.
  • Compute Spearman’s rank correlation coefficient.
  • Understand regression and regression lines.
  • Calculate regression equations for predicting values.
  • Explain the properties of correlation and regression coefficients.

8.3 Correlation Analysis

Correlation indicates the degree to which two variables move together.

8.3.1 Types of Correlation

  1. Positive Correlation – Both variables move in the same direction.
    Example: Temperature and ice cream sales.
  2. Negative Correlation – Variables move in opposite directions.
    Example: Rainfall and outdoor cricket matches played.
  3. Zero Correlation – No relationship between the variables.
  4. Perfect Correlation – Correlation coefficient is exactly +1 or –1.

8.3.2 Measures of Correlation

  • Scatter Diagram – A graphical method.
  • Karl Pearson’s Correlation Coefficient – Measures linear correlation numerically.
  • Spearman’s Rank Correlation – Based on ranks rather than actual values.

8.4 Scatter Diagram

A scatter diagram is a plot of paired values (X, Y) on a Cartesian plane.

  • Upward slope: Positive correlation
  • Downward slope: Negative correlation
  • No clear slope: Zero correlation

Example: Plotting PM₂.₅ concentration against daily hospital visits.


8.5 Karl Pearson’s Correlation Coefficient

Karl Pearson’s method gives a numerical value (r) ranging between –1 and +1.

Formula:

r=∑(X−Xˉ)(Y−Yˉ)∑(X−Xˉ)2∑(Y−Yˉ)2r = \frac{\sum (X - \bar{X})(Y - \bar{Y})}{\sqrt{\sum (X - \bar{X})^2 \cdot \sum (Y - \bar{Y})^2}}r=∑(X−Xˉ)2∑(Y−Yˉ)2​∑(X−Xˉ)(Y−Yˉ)​


8.5.1 Properties of Correlation Coefficient

  • Range: –1 ≤ r ≤ +1
  • Sign: Indicates direction (positive or negative)
  • Magnitude: Indicates strength
  • Unit-free: Independent of measurement units
  • Symmetry: rXY=rYXr_{XY} = r_{YX}rXY​=rYX​

8.5.2 Calculation of Correlation Coefficient

Example:

X

Y

10

20

20

25

30

28

40

35

50

40

Step 1: Find deviations from mean.
Step 2: Apply the formula.
Step 3: Interpret rrr.


8.6 Spearman’s Rank Correlation Coefficient

Spearman’s method is useful for qualitative data or when the data is in ranks.

ρ=1−6∑d2n(n2−1)\rho = 1 - \frac{6 \sum d^2}{n(n^2 - 1)}ρ=1−n(n2−1)6∑d2​

Where:

  • ddd = difference between paired ranks
  • nnn = number of observations

8.6.1 Method of Calculation of Rank Correlation

  1. Rank the data for X and Y separately.
  2. Compute d=Rank(X)−Rank(Y)d = \text{Rank}(X) - \text{Rank}(Y)d=Rank(X)−Rank(Y).
  3. Square each ddd and sum them.
  4. Apply the formula for ρ\rhoρ.

8.7 Concept of Regression

Regression estimates the value of one variable based on the value of another.

  • Dependent Variable (Y) – The variable to be predicted.
  • Independent Variable (X) – The variable used for prediction.

8.8 Lines of Regression

Two lines of regression exist:

  1. Regression of Y on X: Predicts Y from X.

Y=a+bXY = a + bXY=a+bX

  1. Regression of X on Y: Predicts X from Y.

X=a′+b′YX = a' + b'YX=a′+b′Y

8.8.1 Single Variable Linear Regression Lines

Used when only one independent variable is present.


8.8.2 Calculation of Regression Lines

Formulas:

bYX=∑(X−Xˉ)(Y−Yˉ)∑(X−Xˉ)2b_{YX} = \frac{\sum (X - \bar{X})(Y - \bar{Y})}{\sum (X - \bar{X})^2}bYX​=∑(X−Xˉ)2∑(X−Xˉ)(Y−Yˉ)​ a=Yˉ−bYXXˉa = \bar{Y} - b_{YX} \bar{X}a=Yˉ−bYX​Xˉ

Similarly for bXYb_{XY}bXY​ and a′a'a′.


8.9 Regression Coefficients

The regression coefficients (bYXb_{YX}bYX​ and bXYb_{XY}bXY​) measure the slope of the regression lines.

8.9.1 Properties of Regression Coefficients

  • Both have the same sign as the correlation coefficient.
  • Product relation: bYXbXY=r2b_{YX} \cdot b_{XY} = r^2bYX​bXY​=r2.
  • If r = 0, both coefficients are zero.
  • Sensitive to extreme values.

8.10 Let Us Sum Up

  • Correlation measures the degree and direction of association between two variables.
  • Karl Pearson’s coefficient is used for quantitative linear relationships; Spearman’s is used for rank data.
  • Regression helps in prediction using a mathematical equation.
  • Regression coefficients indicate the slope and are related to the correlation coefficient.

8.11 Key Words

  • Correlation – Degree of association between two variables.
  • Scatter Diagram – Graphical representation of paired data.
  • Karl Pearson’s r – Numerical measure of linear correlation.
  • Spearman’s ρ – Rank-based correlation measure.
  • Regression Line – Best-fit line used for prediction.
  • Regression Coefficient – Slope indicating rate of change.

 

Comments

Popular Posts

Jcert Class 8 Daffodil Chapter 1a: The Naive Friends Solutions

Jcert Class 8 भाषा मंजरी Chapter 3 मित्रता Solutions

Jcert Class 8 भाषा मंजरी Chapter 8 अमरूद का पेड Solutions