Home

Non-Parametric Statistical Tests

Introduction

Parametric tests such as the t-test and ANOVA rely on assumptions like normality, equal variances, and interval-scale data.

When these assumptions are violated, non-parametric tests provide robust alternatives. Following are commonly known tests in this case:

1. Wilcoxon Signed-Rank Test

Purpose: The Wilcoxon signed-rank test is a non-parametric alternative to the paired t-test. It is used when comparing two related samples or paired observations, especially when the data are not normally distributed.

Typical Use Cases:

Before–after measurements on the same subjects
Pre-treatment vs post-treatment data
Matched or paired samples

Hypotheses:

Null hypothesis (H₀): The median of the differences between pairs is zero.
Alternative hypothesis (H₁): The median of the differences is not zero.

Mathematical Idea: Let the paired observations be:

$$ (x_1, y_1), (x_2, y_2), \dots, (x_n, y_n) $$ Compute the differences: $$ d_i = x_i - y_i $$

Steps:

Remove pairs where ( d_i = 0 )
Rank the absolute differences ( |d_i| )
Assign signs (+/–) based on direction
Sum the signed ranks

he test statistic is:

$$ W = \sum \text{signed ranks} $$ For large samples, W is approximated by a normal distribution.

When to use:

✔ Paired data
✔ Non-normal distribution
✔ Ordinal or continuous data

2. Mann–Whitney U Test (Wilcoxon Rank-Sum Test)

Purpose: The Mann–Whitney U test compares two independent groups when the assumption of normality is violated.

It is the non-parametric alternative to the independent t-test.

Hypotheses:

H₀: The two groups come from the same distribution
H₁: One group tends to have larger values than the other

Conceptual Idea: Instead of comparing means, the test:

Ranks all observations together
Compares the sum of ranks between groups

If one group consistently has higher ranks, it suggests a difference in distributions.

Test Statistic:

$ n_1, n_2 $ be sample sizes
$ R_1 $ = sum of ranks for group 1

$$ U_1 = n_1 n_2 + \frac{n_1(n_1 + 1)}{2} - R_1 $$ Similarly compute $ U_2 $, and use the smaller of the two.

When to use:

✔ Two independent samples
✔ Ordinal or continuous data
✔ Non-normal distributions

3. Kruskal–Wallis Test

Purpose: The Kruskal–Wallis test extends the Mann–Whitney test to three or more independent groups.

It is the non-parametric alternative to one-way ANOVA.

Hypotheses:

H₀: All groups come from the same distribution
H₁: At least one group differs

Mathematical Idea: All observations are ranked together. Let:

$R_i$ = sum of rankeds in group $i$
$n_i$ size of group $i$
$N$= total sample size

The Test statistic is defined as: $$ H = \frac{12}{N(N+1)} \sum_{i=1}^{k} \frac{R_i^2}{n_i} - 3(N+1) $$ This statistic approximately follows a χ² distribution with $ k - 1 $ degrees of freedom.

When to Use:

✔ More than two independent groups
✔ Non-normal distributions
✔ Ordinal or continuous data

4. Spearman Rank Correlation (ρ)

Purpose: Measures the strength and direction of a monotonic relationship between two variables. Unlike Pearson correlation, Spearman does not assume linearity or normality.

How It Works:

Convert raw data into ranks
Compute correlation between ranks

Mathematical Idea:

$$ \rho = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)} $$ Where:

$ d_i $ = difference between ranks
$ n $ = number of observations

Interpretation

ρ Value	Interpretation
+1	Perfect positive monotonic relationship
0	No monotonic relationship
−1	Perfect negative monotonic relationship

5.Chi-Square Test ($\xi^2$ test)

The Chi-Square $\chi^2$ test is a non-parametric statistical test used to examine whether there is a significant association between categorical variables. Unlike t-tests or ANOVA, it does not compare means, but instead compares observed frequencies with expected frequencies.

When to Use the Chi-Square Test: Use the Chi-square test when:
- The data are categorical (nominal or ordinal)
- You want to test association or independence
- Observations are independent
- Expected frequencies in each cell are sufficiently large (usually ≥ 5)
Types of Chi-Square Tests:
- 1️⃣ Chi-Square Test of Independence: Used to determine whether two categorical variables are related.
  Example:
  - Gender (Male / Female)
  - Product preference (A / B / C)
  Question: Is product preference independent of gender?
- 2️⃣ Chi-Square Goodness-of-Fit Test: Used to test whether observed data follows a theoretical distribution.
  Example:
  - Is a die fair?
  - Do observed frequencies match expected proportions?
Hypotheses: For Chi-Square Test of Independence
- Null hypothesis (H₀): The two categorical variables are independent.
- Alternative hypothesis (H₁): The variables are dependent (associated).
Test Statistic (Mathematical Formulation): The chi-square statistic is defined as: $$ \chi^2 =\sum \frac{(O_i - E_i)^2}{E_i} $$ where:
- $O_i$ = observed frequency
- $E_i$ = expected frequency
- The sum is taken over all cells in the contingency table
Expected Frequency Formula:
$$ E_{ij}= \frac{(\text{Row Total}_i)(\text{Column Total}_j)}{\text{Grand Total}} $$
Degrees of Freedom: $$df = (r−1)(c−1) $$
Where:
- $r$ = number of rows
- $c$ = number of columns
Decision Rule:
- Compute the chi-square statistic
- Find the critical value from the $\chi^2$ distribution table (or $p$-value)
- If:
  $$\chi_{\text{calculated}}^2 > \chi_{\text{critical}}^2$$
  
  or
  $$p < \alpha$$
Example: Chi-Square Test of Independence

Observed data:

	Tea	Coffee	Total
Male	30	20	50
Female	20	30	50
Total	50	50	100

Step 1: Compute Expected Frequencies $$E = \frac{(\text{row total})(\text{column total})}{\text{grand total}}$$
For males who prefer tea:
$$E = \frac{50\times 50}{100} =25$$
Step 2: Compute $\chi^2$ $$\chi^2 = \sum \frac{(O-E)^2}{E}$$
Step 3: Degrees of Freedom $$df=(2−1)(2−1)=1$$
Step 4: Decision
If $\chi^2 > \chi^2_{0.05}(1 = 3.84 \rightarrow H_0)$

Interpretation:
- Significant result → variables are associated
- Non-significant result → no evidence of association
When NOT to Use Chi-Square:
- Expected cell counts < 5 (use Fisher’s Exact Test)
- Continuous data
- Very small sample sizes

Final Note: The Chi-square test is one of the most widely used tools in statistics for analyzing categorical data. It is simple, robust, and powerful when used correctly—but always ensure its assumptions are met before applying it.

Advantages of Non-Parametric Methods

Non-parametric methods offer several benefits:

Flexibility: They handle non-normal, ordinal, or categorical data.
Small Samples: They work well with limited data.
No Normality Assumption: They don’t require data to follow a specific distribution.
Outlier Robustness: Non-parametric tests are less sensitive to outliers.
For instance, in medical research, patient recovery times may be skewed. Non-parametric tests analyze this data reliably.

Limitations of Non-Parametric Tests

Despite their strengths, non-parametric tests have drawbacks:

Lower Power: They’re less sensitive than parametric tests when assumptions are met.
Complex Interpretation: Results based on ranks can be harder to interpret.
Limited Scope: They’re not suitable for all statistical analyses, like regression.

Researchers must weigh these factors when choosing non-parametric methods.

Non-Parametric Statistics in Research

Non-parametric statistics are widely used in research. They’re common in:

Social Sciences: Analyzing survey data with ordinal scales, like Likert scores.
Medical Studies: Studying non-normal data, like recovery times or symptom severity.
Business: Comparing customer preferences or product rankings.
Environmental Science: Analyzing skewed data, like pollution levels.

Their flexibility makes them invaluable across fields.

Non-Parametric Methods and Sample Size

Sample size impacts both parametric and non-parametric tests. Parametric tests need larger samples to ensure normality. Non-parametric tests work with smaller samples, often n < 30. This makes them ideal for pilot studies or limited datasets.

For example, a study with 15 participants may use a Mann-Whitney U test. A t-test would be less reliable due to the small sample.

Summary table

Test	Purpose	Parametric Equivalent	Data Type
Wilcoxon Signed-Rank	Paired samples	Paired t-test	Ordinal / Continuous
Mann–Whitney U	Two independent samples	Independent t-test	Ordinal / Continuous
Kruskal–Wallis	≥ 3 independent groups	One-way ANOVA	Ordinal / Continuous
Spearman Correlation	Association between variables	Pearson correlation	Ordinal / Continuous

Key takeaway

When data violate normality assumptions, non-parametric tests provide reliable, distribution-free alternatives to classical parametric tests — at the cost of slightly reduced statistical power.

References

Chapter 27 Nonparametric Tests

Some other interesting things to know:

Visit my website on For Data, Big Data, Data-modeling, Datawarehouse, SQL, cloud-compute.
Visit my website on Data engineering

Arun

Non-Parametric Statistical Tests

Introduction

1. Wilcoxon Signed-Rank Test

2. Mann–Whitney U Test (Wilcoxon Rank-Sum Test)

3. Kruskal–Wallis Test

4. Spearman Rank Correlation (ρ)

5.Chi-Square Test (\(\xi^2\) test)

Advantages of Non-Parametric Methods

Limitations of Non-Parametric Tests

Non-Parametric Statistics in Research

Non-Parametric Methods and Sample Size

Summary table

Key takeaway

References

Some other interesting things to know: