Non-Parametric Statistical Tests
Introduction
Parametric tests such as the t-test and ANOVA rely on assumptions like normality, equal variances, and interval-scale data.When these assumptions are violated, non-parametric tests provide robust alternatives. Following are commonly known tests in this case:
1. Wilcoxon Signed-Rank Test
Purpose: The Wilcoxon signed-rank test is a non-parametric alternative to the paired t-test. It is used when comparing two related samples or paired observations, especially when the data are not normally distributed.
Typical Use Cases:
- Before–after measurements on the same subjects
- Pre-treatment vs post-treatment data
- Matched or paired samples
Hypotheses:
- Null hypothesis (H₀): The median of the differences between pairs is zero.
- Alternative hypothesis (H₁): The median of the differences is not zero.
Mathematical Idea: Let the paired observations be:
$$ (x_1, y_1), (x_2, y_2), \dots, (x_n, y_n) $$ Compute the differences: $$ d_i = x_i - y_i $$Steps:
- Remove pairs where ( d_i = 0 )
- Rank the absolute differences ( |d_i| )
- Assign signs (+/–) based on direction
- Sum the signed ranks
he test statistic is:
$$ W = \sum \text{signed ranks} $$ For large samples, W is approximated by a normal distribution.When to use:
- ✔ Paired data
- ✔ Non-normal distribution
- ✔ Ordinal or continuous data
2. Mann–Whitney U Test (Wilcoxon Rank-Sum Test)
Purpose: The Mann–Whitney U test compares two independent groups when the assumption of normality is violated.
It is the non-parametric alternative to the independent t-test.
Hypotheses:
- H₀: The two groups come from the same distribution
- H₁: One group tends to have larger values than the other
Conceptual Idea: Instead of comparing means, the test:
- Ranks all observations together
- Compares the sum of ranks between groups
Test Statistic:
- \( n_1, n_2 \) be sample sizes
- \( R_1 \) = sum of ranks for group 1
When to use:
- ✔ Two independent samples
- ✔ Ordinal or continuous data
- ✔ Non-normal distributions
3. Kruskal–Wallis Test
Purpose: The Kruskal–Wallis test extends the Mann–Whitney test to three or more independent groups.
It is the non-parametric alternative to one-way ANOVA.Hypotheses:
- H₀: All groups come from the same distribution
- H₁: At least one group differs
Mathematical Idea: All observations are ranked together. Let:
- \(R_i\) = sum of rankeds in group \(i\)
- \(n_i\) size of group \(i\)
- \(N\)= total sample size
When to Use:
- ✔ More than two independent groups
- ✔ Non-normal distributions
- ✔ Ordinal or continuous data
4. Spearman Rank Correlation (ρ)
Purpose: Measures the strength and direction of a monotonic relationship between two variables. Unlike Pearson correlation, Spearman does not assume linearity or normality.
How It Works:
- Convert raw data into ranks
- Compute correlation between ranks
Mathematical Idea:
$$ \rho = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)} $$ Where:- \( d_i \) = difference between ranks
- \( n \) = number of observations
Interpretation
| ρ Value | Interpretation |
|---|---|
| +1 | Perfect positive monotonic relationship |
| 0 | No monotonic relationship |
| −1 | Perfect negative monotonic relationship |
5.Chi-Square Test (\(\xi^2\) test)
The Chi-Square \(\chi^2\) test is a non-parametric statistical test used to examine whether there is a significant association between categorical variables. Unlike t-tests or ANOVA, it does not compare means, but instead compares observed frequencies with expected frequencies.- When to Use the Chi-Square Test: Use the Chi-square test when:
- The data are categorical (nominal or ordinal)
- You want to test association or independence
- Observations are independent
- Expected frequencies in each cell are sufficiently large (usually ≥ 5)
- Types of Chi-Square Tests:
- 1️⃣ Chi-Square Test of Independence: Used to determine whether two categorical variables are related.
Example:
- Gender (Male / Female)
- Product preference (A / B / C)
- 2️⃣ Chi-Square Goodness-of-Fit Test: Used to test whether observed data follows a theoretical distribution.
Example:
- Is a die fair?
- Do observed frequencies match expected proportions?
- 1️⃣ Chi-Square Test of Independence: Used to determine whether two categorical variables are related.
- Hypotheses: For Chi-Square Test of Independence
- Null hypothesis (H₀): The two categorical variables are independent.
- Alternative hypothesis (H₁): The variables are dependent (associated).
- Test Statistic (Mathematical Formulation): The chi-square statistic is defined as:
$$
\chi^2 =\sum \frac{(O_i - E_i)^2}{E_i}
$$
where:
- \(O_i\) = observed frequency
- \(E_i\) = expected frequency
- The sum is taken over all cells in the contingency table
Expected Frequency Formula:
$$ E_{ij}= \frac{(\text{Row Total}_i)(\text{Column Total}_j)}{\text{Grand Total}} $$ - Degrees of Freedom:
$$df = (r−1)(c−1) $$
Where:
- \(r\) = number of rows
- \(c\) = number of columns
- Decision Rule:
- Compute the chi-square statistic
- Find the critical value from the \(\chi^2\) distribution table (or \(p\)-value)
- If:
$$\chi_{\text{calculated}}^2 > \chi_{\text{critical}}^2$$
or
$$p < \alpha$$
→ Reject the null hypothesis
- Example: Chi-Square Test of Independence
- Step 1: Compute Expected Frequencies
$$E = \frac{(\text{row total})(\text{column total})}{\text{grand total}}$$
For males who prefer tea:
$$E = \frac{50\times 50}{100} =25$$ - Step 2: Compute \(\chi^2\) $$\chi^2 = \sum \frac{(O-E)^2}{E}$$
- Step 3: Degrees of Freedom $$df=(2−1)(2−1)=1$$
- Step 4: Decision
If \(\chi^2 > \chi^2_{0.05}(1 = 3.84 \rightarrow H_0)\) - Interpretation:
- Significant result → variables are associated
- Non-significant result → no evidence of association
- When NOT to Use Chi-Square:
- Expected cell counts < 5 (use Fisher’s Exact Test)
- Continuous data
- Very small sample sizes
Observed data:
| Tea | Coffee | Total | |
|---|---|---|---|
| Male | 30 | 20 | 50 |
| Female | 20 | 30 | 50 |
| Total | 50 | 50 | 100 |
Final Note: The Chi-square test is one of the most widely used tools in statistics for analyzing categorical data. It is simple, robust, and powerful when used correctly—but always ensure its assumptions are met before applying it.
Advantages of Non-Parametric Methods
Non-parametric methods offer several benefits:
- Flexibility: They handle non-normal, ordinal, or categorical data.
- Small Samples: They work well with limited data.
- No Normality Assumption: They don’t require data to follow a specific distribution.
- Outlier Robustness: Non-parametric tests are less sensitive to outliers.
- For instance, in medical research, patient recovery times may be skewed. Non-parametric tests analyze this data reliably.
Limitations of Non-Parametric Tests
Despite their strengths, non-parametric tests have drawbacks:
- Lower Power: They’re less sensitive than parametric tests when assumptions are met.
- Complex Interpretation: Results based on ranks can be harder to interpret.
- Limited Scope: They’re not suitable for all statistical analyses, like regression.
Researchers must weigh these factors when choosing non-parametric methods.
Non-Parametric Statistics in Research
Non-parametric statistics are widely used in research. They’re common in:
- Social Sciences: Analyzing survey data with ordinal scales, like Likert scores.
- Medical Studies: Studying non-normal data, like recovery times or symptom severity.
- Business: Comparing customer preferences or product rankings.
- Environmental Science: Analyzing skewed data, like pollution levels.
Their flexibility makes them invaluable across fields.
Non-Parametric Methods and Sample Size
Sample size impacts both parametric and non-parametric tests. Parametric tests need larger samples to ensure normality. Non-parametric tests work with smaller samples, often n < 30. This makes them ideal for pilot studies or limited datasets.
For example, a study with 15 participants may use a Mann-Whitney U test. A t-test would be less reliable due to the small sample.
Summary table
| Test | Purpose | Parametric Equivalent | Data Type |
|---|---|---|---|
| Wilcoxon Signed-Rank | Paired samples | Paired t-test | Ordinal / Continuous |
| Mann–Whitney U | Two independent samples | Independent t-test | Ordinal / Continuous |
| Kruskal–Wallis | ≥ 3 independent groups | One-way ANOVA | Ordinal / Continuous |
| Spearman Correlation | Association between variables | Pearson correlation | Ordinal / Continuous |
Key takeaway
When data violate normality assumptions, non-parametric tests provide reliable, distribution-free alternatives to classical parametric tests — at the cost of slightly reduced statistical power.References
Some other interesting things to know:
- Visit my website on For Data, Big Data, Data-modeling, Datawarehouse, SQL, cloud-compute.
- Visit my website on Data engineering