Definition of Hypothesis testing
Hypothesis testing is a formal process of statistical analysis using inferential statistics. The goal of hypothesis testing is to compare populations or assess relationships between variables using samples. Hypotheses, or predictions, are tested using statistical tests. Statistical tests also estimate sampling errors so that valid inferences can be made.Statistical tests can be: parametric or non-parametic.
1. Parametric tests
parametric tests are considered more statistically powerful because they are more likely to detect an effect if one exists. Parametric tests make assumptions that include the following:- the population that the sample comes from follows a normal distribution of scores
- the sample size is large enough to represent the population
- the variances, a measure of variability, of each group being compared are similar
Example: t-test, ANOVA, Regression analysis, Pearson correlation coefficient.
A short description on each of these examples:
- t-test: Used to compare the means of two groups when the data are normally distributed and the variances of the two groups are equal.
- ANOVA: Used to compare the means of three or more groups when the data are normally distributed and the variances of the groups are equal.
- Regression analysis: Used to model the relationship between two or more variables when the data are normally distributed and the assumptions of the model are met.
- Pearson correlation coefficient: Used to measure the strength and direction of the linear relationship between two continuous variables when the data are normally distributed.
2. Non-parametric tests
When your data violates any of these assumptions, non-parametric tests are more suitable. Non-parametric tests are called “distribution-free tests” because they don’t assume anything about the distribution of the population data.Example: Wilcoxon signed-rank test, Mann-Whitney U test, Kruskal-Wallis test, Spearman correlation coefficient
- Wilcoxon signed-rank test: Used to compare the medians of two related samples when the data are not normally distributed.
- Mann-Whitney U test: Used to compare the medians of two independent groups when the data are not normally distributed.
- Kruskal-Wallis test: Used to compare the medians of three or more groups when the data are not normally distributed.
- Spearman correlation coefficient: Used to measure the strength and direction of the monotonic relationship between two continuous variables when the data are not normally distributed.
Differents forms of statistics tests
1. comparison test: Comparison tests assess whether there are differences in means, medians or rankings of scores of two or more groups. To decide which test suits your aim, consider whether your data meets the conditions necessary for parametric tests, the number of samples, and the levels of measurement of your variables.
| Comparison test | Parametric? | What’s being compared? | Samples |
|---|---|---|---|
| t-test | Yes | Means | 2 samples |
| ANOVA | Yes | Means | 3+ samples |
| Mood’s median | No | Medians | 2+ samples |
| Wilcoxon signed-rank | No | Distributions | 2 samples |
| Wilcoxon rank-sum (Mann-Whitney U) | No | Sums of rankings | 2 samples |
| Kruskal-Wallis H | No | Mean rankings | 3+ samples |
2. correlation test: Correlation tests determine the extent to which two variables are associated. Although Pearson’s r is the most statistically powerful test, Spearman’s r is appropriate for interval and ratio variables when the data doesn’t follow a normal distribution. The chi square test of independence is the only test that can be used with nominal variables.
| Correlation test | Parametric? | Variables |
|---|---|---|
| Pearson’s r | Yes | Interval/ratio variables |
| Spearman’s r | No | Ordinal/interval/ratio variables |
| Chi square test of independence | No | Nominal/ordinal variables |
3. regression test: Regression tests demonstrate whether changes in predictor variables cause changes in an outcome variable. You can decide which regression test to use based on the number and types of variables you have as predictors and outcomes. Most of the commonly used regression tests are parametric. If your data is not normally distributed, you can perform data transformations. Data transformations help you make your data normally distributed using mathematical operations, like taking the square root of each value.
| Regression test | Predictor | Outcome |
|---|---|---|
| Simple linear regression | 1 interval/ratio variable | 1 interval/ratio variable |
| Multiple linear regression | 2+ interval/ratio variable(s) | 1 interval/ratio variable |
| Logistic regression | 1+ any variable(s) | 1 binary variable |
| Nominal regression | 1+ any variable(s) | 1 nominal variable |
| Ordinal regression | 1+ any variable(s) | 1 ordinal variable |
Degrees of Freedom: Degrees of freedom, often represented by v or df, is the number of independent pieces of information used to calculate a statistic. It’s calculated as the sample size minus the number of restrictions.
Step-by-step guide to hypothesis testing
-
Formulate the null and alternative hypotheses
The null hypothesis (denoted H₀) is the hypothesis that there is no significant difference or relationship. The alternative hypothesis (denoted Hₐ) states that a significant difference or relationship exists.
- Example: In a clinical trial, the null hypothesis might state that there is no difference between a new drug and a placebo, while the alternative hypothesis states that the new drug is more effective.
- Go to detailed section
-
Choose a significance level
The significance level (denoted α) represents the probability of rejecting the null hypothesis when it is actually true. A commonly used value is 0.05.
-
Select an appropriate statistical test
The choice of statistical test depends on the type of data, number of groups, and assumptions of the test. For example, a t-test compares two means, while ANOVA compares more than two.
-
Calculate the test statistic
The test statistic measures how far the observed sample result deviates from the null hypothesis.
t-test formula:
$$ t = \frac{\bar{x} - \mu}{s / \sqrt{n}} $$
where 𝑥̄ is the sample mean, μ the population mean, s the sample standard deviation, and n the sample size.
z-test formula:
$$ z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}} $$
The z-test is used when the population standard deviation is known or the sample size is large.
-
Calculate the p-value
The p-value represents the probability of observing a test statistic as extreme as the one calculated, assuming the null hypothesis is true.
- A small p-value indicates strong evidence against the null hypothesis.
- If p < α, the null hypothesis is rejected.
- If p ≥ α, the null hypothesis is not rejected.
The p-value depends on the sampling distribution (t, z, chi-square, or F) used in the test.
-
Interpret the results
If the p-value is less than the significance level, the result is statistically significant and the null hypothesis is rejected. Otherwise, there is insufficient evidence to reject the null hypothesis.
Example on step by step analysis
To illustrate this process, consider the following scenario.
Suppose we want to test whether there is a difference in the average height of men and women. We collect a sample of 100 men and 100 women and measure their heights. The goal is to determine whether the observed difference in sample means is statistically significant.
-
Null hypothesis (H₀):
μ₁ = μ₂
(There is no difference in height between men and women.) -
Alternative hypothesis (H₁):
μ₁ ≠ μ₂
(There is a difference in height between men and women.) - Significance level: α = 0.05
- Statistical test: A two-sample t-test is used to compare the means of the two groups.
-
Test statistic:
The test statistic for a two-sample t-test is:
$$t= \frac{\bar{x}_1-\bar{x}_2}{s/\sqrt{n_1+n_2}}$$
where x̄₁ and x̄₂ are the sample means, s is the pooled standard deviation, and n₁ and n₂ are the sample sizes.
Suppose the sample mean height for men is 175 cm and for women is 162 cm. Let the sample standard deviations be s₁ = 6 cm and s₂ = 5 cm.
The pooled standard deviation is:
$$ s= \sqrt{\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n2-2}} = 5.524 $$
Substituting into the test statistic:
t = (175 − 162) / (5.524 / √(100 + 100)) = 12.215
-
P-value:
The p-value represents the probability of observing a t-statistic as extreme as 12.215 assuming the null hypothesis is true. Using a t-distribution with 198 degrees of freedom, the p-value is far less than 0.05.
-
Conclusion:
Since the p-value is smaller than the significance level (0.05), we reject the null hypothesis and conclude that there is a statistically significant difference in average height between men and women.
Null and Alternative Hypothesis
The null hypothesis (H₀) states that there is no effect or no difference, while the alternative hypothesis (Hₐ) states that a meaningful effect or difference exists.
- The null hypothesis is assumed true unless evidence suggests otherwise.
- The hypotheses must be mutually exclusive and collectively exhaustive.
Example:
- H₀ ⇒ The new drug has no effect on blood pressure.
- Hₐ ⇒ The new drug has a significant effect on blood pressure.
Alternatively, suppose a researcher is interested in whether there is a difference in job satisfaction between men and women. They could formulate the null and alternative hypotheses as follows:
- \(H_0\) ⇒ There is no significant difference in job satisfaction between men and women.
- \(H_a\) ⇒ There is a significant difference in job satisfaction between men and women.
The null and alternative hypotheses can be one-tailed or two-tailed, depending on the direction of the expected difference or relationship between the variables.
A one-tailed hypothesis predicts the direction of the effect (e.g., the new drug will lower blood pressure), while a two-tailed hypothesis does not predict the direction of the effect (e.g., there is a difference in job satisfaction between men and women).
In summary, formulating the null and alternative hypotheses is a critical step in hypothesis testing, as it defines the research question and the direction of the analysis. The hypotheses must be mutually exclusive and collectively exhaustive, and their formulation depends on the research question and the expected relationship or difference between the variables being studied.
Significance Level (α)
The significance level (α) represents the probability of committing a Type I error — rejecting a true null hypothesis.
- Common values: 0.05 or 0.01
- Lower α reduces false positives but may increase false negatives
- Choice depends on context, risk, and required confidence
Selecting an Appropriate Statistical Test
Choosing the correct statistical test depends on multiple factors:
- Type of data: Continuous, categorical, or ordinal
- Sample size: Small samples may require non-parametric tests
- Number of groups: Two groups vs multiple groups
- Assumptions: Normality, equal variance, independence
- Research question: Difference, relationship, or association
Selecting the correct test ensures valid conclusions and reliable statistical inference.