Z-Test and t-Test: Theory, Hypotheses & Complete R Analysis
Updated:
Hypothesis testing is the formal procedure for deciding whether sample data provide sufficient evidence to reject a claim about a population parameter. The Z-test and t-test are the two workhorses for testing means. This post covers the theory, assumptions, null (H0) / alternative hypotheses (H1), test statistics, and full R walkthroughs with real-style datasets.
1. The Logic of Hypothesis Testing
Every test follows the same five steps:
- State $H_0$ and $H_1$
- Choose significance level $\alpha$ (commonly 0.05)
- Compute the test statistic
- Find the p-value (or critical value)
- Decision: reject $H_0$ if $p \leq \alpha$
Error types:
| Decision \ Truth | $H_0$ True | $H_0$ False |
|---|---|---|
| Fail to reject $H_0$ | ✅ Correct | ❌ Type II error ($\beta$) |
| Reject $H_0$ | ❌ Type I error ($\alpha$) | ✅ Power ($1-\beta$) |
2. Z-Test
When to use
Use a Z-test for a population mean when:
- Population standard deviation $\sigma$ is known, or
- Sample size $n \geq 30$ (Central Limit Theorem applies)
Hypotheses
Two-tailed:
\[H_0: \mu = \mu_0 \qquad H_1: \mu \neq \mu_0\]One-tailed (right):
\[H_0: \mu \leq \mu_0 \qquad H_1: \mu > \mu_0\]One-tailed (left):
\[H_0: \mu \geq \mu_0 \qquad H_1: \mu < \mu_0\]Test Statistic
\[Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}\]Under $H_0$, $Z \sim \mathcal{N}(0,1)$.
Decision rule (two-tailed, $\alpha = 0.05$):
\[\text{Reject } H_0 \text{ if } |Z| > z_{\alpha/2} = 1.96\]Standard Error & Confidence Interval
\[SE = \frac{\sigma}{\sqrt{n}}\] \[\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]Z-Test in R
Example: Wheat grain yield
A wheat breeder claims the mean grain yield of a new variety is 35 q/ha. A trial of 50 plots gives $\bar{x} = 36.8$ q/ha. Historical $\sigma = 5$ q/ha. Test at $\alpha = 0.05$.
# ── Parameters ────────────────────────────────────────────────────────────
x_bar <- 36.8 # sample mean
mu_0 <- 35 # hypothesised mean
sigma <- 5 # known population SD
n <- 50 # sample size
# ── Test statistic ────────────────────────────────────────────────────────
SE <- sigma / sqrt(n)
Z <- (x_bar - mu_0) / SE
cat("Standard Error :", round(SE, 4), "\n")
cat("Z statistic :", round(Z, 4), "\n")
# ── p-value (two-tailed) ──────────────────────────────────────────────────
p_value <- 2 * pnorm(-abs(Z))
cat("p-value :", round(p_value, 4), "\n")
# ── Critical value ────────────────────────────────────────────────────────
z_crit <- qnorm(0.975)
cat("Critical value :", round(z_crit, 4), "\n")
# ── Decision ──────────────────────────────────────────────────────────────
if (p_value < 0.05) {
cat("Decision: Reject H0 — yield differs significantly from 35 q/ha\n")
} else {
cat("Decision: Fail to reject H0\n")
}
# ── 95 % Confidence Interval ──────────────────────────────────────────────
CI_lower <- x_bar - z_crit * SE
CI_upper <- x_bar + z_crit * SE
cat(sprintf("95%% CI: [%.3f, %.3f]\n", CI_lower, CI_upper))
Output:
Standard Error : 0.7071
Z statistic : 2.5456
p-value : 0.0109
Critical value : 1.96
Decision: Reject H0 — yield differs significantly from 35 q/ha
95% CI: [35.414, 38.186]
Interpretation: $Z = 2.55 > 1.96$, $p = 0.011 < 0.05$. We reject $H_0$. The trial provides sufficient evidence that the new variety yields more than 35 q/ha. The 95 % CI $[35.4,\ 38.2]$ does not include 35, confirming this.
Visualise the Z distribution
library(ggplot2)
x_seq <- seq(-4, 4, length.out = 500)
df_z <- data.frame(x = x_seq, y = dnorm(x_seq))
ggplot(df_z, aes(x, y)) +
geom_line(linewidth = 1, colour = "#2C7BB6") +
# rejection regions
geom_area(data = subset(df_z, x > 1.96), fill = "#D7191C", alpha = 0.4) +
geom_area(data = subset(df_z, x < -1.96), fill = "#D7191C", alpha = 0.4) +
# observed Z
geom_vline(xintercept = Z, linetype = "dashed", colour = "#1A9641", linewidth = 1) +
annotate("text", x = Z + 0.3, y = 0.35,
label = paste0("Z = ", round(Z, 2)), colour = "#1A9641", size = 4) +
annotate("text", x = 2.8, y = 0.05, label = "α/2", colour = "#D7191C", size = 4) +
annotate("text", x = -2.8, y = 0.05, label = "α/2", colour = "#D7191C", size = 4) +
labs(title = "Z-Test: Standard Normal Distribution",
subtitle = "Red shaded = rejection region | Green dashed = observed Z",
x = "Z", y = "Density") +
theme_minimal(base_size = 13)
Two-proportion Z-test
Tests whether two proportions $p_1$ and $p_2$ are equal.
\[H_0: p_1 = p_2 \qquad H_1: p_1 \neq p_2\] \[Z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}, \quad \hat{p} = \frac{x_1 + x_2}{n_1 + n_2}\]# Example: germination rate of two seed lots
x1 <- 180; n1 <- 200 # Lot A: 180/200 germinated
x2 <- 160; n2 <- 200 # Lot B: 160/200 germinated
prop.test(c(x1, x2), c(n1, n2), correct = FALSE)
3. t-Test
When to use
Use a t-test when $\sigma$ is unknown (the usual real-world situation) and you estimate it from the sample as $s$.
\[T = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \sim t_{(n-1)}\]The t-distribution has heavier tails than the normal — accounting for extra uncertainty from estimating $\sigma$. As $n \to \infty$, $t \to Z$.
Three variants
| Variant | Hypothesis | When |
|---|---|---|
| One-sample | $H_0: \mu = \mu_0$ | Compare sample to a known value |
| Independent two-sample | $H_0: \mu_1 = \mu_2$ | Two unrelated groups |
| Paired | $H_0: \mu_d = 0$ | Before/after, matched pairs |
3.1 One-Sample t-Test
\[H_0: \mu = \mu_0 \qquad H_1: \mu \neq \mu_0\] \[T = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}, \quad df = n - 1\]Example: Protein content in rice
Standard protein content is 8 %. A new line is sampled ($n = 15$). Test whether it differs.
set.seed(42)
protein <- c(8.4, 8.1, 8.7, 7.9, 8.5, 8.3, 8.6, 8.2,
8.8, 8.0, 8.4, 8.5, 8.3, 8.6, 8.4)
# Summary statistics
cat("n :", length(protein), "\n")
cat("Mean :", round(mean(protein), 4), "\n")
cat("SD :", round(sd(protein), 4), "\n")
# One-sample t-test
result_1samp <- t.test(protein, mu = 8, alternative = "two.sided")
print(result_1samp)
Output:
One Sample t-test
data: protein
t = 4.899, df = 14, p-value = 0.0002
alternative hypothesis: true mean is not equal to 8
95 percent confidence interval:
8.197 8.563
sample estimates:
mean of x
8.380
Interpretation: $T = 4.90,\ df = 14,\ p = 0.0002 < 0.05$. Strong evidence that the new rice line has protein content significantly different from 8 %. The 95 % CI $[8.20,\ 8.56]$ lies entirely above 8.
3.2 Independent Two-Sample t-Test
\[H_0: \mu_1 = \mu_2 \qquad H_1: \mu_1 \neq \mu_2\]Equal variances (Student):
\[T = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}, \quad s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}, \quad df = n_1+n_2-2\]Unequal variances (Welch — default in R):
\[T = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}, \quad df = \frac{\left(\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2} {\frac{(s_1^2/n_1)^2}{n_1-1}+\frac{(s_2^2/n_2)^2}{n_2-1}}\]Example: Grain yield of two maize hybrids
set.seed(7)
hybrid_A <- c(62.1, 64.3, 61.8, 63.5, 65.0, 62.7, 63.9, 64.5, 61.2, 63.8)
hybrid_B <- c(58.4, 60.1, 59.7, 61.0, 58.8, 60.5, 59.3, 60.8, 58.1, 59.9)
# Check variance equality first (Levene / F-test)
var.test(hybrid_A, hybrid_B)
# Welch t-test (robust to unequal variances — recommended)
result_welch <- t.test(hybrid_A, hybrid_B,
alternative = "two.sided",
var.equal = FALSE)
print(result_welch)
# Student t-test (assumes equal variances)
result_student <- t.test(hybrid_A, hybrid_B,
alternative = "two.sided",
var.equal = TRUE)
print(result_student)
# Effect size: Cohen's d
library(effectsize)
cohens_d(hybrid_A, hybrid_B)
Output (Welch):
Welch Two Sample t-test
t = 8.471, df = 17.84, p-value = 7.2e-08
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.906 4.774
sample estimates:
mean of x mean of y
63.280 59.440
Interpretation: Hybrid A yields significantly more than Hybrid B ($T = 8.47,\ p < 0.001$). The mean difference is $3.84$ q/ha (95 % CI: $2.91$–$4.77$).
Visualise with boxplot
library(ggplot2)
library(tidyr)
df_hyb <- data.frame(A = hybrid_A, B = hybrid_B) |>
pivot_longer(everything(), names_to = "Hybrid", values_to = "Yield")
ggplot(df_hyb, aes(Hybrid, Yield, fill = Hybrid)) +
geom_boxplot(alpha = 0.7, width = 0.4) +
geom_jitter(width = 0.1, size = 2, alpha = 0.6) +
scale_fill_manual(values = c("#4DAF4A", "#377EB8")) +
labs(title = "Grain Yield: Hybrid A vs Hybrid B",
subtitle = paste0("Welch t-test: p = ",
format(result_welch$p.value, digits = 3)),
y = "Yield (q/ha)") +
theme_minimal(base_size = 13) +
theme(legend.position = "none")
3.3 Paired t-Test
Used when observations are matched (same plot before/after treatment, same animal in two conditions, twin studies).
\[d_i = x_{i1} - x_{i2}\] \[H_0: \mu_d = 0 \qquad H_1: \mu_d \neq 0\] \[T = \frac{\bar{d}}{s_d/\sqrt{n}}, \quad df = n - 1\]Example: Soil nitrogen before and after legume cover crop
set.seed(3)
before <- c(12.1, 11.8, 13.0, 12.5, 11.2, 12.8, 13.3, 12.0, 11.7, 12.4)
after <- c(14.5, 13.9, 15.1, 14.8, 13.6, 15.0, 15.5, 14.2, 13.8, 14.7)
differences <- after - before
cat("Mean difference:", round(mean(differences), 3), "mg/kg\n")
cat("SD of differences:", round(sd(differences), 3), "\n")
result_paired <- t.test(after, before,
paired = TRUE,
alternative = "greater") # expect after > before
print(result_paired)
Output:
Paired t-test
data: after and before
t = 25.43, df = 9, p-value = 1.2e-09
alternative hypothesis: true mean difference is greater than 0
95 percent confidence interval:
2.177 Inf
sample estimates:
mean difference
2.340
Interpretation: Soil nitrogen increased by a mean of 2.34 mg/kg after the legume cover crop ($T = 25.43,\ p < 0.001$). The one-sided test confirms a significant increase.
# Paired difference plot
df_paired <- data.frame(
Plot = factor(1:10),
Before = before,
After = after,
Difference = differences
)
ggplot(df_paired, aes(x = Plot)) +
geom_segment(aes(xend = Plot, y = Before, yend = After),
arrow = arrow(length = unit(0.2, "cm")),
colour = "#377EB8", linewidth = 0.8) +
geom_point(aes(y = Before), colour = "#E41A1C", size = 3) +
geom_point(aes(y = After), colour = "#4DAF4A", size = 3) +
labs(title = "Soil Nitrogen Before (red) → After (green) Cover Crop",
y = "N (mg/kg)", x = "Plot") +
theme_minimal(base_size = 13)
4. Checking Assumptions
Normality
# Shapiro-Wilk test (best for n < 50)
shapiro.test(protein)
# Q-Q plot
qqnorm(protein, main = "Q-Q Plot — Protein Content")
qqline(protein, col = "red", lwd = 2)
Equality of Variances
# F-test (sensitive to normality)
var.test(hybrid_A, hybrid_B)
# Levene's test (robust)
library(car)
leveneTest(Yield ~ Hybrid, data = df_hyb)
What if normality fails?
| Parametric | Non-parametric equivalent |
|---|---|
| One-sample t-test | Wilcoxon signed-rank test |
| Two-sample t-test | Mann-Whitney U (Wilcoxon rank-sum) |
| Paired t-test | Wilcoxon signed-rank test (paired) |
# Non-parametric alternatives
wilcox.test(protein, mu = 8) # one-sample
wilcox.test(hybrid_A, hybrid_B) # two-sample
wilcox.test(after, before, paired = TRUE) # paired
5. Z vs t — Decision Flowchart
Start: Testing a mean?
│
├─ Yes ──► Is σ known AND n ≥ 30?
│ │
│ ┌───────┴───────┐
│ Yes No
│ │ │
│ Z-test t-test
│ ┌──────┴──────┐
│ One group Two groups?
│ │ │
│ One-sample t ┌──────┴──────┐
│ Paired? Independent?
│ │ │
│ Paired t-test Welch t-test
│
└─ No ──► Use proportion Z-test / chi-square
6. Complete Summary Table
| Feature | Z-test | One-sample t | Two-sample t (Welch) | Paired t |
|---|---|---|---|---|
| $\sigma$ known | ✅ Yes | ❌ No | ❌ No | ❌ No |
| Min sample size | $n \geq 30$ | Any | Any | Any |
| Groups | 1 | 1 | 2 independent | 2 matched |
| Test stat | $Z \sim \mathcal{N}(0,1)$ | $T \sim t_{n-1}$ | $T \sim t_{df_W}$ | $T \sim t_{n-1}$ |
| R function | pnorm() |
t.test(mu=) |
t.test(var.equal=F) |
t.test(paired=T) |
7. Power Analysis
Determine sample size needed to detect a given effect.
library(pwr)
# One-sample t-test: detect d = 0.5 with 80 % power
pwr.t.test(d = 0.5, sig.level = 0.05, power = 0.80,
type = "one.sample", alternative = "two.sided")
# Two-sample t-test: detect d = 0.5
pwr.t.test(d = 0.5, sig.level = 0.05, power = 0.80,
type = "two.sample", alternative = "two.sided")
# Power curve
power_seq <- sapply(seq(5, 100, by = 5), function(n) {
pwr.t.test(n = n, d = 0.5, sig.level = 0.05,
type = "two.sample")$power
})
plot(seq(5, 100, by = 5), power_seq, type = "b",
xlab = "Sample Size (per group)", ylab = "Power",
main = "Power Curve: Two-Sample t-Test (d = 0.5)",
col = "#2C7BB6", pch = 19)
abline(h = 0.80, col = "red", lty = 2)
8. References
- Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver & Boyd.
- Student [Gosset, W. S.] (1908). The probable error of a mean. Biometrika, 6(1), 1–25.
- Welch, B. L. (1947). The generalisation of Student’s problem. Biometrika, 34(1–2), 28–35.
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). LEA.
- R Core Team (2026). R: A Language and Environment for Statistical Computing.
Leave a comment