CROUpdated Apr 2026

Statistical Significance

The likelihood that an observed test result isn't due to random chance.

Definition

Statistical significance is the probability that a measured difference between two groups (A and B in an A/B test) is real rather than a product of random variation. The conventional threshold is 95% confidence (p < 0.05).

Context

95% confidence means there's a 5% chance the observed result is a false positive — i.e., the groups are actually equal and the test caught a random fluctuation. Running many tests at 95% confidence produces a predictable stream of false wins.

For high-stakes decisions or when testing many variants simultaneously, 99% confidence is more appropriate because it reduces false-positive rate at the cost of requiring more sample size.

Example

An A/B test showing variant B won by 8% with p = 0.04 passes the 95% threshold. Run 20 such tests and you'd expect ~1 false positive — one 'win' that's actually noise. This is why testing everything is worse than testing important things.

The nuance most definitions miss

Statistical significance is not practical significance. A 0.5% lift measured at 95% confidence on millions of sessions is 'statistically significant' but often too small to bother shipping.

Related terms

A/B Test

A controlled experiment comparing two versions of a page, email, or feature.

MDE (Minimum Detectable Effect)

The smallest improvement a test is statistically capable of detecting.

Services that apply this