Statistical Significance
The likelihood that an observed test result isn't due to random chance.
Statistical significance is the probability that a measured difference between two groups (A and B in an A/B test) is real rather than a product of random variation. The conventional threshold is 95% confidence (p < 0.05).
Context
95% confidence means there's a 5% chance the observed result is a false positive — i.e., the groups are actually equal and the test caught a random fluctuation. Running many tests at 95% confidence produces a predictable stream of false wins.
For high-stakes decisions or when testing many variants simultaneously, 99% confidence is more appropriate because it reduces false-positive rate at the cost of requiring more sample size.
An A/B test showing variant B won by 8% with p = 0.04 passes the 95% threshold. Run 20 such tests and you'd expect ~1 false positive — one 'win' that's actually noise. This is why testing everything is worse than testing important things.
Statistical significance is not practical significance. A 0.5% lift measured at 95% confidence on millions of sessions is 'statistically significant' but often too small to bother shipping.