Confidence Intervals
Confidence intervals are an intuitive way to quantify the uncertainty in the observed metric deltas. A 95% confidence interval should contain the true effect 95% of the time. This means that if we ran an experiment 100 times, the true value of the metric delta should be inside the confidence intervals 95 times.
In practical terms, a 95% confidence interval that doesn't contain zero (the green bar above) represents a statistically significant result (with α = 0.05). Only 5% of the time would we expect to see the confidence interval exclude zero if the true effect was zero (a.k.a. a false positive). Larger confidence intervals imply less certainty in the exact size of the effect with a larger range of likely values.
Computing Confidence Intervals
Confidence intervals in Statsig are calculated using a two-sample z-test. This test requires knowledge of the variance in the metric delta we're measuring, which is derived differently depending on the type of metric (details here).
Once we've established the variance of the delta, it's straightforward to compute the confidence intervals.
Two-Sided Tests
For the absolute metric delta, the confidence interval is given by:
where:
- is the z-critical value for the desired significance level (1.96 for the standard and 95% confidence interval) and we run a two-sided test
- is the variance of the absolute delta (details here)
Similarly, the confidence interval for the relative metric delta is:
One-Sided Tests
When running one-sided tests, the form of of the confidence interval calculation changes slightly to account for a redistribution of desired false positive rate when looking for increases or decreases in the metric: