p-Value Calculation
In hypothesis testing, the p-value is the probability of observing an effect larger than or equal to the measured metric delta, under the assumption that the null hypothesis is true. In practice, a p-value that's lower than your pre-defined threshold () is treated as evidence for there being a true effect.
The methodology used for p-value calculation depends on the number of degrees of freedom (). A two-sample z-test is appropriate for most experiments. Welch's t-test is used for smaller experiments with . In both cases, the p-value depends on the metric mean and variance computed for the test and control groups.
Two-Sample Tests
Two-Sided z-Test
The z-statistic (a.k.a. z-score) of a two-sample z-test can be computed in multiple equivalent formats:
where:
- is the observed z-statistic (not the z-critical value )
- is the variance of the absolute delta of means
- is the variance of sample means either control or treatment group (details here)
- is the standard error of the mean of either control or treatment group (these are the terms you can find in Pulse under the Statistics tab of a metric)
The two-sided p-value is obtained from the standard normal cumulative distribution function: