Skip to main content

Running an A/A (aa) Test

In this guide, we will walk you through how to leverage Statsig’s platform to run an A/A test on your product.

This guide assumes that you have successfully set up and configured your Statsig SDK. For a step-by-step guide on how to do this, see our “Your first feature” guide.

Why run an A/A (aa) test?

There are many reasons to run an A/A test, one of the most common being to validate a new experimentation engine you may be integrating with (in this case Statsig). For new users just getting started with Statsig, we often recommend running an A/A test to provide a “low-stakes” first test environment, ensuring that you’ve got your metrics set up correctly and are seeing exposures flowing through as expected before kicking off your first real A/B test.

Here at Statsig, we are continuously running both live and offline simulation A/A tests on our stats engine- if you want to check one out for yourself, see this example in our Swaggr Demo Project!

How to run an A/A test

The easiest way to run an A/A test in Statsig is by leveraging a Feature Flag. You can also leverage an Experiment to run an A/A, but we chose to use a Feature Gate for this tutorial for simplicity.

Step 1: Create a new feature gate in the Statsig console

Log into the Statsig console at https://console.statsig.com/ and navigate to Feature Gates in the left-hand navigation panel.

Click on the Create button and enter the name and (optional) description for your feature gate. We will call our feature gate “aatest_example”. Click Create.

create_new_fg_empty

In the Setup tab, define the rules for this feature gate. Tap + Add New Rule. While you could run an A/A test on a specific user-group, platform, etc. the easiest setup is to simply divide all of your traffic 50/50 and deliver the same experience (your default product experience) to each group.

add_new_rule_empty

To do this, under Criteria select Everyone (you may need to scroll up), name your rule, and then change the Pass Percentage to 50%. Click Add Rule and that’s it! Tap Save Changes in the upper right-hand corner.

add_new_rule_filled

Your feature gate setup should now look as follows-

aa_rule_filled_out

Check that it is working as expected by typing in some dummy user IDs into the console- roughly 50% of the time your IDs should pass, and 50% of the time they should fail.

check_rule_pass

check_rule_fail

Step 2: Check the feature gate in your application and log an event

Copy the code snippet in the upper right hand corner of your feature gate page under the < > symbol and drop it into your application at the point you want to call the A/A check.

statsig.checkGate("aatest_example") 

Now when a user renders this page in their client application, you will automatically start to see a live log stream in the Statsig console when you navigate to the Diagnostics tab for your feature gate.

logstream

Step 3: Review A/A test results

Within 24 hours of starting your experiment, you'll see the cumulative exposures in the Pulse Results tab of your feature gate.

cumulative_exposures

This will break down your logged exposures (as well as the distribution of the logged exposures). If something looks off, check the Diagnostics tab for more granular, day-by-day exposure breakdowns at both the Checks and User level.

In the Metric Lifts panel, you can see the full picture of how all your tagged metrics are performing.

pulse_results_empty

What should you expect to see?

  • Exposures- make sure you’re seeing exposures flowing through as expected from your product. If you’re not seeing exposures, use the Diagnostics tab and the Exposure Stream to debug
  • Pulse results- roughly 5% of your metrics in Pulse should be showing a statistically significant change due to the 95% confidence interval of Statsig’s stats engine

We recommend running your A/A long enough to reach most of your weekly active users, or at least a week.

Offline AA Tests

We’ve made running A/A tests at scale easy by setting up simulated A/A tests that run every day in the background, for every company on the platform. An A/A test is like an A/B test - but both groups get the same experience. A/A tests help build trust in your experimentation platform (and your metrics!)

A/A tests can be Online or Offline. An Online A/A test is run on real users. An engineer instruments your app with the Statsig SDK to check for experiment assignment. Assignment is logged, but there's no difference in experience to the user.

Since there is no effect, you expect to only see statistical noise. When using 95% confidence intervals, only ~1 in 20 metrics will show a stat-sig difference between control and test.

Offline A/A tests

A single request runs on one unit type, and an offline A/A test works by

  1. Querying a representative sample of your data
  2. Randomly assigning subjects to Test or Control
  3. Computing relevant metrics for Test vs Control and running them through the stats engine
  4. You're looking for the % of false positives. If your p-value cutoff is 0.05 (typical), you'd expect a ~5% false positive rate.

You can download the running history of your simulated A/A test performance via the “Tools” menu in your Statsig Console. We run 100 tests per request.

File Description

Column NameDescription
metric_nameName of the Metric
metric_typeType of Metric
unit_typeThe unit used to randomize (e.g. userID)
n_testsThe number of tests run
pct_ss_95_pct_confidenceThe percentage of tests that have a stat-sig result for this metric
avg_units_per_testThe number of units (often users) sampled into the A/A test
avg_participating_units_per_testThe number of units in the test with a value for this metric

image