Skip to main content

Your First A/B Test

In this guide, you will create and implement your first A/B/n test. While you can use Statsig's Feature Flags to roll out new features, Statsig's Experiments enable you to run all kinds of A/B tests, from simple bivariant (A vs. B) experiments to multi-variant experiments ( A vs. B/C/D/n) and mutually exclusive experiments.

This guide is for a user-level experiment

Most experiments randomize the users that are exposed to different variants. We call these user-level experiments. To randomize based on devices, say when you don't yet have a userID (e.g. for unregistered users), consider a Device-level Experiment.

Do I need a large sample size?

This is a common question that comes up in the context of A/B tests. Most assume (incorrectly) that they can't get statistically significant results with small sample sizes. Here's a good article that clears this up:

You don't need large sample sizes to run A/B Tests

Step 1 - Create a new experiment in the Statsig console

Log into the Statsig console at https://console.statsig.com/ and navigate to Experiments in the left-hand navigation panel.

Click on the Create button and enter the name and hypothesis for your experiment.

We are naming this experiment:

onboarding_banner

And the hypothesis:

A new onboarding banner will improve the onboarding flow completion rate

Click Create.

In the Setup tab, you can fill out the scorecard for the primary metrics you are interested in watching. These key metrics should reflect your hypothesis for the experiment. Establishing a hypothesis upfront ensures that the experiment serves to improve your understanding of users.

Since you haven't logged any events for this fictional example, lets just set "dau (daily active users)" as the primary metric.

Underneath the scorecard, you can see a tree view of the experiment configuration which illustrates how users will be assigned to different variants.

First, you can set the allocation percentage of this experiment (default 100%). This controls the percentage of users that will be considered for this experiment.

Then, you can set targeting criteria. If you only want this experiment to run on a specific app version, or for a specific segment of users, use this criteria to filter down to which users should be assigned a group.

Next, set the number of experiment groups and the percentage of users to allocate to each group. In this example, we will keep the default 2 groups.

Finally, there are parameters. These are the values in code that will be different for each group. In this example, we will create a parameter called "enable_banner", a boolean. Create that parameter, and then set it to false for the test group and true for the experiment group.

Here's what our setup looks like. Don't forget to hit "Save" in the bottom right to finalize this setup.

Experiment Setup

Once you have your experiment setup, it won't run until you hit "Start" at the top of the page. This will make production traffic eligible for the experiment. You can test the configuration in non production environments first, but you won't get results until you start the experiment.

Step 2: Install and Initialize the Statsig SDK

In this guide, we use the Statsig Javascript client SDK. See the Statsig client or server SDKs docs for your desired programming language and technology.

You can install the Statsig SDK via npm, yarn or jsdelivr:

npm install @statsig/js-client

After you install the SDK, you must initialize the SDK using an active Client API Key from the API Keys tab under Settings in the Statsig console.

In addition to the Client API key, you must also pass in a Statsig User ID to ensure users can be assigned and tracked for different variants of the experiment.

...

When calling StatsigClient.initializeAsync, the client will load values from cache and provide them to the client. The client will then fetch the latest values from the network, providing the ability to await the latest values from Statsig.

Step 3: Check the experiment in your application and log an event

Get your experiment's configuration parameters to enable the user to see the experience you want to create with each variant. In this case, we fetch the value of the shape parameter that we had created earlier.

const expConfig = myStatsigClient.getExperiment("onboarding_banner");

const showBanner = expConfig.get("enable feature", false);

Now when a user renders this page in their client application, you will automatically start to see a live log stream in the Statsig console when you navigate to your experiment and select the Diagnostics tab.

Experiment Diagnostics

If you want to measure how your users are responding to different variants of the experiment, you need to log events to Statsig. To do this, call the LogEvent API with the key measure that you may want to aggregate as well as other metadata that you may use as additional dimensions to breakdown the aggregate results. Events logged to Statsig are associated with the user identifiers, and not experiment-specific.

myStatsigClient.logEvent('add_to_cart', 'groceries', {'price': '9.99', 'SKU_ID': 'SKU-123'});

As the experiment progresses, you will see how many experiment checks occur and how many events are logged on an hourly basis in the Diagnostics tab.

Step 4: Review experiment results

Within 24 hours of starting your experiment, you'll see the cumulative exposures in the Pulse Results* tab

You will also start to see the key results for each test group compared to the control. In the Scorecard panel, you can see the full picture of how all your tagged metrics have shifted in the experiment. Here's a sample of what that looks like:

Example experiment results

After your experiment has run for your predetermined duration, you can start to make a decision based on the results in the Scorecard panel.

Alternatively, you can choose to ramp up the experiment to a larger user base by increasing the allocation in the Setup tab. We recommend scaling up allocation slowly in the initial stages, and then by a larger margin in later stages, say 0% -> 2% -> 10% -> 50% -> 100%. This helps you get more data and build more confidence before you decide to ship the experiment.