Skip to main content

Contextual Bandit

Model

Autotune UI implements a variant of the LinUCB algorithm. This estimates a user's outcome for each variant, and incorporates the model's uncertainty to determine an "upper bound" for the prediction. This upper bound is used to select a variant, which is the explore mechanism; in layperson's terms the bandit is selecting for the "highest potential upside" predicted.

For example, if Autotune AI has two variants and for user "Bob" they have the following predictions for click-through rate:

  • Variant A: 6% ± 0.5%
  • Variant B: 4% ± 3%

The user will be served model B, even though A has a higher prediction, because the "upside" of B's prediction is 7% vs. A's 6.5%. A variant's prediction usually has higher variance when there's low sample, or when the relationship between features and outcome is still fuzzy. As more traffic is delivered, the uncertainty will shrink. After more samples, the values might go to:

  • Variant A: 5.9% ± 0.4%
  • Variant B: 4.2% ± 1.5%

In which case Autotune would serve A, since 6.3% > 5.7%.

Some helpful references:

Feature Types

Autotune AI works with categorical and numerical features. Whatever key-value pairs attached to the custom object on the Statsig user will be converted into categorical/numerical features based on their data type. Categorical features will generally be one-hot-encoded.

This means that you don't need to build complex training pipelines - whatever features you use for model evaluation will also be used for training.

Outcome Types

Autotune AI has a few different model types under the hood - this means it can be used for both classification use cases (will this user click a button) and for continuous outcomes (how much time will the user spend reading articles).

These models follow the approach described best by the LinUCB family of algorithm - normalizing data and creating a linear model to estimate an outcome, and applying the model uncertainty to score an upper bound.

For classification cases, Autotune AI will identify if any outcome occurs within its attribution window. For continuous cases, Autotune AI requires an event name and field name, and will use the numerical value associated with that field on the first observed event after exposure.

Advantages

The major advantage of a contextual bandit is its ability to optimize based on user attributes. This allows you to deeply optimize product and marketing outcomes beyond picking a single "best experiment". Contextual Bandits can also function with very little training data, and are re-trained hourly, meaning you have a functional personalization tool running within hours after launching your bandit.

For example:

  • Increasing Outcomes: If you're offering discount promotions as a user is checking out to increase completion rates, based on the total value of a user's cart, the user's spending history, and the user's country code they might respond differently to a "10% off" coupon vs. a "Free shipping" coupon
  • Avoiding Harm: If you want to show an upsell for referral codes to users, but don't want to burn users who won't share, Autotune AI can help you optimize for showing the unit to users with a high likelihood to copy the referral link and avoid showing the unit to users who will just dismiss it.

Disadvantages

The major disadvantages of a Contextual Bandit as compared to a Multi-armed bandit is the lack of convergence, as well as the potential for over-fitting to training data. We utilize regression formats (ridge, normalized logistic regressions, and others) which deliberately omit predictors with low signal and are turned avoid overfitting.

Additionally, contextual bandit models may not capture complex interactions between features that a more complex model (a well-tuned GBDT or neural network) may be able to exploit. Contextual bandits are a powerful personalization tool, but won't outperform a dedicated ML team.

Methodology and Notes

Samples Required

Contextual Bandits can start personalizing with extremely minimal traffic (10s of samples). However, your initial traffic will be mostly used for exploration.

Attribution

Attribution is done by joining the target event to downstream events within the attribution window. The first event observed, by logging timestamp, is the one used for metadata in metadata-based contextual bandits. For binary bandits (did an event happen), the flag is 1/0 for if there are more than 0 events during the attribution window.

If running multiple bandits, there's currently no attribution logic; if n bandits share the same outcome event in their attribution window, all bandits would consider that event as part of their outcome space.

Exploration

During the explore period, all units will be assigned a random variant. Afterwards, a small portion of traffic will still be sent a managed "Explore"/Random variant in order to keep the model from becoming stale. This decreases to a terminal 1% based on timeline and samples observed. The explore traffic is distributed inversely to the current distribution, so "rare" variants are up-sampled - this does generally mean underperforming variants are over-represented in explore to give them a chance.

Exploration explores can be seem in the log stream, with :explore appended.