The Three Tenets of Experimentation System Design
By Igor Urisman, June 27, 2024.
In their influential 2017 paper Aleksander Fabijan et al postulate the following three paramount requirements for any experimentation service:
First, end users must be equally likely to see each variant of an experiment (assuming a 50-50 split). Second, repeat assignments of a single end user must be consistent i.e. the user should be assigned to the same variant on each successive visit to the product. Third, when multiple experiments are run, there must be no correlation between experiments. An end user’s assignment to a variant in one experiment must not have effect on the probability of being assigned to a variant in any other experiment.
Let’s consider these in detail and see how Variant stacks up and even challenges some of the assumptions inherent in these principles, starting with the second—for a more gradual material flow for those unfamiliar with our tech.
1. Targeting Durability
The most immediate problem with the statement “the user should be assigned to the same variant on each successive visit to the product” is that in a large proportion of use cases when the end-user is anonymous it is simply not possible to guarantee that the same end user will see the same variant on the next visit. Various naive solutions like using a cookie or the mobile device ID are too unreliable to be considered. Blanket targeting durability is simply infeasible.
The less immediate problem with the statement is that even if it were doable, durable targeting is frequently unnecessary and even unwanted. For example, consider testing variants that all present the same user interface, like optimizing a product suggestions algorithm by varying config parameter values. There’s no harm whatsoever if the user sees slightly different suggestions on a return visit. In fact, users likely expect the suggestions engine to be evolving as new options and user feedback are received regardless of any experiments. Furthermore, allowing the targeting decision to expire with the user session means that each user visit presents an independent trial, greatly increasing the experiment’s power.
Variant allows experiment designers to choose between three time-to-live settings: state
, session
, and experiment
for each experiment independently:
experiments:
- name: myExperiment
timeToLive:
targeting: experiment
Here, experiment
means that once targeted, this end-user will always see the same experience in a given experiment, provided the current user session belongs to a recognized user and is qualified for this experiment.
It is important to keep in mind that Variant’s domain model clearly separates the notions of targeting and qualification: only users who pass some qualification criteria are eligible for targeting. If a user is not qualified for an experiment, he is assigned to the control experience, but he does not contribute to the experiment readout. Conversely, if a user qualifies for an experiment, she contributes to the readout, whatever the experience her session has been targeted for.
Targeting and qualification need not have the same time-to-live. This independence affords unmatched flexibility, as illustrated in our demo application. There we demonstrate a simple experiment where free shipping is offered to unengaged customers who haven’t purchased recently. Clearly, whenever the offer is taken, we want it off the table immediately to prevent a double take in the same session. The following schema fragment achieves exactly that:
timeToLive:
targeting: experiment
qualification: request
The request
time-to-live scope is the least durable: the user session will be re-qualified for each state transition. (State is an abstraction of a web page, mobile view, call response center menu, etc.) Hence, as the user completes the purchase and navigates to the order confirmation page, he is re-qualified with the qualification hook failing the criteria.
2. Uniformity of Targeting Distribution
The first tenet postulates that user should be targeted to any experiment’s experiences randomly and uniformly. 50/50 is given as an example of, presumably a single variant experiment where half of the participants goes to control and half to the variant.
Such uniform distribution would produce a sound experiment readout, but it is unnecessarily and consequentially restrictive. Some more current research in experiment design, such as Simchi-Levi and Wang have advanced the idea of using a reinforcement learning algorithms like Multi-Armed Bandit in order to minimize cost associated with losing experiences. Such an approach will certainly require not only non-uniform, but variable targeting distribution. So long as at any point the targeting is random, the non-uniform distribution is perfectly okay.
At the time of this writing, Variant offers a weighted random targeting hook as part of its standard extension library. It can be easily configured in the experimentation schema:
experiences:
- name: foo
properties:
weight: 1
- name: bar
properties:
weight: 1.5
- name: baz
properties:
weight: 0.75
hooks:
- class: com.variant.spi.stdlib.lifecycle.WeightedRandomTargetingHook
By default, the hook looks for the weight
experience property, unless overridden in the init
key, like so:
experiences:
- name: foo
properties:
pondus: 1
- name: bar
properties:
pondus: 1.5
- name: baz
properties:
pondus: 0.75
hooks:
- class: com.variant.spi.stdlib.lifecycle.WeightedRandomTargetingHook
init:
key: pondus
To change the weight distribution, redeploy the new schema file on the server and it will take effect immediately for all new sessions.
3. Targeting Isolation
The requirement that “an end user’s assignment to a variant in one experiment must not have effect on the probability of being assigned to a variant in any other experiment” seems unassailable: any deviation from random targeting would call into question our ability to interpret correlation as causation. How to guarantee this independence is less intuitive and the paper leaves it up to experiment designers and application programmers to figure out. None of the incumbent experimentation products offer any help. Rather, in a typical experimentation practice, experiment designers have to evaluate a proposed experiment for possible interference with existing experiments.
Conversely, Variant server knows enough about each experiment’s topology to provide helpful semantics as a matter configuration. In particular, Variant knows if two experiments are serial (no states in common), or concurrent (share one or more states). Serial experiments by default are targeted for independently, as if the other didn’t exist. Although intuitive and likely acceptable in a vast majority of use cases, this default may be overly optimistic in certain cases. For example, two incompatible promotions may be instrumented on different pages, but it’s conceivable that a user session could see both of them. In such instances, the default can be overridden with a custom qualification hook or simply by the consonance schema spec:
experiments:
- name: small
experiences:
- name: noPromo
isControl: true
- name: promo
onStates:
- name: homePage
- name: bigPromotion
disqualWhen:
liveExperiences: [small.promo]
onStates:
- name: checkoutPage
Here, the user session will be disqualified from the bigPromotion
experiment, if it already has has been targeted to small.promo
.
Unlike serial experiments, concurrent experiments are by default mutually exclusive. This default is implemented by the default qualification hook, which is posted if no other qualification hooks were instrumented or none returned a result. The default qualification hook disqualifies any user session from an experiment if the session already has been targeted for a concurrent experiment. This default can be overridden with a custom qualification hook or in the schema by the concurrentWith
schema spec:
experiments:
- name: exp1
experiences:
- name: control
isControl: true
- name: variant
onStates:
- name: homePage
- name: exp2
experiences:
- name: control
isControl: true
- name: variant
onStates:
- name: homePage
concurrentWith: [exp1]