Release 1.3.1, May 2024
1 Key Concepts
1.1 Code Variations
Software application development is accelerating. Many leading teams release new code continuously, deploying each independent code delta as soon as it’s ready, sometimes multiple times per second. In such high-velocity operational environment it’s critical to diminish the risk of defects. One of the mainstays of defect reduction in software development is the use of code variations — the term we use to denote a bifurcation point in an application code path where alternate code paths temporarily co-exist and which path is taken is determined at run time by an external component. There are two classes of use cases which call for the instrumentation of such code variations, which are described below.
1.1.1 Online Controlled Experiments
In an online controlled experiment, a modification to the existing user experience co-exists, for a time, with the original experience. User traffic is split randomly between the two experiences, and measurements are collected of some target metric, e.g. rate of conversion to the next page. In scientific terms, the existing experience serves as control and the new experience as treatment. The experiment is said to succeed if it reaches statistical significance — a mathematical term connoting that a) the number of measurements taken is large enough, and 2) the observed difference between the metric’s values in control and in treatment is large enough, to conclude that this difference is far more likely due to the difference between the two experiences, than to mere chance.
For example, you may want to run an experiment to find out the optimal minimum order amount you can ask in return for free shipping. In such an experiment you offer several experiences, each requring a different minimum order amount and target your user traffic to these experiences randomly. As your customers pass through these experiences you can compare the offer take rate and your revenue lift between the treatments.
Note, that in the case of online controlled experiences, in order to interpret correlation as causation, session targeting must be random, because randomness serves a natural control for everything other than the difference in user experience itself.
1.1.2 Managed Feature Flags
The other use case for code variations is feature flags, sometimes also called feature roll-outs. They refer to a software delivery practice, where a new product feature is rolled out gradually to a carefully controlled group of customers before it is made generally available. Whenever you roll out a new product feature, a feature flag enables you to first publish it to a limited population of users, while sending all others into the stable existing experience. If all goes well, you gradually increase traffic into the new code path until you reach full production, at which point the existing code path can be discarded. But if a defect is discovered, the new feature can be temporarily toggled off until the problem is fixed.
In contrast with online controlled experiments, when instrumenting feature flags you will likely use some deterministic targeting rules for your user traffic. For example you may want to start by allowing into the new code only internal users, then users organization ID, etc.
1.2 Interactive Application as a Graph
The only assumption Variant makes about the host application is that it is interactive, i.e. pauses for and responds to user input. Its control flow is commonly represented with traversal graphs, like in Figure 1 below. Here the nodes represent the interface states where the system awaits user input and the arcs represent the application responding to user input. Interpreted as a state machine, each node is also a state of the application.
Figure 1. An interactive application modeled as a state graph. (Source: Offutt et al, 2004)
Irrespective of the user interface mechanism, the host application pauses in an application state awaiting user response. These application states render some user interface and provide the means for the user to respond. Depending on the type of the host application, this interface may be manifested as a computer desktop window (desktop application), an HTML page (Web application), an activity (Android mobile app), a phone menu (an IVR application), an XML document (RESTful API), etc.—these details are not relevant to the Code Variation Model (CVM). A traversal of a set of interface states, as user transitions from one Web page or one telephone menu to the next is what constitutes a user experience, which can be more strictly defined as some connected segment of the application state graph.
Typically, each application state exists in a single variant. (There’s only one checkout page.) To model alternate code paths, CVM distinguishes between a base state and zero or more of its state variants. At runtime, the host application chooses which state variant to traverse. These alternate code paths is what we call code variations. The control user experience is the one that traverses the base states, while a variant user experience is one that traverses variant states.
Whenever the host application receives user input, it navigates to the next interface state. If the target interface state exists in more than one variant, the host application must pick a viriant to present to the user, Figure 2 below:
Figure 2. A state transition without code variation (A); and with code variation (B).
In the regular, uninstrumented case (A), the application simply figures out the next state based on the user’s input, carries out requisite computations, renders the state’s interface to the user, and pauses for user input. However, if the next state is instrumented by one or more code variations (B), the host application has a set of additional state variants it can choose from. It is exactly this task of figuring out the particular state variant that the host application delegates to Variant server, just like it delegates to a database server the task of storing data on disk.
The part in the state transition where the host application defers to Variant for targeting is called a state request. A Variant session is, in the nutshell, a succession of state requests plus the common session state, preserved between the requests.
1.3 Code Variation Model (CVM)
Code Variation Model is a domain model for code variations. It offers a formal framework for defining code variations and for reasoning about them. Its key practical benefit is to provide a way to externalize the metadata for a set of related code variations into human readable schema files managed centrally by Variant server. These schemas enable developers to define variations declaratively, rather than programmatically, leaving the implementation details to be handled by the Variant server.
This removes oodles of instrumentation code from the host application. The application developer uses familiar tools to implement new application behaviors, unconcerned with how these new code paths will be instrumented as experiments or feature flags. This instrumentation is accomplished with only a few lines of glue code facilitating the communication between the host application and the Variant server, and the experiment schema containing the complete configuration of all code variations and managed entirely by the Variant server. The Variant server handles the rest, hiding enormous amounts of complexity from the application developer.
This clean separation between implementation and instrumentation dramatically reduces the amount of code the application developer must write in order to instrument code variations. In fact, it takes the same number of lines of code (about a dozen) to instrument your 100th concurrent experiment as it takes you to instrument your first.
In other words, the complexity of instrumenting N concurrent code variations = O(N).
1.4 Simple Variation Schema
A minimal valid variation schema consists of a single state, instrumented by a single variation with a single experience, as in the following listing, where we model a feature flag protecting the new code path which adds reCAPTCHA to the existing password reset page.
# A very simple variation schema take 1.
name: minial_schema
states:
- name: passwordResetPage
variations:
- name: recaptcha
experiences:
- name: recaptcha
onStates:
- state: passwordResetPage
Listing 1. A minimal valid variation schema with compiles but does nothing useful.
To deploy this schema, simply copy the file into the server’s schemata
directory. Although this schema file parses without error, it does not do much yet because all traffic will be qualified by the default qualifier and targeted for the only experience recaptcha
. To make this feature flag more useful, we need to add a custom qualification hook, which will qualify into this variation not all, but only certain users. For example, the following hook qualifies only those users whose IDs were passed to its constructor in a space separated string:
package mycompany.variant.spi;
import com.variant.server.spi.QualificationLifecycleEvent;
import com.variant.server.spi.QualificationLifecycleHook;
import com.variant.share.yaml.YamlList;
import com.variant.share.yaml.YamlNode;
import com.variant.share.yaml.YamlScalar;
import java.util.Arrays;
import java.util.Optional;
/**
* Custom qualification hook qualifies user IDs provided at initialization.
*/
public class RecaptchaQualificationHook implements QualificationLifecycleHook {
private final String[] qualifiedUserIds;
public RecaptchaQualificationHook(YamlNode<?> init) {
qualifiedUserIds =
((YamlList)init).value().stream()
.map(node -> ((YamlScalar<String>)node).value())
.toArray(String[]::new);
}
@Override
public Optional<Boolean> post(QualificationLifecycleEvent event) {
Boolean isQualified = event.getSession().getOwnerId()
.map(userId -> Arrays.stream(qualifiedUserIds).anyMatch(userId::equals))
.orElse(false);
return Optional.of(isQualified);
}
}
Listing 2. Custom qualification hook qualifies into a code variation only those users whose IDs are passed to its constructor.
To add this hook to the variation use the hooks
key:
# A very simple variation schema take 2.
name: minimal_schema
states:
- name: passwordResetPage
variations:
- name: recaptcha
experiences:
- name: recaptcha
onStates:
- state: passwordResetPage
hooks:
- class: mycompany.variant.spi.RecaptchaQualificationHook
init: [USERID1 USERID2 USERID3]
Listing 3. The minimal valid variation schema that does something useful.
Now, whenever the Variant server needs to qualify a session for the recaptcha
feature flag, it will delegate to the RecapthaQualificationHook
hook which will only qualify into the feature those user sessions whose user IDs match those provided in the init
list.
1.5. Session Qualification and Targeting
1.5.1. Qualification vs. Targeting
Variant’s Code Variation Model clearly distinguishes between qualification and targeting. Let’s go back to the example introduced in Section 1.1.1:
For example, you may want to run an experiment to find out the optimal minimum order amount you can ask in return for free shipping. In such an experiment you offer several experiences, each requring a different minimum order amount and target your user traffic to these experiences randomly. As your customers pass through these experiences you can compare the offer take rate and your revenue lift between the treatments.
Suppose now that you do not wish to combine the offer free shipping with some other promotion. Clearly, this constraint is a matter of qualification, and not targeting. Users already in another promotion should be disqualified from the free shipping experiment—not merely targeted to the control experience. Rather, Variant server assigns disqualified sessions to the control experience triggering no state visited events as it passes through the experiment for which it is not qualified.
Before any actual targeting can take place, a user session must be first qualified. Only qualified sessions will be targeted to an experience through some randomized mechanism. A qualified session may also end up being targeted to the control experience, but this time Variant server will trigger state visited events for it, in contrast to the disqualified case.
1.5.2. Durability
When a user session is first qualified or targeted for a variation, schema designer has the choice of the effective lifespan of these decisions. For example, it is typically desirable that a user continues to see the same experiment experience at least for the remainder of the current user session. Moreover, there are many examples when qualification or targeting information must be preserved between session, for example to ensure that returning users continue continue through a multi-page wizard without the risk of inconsistencies if they switch to a different device.
Variant allows experiment designer choose between three durability scopes: state
, session
and durable
. A variation’s targeting durability is specified in the schema independently from its qualification durability so they do not have to be the same. For example, a user’s eligibility for an experiment, e.g. related to a promotion, may vary from visit to visit. But whenever she is qualified, the experiment designer typically wants the her to see the same experiment experience.
- State scoped durability means that the outcome of qualification or targeting is not reused. A variation with state-scoped qualification durability will be re-qualified for each state request, and a variation with state-scoped targeting durability, if qualified, will be re-targeted for each state request.
- Session scoped durability means that the outcome of qualification or targeting is reused for the duration of this user session. A variation with session-scoped qualification durability will be qualified once per session and reused for the remainder of this session, but will be re-qualified in a different session. A variation with session-scoped targeting durability will be targeted once per session and reused for the remainder of this session, but will be re-targeted in a different session.
- Durable scoped durability means that the outcome of qualification or targeting is reused for the entire lifespan of the variation, or, effectively, forever. A variation with durable qualification will be qualified once, and the qualification decision will be reused for as long as this variation is defined in the schema. A variation with durable targeting will be targeted once, and the targeting decision will be reused for as long as the variation is defined in the schema.
State and session durability levels do not require any support from the host application. However, durable targeting or qualification can only be accomplished for recognized session for which the host application supplies a durable owner ID when it creates a new Variant session. Variant uses this owner ID to store qualification and targeting information in an embedded database. Therefore durable decisions survive not only session expirations, but also Variant server restarts.
1.6. Concurrent Variations
1.6.1. Definitions
If two variations instrument no states in common, they are called serial variations; a user session can only traverse them one at a time. Conversely, whenever two code variations instrument some states in common, they are called concurrent variations because a user session may be traversing them at at the same time. (The term “overlapping experiments” is also commonly used.)
Concurrent variations are more likely than it may first seem, because of the Pareto principle; your users spend 80% of their time on 20% of your pages. These higher-contention code paths are very likely to be instrumented by multiple concurrent experiments and feature flags. Variant’s Code Variation Model gives you a cogent abstraction to manage this concurrency.
In Figure 3 below, the Blue and the Green variations are serial, but the Red variation is concurrent with both of them.
Figure 3. Concurrent experiments. Blue and Green variations are serial, while Red is concurrent with both Blue and Green. The grey boxes denote control states, while the colored ones denote state variants.
When a user session targets a state that is instrumented by two or more variations, Variant server creates a state variant space of possible experience permutations from which any combination of state variants can be chosen. For example, the state S2
is instrumented by Blue and Red variations. Blue only has one variant experience and Red has two variant experiences, so the complete variant space of the state S2
has 6 cells:
Figure 4. Variant space of the state
S2
has one control, three proper, and two hybrid state variants.
The relationship of concurrence between two variations V1 and V2 has the following properties:
- Symmetric: If variation
V1
is concurrent with variationV2
, thenV2
is concurrent withV1
. The variation schema grammar takes advantage of this by requiring the concurrency relationship to be defined by the variation that appears in the schema after the referenced variation. - Not Reflexive: a variation is not concurrent with itself.
- Not Transitive: If
V1
is concurrent withV2
andV2
is concurrent withV3
, thenV1
andV3
need not be concurrent.
Variant server supports two runtime strategies for managing concurrent variations: a simplified, pseudo-serial concurrency, as well as the unconstrained concurrency, as discussed in the next two sections.
1.6.2. Pseudo-Serial Concurrency
First, let’s consider the pseudo-serial strategy, where no user session is targeted for a hybrid state variant. To support Blue variation by itself, application developer needs to implement the S2blue
experience. Similarly, to support Red variation in isolation, (probably some other) developer needs to implement its two variant experiences S21red
and S22red
. This is a perfectly acceptable scenario. So long as Variant targets no user session to variant experiences in both variations, everything works and the two developers need not know other one another’s work.
This type of constrained pseudo-concurrency is the default behavior. Unless instructed otherwise, Variant will not target a user session to a hybrid experience. This default makes sense: application developers should not have to coordinate with each other simply because they work on potentially overlapping features.
However, this convenient default comes with a price:
- Potential starvation of downstream variations of user traffic, because only only those sessions targeted to a control experience in an upstream concurrent test will be eligible to be targeted in the downstream test.
- Potential bias in downstream variations due to induced control targeting.
1.6.3. Unconstrained Concurrency
In the unconstrained concurrency approach, a session’s ability to participate in Red variation is not constrained by its participation in Blue variation and vice versa. To instrument two properly concurrent variations, the application developer must notify Variant server that by using the concurrentVariations
schema key.
Listing 4 below is the complete variation schema for the Blue, Red and Green variations from the Figure 3 above. To illustrate both concurrency modes, Red and Blue variations as defined as concurrent but not Green and Red variations.
name: 'Tricolor Schema'
description: 'Demonstratges instrumentation of concurrent variations on Figure 5'
states:
- name: S1
- name: S2
- name: S3
- name: S4
variations:
- name: Blue
experiences:
- name: grey
isControl: true
- name: blue
onStates:
- state: S1
- state: S2
- name: Red
# Red is conjointly concurrent with Blue
concurrentVariations: [Blue]
experiences:
- name: grey
isControl: true
- name: red_1
- name: red_2
onStates:
- state: S2
- state: S3
- name: Green
# Serial with Blue and concurrent with Red
experiences:
- name: grey
isControl: true
- name: green
onStates:
- state: S3
- state: S4
# S4 does not exist in control.
experiences: [green]
Listing 4. The Tricolor variation schema of concurrent tests from Figure 3.
Note the explicit experiences
key t for the Green variation on state S4
. It is needed in order to alert Variant that the control experience is not defined on S4
.
2. Variant Platform Architecture
2.1. Overview
Variant server is deployed on the network local to the host application and its operational database, either on premises or on the customer’s own compute instance in the cloud. (A fully managed Variant Platform-as-a-Service is under development.) Being on the same local network facilitates reliable real-time integration with the operational data for the purposes of qualification and targeting.
The following diagram presents a high-level overview of the different components of Variant software platform:
Figure 5. Variant Platform Architecture.
2.2 Client-Facing Network API
Each component of the host application that wishes to participate in an experiment or a feature flag communicates with Variant server via a native client SDK. Typically, only one instance of the Variant API handle is necessary per process, and only one connection is necessary per variation schema. If you need to connect to multiple schemas, each schema requires a separate connection handle.
At the time of this writing only the Variant Java Client library is available.
2.3. Server-side Extension SPI
Variant server’s functionality can be extended through the use of the server-side extension service programming interface (SPI), which enables user-defined code to be directly executed by the server process. The server-side SPI exposes Java bindings which facilitate injection of custom semantics into the server’s default execution path via an event subscription mechanism. Two types of user-defined event handlers are supported:
- Lifecycle Hooks are handlers for various lifecycle events, such as when a session is about to be qualified for a variation. A user-defined handler can implement a custom qualification logic, e.g. checking if the user is already registered, if, e.g. a feature flag is only open to unregistered users.
- Trace Event Flushers handle the egest of trace events into external storage, like a database or an event queue. Each variation schema can have its own event flusher.
Refer to the Server-Side Extension SPI User Guide for more information.
2.4 The Lifecycle of Code Variations
The other responsibility of the Variant server is the management of code variations’ lifecycle, such as creation, alteration, suspension, resumption, etc. of experiments and feature flags. These actions are triggered by changes to variation schema files residing on the server the /schemata
directory. Each schema file is a YAML file, containing definitions of related code variations, or code variation metadata. A schema is first deployed to the Variant server when its YAML file is placed into the /schemata
directory. A schema is undeployed from the Variant server when its YAML file is removed from the /schemata
directory. Whenever a schema file is modified in place in the /schemata
directory, the Variant server detects the changes and attempts to redeploy the schema.
A single server instance can manage an unlimited number of variation schemas.
2.5 Distributed Session Management
Variant maintains its own user sessions, instead of relying on the native user sessions, maintained by the host application, e.g HTTP sessions. Variant user sessions are distributed; changes made to a user session by a Variant client are available to all concurrent Variant clients. This architecture is particularly attractive to modern distributed applications made up of multiple service components. Any such component, connected to a Variant server, can get a hold of a user session by its session ID and obtain its most up-to-date shared state.
Variant does not guarantee an automatically consistent view of the session’s shared state to all concurrent clients; a change made by one client is not automatically visible to others. Attempting to provide such a strong guarantee would be too expensive while still susceptible to race conditions. However, the Variant client interface provides a way for the application programmer to explicitly synchronize the state when required.
2.6 Advantages of the Variant Architecture
Other benefits of variation schema include
- Separation of artifacts’ lifecycles. Experimentation and feature flagging metadata is source controlled in their own artifacts, independently of the host application. This makes it easy to make trackable changes to experiments and feature flags external to the host application’s code base.
- Separation of workloads. The runtime workload associated with experiments and feature flags is handled by the Variant server out of band of the host application. The Variant server runs on separately provisioned compute and network resources, leaving the host application’s resources unaffected by increase of decrease in experimentation workload.
3. Variation Schema Reference
3.1 Syntactical Conventions
The Variant server manages code variation metadata in human readable YAML files, called schema files. Each schema file contains a single CVM schema describing a set of related code variations instrumented on some host application using the familiar YAML syntax.
All schema keys (nouns to the left of the :
) are case-insensitive reserved keywords that have specific meanings. For example, 'name:'
, 'Name:'
, or 'NAME:'
are interchangeable.
The following conventions are used throughout this section:
any-string | Arbitrary, case sensitive, Unicode string. Follow YAML’s escape rules if you want a string contain special characters. |
name-string | Case insensitive string containing only Unicode letters, digits or the ‘_’ (underscore) and not starting with a digit. For example, _mySchema is a valid name and is indistinguishible from, e.g. _MYSCHEMA , but 3rdField is not a valid name. |
boolean | YAML boolean scalar value of true or false . |
number | YAML numeric scalar value. |
fragment | Arbitrary YAML fragment |
type | YAML mapping (dictionary) of some type. |
[type] | YAML sequence of mappings of a some type. |
3.2 Schema Top Level Keys
The top-level keys in the CVM grammar are:
Key | Type | Required | Comment | Default |
---|---|---|---|---|
name | name-string | Yes | The schema name. Must be server-wide unique. | |
description | any-string | No | Optional description. | None |
| [state] | Yes | A list of the host application’s interface states. | |
variations | [variation] | Yes | A list of potentially inter-dependent code experiments and feature flags. | |
flusher | flusher | No | Defines a schema-specific trace event flusher. Applies to all trace events generated by the variations configured by this schema. | Server-wide default1. |
hooks | [hook] | No | A list of schema-scoped lifecycle hook specifications. Hooks defined at this scope apply to all states and all variations defined by this schema. | [] |
1 Configured with the variant.event.fluher.*
config parameters.
Code variations are complex structures packing most of CVM’s expressive power. At a minimum, a variation must have a name, at least one experience of which exactly one must be declared control, and at least one onStates
element, establishing the list of states on which this experiment or feature flag is instrumented. The control experience typically represents the existing code path and zero or more variant experience(s) represent the alternate code path(s).
3.3 States
The states
sequence key contains a list of state elements, each representing a node in the application state graph. A state element has the following keys:
Key | Type | Required | Comment | Default |
---|---|---|---|---|
name | name-string | Yes | This state’s name. | |
parameters | param-map | No | A map of state-scoped user-defined parameters. | {} |
hooks | [hook] | No | A list of state-scoped lifecycle hook specifications which apply only to this state. | [] |
Example:
states:
- name: state1
parameters:
key1: "a string"
key2: "a string"
- name: state2
hooks:
- class: mycompany.variant.spi.RecaptchaTargetingHook
All state-scope hooks must listen to StateAwareLifecycleEvent
s.
3.4. Variations
3.4.1. Variation Top Level Keys
Code variations are described in the variations
list key. Each variation element is a mapping with the following keys:
Key | Type | Required | Comment | Default |
---|---|---|---|---|
name | name-string | Yes | The variation’s name. | |
experiences | [experience] | Yes | This variation’s experiences. | |
onStates | [on-state] | Yes | The mappings of this variation to states it instruments. | |
isOn | boolean | No | Is this variation online? | true |
concurrentVariations | [name-string] | No | A list of previously defined variation names conjointly concurrent with this variation. | [] |
durability | durability | No | Durability specification | Session s oped durability for both qualification and targeting. |
hooks | [hook] | No | A list of variation-scoped lifecycle hook specifications. Hooks defined at this scope apply only to this variation. | [] |
parameters | param-map | No | A map of variation-scoped user-defined parameters. | {} |
For example:
variations:
- name: myVariation
experiences:
- name: control
isControl: true
- name: variant
onStates:
- state: state1
- state: state2
durability:
targeting: durable
The isOn
property is used to turn an experiment or a feature toggle temporarily offline without removing it from the schema. No sessions are targeted for an offline variation, as if it didn’t even exist. In fact, the only differences between an offline variation and a variation that is completely removed from the schema is that if it defines its targeting or qualification durability as variation
, this information is preserved so that, when an offline variation is taken back online, return users will continue seeing the same experience they saw before the variation was taken offline.
All variation-scoped hook must listen to VariationAwareLifecycleEvent
s.
3.4.2. Variation Experiences
Each element of a variation’s experiences
list describes one of its experiences.
Key | Type | Required | Comment | Default |
---|---|---|---|---|
name | name-string | Yes | The variation’s name. | |
isControl | boolean | Yes, unless this variation has only one experience. | Defines whether this experience the control experience in this variation. | true if the only experience in this variation, false otherwise. |
parameters | param-map | No | A map of experience-scoped user-defined parameters. | {} |
3.4.3. Durability
Variation durability determines the retention rules for qualification and targeting decisions with respect to the given variation. The complete durability specification looks as follows:
durability:
qualiafication: state|session|durable
targeting: state|session|durable
state | The decision is retained for the duration of the state request only. |
session | (Default.) The decision is retained for the curation of user session, but not persisted. |
durable | The decision is persisted and retained for as long as this variation remains in the schema. |
3.4.4. OnStates
The onStates
key contains a list of elements, each of which describes this variation’s instrumentation details on a particular state. Whenever a variation instruments a state with an onStates
element, this implies an obligation, on the part of the host application, to provide an implementation of a state variant for any experience defined by the variation.
An onState
element can have the following keys:
Key | Type | Required | Comment | Default |
---|---|---|---|---|
state | name-string | Yes | The name of the state being mapped by this onStates element. | |
experiences | [name-string] | No | The list of this variation’s experiences defined on this state. | The list of all of this variation’s experiences. |
variants | [state-variant] | No | The list of this state’s variants. | [] |
For example, the following listing defines an experiment on two consecutive pages of a signup wizard:
states:
- name: page1
- name: page2
variations:
- name: WizartTest
experiences:
- name: control
isControl: true
- name: variant
onStates:
- state: page1
- state: page2
If the experiment is testing a new combined page that replaces the two existing pages with only one, the experiences
key must be used to explicitly list those experiences that are defined:
states:
- name: page1
- name: page2
variations:
- name: WizartTest
experiences:
- name: control
isControl: true
- name: variant
onStates:
- state: page1
- state: page2
experiences: [control] # page2 is not instrumented by variant experience
The experiences
list on line 13 implies an obligation, on the part of the host application, not to attempt to target a session for page2 if its live experience is WizardTest.variant
. Doing so will result in a runtime error.
Conversely, if we wanted to test splitting an existing page1
into two new pages page1
and page2
, line 13 would list the variant experience instead:
states:
- name: page1
- name: page2
variations:
- name: WizartTest
experiences:
- name: control
isControl: true
- name: variant
onStates:
- state: page1
- state: page2
experiences: [variant] # page2 is not instrumented by control experience
3.4.5. State Variants and State Parameter Resolution
For each element of the onStates
list Variant schema parser creates the state variant space as a Cartesian product of the set of this variation’s experiences and the experience sets of all variations conjointly concurrent with this variation and also defined on this state. All state variants in this variant space implicitly inherit the state parameters as defined for this state, if any.
However, in some cases, it is useful to also create state parameters at the state variant level. This is accomplished with the variants key:
# ...
onStates:
- state: state1
variants:
- experience: variant
parameters:
path: '/path/to/something'
In most cases, this inferred state variant space is sufficient, and you will only need to define state variants explicitly if you wish to override one or more state variants.
An element of the variants
key can contain the following keys:
Key | Type | Required | Comment | Default |
---|---|---|---|---|
experience | name-string | Yes | The name of this variation’s experience. Cannot be the control experience | |
concurrentExperiences | [name-string] | No | The list of concurrent experiences defining this state variant. Cannot be a control experience. | []. |
parameters | {name:value,...} | No | Arbitrary properties dictionary. | {} |
If a state parameter is defined at both the base state and a state variant, the state variant value overrides the base value:
name: example
states:
- name: state1
# State parameters, specified at the state level,
# provide the base values for all variants of this state.
parameters:
key1: value1
key2: value2
variations:
- name: variation1
experiences:
- name: existing
isControl: true
- name: variant
onStates:
- state: state1
variants:
- experience: variant
# State parameters, specified at the state variant level,
# at runtime override the likely-keyed base values within
# the scope of the enclosing state variant.
parameters:
key2: 'value2 in state variant'
key3: 'value3 in state variant'
At runtime, Variant will return the following values to the host application:
stateRequest.getResolvedStateParameters().get("key1"); // "value1"
stateRequest.getResolvedStateParameters().get("key2"); // "value2 in state variant"
stateRequest.getResolvedStateParameters().get("key3"); // "value3 in state variant"
This mechanism of state parameter overrides is a convenient way for the developer to introduce application state into the schema at both global and local scopes.
3.5. Flusher
The schema definition of a schema flusher has the following components:
Key | Type | Required | Comment | Default |
---|---|---|---|---|
class | any-string | Yes | The fully qualified name of the Java class implementing the flusher. | |
name | name-string | No | This hook’s name. | The simple (unqualified) class name. |
init | fragment | No | Any YAML value. | None |
Example:
name: my_schema
flusher:
- class: mycompany.variant.spi.CustomFlusher
init:
endpoints: [http:/some.url http:/some.other.url]
In this example, CustomFlusher
‘s constructor will know what to do with the two URLs supplied in the init key. Refer to the Variant CVM Server-Side Extension SPI User Guide for more information.
3.6. Common Schema Components
3.6.1 User-Defined Parameters
User-defined parameters (UDPs) help host applications enrich variation schema with application-specific state. They are simple read-only key/value pairs of strings, whose semantics are entirely up to the host application.
UDPs can be attached to a state or to a variation:
name: example
states:
- name: state1
parameters:
key1: 'state param 1 in state1'
key2: 'state param 2 in state1'
variations:
- name: variation1
experiences:
- name: new_feature
parameters:
key1: 'variation param 1 in variation1'
key2: 'variation param 1 in variation1'
onStates:
- state: state1
State UDPs can be further specified at the state variant level:
name: example
states:
- name: state1
parameters:
key1: 'state param 1 in state1'
key2: 'state param 2 in state1'
variations:
- name: variation1
experiences:
- name: exp1
onStates:
- state: state1
variants:
- experience: new_feature
parameters:
key1: 'overrides value of key1 in state1 and live experice exp1'
key3: 'adds a new parameter in state1 and live experience exp1'
At runtime, the values of state user defined parameters are retrieved via the client SDKs:
State.getParameters()
retrieves those parameters defined with the state.
StateRequest.getStateParameters()
retrieves those parameters defined with the state with overrides, if any, with values defined with the state variant, according to the live experience in effect.
Likewise, variation UDPs can be further specified at the experience level:
name: example
states:
- name: state1
variations:
- name: variation1
parameters:
key1: 'variation param 1 in variation1'
key2: 'variation param 2 in variation1'
experiences:
- name: A
isControl: true
- name: B
parameters:
key1: 'overrides value of key1 in experience B'
key3: 'adds a new parameter only to experience B'
onStates:
- state: state1
At runtime, the values of user defined parameters are retrieved via the client SDKs: Variation.getParameters()
retrieves those parameters defined with the state.
Experience.getParameters()
retrieves those parameters defined at the variation level with overrides, if any, with values defined with the experience.
User defined parameters cannot be updated by the host application.
3.6.2 Lifecycle Event Hooks
The schema definition of a hook in any scope has the following three components
Key | Type | Required | Comment | Default |
---|---|---|---|---|
class | any-string | Yes | The fully qualified name of the Java class implementing the hook. | |
name | name-string | No | This hook’s name. | The simple (unqualified) class name. |
init | fragment | No | Any YAML value. | None |
Example:
name: minimal_schema
...
hooks:
- class: mycompany.variant.spi.RecaptchaQualificationHook
init: [USERID1 USERID2 USERID3]
In this example, RecaptchaQualificationHook
‘s constructor will know what to do with the three user IDs supplied in the init key. Refer to the Variant CVM Server-Side Extension SPI User Guide for more information.
4 Variant Runtime
4.1 The Lifecycle of a State Request
Code Variation Model treats interactive applications as finite state machines. Each user session traverses some state graph, whose nodes are application states, where the host application pauses for user input. Whenever a session navigates to the next application state, the host application must determine if this state exists in more than one variant (i.e. if it is instrumented by any code variations), and, if so, determine which of these variants to return. This inference is known as targeting of a session for a state and is accessed via the Session.targetForState(state)
client method. It returns the StateRequest
object which can be further examined for the list of live experiences in all variations instrumenting this state. Before any targeting can happen, the session must be created first with the Connection.getOrCreateSession()
method.
See Variant CVM Java Client for more details.
A Variant session can be thought of as a succession of consecutive state requests, each advancing it from one application state to the next. Variant sessions provide
- A way to identify a user across multiple state requests;
- Storage for the session state that must be preserved between state requests;
- Metadata isolation context.
Variant server acts as the centralized session repository, accessible to any Variant client by the session ID. All clients sharing a session are guaranteed a consistent view of the session state. Sessions are expired after a configurable period of inactivity.
Variant hides any changes to variation schema from active sessions, which continue to see the variation metadata as it was at the time when the sessions were created. This isolation guarantee is critical in protecting user sessions from (potentially fatal) inconsistencies. For example, if a variation is taken offline, or one of its variant experiences is dropped, existing sessions, currently traversing this variation, would be thrown out of their experiences, if this change were visible.
Note, that Variant sessions are completely separate of the host application’s own native sessions. Variant sessions are configured independently and do not require that the host application even have any native notion of a session.
Much of the complexity, hidden by Variant server from the application developer, is inside the Session.targetForState(state)
method. For each variation, instrumented on the given state, Variant server must perform the following steps:
Figure 7. Qualification and targeting of a session.
4.2 Session Qualification
Qualification is a distinct idea from targeting. It directly models feature flags as single-experience variations gated by a qualification hook, but it is equally useful for implementing experiments, where a clear delineation between qualification and targeting is essential. For example, suppose that a newspaper wants to test promotional rates, offered on its website. This promotion cannot be combined with another promotion, so the traffic coming from other promotional offers must be disqualified from the experiment.
Variant server will consider pre-existing qualification information, subject to the durability rules.
Whenever Variant determines that the calling session’s qualification for a particular code variation must be (re)established, it raises the VariationQualificationLifecycleEvent
lifecycle event, which posts eligible lifecycle hooks. If none were defined or none returned a result, the default built-in qualification hook is posted, which unconditionally qualifies all session for all variations. For more information on lifecycle hooks, refer to [todo] Section 5.1.
If the session is disqualified, it is assigned to the control experience, but not targeted for it. The difference is that
- The set of live experiences, returned by the
StateRequest.getLiveExperiences()
method, doesn’t contain an entry for the disqualified variation; - No [todo] trace events are triggered on behalf of disqualified variations.
If a session is qualified, Variant server proceeds to targeting it for the requested state.
4.3 Session Targeting
Targeting a session for a state produces the the session’s set of live variation experiences on that state. Variant server will consider pre-existing targeting information, subject to the durability rules.
Even in a serial case, when the requested state is only instrumented by one variation, the targeting algorithm is complex. If the requested state is partially, instrumented, Variant server will only consider those experiences that are defined on this state. The complexity of the targeting algorithm grows dramatically for concurrent variations.
Whenever, inside the targetForState(state)
method, Variant determines that the calling session must be (re)targeted for a particular state+variation combination, it raises the VariationTargetingLifecycleEvent
lifecycle event, which posts eligible lifecycle hooks. If none were defined or none returned a result, the default built-in targeting hook is posted, which targets randomly, according to the sampling weights provided in the schema, e.g
#...
variations:
- name: Blue
experiences:
- name: grey
isControl: true
weight: 9 # Sampling weight
- name: blue
weight: 1 # Sampling weight
onStates:
- state: S1
- state: S2
#...
4.5 Qualification and Targeting Durability
Once a session has been qualified or targeted for a variation, the natural question is how long this qualification or targeting decision should remain in effect. Variant supports three durability scopes: request, session and variation, which represent these three qualification guarantees:
- The state-scoped durability is the most volatile: the qualification or targeting decision is not preserved between state requests. It is unlikely to be appropriate for retaining targeting decisions, but may be quite useful for qualification decisions. Suppose you want to test a new signup funnel experience which is alleged to improve conversion. Clearly, you want to only qualify those visitors who have not yet signed up. However, if they do signup in the course of a session, you don’t want them to continue seeing the experiment experience.
- The session-scoped durability preserves the qualification or targeting decision between state requests and for the duration of the current session. This is the default behavior and it requires no database I/O. However, it is not strong enough if the qualification or targeting experience must be preserved between user sessions.
- The variation-scoped durability is commonly used for the retention of targeting information for the lifespan of the variation. This is a common strategy for those experiments, where it is desirable that the return users see the same experience. Variant server saves variation-scoped durability decisions to the embedded RocksDB database. Even though RocksDB is extremely fast, Variant server makes at most one random database read per session.
Durability is declared in the variation schema using the durability
key, as explained in Section 3.3.3 Durability. For example, we can re-write [todo] listing … with explicitly declared qualification and targerting durability as follows:
#...
variations:
- name: Blue
experiences:
- name: grey
isControl: true
weight: 9
- name: blue
weight: 1
onStates:
- state: S1
- state: S2
durability:
qualification: session # Default, could have been omitted
targeting: variation
#...
Listing 5. The Tricolor variation schema with different levels of qualification longevity.
Request and session-scoped durability does not require that the user be recognized. But variation-scoped durability requires a database key unique to the user so he can be recognized in a different Variant session. The host application sets the value of this key in the ownerId
argument to the Connection.getOrCreateSession()
method:
User appUser = ... // Represents the host application's User object
Session ssn = variantConnection.getOrCreateSession(userData, appUser.userId);
Qualification and targeting are guaranteed to be stable, because Variant sessions are isolated from any schema changes, as explained in [todo] Schema Management. However, variation-scoped durability cannot be guaranteed unconditionally because variation schema may have changed between two consecutive session. Consider the following scenario:
- Your schema contains two conjointly concurrent variations, both defined with variation-scoped targeting durability.
- Some user has traversed these variations and was randomly targeted to variant experiences in both;
- A bug was discovered in the hybrid experience and you’ve changed concurrency to disjoint in order to avoid the hybrid experience.
- The same user visits again. Her targeting information is no longer consistent with the schema and must be revised.
When cases like this arise, Variant will discard the least recently used targeting decision and re-target.
4.5 Schema Management
When Variant server starts, it looks for variation schema files in the schemata
directory and attempts to deploy them. A schema file must contain exactly one uniquely named Variant schema. There is no requirement that the schema file name match that of the schema it contains, though it is recommended that you name each schema file similarly to the schema therein.
For each schema file in the schemata
directory Variant server takes these steps:
- Parse. Any messages emitted by the parser are written to the server log file.
- Deploy schema, if no parse errors. If any parser errors were encountered, Variant server skips this schema file. Otherwise, if no parser errors, and provided no already deployed schemata has the same name, Variant will deploy this schema.
In order to make any changes to the schema, its schema file must be edited and replaced in the schemata
directory. It is not necessary to restart Variant server to redeploy a schema. The new file timestamp will be detected and Variant server will attempt to parse it and re-deploy its schema by following these steps:
- Parse the schema file. Any messages emitted by the parser are written to the server log file.
- Deploy if no parse errors. If any parser errors were encountered, Variant server skips this schema file. Otherwise, if no parser errors, Variant will attempt to deploy this schema, subject to the following conditions:
- If no currently deployed schemata has the same name as this schema, this schema is deployed.
- If a currently deployed schema has the same name as this schema, their respective file names must also be the same.
- If both of the above conditions stand, the currently deployed schema is undeployed and the new one is deployed in its place.
To undeploy a currently deployed schema, simply remove the corresponding schema file.
Whenever a schema is undeployed, Variant server will hold on to its memory representation, while all active sessions connected to it naturally expire. All new sessions are created against the currently deployed generation, if any. Session draining isolates active sessions from schema updates, which is instrumental in Variant’s ability to provide stable qualification and targeting. In practice this means that, for instance, you can shut off a feature flag without worrying about disrupting active users who are already in the experience.
4.6 Trace Event Logging
Variant trace events are generated by user traffic, as it flows through code variations, with the purpose of subsequent analysis by a downstream process. Trace events can be triggered implicitly, by Variant, or explicitly by the host application. In either case, the host application can attach attributes to these events, to aid in the downstream analysis.
The only implicit trace event is the state visited event (SVE). It is created at the start of a state request, [todo] Figure 3, and triggered when a StateRequest
is committed or failed. This gives the host application a chance to attach custom attributes to the event. For example, if the host application caught an exception, it may wish to set the status of the event to error, and add the name of the class that threw the exception. This information can be used downstream to exclude this session from the statistical analysis (if this is an experiment), or to shut off the variation (if this is a feature flag).
Explicit trace events are triggered by calling the Session.triggerTraceEvent()
method.
Trace events are egested onto external storage via Trace Event Flushers which are part of the Extension API, discussed next.
Appendix A
Analyzing Variant Controlled Experiments
A.1. Trace Event Data Aggregation
Each Variant experiment is designed with particular target metric(s) in mind. But regardless of the target metric(s), the starting data point is always a time-series of trace events, such as the page visited event, which must be aggregated into a time series of measurements, such as revenue as a function of number of users through the experiment. The details of this aggregation step depend entirely on the longevity mechanism you’ve chosen for your trace events. If your flusher inserts them into a relational database, you will likely use SQL. A distributed data processing framework, like Apache Hadoop , can also be successfully deployed for longevity and aggregation of Variant trace events.
A.2. Statistical Analysis
The goal of an experiment to
- Discover if there is a difference between control and variant experience(s) with respect to the target metric of interest;
- Asses how certain can we be that this difference is not just random noise.
The latter can be accomplished with some well-known mathematical formulas developed in the field of statistical hypothesis testing. The fundamental idea there is to develop a procedure that will enable the researcher to make a claim about the entire population with a given degree of certainty, based on a set of sample observations. Refer to the Statistical Analysis of Variant Experiments white paper for more information.