Press "Enter" to skip to content

Variant CVM Server User Guide

Release 1.0.1, December 2023

1 Introduction

1.1 Code Variations

Software application development is accelerating. Many leading teams release new code continuously, deploying each independent code delta as soon as it’s ready, sometimes multiple times per second. In such high-velocity operational environment it’s critical to diminish the risk of defects. One of the mainstays of defect reduction in software development is the use of code variations — the term we use to connote a bifurcation point an application code path where alternate code paths temporarily co-exist and which path is taken is determined at run time by an external component. There are two classes of use cases which call for the instrumentation of such code variations, which are described below.

1.1.1 Online Controlled Experiments

In an online controlled experiment, a modification to the existing user experience co-exists, for a time, with the original experience. User traffic is split randomly between the two experiences, and measurements are collected of some target metric, e.g. rate of conversion to the next page. In scientific terms, the existing experience serves as control and the new experience as treatment. The experiment is said to succeed if it reaches statistical significance — a mathematical term connoting that a) the number of measurements taken is large enough, and 2) the observed difference between the metric’s values in control and in treatment is large enough, to conclude that this difference is far more likely due to the difference between the two experiences, than to mere chance.

For example, you may want to run an experiment to find out the optimal order amount which entitles your customer to free shipping. In such an experiment you offer several experiences, each promoting a different minimal order amount and target your user traffic to these experiences randomly. As your customers pass through these experiences you can compare the revenue lift your offer of free shipping has generated.

Note, that in the case of online controlled experiences, session targeting must be random, if you are going to be able to interpret correlation as causation, because the randomness is a natural control for everything other than the difference in user experience itself.

1.1.2 Managed Feature Roll-Outs

The other use case for code variations is feature flags. They refer to a software delivery practice, where a new product feature is rolled out gradually to a carefully controlled group of customers before it is made generally available. Whenever you roll out a new product feature, a feature flag enables you to first publish it to a limited population of users, while sending all others into the stable existing experience. If all goes well, you gradually increase traffic into the new code path until you reach full production, at which point the existing code path can be discarded. But if a defect is discovered, the new feature can be temporarily toggled off until the problem is fixed.

In contrast with online controlled experiments, when instrumenting feature flags you will likely use some deterministic targeting rules for your user traffic. For example you may want to start by allowing into the new code users by their Zip code, or customers by their organization ID.

1.2 Interactive Application as a Graph

The only assumption Variant makes about the host application is that it is interactive, i.e. waits for and responds to user input. Its control flow is commonly represented with traversal graphs, like in Figure 1 below. Here the nodes represent the interface states where the system awaits user input and the arcs represent the application responding to user input. Interpreted as a state machine, each node is also a state of the application.

Figure 1. An interactive application modeled as a state graph. (Source: Offutt et al, 2004)

Irrespective of the user interface mechanism, the host application pauses in an application state awaiting user response. These application states render some user interface and provide the means for the user to respond. Depending on the type of the host application, this interface may be manifested as a computer desktop window (desktop application, e.g. MFC), an HTML page (Web application), an activity (Android mobile app), a phone menu (an IVR application), an XML document (RESTful API), etc. These details are not relevant to the CVM; its concept of user experience is a traversal of a set of interface states, as user transitions from one Web page or one telephone menu to the next.

We can now give a stricter definition to code variation as some connected segment of the application state graph which exists in more than one variant. For each base state in a code variation, there are one or more state variants, which the host application may choose from in place of the base state. The control user experience is the one that traverses the base states, while a variant user experience is one that traverses variant states.

A feature toggle or an online experiment are both examples of code variations: they encompass the control experience (the current code path, possibly nil) and one or more variant experiences (new code paths), which co-exist with the control experience. Whenever the host application is in a processing state, it must decide what next interface state to present to the user, Figure 2 below:

Figure 2. A state transition without code variation (A); and with code variation (B).

In the regular, uninstrumented case (A), the application simply figures out the next state based on the user’s input, carries out requisite computations, renders the state’s interface to the user, and pauses for user input. However, if the next state is instrumented by one or more code variations (B), the host application has a set of additional state variants it can choose from. It is exactly this task of figuring out the particular state variant that the host application delegates to Variant server, just like it delegates to a database server the task of storing data on disk.

The part in the state transition where the host application defers to Variant for targeting is called a state request. A Variant session is, in the nutshell, a succession of state requests plus the common session state, preserved between the requests.

1.3 Code Variation Model

Code Variation Model is a domain model for code variations. It offers a formal framework for defining code variations and for reasoning about them. Its key practical benefit is to provide a way to externalize the metadata for a set of related code variations into human readable schema files managed centrally by Variant server. These schemas enable developers to define variations declaratively, rather than programmatically, leaving the implementation details to be handled by the Variant server.

This removes oodles of instrumentation code smell from the host application. The application developer uses familiar tools to implement new application behaviors, unconcerned with how these new code paths will be instrumented as experiments or feature flags. This instrumentation is accomplished with only a few lines of glue code facilitating the communication between the host application and the Variant server, and the experiment schema containing the complete configuration of all code variations and managed entirely by the Variant server. The Variant server handles the rest, hiding enormous amounts of complexity from the application developer.

This clean separation between implementation and instrumentation dramatically reduces the amount of code the application developer must write in order to instrument code variations. In fact, it takes the same number of lines of code (about a dozen) to instrument your 100th concurrent experiment as it takes you to instrument your first. In other words, the complexity of instrumenting N concurrent code variations = O(N).

Other benefits of variation schema include

  • Separation of artifacts’ lifecycles. Experimentation and feature flagging metadata is source controlled in their own artifacts, independently of the host application. This makes it easy to make trackable changes to experiments and feature flags external to the host application’s code base.
  • Separation of workloads. The runtime workload associated with experiments and feature flags is handled by the Variant server out of band of the host application. The Variant server runs on separately provisioned compute and network resources, leaving the host application’s resources unaffected by increase of decrease in experimentation workload.

1.4 Simple Variation Schema

A minimal valid variation schema consists of a single state, instrumented by a single variation with a single experience, as in the following listing, where we model a feature flag protecting the new code path which adds reCAPTCHA to the existing password reset page.

# A very simple variation schema take 1.
name: 'Minimal Schema'
states: 
  - name: passwordResetPage
variations:
  - name: RecaptchaOnPasswordResetFF
    experiences:
      - name: recaptcha 
    onStates:
      - state: passwordResetPage

Listing 1. A minimal valid variation schema with compiles but does nothing useful.

To deploy this schema, simply copy the file into the server’s schemata directory. Although this schema file parses without error, it does not do much yet because all traffic will be qualified by the default qualifier and targeted for the only experience recaptcha. To make this feature flag more useful, we need to add a custom qualification hook, which will qualify into this variation not all, but only certain users. For example, the following hook qualifies only those users whose IDs were passed to its constructor in a space separated string:

package mycompany.variant.spi;

import com.variant.server.spi.lifecycle.VariationQualificationLifecycleEvent;
import com.variant.server.spi.lifecycle.VariationQualificationLifecycleHook;
import java.util.Arrays;
import java.util.Optional;

/** 
 * Custom qualification hook qualifies user IDs provided at initialization.
 */

public class RecaptchaQualificationHook 
implements VariationQualificationLifecycleHook {

  private final String[] qualifiedUserIds;
  public RecaptchaQualificationHook(String init) {
    qualifiedUserIds = init.split(" ");
  }
  
	@Override
	public Optional<Boolean> post(VariationQualificationLifecycleEvent event) {
		Boolean isQualified = event.getSession().getOwnerId()
		  .map(userId -> Arrays.stream(qualifiedUserIds).anyMatch(userId::equals))
		  .orElse(false);
		return Optional.of(isQualified)
	}
}

Listing 2. Custom qualification hook qualifies into a code variation only those users whose IDs are passed to its constructor.

To add this hook to the variation use the hooks key:

# A very simple variation schema take 2.
name: 'Minimal Schema'
states: 
  - name: passwordResetPage
variations:
  - name: RecaptchaOnPasswordResetFF
    experiences:
      - name: recaptcha 
    onStates:
      - state: passwordResetPage
    hooks:
      - class: mycompany.variant.spi.RecaptchaQualificationHook
        init: "USERID1 USERID2 USERID3"

Listing 3. The minimal valid variation schema that does something useful.

Now, whenever the Variant server needs to qualify a session for the RecaptchaOnPasswordResetFF feature flag, it will delegate to the RecapthaQualificationHook hook which will only qualify into the feature user sessions whose user IDs match those provided in the init key.

See [todo] on how to make Java classes available to the Variant server at runtime.

2. Variant CVM Platform Architecture

2.1. Overview

Variant Code Variation Management (CVM) server provides a service for host application to obtain at runtime information about the live experiences assigned to a user session. The host application accesses this service via the Variant client library suitable for its compilation environment.

Variant server is deployed on the network local to the host application and its operational database, either on premises or on the customer’s own compute instance in the cloud. (A fully managed Variant Platform-as-a-Service is under development.) Being on the same local network facilitates reliable real-time integration with the operational data for the purposes of contextual qualification and targeting.

The following diagram presents a high-level overview of the different components of Variant software platform:

Figure 3. Variant Platform Architecture.

2.2 The Client-Side API

Each component of the host application that wishes to participate in an experiment or a feature flag communicates with Variant server via a native client library. Typically, only one instance of the Variant API handle is necessary per process, and only one connection is necessary per variation schema. If you need to connect to multiple schemas, each schema requires a separate connection handle.

At the time of this writing only the Variant Java Client library is available.

2.3 The Server-Side SPI

Variant server’s default behavior can be extended via the server-side service program interface (SPI). It supports creation and configuration of user custom executable code, to which the server can delegate for handling various events. These hooks run in the server’s address space, seamlessly overriding the server’s default behavior with custom semantics.

There are two principal extension mechanisms: lifecycle hooks and trace event flushers. Lifecycle hooks are listeners for various lifecycle events raised by the Variant server, such as the session qualification or session targeting events. Lifecycle hooks can be chained to help you modularize and reuse your code. Event flushers handle the egestion of Variant trace events.

A few standard extensions come with the Variant server in the standard extension library, but creating your own event hook or trace event flusher is as simple as implementing a Java interface. Both lifecycle hooks and event flusher are configured in the variation schema and made available to the server’s JVM at run time via the /spi directory.

2.4 The Lifecycle of Code Variations

The other responsibility of the Variant server is the management of code variations’ lifecycle, such as creation, alteration, suspension, resumption, etc. of experiments and feature flags. These actions are triggered by changes to variation schema files residing on the server the /schemata directory. Each schema file is a YAML file, containing definitions of related code variations, or code variation metadata. A schema is first deployed to the Variant server when its YAML file is placed into the /schemata directory. A schema is undeployed from the Variant server when its YAML file is removed from the /schemata directory. Whenever a schema file is modified in place in the /schemata directory, the Variant server detects the changes and attempts to redeploy the schema.

A single server instance can manage an unlimited number of variation schemas.

2.5 Distributed Session Management

Variant maintains its own user sessions, instead of relying on the native user sessions, maintained by the host application, e.g HTTP sessions. Variant user sessions are distributed; changes made to a user session by a Variant client are available to all concurrent Variant clients. This architecture is particularly attractive to modern distributed applications made up of multiple service components. Any such component, connected to a Variant server, can get a hold of a user session by its session ID and obtain its most up-to-date shared state.

Variant does not guarantee an automatically consistent view of the session’s shared state to all concurrent clients; a change made by one client is not automatically visible to others. Attempting to provide such a strong guarantee would be too expensive while still susceptible to race conditions. However, the Variant client interface provides a way for the application programmer to explicitly synchronize the state when required.

2.6 Targeting and Qualification Durability

When a user session is first qualified or targeted for a variation, schema designer has the choice of the effective duration of these decisions. For example, it is typically desirable that a user continues to see the same experiment experience at least for the remainder of the current user session. Moreover, there are many examples when qualification or targeting information must be preserved between session, for example to ensure that returning users continue seeing the same experience.

Variant supports three durability scopes: statesession and variation, which are declared in the variation’s schema definition. A variation’s targeting durability is declared independently from its qualification durability so they do not have to be the same. For example, a user’s eligibility for an experiment, e.g. related to a promotion, may vary from visit to visit. But whenever she is qualified, the experiment designer typically wants the her to see the same experiment experience.

State scoped durability means that the outcome of qualification or targeting is not reused. A variation with state-scoped qualification durability will be re-qualified for each state request, and a variation with state-scoped targeting durability, if qualified, will be re-targeted for each state request.

Session scoped durability means that the outcome of qualification or targeting is reused for the duration of this user session. A variation with session-scoped qualification durability will be qualified once per session and reused for the duration of this session, but will be re-qualified in a different session. A variation with session-scoped targeting durability will be targeted once per session and reused for the duration of this session, but will be re-targeted in a different session.

Variation scoped durability means that the outcome of qualification or targeting is reused for the entire lifespan of the variation, which is essentially forever. A variation with variation-scoped qualification durability will be qualified once, and the qualification decision will be reused for as long as this variation is defined in the schema. A variation with variation-scoped targeting durability will be targeted once, and the targeting decision will be reused for as long as the variation is defined in the schema.

3. Variation Schema Reference

3.1 Syntactical Conventions

The Variant server manages code variation metadata in human readable YAML files, called schema files. Each schema file contains a single CVM schema describing a set of related code variations instrumented on some host application using the familiar . Keys are reserved keywords that have specific meanings. Keywords are case-insensitive, e.g. 'name:', 'Name:', or 'NAME:' are interchangeable.

The following conventions are used throughout this chapter:

any-stringArbitrary, case sensitive, Unicode string. Follow YAML’s escape rules if you want a string contain special characters.
name-stringCase insensitive string containing only Unicode letters, digits or the ‘_’ (underscore) and not starting with a digit. For example, _mySchema is a valid name and is indistinguishible from, e.g. _MYSCHEMA, but 3rdField is not a valid name.
booleanYAML Boolean value of true or false.
numberYAML numeric value.
literalArbitrary YAML literal. This may be a primitive type, an object, or an array.
{type}YAML mapping (dictionary) literal of a given type.
[type]YAML sequence of mappings of a given type.

3.2 Schema Top Level Keys

The top-level keys in CVM grammar are:

KeyTypeRequiredCommentDefault
namename-stringYesThe schema name. Must be server-wide unique.
descriptionany-stringNoOptional description.None
states[state]YesA list of the states host application’s interface states.
variations[variation]YesA list of potentially inter-dependent code variations. Code variations are complex structures packing most of CVM’s power. At a minimum, a variation must have a name, at least one experience of which exactly one mast be declared control, and at least one onStates elements, establishing the list of states on which this experiment or feature flag is instrumented. The control experience typically represents the existing code path and zero or more variant experience(s) represent the alternate code path(s).
flusher{flusher}NoDefines a schema-specific trace event flusher. Applies to all trace events generated by the variations configured by this schema.As configured by the [todo]server config properties.
hooks[hook]NoA list of schema-scoped lifecycle hook specifications. Hooks defined at this scope apply to all states and all variations defined by this schema.[]

3.2 States

The states list key contains a list of state elements, each representing a node in the application state graph. A state element can have the following keys:

KeyTypeRequiredCommentDefault
namename-stringYesThis state’s name.
parameters[state-parameter]NoThis state’s parameters.[]
hooks[hook]NoA list of state-scoped lifecycle hook specifications. Hooks defined at this scope apply only to this state.[]

For example:

states:
  - name: state1
    parameters:
      key1: "a string"
      key2: "a string"
  - name: state2
    parameters:
      key1: "a string"
      key3: "a string"

State parameters help host applications enrich variation schema with application-specific state. They are simple key/value pairs of strings, whose semantics are entirely up to the host application. State parameters can be specified either at the state or at the state variant level as explained further.

Each state-scoped hook must listen to a lifecycle event descendant from [todo] StateAwareLifecycleEvent . For more information, see [todo] Section 4.1 Lifecycle Hooks.

Note, that from Variant’s point of view states form a set, not a graph; i.e. user sessions are free traverse states in any order.

3.3 Variations

3.3.1 Variation Top Level Keys

Code variations are described in the variations list key. Each variation element is a mapping with the following keys:

KeyTypeRequiredCommentDefault
namename-stringYesThe variation’s name.
experiences[experience]YesThis variation’s experiences.
onStates[on-state]YesThe mappings of this variation to states it instruments.
isOnbooleanNoIs this variation online?true
concurrentVariations[name-string]NoA list of previously defined variation names conjointly concurrent with this variation. []
durability{durability}NoDurability specificationSee [todo]
hooks[hook]NoA list of variation-scoped lifecycle hook specifications. Hooks defined at this scope apply only to this variation.[]

For example:

variations:
  - name: myVariation
    experiences:
      - name: control
        isControl: true
      - name: variant
    onStates:
      - state: state1
      - state: state2

Each variation-scoped hook must listen to a lifecycle event descendant from [todo] VariationAwareLifecycleEvent . For more information, see [todo] Section 4.1 Lifecycle Hooks.

The isOn property is used to turn an experiment or a feature toggle temporarily offline without removing it from the schema. No sessions are targeted for an offline variation, as if it didn’t even exist. In fact, the only differences between an offline variation and a variation that is completely removed from the schema is that if it defines its targeting or qualification durability as variation, this information is preserved. In practice this means that, after an offline variation is taken back online, return users will see the same experience they saw before the variation was taken offline.

3.3.2 Variation Experiences

Each element of a variation’s experiences property describes one of its experiences.

KeyTypeRequiredCommentDefault
namename-stringYesThe variation’s name.
isControlbooleanYes, unless this variation has only one experience.Defines whether this experience the control experience in this variation. true if the only experience in this variation, false otherwise.

3.3.3 Durability

Variation durability determines the retention rules for qualification and targeting decisions with respect to the given variation. The complete durability specification looks as follows:

durability: 
  qualiafication: state|session|variation
  targeting: state|session|variation
stateThe decision is retained for the duration of the state request only.
session(Default.) The decision is retained for the curation of user session, but not persisted.
variationThe decision is persisted and retained for as long as this variation remains in the schema.

For details, refer to Section 4.4. Qualification and Targeting Durability.

3.3.4 OnStates

The onStates key contains a list of elements, each of which describes this variation’s instrumentation details on a particular state. Whenever a variation instruments a state with an onStates element, this implies an obligation, on the part of the host application, to provide an implementation of a state variant for any experience defined by the variation.

An onState element can have the following keys:

KeyTypeRequiredCommentDefault
statename-stringYesThe name of the state being mapped by this variation.
experiences[name-string]NoThe list of this variation’s experiences defined on this state.The list of all of this variation’s experiences.
variants[state-variant]NoThe list of this state variants.[]

For example, the following listing defines an experiment on two consecutive pages of a signup wizard:

states:
  - name: page1
  - name: page2
variations:
  - name: WizartTest
    experiences:
      - name: control
        isControl: true
      - name: variant
    onStates:
      - state: page1
      - state: page2

If the experiment is testing a new combined page that replaces the two existing pages with only one, the experiences key must be used to explicitly list those experiences that are defined:

states:
  - name: page1
  - name: page2
variations:
  - name: WizartTest
    experiences:
      - name: control
        isControl: true
      - name: variant
    onStates:
      - state: page1
      - state: page2
        experiences: [control] # page2 is not instrumented by variant experience

The experiences list on line 13 implies an obligation, on the part of the host application, not to attempt to target a session for page2 if its live experience is WizardTest.variant. Doing so will result in a runtime error.

Conversely, if we wanted to test splitting an existing page1 into two new pages page1 and page2, line 13 would list the variant experience instead:

states:
  - name: page1
  - name: page2
variations:
  - name: WizartTest
    experiences:
      - name: control
        isControl: true
      - name: variant
    onStates:
      - state: page1
      - state: page2
        experiences: [variant] # page2 is not instrumented by control experience

3.3.5. State Variants and State Parameter Resolution

For each element of the onStates list Variant schema parser creates the state variant space as a Cartesian product of the set of this variation’s experiences and the experience sets of all variations conjointly concurrent with this variation and also defined on this state. All state variants in this variant space implicitly inherit the state parameters as defined for this state, if any.

However, in some cases, it is useful to also create state parameters at the state variant level. This is accomplished with the variants key:

# ...
    onStates:
      - state: state1
        variants:
          - experience: variant
            parameters:
              path: '/path/to/something'

In most cases, this inferred state variant space is sufficient, and you will only need to define state variants explicitly if you wish to override one or more state variants.

An element of the variants key can contain the following keys:

KeyTypeRequiredCommentDefault
experiencename-stringYesThe name of this variation’s experience. Cannot be the control experience
concurrentExperiences[name-string]NoThe list of concurrent experiences defining this state variant. Cannot be a control experience.[].
parameters{name:value,...}NoArbitrary properties dictionary.{}

If a state parameter is defined at both the base state and a state variant, the state variant value overrides the base value:

name: example
states:
  - name: state1
    # State parameters, specified at the state level,
    # provide the base values for all variants of this state.
    parameters:
      key1: value1
      key2: value2
variations:
  - name: variation1
    experiences:
      - name: existing
        isControl: true
      - name: variant
    onStates:
      - state: state1
        variants:
          - experience: variant
            # State parameters, specified at the state variant level,
            # at runtime override the likely-keyed base values within
            # the scope of the enclosing state variant. 
            parameters:
              key2: 'value2 in state variant'
              key3: 'value3 in state variant'

At runtime, Variant will return the following values to the host application:

stateRequest.getResolvedStateParameters().get("key1");  // "value1"
stateRequest.getResolvedStateParameters().get("key2");  // "value2 in state variant"
stateRequest.getResolvedStateParameters().get("key3");  // "value3 in state variant"

This mechanism of state parameter overrides is a convenient way for the developer to introduce application state into the schema at both global and local scopes.

3.4. Concurrent Variations

3.4.1. Definitions

If two variations instrument no states in common, they are called serial variations; a user session can only traverse them one at a time. Conversely, whenever two code variations instrument some states in common, they are called concurrent variations because a user session may be traversing them concurrently. (The term “overlapping experiments” is also commonly used.)

Concurrent variations are more likely than it may first seem, because of the Pareto principle; your users spend 80% of their time on 20% of your pages. These higher-contention code paths are very likely to be instrumented by multiple concurrent experiments and features flags. Variant’s Code Variation Model gives you a cogent abstraction to manage this concurrency.

In Figure 5 below, the Blue and the Green variations are serial, but the Red variation is concurrent with both of them.

Figure 5. Concurrent experiments. Blue and Green variations are serial, while Red is concurrent with both Blue and Green. The grey boxes denote control states, while the colored ones denote state variants.

When a user session targets a state that is instrumented by two or more variations, there is a state variant space of possible experience permutations from which any state variant can be chosen. For example, the state S2 is instrumented by Blue and Red variations. Blue only has one variant experience and Red has two variant experiences, so the complete variant space of the state S2 has 6 cells:

Figure 6. Variant space of the state S2 has one control, three proper, and two hybrid state variants.

The relationship of concurrence between two variations V1 and V2 has the following properties:

  • Symmetric: If variation V1 is concurrent with variation V2, then V2 is concurrent with V1. The variation schema grammar takes advantage of this by requiring the concurrency relationship to be defined by the variation that appears in the schema after the referenced variation.
  • Not Reflexive: a variation cannot be concurrent with itself.
  • Not Transitive: If V1 is concurrent with V2 and V2 is concurrent with V3, then V1 and V3 need not be concurrent.

Variant server supports two runtime strategies for managing concurrent variations: a simplified, pseudo-serial strategy, called disjoint concurrency and the more powerful conjoint concurrency, as discussed in the next two sections.

3.4.2. Disjoint Concurrency

First, let’s consider the pseudo-serial execution, where no user session is targeted for a hybrid state variant. To support Blue variation by itself, application developer needs to implement the S2blue experience. Similarly, to support Red variation in isolation, (probably some other) developer needs to implement its two variant experiences S21red and S22red. This is a perfectly acceptable scenario, so long as no user session ends up targeted to variant experiences in both variations. If that were to happen, the host application would have no code path, implementing both S2blue and S21red state variants at once.

This type of constrained concurrency is referred to as disjoint concurrency and is the default behavior. Unless instructed otherwise [todo link to concurrentexperiences], Variant will not target a user session to two variant experiences in two concurrent variations. This default makes sense: application developers should not have to coordinate with each other simply because they work on potentially overlapping features.

However, this convenient default comes with a price:

  • Potential starvation of downstream variations of user traffic.
  • Potential bias in downstream variations.

3.4.3. Conjoint Concurrency

The unconstrained concurrency mode, where a session’s ability to participate in Red variation is not constrained by its participation in Blue variation, and vice versa, is referred to as conjoint concurrency. To instrument two conjointly concurrent variations, the application developer has to do the following:

  • Implement all hybrid experiences, e.g. the two hybrid state variants shaded in two colors in Figure 6 above.
  • Instruct Variant to treat the two variations conjointly by using the concurrentVariations [todo link} schema key.

Listing 4 below is the complete variation schema for the Blue, Red and Green variations from Figure 5 above. To illustrate both concurrency modes, Red and Blue variations as defined as conjoint and Green and Red variations as disjoint.

name: 'Tricolor Schema'
description: 'Demonstratges instrumentation of concurrent variations on Figure 5'
states:
  - name: S1
  - name: S2
  - name: S3
  - name: S4
variations:
  - name: Blue
    experiences:
      - name: grey
        isControl: true
      - name: blue
    onStates:
      - state: S1
      - state: S2
  - name: Red
    # Red is conjointly concurrent with Blue
    concurrentVariations: ['Blue']
    experiences:
      - name: grey
        isControl: true
      - name: red_1
      - name: red_2
    onStates: 
      - state: S2
      - state: S3
  - name: Green 
    # Serial with Blue and disjointly concurrent with Red          
    experiences:
      - name: grey
        isControl: true
      - name: green
        onStates:
          - state: S3
          - state: S4
            # S4 does not exist in control.
            experiences: [green]

Listing 4. The Tricolor variation schema of concurrent tests from Figure 5.

Note the explicit state variant for the Green variation’s control experience on state S4. It is needed in order to declare it as phantom to account for the fact that there is no control state variant, i.e. that a user session is not allowed to target for S4 if it has already been targeted to the control experience in Green variation.

4 Variant Runtime

4.1 The Lifecycle of a State Request

Code Variation Model treats interactive applications as finite state machines. Each user session traverses some state graph, whose nodes are application states, where the host application pauses for user input. Whenever a session navigates to the next application state, the host application must determine if this state exists in more than one variant (i.e. if it is instrumented by any code variations), and, if so, determine which of these variants to return. This inference is known as targeting of a session for a state and is accessed via the Session.targetForState(state) client method. It returns the StateRequest object which can be further examined for the list of live experiences in all variations instrumenting this state. Before any targeting can happen, the session must be created first with the Connection.getOrCreateSession() method.

See Variant CVM Java Client for more details.

A Variant session can be thought of as a succession of consecutive state requests, each advancing it from one application state to the next. Variant sessions provide

  • A way to identify a user across multiple state requests;
  • Storage for the session state that must be preserved between state requests;
  • Metadata isolation context.

Variant server acts as the centralized session repository, accessible to any Variant client by the session ID. All clients sharing a session are guaranteed a consistent view of the session state. Sessions are expired after a configurable period of inactivity.

Variant hides any changes to variation schema from active sessions, which continue to see the variation metadata as it was at the time when the sessions were created. This isolation guarantee is critical in protecting user sessions from (potentially fatal) inconsistencies. For example, if a variation is taken offline, or one of its variant experiences is dropped, existing sessions, currently traversing this variation, would be thrown out of their experiences, if this change were visible.

Note, that Variant sessions are completely separate of the host application’s own native sessions. Variant sessions are configured independently and do not require that the host application even have any native notion of a session.

Much of the complexity, hidden by Variant server from the application developer, is inside the Session.targetForState(state) method. For each variation, instrumented on the given state, Variant server must perform the following steps:

Figure 7. Qualification and targeting of a session.

4.2 Session Qualification

Qualification is a distinct idea from targeting. It directly models feature flags as single-experience variations gated by a qualification hook, but it is equally useful for implementing experiments, where a clear delineation between qualification and targeting is essential. For example, suppose that a newspaper wants to test promotional rates, offered on its website. This promotion cannot be combined with another promotion, so the traffic coming from other promotional offers must be disqualified from the experiment.

Variant server will consider pre-existing qualification information, subject to the durability rules.

Whenever Variant determines that the calling session’s qualification for a particular code variation must be (re)established, it raises the VariationQualificationLifecycleEvent lifecycle event, which posts eligible lifecycle hooks. If none were defined or none returned a result, the default built-in qualification hook is posted, which unconditionally qualifies all session for all variations. For more information on lifecycle hooks, refer to [todo] Section 5.1.

If the session is disqualified, it is assigned to the control experience, but not targeted for it. The difference is that

If a session is qualified, Variant server proceeds to targeting it for the requested state.

4.3 Session Targeting

Targeting a session for a state produces the the session’s set of live variation experiences on that state. Variant server will consider pre-existing targeting information, subject to the durability rules.

Even in a serial case, when the requested state is only instrumented by one variation, the targeting algorithm is complex. If the requested state is partially, instrumented, Variant server will only consider those experiences that are defined on this state. The complexity of the targeting algorithm grows dramatically for concurrent variations.

Whenever, inside the targetForState(state) method, Variant determines that the calling session must be (re)targeted for a particular state+variation combination, it raises the VariationTargetingLifecycleEvent lifecycle event, which posts eligible lifecycle hooks. If none were defined or none returned a result, the default built-in targeting hook is posted, which targets randomly, according to the sampling weights provided in the schema, e.g

#...
variations:
  - name: Blue
    experiences: 
      - name: grey
        isControl: true
        weight: 9   # Sampling weight
      - name: blue
        weight: 1   # Sampling weight
    onStates:
      - state: S1
      - state: S2
#...

4.5 Qualification and Targeting Durability

Once a session has been qualified or targeted for a variation, the natural question is how long this qualification or targeting decision should remain in effect. Variant supports three durability scopes: requestsession and variation, which represent these three qualification guarantees:

  • The state-scoped durability is the most volatile: the qualification or targeting decision is not preserved between state requests. It is unlikely to be appropriate for retaining targeting decisions, but may be quite useful for qualification decisions. Suppose you want to test a new signup funnel experience which is alleged to improve conversion. Clearly, you want to only qualify those visitors who have not yet signed up. However, if they do signup in the course of a session, you don’t want them to continue seeing the experiment experience.
  • The session-scoped durability preserves the qualification or targeting decision between state requests and for the duration of the current session. This is the default behavior and it requires no database I/O. However, it is not strong enough if the qualification or targeting experience must be preserved between user sessions.
  • The variation-scoped durability is commonly used for the retention of targeting information for the lifespan of the variation. This is a common strategy for those experiments, where it is desirable that the return users see the same experience. Variant server saves variation-scoped durability decisions to the embedded RocksDB database. Even though RocksDB is extremely fast, Variant server makes at most one random database read per session.

Durability is declared in the variation schema using the durability key, as explained in Section 3.3.3 Durability. For example, we can re-write [todo] listing … with explicitly declared qualification and targerting durability as follows:

#...
variations:
  - name: Blue
    experiences: 
      - name: grey
        isControl: true
        weight: 9
      - name: blue
        weight: 1
    onStates:
      - state: S1
      - state: S2
    durability:
      qualification: session # Default, could have been omitted
      targeting: variation
#...

Listing 5. The Tricolor variation schema with different levels of qualification longevity.

Request and session-scoped durability does not require that the user be recognized. But variation-scoped durability requires a database key unique to the user so he can be recognized in a different Variant session. The host application sets the value of this key in the ownerId argument to the Connection.getOrCreateSession() method:

User appUser = ... // Represents the host application's User object
Session ssn = variantConnection.getOrCreateSession(userData, appUser.userId);

Qualification and targeting are guaranteed to be stable, because Variant sessions are isolated from any schema changes, as explained in [todo] Schema Management. However, variation-scoped durability cannot be guaranteed unconditionally because variation schema may have changed between two consecutive session. Consider the following scenario:

  1. Your schema contains two conjointly concurrent variations, both defined with variation-scoped targeting durability.
  2. Some user has traversed these variations and was randomly targeted to variant experiences in both;
  3. A bug was discovered in the hybrid experience and you’ve changed concurrency to disjoint in order to avoid the hybrid experience.
  4. The same user visits again. Her targeting information is no longer consistent with the schema and must be revised.

When cases like this arise, Variant will discard the least recently used targeting decision and re-target.

4.5 Schema Management

When Variant server starts, it looks for variation schema files in the schemata directory and attempts to deploy them. A schema file must contain exactly one uniquely named Variant schema. There is no requirement that the schema file name match that of the schema it contains, though it is recommended that you name each schema file similarly to the schema therein.

For each schema file in the schemata directory Variant server takes these steps:

  1. Parse. Any messages emitted by the parser are written to the server log file.
  2. Deploy schema, if no parse errors. If any parser errors were encountered, Variant server skips this schema file. Otherwise, if no parser errors, and provided no already deployed schemata has the same name, Variant will deploy this schema.

In order to make any changes to the schema, its schema file must be edited and replaced in the schemata directory. It is not necessary to restart Variant server to redeploy a schema. The new file timestamp will be detected and Variant server will attempt to parse it and re-deploy its schema by following these steps:

  1. Parse the schema file. Any messages emitted by the parser are written to the server log file.
  2. Deploy if no parse errors. If any parser errors were encountered, Variant server skips this schema file. Otherwise, if no parser errors, Variant will attempt to deploy this schema, subject to the following conditions:
    1. If no currently deployed schemata has the same name as this schema, this schema is deployed.
    2. If a currently deployed schema has the same name as this schema, their respective file names must also be the same.
  3. If both of the above conditions stand, the currently deployed schema is undeployed and the new one is deployed in its place.

To undeploy a currently deployed schema, simply remove the corresponding schema file.

Whenever a schema is undeployed, Variant server will hold on to its memory representation, while all active sessions connected to it naturally expire. All new sessions are created against the currently deployed generation, if any. Session draining isolates active sessions from schema updates, which is instrumental in Variant’s ability to provide stable qualification and targeting. In practice this means that, for instance, you can shut off a feature flag without worrying about disrupting active users who are already in the experience.

4.6 Trace Event Logging

Variant trace events are generated by user traffic, as it flows through code variations, with the purpose of subsequent analysis by a downstream process. Trace events can be triggered implicitly, by Variant, or explicitly by the host application. In either case, the host application can attach attributes to these events, to aid in the downstream analysis.

The only implicit trace event is the state visited event (SVE). It is created at the start of a state request, [todo] Figure 3, and triggered when a StateRequest is committed or failed. This gives the host application a chance to attach custom attributes to the event. For example, if the host application caught an exception, it may wish to set the status of the event to error, and add the name of the class that threw the exception. This information can be used downstream to exclude this session from the statistical analysis (if this is an experiment), or to shut off the variation (if this is a feature flag).

Explicit trace events are triggered by calling the Session.triggerTraceEvent() method.

Trace events are egested onto external storage via Trace Event Flushers which are part of the Extension API, discussed next.

5. Extending Variant Server

Variant CVM Server’s default behavior can be extended via the server-side Extension API. It supports creation and configuration of user code which runs in the server’s address space, augmenting the server’s default behavior with custom semantics. ExtAPI exposes two principal extension mechanisms: lifecycle hooks and trace event flushers. They are configured in the variation schema and made available to the server’s JVM at run time via the /ext directory.

Refer to the Variant CVM Server Reference for further details on configuring ExtAPI.

5.1. Lifecycle Hooks

The ScheduleVisitTest from Listing 2 above defined a lifecycle hook class UserQualifyingHook, which disqualifies black-listed users from the experiment. Here’s the relevant section from Listing 2:

    ...
          'hooks': [
            {
              // Disqualify blacklisted users.
              'class':'com.variant.extapi.std.demo.UserQualifyingHook',
              'init': {'blackList':['Nikita Krushchev']}
            } 
          ]
    ...

Lifecycle event hooks are callback methods, executed by Variant server when correponding lifecycle events are raised. For example, when a user session must be qualified or targeted for a particular variation, two corresponding lifecycle events are raised: the session qualification event and the session targeting event. If you have defined custom hooks for these events, Variant will post them by calling their post() method.

Lifecycle hooks provide a way to extend Variant server’s default behavior with application-specific semantics. They are executed in the server process’s address space and are highly reusable modules encapsulating common semantics and having their own lifecycle, independent of that of the host application.

Depending on where a hook is defined in the schema, it may have the global (or meta) scope, a state scope or a variation scope. Global hooks are defined in the meta section and apply to all states and all variations in this schema. A state-scoped hook only applies to the state with which it is defined, and a variation-scoped hook applies only to the variation with which it is defined.

In any scope, any number of hooks can be defined. If more than one lifecycle hook is eligible to be posted by a lifecycle event at runtime, they form a hook chain. More locally defined hooks are posted before the global ones on the chain, and within a scope hooks are posted in ordinal order. The hooks are posted serially, until a hook’s post() method returns a non-empty Optional. If no custom hooks have been defined for a lifecycle event, or all returned an empty Optional, the default built-in hook for the event is posted, which is guaranteed to return a usable value.

For more information, refer to the Variant Server Reference.

5.2. Trace Event Flushers

Event flushers handle the terminal ingestion of Variant trace events. A typical event flusher writes them to a persistent storage mechanism, such as an external database or event stream. Whenever a trace event is triggered — implicitly by Variant server or explicitly by user code — it is picked up by the Variant’s asynchronous event writer, where it is held in a memory buffer until a dedicated flusher thread becomes available. There is one event writer per Variant server, shared by all schemata. Event writer groups trace events by the schema that produced them and turns them over to the apropriate event flusher by calling its flush() method.

The size of the trace event buffer, passed to the flush() method, is configured by the variant.event.writer.flush.size server config property, whose value refers to the number of trace events held in a single flush buffer. The overall size of the event writer cache is configured by the variant.event.writer.flush.buffers server config property, whose value refers to the total number of flush buffers available to the event writer. The larger the number of flush buffers, the better the event writer is able cope with bursts of trace evens, but at the price of additional memory footprint.

Whenever the event writer is not keeping up with the event load, it will discard new events (with an error message to the server log) until a flush buffer becomes available.

A few ready-made event flushers, intended for saving trace events in popular databases, such as PostgreSQL and MySQL, are included in Variant server’s standard extension library, included with the server. These can be configured and used out of the box.

It is also straightforward to create a custom event flusher by implementing the TraceEventFlusher interface . See Variant Server Reference Guide for more information.

Appendix A Analyzing Variant Controlled Experiments

5.1. Trace Event Data Aggregation

Each Variant experiment is designed with particular target metric(s) in mind. But regardless of the target metric(s), the starting data point is always a time-series of trace events, such as the page visited event, which must be aggregated into a time series of measurements, such as revenue as a function of number of users through the experiment. The details of this aggregation step depend entirely on the longevity mechanism you’ve chosen for your trace events. If your flusher inserts them into a relational database, you will likely use SQL. A distributed data processing framework, like Apache Hadoop , can also be successfully deployed for longevity and aggregation of Variant trace events.

5.2. Statistical Analysis

The goal of an experiment to

  • Discover if there is a difference between control and variant experience(s) with respect to the target metric of interest;
  • Asses how certain can we be that this difference is not just random noise.

The latter can be accomplished with some well-known mathematical formulas developed in the field of statistical hypothesis testing. The fundamental idea there is to develop a procedure that will enable the researcher to make a claim about the entire population with a given degree of certainty, based on a set of sample observations. Refer to the Statistical Analysis of Variant Experiments white paper for more information.