Fidelity and Biases

Dynamic Sampling allows Sentry to automatically adjust the amount of data retained based on how valuable the data is to the user. This is technically achieved by applying a sample rate to every event, which is determined by a set of rules that are evaluated for each event.

At the core of Dynamic Sampling there is the concept of fidelity, which translates to an overall target sample rate that should be applied across all events of an organization.

There are two available modes to govern the target sample rates for Dynamic Sampling: Automatic Mode and Manual Mode.

  • Automatic mode dynamically manages the target sample rate for each project based on the target sample rate for the organization, prioritizing lower volume projects to increase visibility.
  • Manual mode allows the user to set static target sample rates on a per-project basis that serve as the baseline sample rate before applying the dynamic biases outlined below. Target sample rates are not adjusted by the system.

Within this target sample rate, Dynamic Sampling can create a bias toward more meaningful data. This is achieved by constantly updating and communicating special rules to Relay, via a project configuration, which then applies targeted sampling to every event.

Concept of Fidelity

It is important to note that fidelity only determines an approximate target sample rate, so there is flexibility in creating exact sample rates. The ingestion pipeline, composed of Relay and other components, does not have the infrastructure to track volume, so it cannot create an actual weighted distribution within the target sample rate.

Instead, the Sentry backend computes a set of rules whose goal is to cooperatively achieve the target sample rate. Determining when and how to set these rules is part of the Dynamic Sampling infrastructure.

Sentry supports two fundamentally different types of sampling. While this is completely opaque to the user, these rule types provide the basic building blocks for every dynamic sampling functionality and bias.

A trace is a collection of events that are related to each other. For example, a trace could contain events started from your frontend that are then generating events in your backend.

Trace sampling ensures that either all events of a trace are sampled, or none. That is, these rules always yield the same sampling decision for every event in the same trace. This requires the cooperation of SDKs and thus allows sampling only by project, release, environment, and transaction name.

To achieve trace sampling, SDKs pass all fields that can be sampled by Dynamic Sampling Context (DSC) (defined here) as they propagate traces. This ensures that every event from the same trace comes with the same DSC.

Trace Sampling

Transaction Sampling does not guarantee complete traces and instead applies to individual transactions by looking at the incoming transaction's body. It can be used to remove unwanted transactions from traces, or to individually boost transactions at the expense of incomplete contextual traces.

A bias is a set of one or more rules that are evaluated for each event. More specifically, when we define a bias, we want to achieve a specific objective, which can be expressed as a set of rules. You learn more about rules on the architecture page here.

Sentry has already defined a set of biases that are available to all customers. These biases have different goals, but they can be combined to express more complex semantics.

Some of the biases defined by Sentry can be enabled or disabled in the UI, more specifically under Project Settings -> Performance, while others are enabled by default and cannot be disabled.

An example of how the UI looks is shown in the following screenshot (the content of this screenshot is subject to change):

Biases in the UI

This bias is used to prioritize traces that are coming from a new release. The goal is to increase the sample rate in the time window that occurs between the creation of a release and its adoption by users. The identification of a new release is done in the event_manager defined here.

Since the adoption of a release is not constant, we created a system of decaying rules which can interpolate between two sample rates in a given time window with a given function (e.g. linear). The idea being that we want to reduce the sample rate since the amount of samples will increase as the release gets adopted by users.

Sample Rate and Adoption

The latest release bias uses a decaying rule to interpolate between a starting sample rate and an ending sample rate over a time window that is statically defined for each platform (the list of time to adoptions is define here. For example, Android has a bigger time window than Javascript because on average Android apps take more time to get adopted by users.

This bias is used to prioritize traces coming from a development environment in order to increase the amount of data retained for such environments, since they are more likely to be useful for debugging.

To mark a trace's root transaction as belonging to a development environment, we leverage a list of known development environments, which is maintained and updated regularly by Sentry.

Copied
ENVIRONMENT_GLOBS = [
    "*debug*",
    "*dev*",
    "*local*",
    "*qa*",
    "*test*",
    # ...
]

The list of development environments is available here.

For prioritizing dev environments, we use a sample rate of 1.0 (100%), which results in all traces being sampled.

This bias is used to prioritize low-volume transactions that can be drowned out by high-volume transactions. The goal is to rebalance sample rates of the individual transactions so that low-volume transactions are more likely to have representative samples. The bias is of type trace, which means that the transaction considered for rebalancing will be the root transaction of the trace.

In order to rebalance transactions, the system computes the counts of the transactions for each project and runs an algorithm that, given the sample rate of the organization and the counts of each transaction, computes a new sample rate for each transaction assuming an ideal distribution of the counts.

This bias is used to de-prioritize transactions that are classified as health checks. The goal is to reduce the amount of data retained for health checks, since they are not very useful for debugging.

In order to mark a transaction as a health check, we leverage a list of known health check endpoints, which is maintained by Sentry and updated regularly.

Copied
HEALTH_CHECK_GLOBS = [
    "*healthcheck*",
    "*healthy*",
    "*live*",
    "*ready*",
    "*heartbeat*",
    "*/health",
    "*/healthz",
    # ...
]

The list of health check endpoints is available here.

For deprioritizing health checks, we compute a new sample rate by dividing the base sample rate of the project by a factor, which is defined here.

If you want to learn more about the architecture behind Dynamic Sampling, continue to the next page.

Help improve this content
Our documentation is open source and available on GitHub. Your contributions are welcome, whether fixing a typo (drat!) or suggesting an update ("yeah, this would be better").