Rock-Solid Application Monitoring with Acceptance Tests

This post introduces the concept of acceptance tests and how they can be used to catch production issues as soon as they appear. It defines what exactly acceptance tests are in this context, how they fit into application monitoring and how to get started.

What are Acceptance Tests? #

Definition #

The common definition of an acceptance test in software is that of a test that is performed to determine if a set of requirements are fulfilled. Such tests are often conducted manually and only before delivering an application to a customer (i.e., these tests are used for the customer to decide whether or not to “accept” the product).

For the purposes of this post however, we define acceptance tests as automated programs that are run at regular intervals to make sure core business functionality of an application is working at all times. I.e., such tests are run before and after an application is delivered to a customer.

Core Characteristics #

The core characteristics of acceptance tests are:

Interface: Acceptance tests interact with the actual production environment, often through the same APIs that real clients are using.
Deployment: Acceptance tests should be deployed separately from the main infrastructure to ensure isolation in the case of failures.
Development: Acceptance tests for a particular feature are usually written by the engineers responsible for that feature during the development process.
Handling Failure: If an acceptance test fails, this usually means that a core business functionality is not working as expected and thus this should be treated as a high-priority incident, and should be investigated immediately.

Each of these topics will be explored in more depth later in this post. At the end, I will give some suggestions on how to get started with acceptance tests.

Case Study: An Example Acceptance Test #

To illustrate the idea of acceptance test, consider the following example acceptance test I have recently built for a customer as part of my freelance work:

This company operates an IoT application with hundreds of devices sending sensor data to an event processing platform every few seconds. There are several consumers of these events, one of which takes the sensor data and writes it into a time series database for further analysis. In the past, this consumer has had some problems, causing sensor data to not be written into the database as specified. Since the contents of the time series database were checked only manually from time to time, often no one noticed the failing consumer for days, causing data to be irrevocably lost.

In addition to fixing the problems with the event consumer, I made sure such issues no longer stay undetected. To that end, I developed an acceptance test that verifies this entire pipeline from start (sending sensor data) to finish (finding the sensor data in the time series database). If some component of the pipeline fails, this test will detect that and alert us.

Concretely, the acceptance test creates a virtual device and sends a few mock events containing random but well-known sensor data points, in exactly the same way as an actual IoT device does. It then probes the time series database with a linear back-off to look for the specific data points it has sent until it has either found them (in which case the test succeeds) or a timeout has been reached (in which case the test fails). The timeout value (e.g., 20 seconds) has been carefully chosen using historical metrics of device-to-database latencies.

This test runs every 30 minutes and immediately sends alerts if it fails. Whenever some component of this IoT event processing pipeline is failing, the acceptance test will detect this within 30 minutes, which will allow us to investigate and resolve the issue. There has been no loss of data since deploying the acceptance test.

Acceptance Tests for Application Monitoring #

Application monitoring entails many facets, including the aggregated collection of logs and the collection and analysis of instance and application-specific metrics (e.g., CPU utilization of various instances, task queue lengths, event ages at processing time, etc.). I consider acceptance tests as a separate concept that can provide a type of monitoring that the other solutions cannot.

Log aggregation and metric collection along with anomaly detection and alerting are often available as standardized services that can be used in a wide variety of applications (e.g., DataDog or Elastic have offerings in this segment). Acceptance tests on the other hand are custom-made, application-specific and business-logic-sensitive programs that are built by the same engineers that build the application itself. This allows them to not only document what is going on with the application and infrastructure (collecting logs and metrics) and “guess” when something is failing (anomaly detection and alerting rules based on metrics), but to actually check whether an application is doing what it should or not.

One could say that metrics and logs observe the visible symptoms of an application and make guesses about its health, while acceptance tests actually do a bottom-line fitness test to determine an application’s health.

When it comes down to it, a high CPU utilization, overloaded task queues or a spiking number 50x status codes are not what cause your customers to become unhappy. What counts is whether core business functionality is working for them or not. Of course, these two things are correlated most of the time, but acceptance tests allow you to test what is actually relevant, not what is a likely symptom of degraded service.

Benefits of Acceptance Tests #

The core benefits of acceptance tests in general and over other monitoring solutions (which are not replaced, but complemented by acceptance tests) are:

catch business functionality issues early, before your customers: if a bad release or infrastructure issue breaks some core functionality, acceptance tests will detect this immediately (depending on how often they run)
issues that are not visible in logs and metrics can be detected
less false alarms: a spike in CPU utilization or a few 50x status codes should not necessarily cause an incident, and acceptance tests do not trust such lag measures - they check if something is actually going wrong or not

Some drawbacks include that acceptance tests:

are not plug-and-play – they have to be carefully developed, just like the application itself: this means that you likely cannot cover every single feature with acceptance tests
require an isolated deployment architecture, which increases operational burden and cost
are hard to use in deployment pipelines before shipping a new release, as they rely on a fully available production environment
depending on the scheduling frequency, may not catch an issue fast enough (so they need to be complemented with other monitoring solutions)

How Acceptance Tests Interact with an Application #

In order to truly verify that core business functionality of an application is working correctly for its users, acceptance tests should be as close to the actual clients of the application as possible. Thus, they should use the same interfaces as regular clients - which could be a mobile app, a web site or a REST API.

Types of Acceptance Tests #

Based on the type of interaction with an application, several types of acceptance tests can be defined:

API-based acceptance tests: this is the most common type in my experience - tests in this class use the same API endpoints as client applications
UI-based acceptance tests: such tests use an UI testing framework like Selenium along with the actual client app UI to perform checks
database-based acceptance tests: such tests have a direct connection to a production database which they use to validate core assertions about the behavior of an application
hybrid tests: such tests use multiple means of interacting with the application (e.g., they call an API and then validate certain assertions in a database - the IoT pipeline test from above is an example)

Discussion #

While UI-based tests are technically the closest to an actual user’s interaction with the application, they are often hard to develop and maintain. In the standard client-server model, end-user UI-based applications often don’t come with a lot of business logic to begin with, so the main logic sits in the server anyways. Additionally, testing the server using the UI client puts one at risk of mistaking UI bugs for application-wide production issues.

Thus, in my opinion and experience, API-based tests are usually the best way to create acceptance tests. Making requests to an API within the code of the test is simple and effective. In practice, there is little logic that cannot be validated by calling a set of API endpoints and making assertions on the results.

The database of an application is often hard to access from the outside (as it should be), so often database-based acceptance tests should be replaced by API-based tests by introducing a thin API wrapper around a database. This also allows the application to change its internal way of managing data (e.g., changing the type of database, or using another implementation altogether) without breaking the acceptance test.

Acceptance-Test Specific Endpoints #

Introducing a thin API around the database is an example of adding new API endpoints just for an acceptance test. In general, this should be avoided. It not only adds additional code to maintain but also makes the acceptance test validate functionality that the actual clients of the application do not actually use.

However, I have found that some architectures are too complex or too isolated from the outside that make it impossible to test certain criticial assertions without adding additional acceptance test specific code. In such cases, one should be careful to keep the additional code as simple and minimal as possible (and make sure it’s absolutely required - usually it’s better to make the acceptance test more complex if it prevents having to change the application).

Challenges #

In the following, some challenges are listed that will likely arise when integrating acceptance tests into an application monitoring solution. They should be considered as early as possible in the development process.

Access rights and authentication: Usually, API endpoints are protected by authentication mechanisms that verify the caller has the right to call the endpoint. This could be realized using an authentication token supplied in the header of an HTTP request, for example. Acceptance tests may need to be treated as a special type of client and require special handling when it comes to authentication. If the tests run on a separate infrastructure (see deployment), this might also involve firewall adjustments to ensure the atypical API traffic is not blocked.

Running tests during development: While making changes in an application, it can be helpful to run the acceptance test suite locally against an un-deployed new version of the application before deploying it (similarly to running unit tests). This is often hard to accomplish, since the locally running new version of the application likely relies on other services and databases that do not have a locally running version. Additionally, the acceptance tests need to be aware of the locally running version of the application and adjust the endpoints they call accordingly. Having good tooling for such issues in place before developing the first acceptance test is vital.

Important Properties of Acceptance Tests #

To ensure the acceptance tests work as they should and don’t have any unwanted side-effects, one should additionally ensure the following important properties of acceptance tests:

Independence: To ensure acceptance tests can be run at any time and any frequency, and even simultaneously by different developers or deployments, they should be fully independent and assume no particular state in the application.

Isolation: While acceptance tests should interact with the actual production environment, they should not interfere with the data of other users or have any effects that are visible to users of the applications. A good practice is to create a fresh testing account each time a test is run (which also helps with independence) or to have a set of fixed test accounts that are used by the test.

Determinism: As all automated tests, acceptance tests must not rely on any source of randomness or indeterminism.

Deployment of Acceptance Tests #

If acceptance tests are there to verify an application is fulfilling its most important requirements, they definitely should not be running on the same instances as the application itself. Otherwise, if there is an infrastructure problem that causes the application to fail, the acceptance tests will also be affected and might not run to detect the issue in the first place.

Types of Deployment #

I have come across two different approaches for deploying acceptance tests:

a dedicated application that continously runs and schedules acceptance tests
treat acceptance tests as standard short-running tasks with a centralized scheduler

Which one is better in a specific use case depends on the number of acceptance tests, the scale of the infrastructure and which mechanisms one already has in place.

A big company likely already has some kind of centralized scheduler that manages a pool of resources and assigns short-running and long-running tasks to instances (for example, Apache Mesos). In that case, the second approach is likely easier. Acceptance tests are just treated as recurring tasks that are scheduled and run by the central scheduler.

For smaller companies or different types of architectures, a dedicated and isolated application (approach 1) that schedules and executes the tests itself can be easier to get started with. An alternative could be to deploy acceptance tests as serverless functions (e.g., using AWS Lambda) and scheduling them using recurring triggers (e.g., using AWS CloudWatch schedule-based event triggers).

Want to learn how to build a scalable cloud architecture for your startup?

When to Run Acceptance Tests #

Choosing the right scheduling interval of an acceptance test depends on the following factors:

the duration it takes for the acceptance test to run (e.g., is it a few seconds or 2 minutes)
how critical the business functionality that is tested is
how fast you need to catch production issues to conform to your SLOs

In addition, it makes sense to run all acceptance tests after a new deployment to make sure that regressions are caught as early as possible. One could even roll back a deployment automatically if the acceptance tests don’t succeed with the new version. One needs to be careful to avoid noisy alerts when the deployment of a new version of the application requires adjustments in its acceptance tests as well. Temporarily pausing the acceptance tests in that case or having a mechanism that ensures the correct version of the application is used by the acceptance tests for that version can help in this case.

In practice, I have seen acceptance tests being run anywhere between every minute and every hour (in addition to after deployments).

Verify that Acceptance Tests are Running #

An important caveat to keep in mind is that if acceptance tests are not run or fail themselves, they cannot send alerts. You need to verify that acceptance tests are actually being run and send alerts if they don’t succeed (which is technically another acceptance test).

A good solution I have come across is to use the “inverted Hollywood Principle”:

We won’t call you, call us.

The idea is that instead of the acceptance tests themselves sending an alert if they fail, all they do is to call some third party (“us”) if they succeed. The third party then checks regularly when the latest successful acceptance test run was. If it is older than some threshold (e.g., 1.5x the scheduling interval of the test), it will send an alert, notifying the developers that the acceptance test was not successful within a given time span.

Of course, the third party that makes this check is another application to develop and maintain, but in my opinion it is critical to decouple the acceptance test itself from the process that is handling the result (and is aware of the fact if the test is not run at all).

Development of Acceptance Tests #

The engineers developing a particular feature know best how to validate it is working as expected. Thus, it makes sense for the same engineers to write acceptance tests for the feature.

Acceptance Tests as Documentation #

An added benefit is that the acceptance tests of features of other teams serve as a form of documentation that other teams interfacing with that team’s applications can use to understand how the application is supposed to work and be interacted with.

Acceptance Tests in the Development Lifecycle #

Since an application usually needs to be deployed before it can be tested end-to-end, acceptance tests are often developed after the corresponding features themselves. To prevent shipping out a new feature of an application to production before it is covered by acceptance tests, it also makes to first deploy it to a development environment and also run acceptance tests there.

While not every single feature needs to be validated with acceptance tests (many can be covered by unit tests), it makes sense to have working acceptance tests for critical business functionality in place before deploying such functionality to production.

Handling the Failure of Acceptance Tests #

In general, a failing acceptance test means that core business functionality is not working as expected, which should immediately trigger an incident and notify the responsible engineers.

Avoiding Alert Fatigue #

However, transient issues do exist, and for complex end-to-end tests like acceptance tests, they can easily cause a failing acceptance test. In order to avoid alert fatigue (triggering too many false alarms which eventually causes engineers to ignore alerts), it may make sense to immediately execute the failing acceptance test again in such cases. If the issue persists, then it’s time to send an alert. Of course, the test itself must be deterministic for this to bring the desired effect of only ignoring transient issues.

This plays well with the proposed solution of how one should verify that acceptance tests are running, by only sending alerts if there has not been a successful run within a given time span (as described above).

Conclusion #

To conclude, acceptance tests are a useful extension to an application monitoring strategy, not replacing but complementing other practices such as log aggregation and metric collection. They can verify that core business functionality is working as expected at all times at a level of precision that other approaches cannot provide.

Acceptance tests come at a cost, however - they need to be carefully developed and integrating them into one’s infrastructure can entail dealing with complex challenges.

For me personally, the effort is well worth it, as it allows me to trust that the most important features of my systems are working as expected at all times, and that I will be alerted if anything is wrong, often before customers will notice it.

How to Get Started with Acceptance Tests #

To get started with adding acceptance tests to your existing application and infrastructure, I suggest the following steps:

Identify the core workloads that you want to ensure are working at all times (e.g., making a booking in a hotel reservation system) and cover with acceptance tests.
Choose a platform or solution that schedules your acceptance tests at regular intervals (see the suggested options here).
Develop, deploy and schedule your first simple hello-world acceptance test. Extend it to cover the most important business functionality you want to test, making sure it fulfills the important properties discussed above.
Choose a solution to verify that your tests are actually running (more details).

If you need any help or consulting along the way, feel free to reach out!