Monitoring as code is the practice of managing monitoring configurations and alerts through code. This approach offers several benefits for engineering teams at scale, including codified, version-controlled, and reusable monitoring configurations.
Synthetic monitoring is a proven practice for ensuring reliability of applications and services in production. It excels at proactively discovering issues that affect real users, which is why companies—from startups to Fortune 500—rely on it to enhance their user experience. But teams that are moving fast or growing at scale can face real challenges with traditional synthetic monitoring workflows and tools:
Monitoring as Code (MaC) addresses these pain points by automating and integrating monitoring directly into CI/CD pipelines, delivering efficiency, scalability, and actionable insights. It redefines monitoring as a dynamic, code-driven process that evolves seamlessly with your applications.
Let’s explore MaC, its benefits, and how you can use it to level up your monitoring.
Monitoring as Code involves defining monitoring configurations and operations as code, enabling automated setups, real-time observability across DevOps lifecycles, and seamless integration with development workflows.
With Monitoring as Code, your monitoring is implemented in your CI/CD lifecycle and maintained in version control, without having to rummage around a UI. This approach ensures that monitoring evolves alongside your application, providing continuous feedback and maintaining stability.
While synthetic monitoring is a tried and true software practice, there are a few challenges that come with the way tools have traditionally worked.
Traditional monitoring tools are difficult to configure alerts at a granular level, often over alerting a chain of teams, wasting time and precious resources. This leads to “alert fatigue,” where important alerts can be missed, and teams become desensitized to notifications.
Synthetic monitors can be difficult to implement and maintain in large, complex systems. Organizations with many services and screens can spend hundreds of hours creating and reconfiguring monitors and alerts as their underlying software changes.
Operations teams have traditionally owned monitoring tools within organizations and often times are siloed from the engineering teams who ultimately need to fix the error.
Traditional synthetic monitoring tools work at the surface level of an application, unable to give detailed explanations of the root cause of the alert.
Whenever we discuss the ideal setup for synthetic monitoring, experienced DevOps and SRE engineers will respond: “Sure, that would be great, if we had the budget for it.” While it would be nice to simulate a complex user behavior every 30 seconds from 5 different geographic regions, the cost of most synthetic tools makes that cost-prohibitive.
Here are the most important benefits of adopting Monitoring as Code:
Monitoring as Code transforms your monitoring setup into code that is stored, managed, and versioned alongside your application code. This approach brings significant advantages over traditional monitoring tools:
With MaC, your monitoring evolves alongside your codebase, enabling precision, flexibility, and scalability that traditional methods simply can’t match.
Reusability in Monitoring as Code (MaC) goes beyond efficiency—it ensures consistent, reliable, and scalable monitoring across environments and teams. By reusing existing assets like Playwright tests and API definitions, MaC streamlines setup and eliminates redundancy.
By reusing existing assets and creating modular configurations, MaC ensures monitoring is efficient, consistent, and adaptable to changing needs.
MaC breaks down silos by embedding monitoring into the development lifecycle, fostering collaboration across teams:
By embedding collaboration into the monitoring process, MaC drives alignment, reduces friction, and creates a culture of shared accountability.
To maintain high-quality deployments, monitoring should span every stage of the Software Development Life Cycle (SDLC)—from local development to staging and production. Monitoring as Code (MaC) integrates directly into CI/CD pipelines, enabling automatic and consistent observability across environments. This ensures that issues are detected and addressed before they reach production, significantly reducing downtime and errors.
For instance, commercetools, a leader in headless commerce, adopted MaC to implement continuous monitoring throughout their SDLC. By codifying their monitoring setup, they ensured every stage of their pipeline was equipped with accurate, automated checks. This approach caught issues early in development, reduced production incidents, and provided end-to-end visibility into their systems without adding extra operational overhead.
Monitoring APIs and UIs in fast-paced environments is a complex challenge, especially when new endpoints are constantly being added or updated. Traditional approaches often involve manually setting up and maintaining checks, which is time-consuming and prone to errors. As systems scale, gaps in monitoring coverage or misconfigured alerts can lead to missed issues and delayed responses, impacting the reliability of your services. Monitoring as Code solves this by automating the creation and deployment of monitoring configurations, ensuring every API is consistently and comprehensively monitored from staging to production. By integrating monitoring into your development pipelines, you can keep up with rapid changes without sacrificing reliability.
For example, TrueLayer, a leading open banking platform, adopted Monitoring as Code to address these challenges. By codifying their API monitoring configurations and automating their setup with Checkly, they reduced manual effort by 60%. This allowed their team to scale monitoring as they introduced new endpoints, ensuring complete coverage and faster issue detection, all while focusing more on innovation rather than operational tasks.
In large organizations, monitoring is often centralized under SRE teams, creating bottlenecks and slowing down response times. This approach limits the ability of engineering teams to adapt and manage monitoring for their own services. Monitoring as Code (MaC) decentralizes this process, empowering teams to define and control their monitoring setups directly within their workflows. This fosters greater accountability and enables faster detection and resolution of issues.
LinkedIn leveraged MaC to transition from an SRE-managed model to one where engineering teams took full ownership of their monitoring. Using Checkly, they streamlined the process, allowing teams to tailor monitoring configurations to their specific needs. This shift reduced time to detection, improved system reliability, and gave teams the autonomy to innovate without relying on a central authority.
Monitoring as Code (MaC) enables you to test key aspects of your application and infrastructure with precision and automation. Here are some of the areas where it excels:
Application uptime reflects how reliably your application is available to users, and even a few minutes of downtime can lead to user frustration, loss of revenue, and damage to your reputation.
MaC allows you to define uptime checks programmatically, ensuring these checks are consistent across all environments. For example, you can configure MaC to ping specific endpoints or services at regular intervals to verify they are reachable and responding within acceptable timeframes. If a failure is detected, MaC immediately triggers alerts, allowing teams to investigate and resolve issues before they escalate.
MaC also simplifies scaling uptime monitoring across multiple regions. By deploying checks from different geographic locations, you can ensure your application is accessible worldwide and identify regional connectivity issues. Additionally, codified monitoring enables you to keep pace with application changes—new services and endpoints are automatically incorporated into your monitoring setup, reducing the risk of missed coverage.
MaC allows you to simulate and test user interactions with your web application. For example, you can create browser checks to validate critical workflows like logging in, completing a checkout, or performing a search.
These automated tests ensure that every step—loading pages, submitting forms, and rendering elements—works as expected. They catch issues early, safeguarding functionality and user satisfaction.
With MaC, browser tests are version-controlled and evolve with your application. You can easily scale monitoring across regions, browsers, and devices, ensuring a seamless experience for all users.
Here’s an example of a browser monitor that logs into Checkly, and waits for the dashboard to fully load.
import { expect, test } from '@playwright/test' // 1
test('Login to Checkly', async ({ page }) => { // 2
await page.goto('https://app.checklyhq.com/login') // 3
await page.locator('input[type="email"]').type('john@example.com') // 4
await page.locator('input[type="password"]').type('mypassword') // 4
await page.getByRole('button', { name: 'Log In' }).click() // 5
const homeDashboardTable = page.getByTestId('home-dashboard-table')
await expect(homeDashboardTable).toBeVisible() // 6
})
APIs power modern applications, and MaC ensures their availability, performance, and correctness. You can automate checks to validate response status codes, measure latency, and verify payloads.
These checks provide confidence that your APIs function reliably under different conditions. By deploying API checks globally, you can ensure consistent performance for users across all regions.
Since monitoring setups are code-based, they are easy to update as your APIs evolve. This keeps monitoring aligned with API changes, eliminating gaps in coverage.
Here’s an example of an API check that tests an e-commerce flow of updating an inventory item:
import { ApiCheck, Frequency, AssertionBuilder, RetryStrategyBuilder } from 'checkly/constructs'
new ApiCheck('update-inventory-item', {
name: 'update inventory item',
activated: true,
muted: false,
shouldFail: false,
runParallel: false,
locations: ['eu-central-1', 'us-west-1'],
tags: [],
frequency: Frequency.EVERY_10M,
environmentVariables: [],
// group: your check belongs to group 'Inventory API',
maxResponseTime: 10000,
degradedResponseTime: 5000,
request: {
url: 'https://danube-webshop.herokuapp.com/api/books',
method: 'GET',
followRedirects: true,
skipSSL: false,
assertions: [
AssertionBuilder.jsonBody('$.length').equals(30),
AssertionBuilder.statusCode().equals(200),
],
body: ``,
bodyType: 'NONE',
headers: [],
queryParameters: [],
},
retryStrategy: RetryStrategyBuilder.fixedStrategy({
baseBackoffSeconds: 0,
maxRetries: 1,
maxDurationSeconds: 600,
sameRegion: false,
}),
})
Slow app load times or high latency can drive users away and impact revenue. Monitoring as Code provides a streamlined way to track performance metrics like page load times, API response times, and latency, ensuring your application consistently meets expectations.
With MaC, performance checks can be integrated into your CI/CD pipelines to catch slowdowns early in development or staging. Automated alerts notify your team of performance issues in real time, enabling proactive fixes before users are impacted.
MaC also excels at monitoring deployments by running performance checks automatically after each release. This ensures that new changes don’t introduce regressions, and if issues arise, teams can act quickly to resolve or roll back, maintaining stability and trust.
MaC makes it easy to monitor performance across regions, simulating user experiences in different locations. This helps identify latency caused by network issues, server configurations, or CDNs, ensuring fast, consistent performance worldwide. By codifying performance monitoring, MaC standardizes and automates the process, allowing your team to maintain a responsive, reliable web app at scale.
Scheduled jobs or cron jobs are crucial for automating tasks, but failures can lead to major problems if undetected. MaC enables heartbeat monitoring to ensure these jobs run on schedule.
A heartbeat check verifies that a job reports back within the expected timeframe. If it doesn’t, an alert is triggered, giving you a chance to fix the issue before it impacts your system.
This proactive approach ensures your critical automation workflows are reliable and resilient.
MaC also tests the health and reliability of your infrastructure. This includes uptime monitoring for servers, latency checks for databases, and validating metrics like CPU or memory usage.
By codifying these checks, you create an automated system that detects potential issues early. This helps you resolve problems before they escalate into critical incidents.
Monitoring as Code (MaC) with Checkly empowers you to streamline, automate, and scale your monitoring strategy by leveraging advanced tools and integrations. With native Playwright support, Terraform and Pulumi integrations, and seamless CI/CD pipeline compatibility, Checkly simplifies complex monitoring setups and enables dynamic, code-driven observability.
Checkly’s compatibility with CI/CD pipelines ensures monitoring is embedded into your development workflow. You can configure performance, uptime, and functional checks to run automatically during deployments, catching issues early and maintaining system reliability. This tight integration helps teams:
With native Playwright support, Checkly enables you to directly use existing Playwright scripts as browser checks. This integration allows you to monitor complex user workflows—such as logins, form submissions, or multi-step transactions—without needing to rewrite tests.
Your monitoring aligns with your application logic, offering precise, actionable insights into the user experience.
A strong alerting strategy ensures critical issues are addressed promptly while minimizing noise for your team. With Checkly, you can configure alerting integrations like PagerDuty or OpsGenie directly in your codebase. This approach allows you to:
Dashboards offer a real-time view of your application’s health, and MaC enables you to configure these visualizations programmatically. Tools like Grafana integrate easily with MaC, allowing you to:
Performance benchmarking helps you understand your application’s limits and track improvements over time. MaC allows you to automate performance tests and benchmark configurations, ensuring consistent measurement across deployments. You can:
Codified benchmarks keep your team aligned and make it easier to identify potential bottlenecks before they impact users.
Postmortems are crucial for learning from incidents, and MaC ensures these lessons are integrated directly into your monitoring setup. After identifying root causes, you can:
This approach turns every incident into an opportunity for growth, building resilience into your application with each iteration.
Modern applications often run across multiple regions or environments, and ensuring consistent performance everywhere is a major challenge. MaC allows you to:
This granular, automated approach ensures your application delivers a seamless experience to users regardless of their location or environment.
Checkly integrates seamlessly with popular IaC tools like Terraform and Pulumi. This means you can define monitoring configurations alongside your infrastructure code, ensuring monitoring setups are automatically deployed with your infrastructure updates. These integrations enable:
Implementing monitoring as code effectively requires following best practices to ensure reliability, scalability, and maintainability.
By combining native Playwright support, IaC integrations like Terraform and Pulumi, and seamless CI/CD pipeline compatibility, Checkly offers a robust, flexible, and scalable solution for Monitoring as Code. Whether it’s managing alerts, benchmarking performance, or monitoring deployments, Checkly ensures your monitoring evolves dynamically with your application, delivering reliability and efficiency at every step.
Start streamlining your monitoring strategy today with Checkly and experience the power of Monitoring as Code firsthand.