Перейти к содержимому
Skip to content
← All articles

Monitoring as Code: Prometheus Rules and Grafana Dashboards

Monitoring as Code: A Complete Guide

Monitoring as Code (MaC) applies infrastructure-as-code principles to observability. Instead of manually configuring dashboards, alerts, and recording rules through web interfaces, all monitoring configuration is defined in version-controlled files, reviewed through pull requests, and deployed through CI/CD pipelines.

Why Monitoring as Code?

Manual monitoring configuration creates fragile, undocumented observability setups that are difficult to reproduce, audit, and maintain. Codifying monitoring configuration solves these problems.

Key Benefits

Prometheus Recording Rules

Recording rules pre-compute frequently used or expensive PromQL expressions and save the result as new time series. This reduces query load on Prometheus and speeds up dashboard rendering.

Recording Rules Configuration

# recording-rules.yml
groups:
  - name: http_request_rates
    interval: 30s
    rules:
      - record: job:http_requests:rate5m
        expr: sum(rate(http_requests_total[5m])) by (job)
        labels:
          aggregation: "rate5m"

      - record: job:http_request_duration:p99
        expr: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, job))

      - record: job:http_errors:ratio5m
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m])) by (job)
          /
          sum(rate(http_requests_total[5m])) by (job)

Recording Rule Best Practices

Prometheus Alerting Rules

Alerting rules define conditions that trigger notifications when metrics cross thresholds. Well-designed alerts are actionable, properly scoped, and include sufficient context for responders.

Alerting Rules Configuration

# alerting-rules.yml
groups:
  - name: application_alerts
    rules:
      - alert: HighErrorRate
        expr: job:http_errors:ratio5m > 0.05
        for: 5m
        labels:
          severity: critical
          team: backend
        annotations:
          summary: "High HTTP error rate on {{ $labels.job }}"
          description: "Error rate is {{ $value | humanizePercentage }} for job {{ $labels.job }}"
          runbook: "https://wiki.internal/runbooks/high-error-rate"
          dashboard: "https://grafana.internal/d/app-overview"

      - alert: HighLatency
        expr: job:http_request_duration:p99 > 2.0
        for: 10m
        labels:
          severity: warning
          team: backend
        annotations:
          summary: "High p99 latency on {{ $labels.job }}"
          description: "p99 latency is {{ $value }}s for job {{ $labels.job }}"

      - alert: PodCrashLooping
        expr: rate(kube_pod_container_status_restarts_total[15m]) * 60 * 5 > 0
        for: 15m
        labels:
          severity: warning
          team: platform
        annotations:
          summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping"

Alert Design Principles

  1. Alert on symptoms, not causes — alert on user-visible impact (high latency, errors) rather than internal metrics
  2. Use appropriate thresholds — set thresholds based on SLO targets and historical data
  3. Include for duration — require conditions to persist before firing to avoid flapping
  4. Provide context — annotations should include runbook links, dashboard URLs, and current values
  5. Set proper severity — distinguish between critical (pages on-call) and warning (next business day)

Grafana Dashboards as Code

Grafana dashboards can be defined as JSON or generated programmatically using tools like Grafonnet (Jsonnet library) or Grafana Terraform provider. This ensures dashboards are consistent, version-controlled, and automatically deployed.

Dashboard JSON Structure

{
  "dashboard": {
    "title": "Application Overview",
    "uid": "app-overview",
    "tags": ["application", "production"],
    "timezone": "utc",
    "refresh": "30s",
    "panels": [
      {
        "title": "Request Rate",
        "type": "timeseries",
        "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 },
        "targets": [
          {
            "expr": "job:http_requests:rate5m",
            "legendFormat": "{{ job }}"
          }
        ]
      }
    ]
  }
}

Provisioning Dashboards

# grafana/provisioning/dashboards/default.yml
apiVersion: 1
providers:
  - name: default
    orgId: 1
    folder: "Application"
    type: file
    disableDeletion: false
    updateIntervalSeconds: 30
    allowUiUpdates: false
    options:
      path: /var/lib/grafana/dashboards
      foldersFromFilesStructure: true

CI/CD Pipeline Integration

Monitoring configuration should be validated and deployed through the same CI/CD pipeline as application code.

Validation Pipeline

# .github/workflows/monitoring.yml
name: Monitoring Validation
on: [pull_request]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Validate Prometheus rules
        run: promtool check rules monitoring/rules/*.yml
      - name: Validate alerting rules
        run: promtool check rules monitoring/alerts/*.yml
      - name: Unit test rules
        run: promtool test rules monitoring/tests/*.yml
      - name: Validate Grafana dashboards
        run: |
          for f in monitoring/dashboards/*.json; do
            python -m json.tool "$f" > /dev/null
          done

Testing Monitoring Configuration

Prometheus provides built-in support for testing alerting and recording rules with unit tests defined in YAML files.

Rule Unit Tests

# tests/alert-tests.yml
rule_files:
  - ../alerts/application.yml
evaluation_interval: 1m
tests:
  - interval: 1m
    input_series:
      - series: 'http_requests_total{job="api", status="500"}'
        values: "0+10x20"
      - series: 'http_requests_total{job="api", status="200"}'
        values: "0+100x20"
    alert_rule_test:
      - eval_time: 10m
        alertname: HighErrorRate
        exp_alerts:
          - exp_labels:
              severity: critical
              team: backend
              job: api

Directory Structure

Organize monitoring configuration in a clear, maintainable directory structure alongside your application code.

monitoring/
  rules/
    recording-http.yml
    recording-database.yml
  alerts/
    application.yml
    infrastructure.yml
    business.yml
  dashboards/
    application/
      overview.json
      api-details.json
    infrastructure/
      nodes.json
      kubernetes.json
  tests/
    alert-tests.yml
    recording-tests.yml
  provisioning/
    dashboards.yml
    datasources.yml

Conclusion

Monitoring as Code transforms observability from a fragile manual process into a reliable, auditable, and automated practice. By defining Prometheus rules and Grafana dashboards in version-controlled files, teams gain consistency, reproducibility, and the ability to evolve monitoring alongside applications. Start by codifying your most critical alerts and dashboards, validate them in CI, and gradually expand coverage as the practice matures.

Check your website right now

Check now →
More articles: DevOps
DevOps
Zero-Downtime Deployment Strategies
16.03.2026 · 11 views
DevOps
Log Management Best Practices: From Chaos to Clarity
16.03.2026 · 10 views
DevOps
Docker Container Monitoring: Metrics, Tools, and Best Practices
16.03.2026 · 11 views