Skip to content

LLM API cost alerts

Key idea:

LLM spend can grow 100× in an hour from a prompt loop, infinite retries, or an attack. Two layers of defense: a hard cap at the provider (OpenAI usage limit, Anthropic spend) + a soft alert on your own budget (heartbeat monitor from a billing script every 5 min). Attribute by user_id so you can ban runaways fast.

Below: details, example, related terms, FAQ.

Try it now — free →

Details

  • OpenAI hard limit: Settings → Limits → Usage limits → Monthly budget (cuts off API)
  • Anthropic spend limit: Account → Plans & Billing → Spend limit
  • Soft alert every 5 min: cron script fetches the usage API → if > daily_target × 1.2 → Telegram
  • Attribution: tag every LLM call with user_id + endpoint in a JSON log for post-mortems
  • Prompt-loop defense: max_tokens (50-500 for chat, 4K for long-form), 30 s timeout, retry no more than 1×

Example

# Cron: every 5 min — heartbeat to enterno.io with current daily spend
# /etc/cron.d/llm-cost-watch
*/5 * * * * www-data /usr/bin/python3 /opt/llm-cost-check.py

# llm-cost-check.py (simplified)
import requests, os
from datetime import date

spent = fetch_today_usage()  # your billing
budget = 50.0  # USD/day

if spent > budget * 1.2:
    requests.post('https://enterno.io/api/heartbeat',
        params={'token': os.environ['HEARTBEAT_TOKEN'],
                'status': 'critical',
                'msg': f'LLM spend ${spent:.2f} > 120% of ${budget}/day'})
else:
    requests.post('https://enterno.io/api/heartbeat',
        params={'token': os.environ['HEARTBEAT_TOKEN'], 'status': 'ok'})

Related

TL;DR: Setting Up LLM API Cost Alerts

To set up LLM API cost alerts effectively, configure a budget cap using your cloud provider's billing console and implement anomaly detection using monitoring tools like AWS CloudWatch or Google Cloud Monitoring. Set thresholds for alert notifications when spending approaches your budget cap, and integrate alerting mechanisms such as email or SMS for real-time updates.

Configuring Budget Caps for LLM API Usage

Establishing a budget cap is crucial to managing costs effectively when using LLM APIs. Here’s how to configure a budget cap using AWS as an example:

  1. Access the AWS Billing Console: Log into your AWS account and navigate to the Billing Dashboard.
  2. Create a Budget: Select 'Budgets' from the side menu and then click on 'Create budget.'
  3. Define Budget Type: Choose 'Cost budget' and click 'Set your budget.'
  4. Set Budget Amount: Specify your budget limit. For instance, set a monthly budget of $500 for LLM API calls.
  5. Configure Alerts: Under 'Set alerts,' specify the notification threshold. For example, set an alert for when costs exceed 80% of your budget, which would be $400 in this case.
  6. Choose Notification Channels: You can opt to receive alerts via email or SMS. Enter your contact information to receive notifications.
  7. Review and Create: Review your settings and click 'Create budget' to finalize the setup.

This setup allows you to monitor LLM API usage against your budget to prevent unexpected costs.

Implementing Anomaly Detection for Cost Management

In addition to budget caps, implementing anomaly detection can help identify unexpected spikes in LLM API usage, which could lead to increased costs. Here’s how to set up anomaly detection using Google Cloud Monitoring:

  1. Access Google Cloud Console: Navigate to the Google Cloud Console and select 'Monitoring.'
  2. Create an Alert Policy: Click 'Alerting' and then 'Create Policy.'
  3. Select Condition Type: Choose 'Metrics' as the condition type. For LLM API costs, select the relevant metric, such as 'Total API Cost.'
  4. Configure Anomaly Detection: In the condition configuration, select 'Anomaly Detection' and specify the threshold. For example, set a threshold that triggers an alert if costs exceed the average by 20% over a defined period.
  5. Set Notification Channels: Choose how you want to be notified (e.g., email, SMS, or Slack) when an anomaly is detected.
  6. Review and Save: Review your settings and click 'Save' to activate the alert policy.

This approach allows you to proactively manage costs and react quickly to unusual spending patterns, ensuring your LLM API usage remains within budget.

Dead man's switchAlert when job goes silent
Flexible Grace PeriodAllowed ping latency window
REST API PingSingle GET confirms liveness
Cron + CI + ScriptsFor any periodic task

Why teams trust us

1min
min interval
Email
Telegram + Email alerts
HTTP
ping endpoint
Scout
10 monitors free

How it works

1

Create heartbeat

2

Ping URL from cron

3

Get alert on miss

What is Heartbeat Monitoring?

A heartbeat monitor is a "reverse monitor": instead of us polling the service, the service signals us that it's alive. If no signal arrives within the set interval — we send an alert.

Simple Integration

One GET request to a unique URL — and the monitor knows the job completed.

Grace Period

Set an acceptable ping delay to avoid false alerts.

Smart Notifications

Email and Telegram on missed ping. Repeated alert if silence continues.

Execution History

Full ping log with timestamps — see every job execution.

Who uses this

DevOps

cron job monitoring

Developers

background worker check

Sysadmins

dead man's switch

Business

payment queue monitoring

Common Mistakes

No grace periodWithout grace period, any minor delay triggers a false alert.
Pinging before task startsPing at the end of the task — it confirms successful completion, not just start.
One URL for different tasksCreate a separate monitor for each cron job — otherwise you won't know which one failed.
Not pinging on errorIf the task fails — don't ping. Missing ping = failure signal.

Best Practices

Ping at the very endMake the heartbeat URL call the last command in the script.
Use curl in croncurl -s https://enterno.io/api/heartbeat/TOKEN — simple and reliable.
Set grace = 20–30%If the job takes 5 min, grace period = 1–2 min on top.
Cover all critical jobsBackups, report generation, data sync — all should have a heartbeat monitor.

Start monitoring cron for free

Heartbeat monitor: 5 tasks free, Telegram and email alerts on missed runs.

Sign up free

Learn more

Frequently Asked Questions

Why is the provider hard cap not enough?

The cap fires after the billing cycle — typically a 10-15 min lag. In that window a prompt loop can eat $1 000+. The 5-min soft alert catches the spike before the cap does.

How do I defend against a runaway attack?

Per-user rate limit (5 req/min), max_tokens budget per user per day, IP ban on > 3 hot alarms in a row. Provider hard cap is the last line of defense, not the first.

What baseline budget should I use?

For a chatbot: ($/req × avg RPS × 86400). gpt-4o-mini ~$0.0005/req × 1 RPS × 86400 = ~$43/day. Alert at 120% of baseline.

Try the live tool that powered this guide

Free plan — 20 monitors, 5-minute checks, no card required. Upgrade for 1-minute interval and multi-region monitoring.