OpenSearch — alert on the high disk watermark
OpenSearch hits 85 % disk (high watermark) — indices go read-only, the write API breaks. You want to catch this before 95 % (flood stage).
Recipe
#!/usr/bin/env bash
# /etc/cron.d/os-disk
# */5 * * * * root /opt/os-disk.sh
URL=${OS_URL:-http://localhost:9200}
THRESH=${THRESH:-80} # alert above 80 % usage
DATA=$(curl -fsS "$URL/_cat/allocation?format=json" \
| jq -r '.[] | select(.node != "UNASSIGNED") | [.node, .["disk.percent"]] | @tsv')
OVER=0
echo "$DATA" | while IFS=$'\t' read NODE PCT; do
if [ "${PCT%.*}" -gt "$THRESH" ]; then
OVER=$((OVER + 1))
echo "node $NODE @ ${PCT}%"
fi
done
if [ "$OVER" -gt 0 ]; then
curl -fsS "$HEARTBEAT_URL" --data "disk_over=$OVER,threshold=$THRESH"
exit 2
fi
echo "OK (all nodes below ${THRESH}%)"
Same thing in Enterno.io
Wrap it in an Enterno heartbeat — 30-day disk-percent retention plus a degradation pattern (e.g. +1 %/h) that tells you when to plan an upgrade.
Related recipes
Logs or backup files eat /var; in 24 hours the server falls over. A basic df check every 10 minutes saves a 2 AM incident.
Production ES cluster goes yellow status. Need an alert now, not 30 minutes later via Kibana.
logrotate stopped (config syntax error on last edit, or the systemd timer was disabled) — the main log file grows. Nobody notices until the disk fills.