Incident response

This page provides step-by-step response procedures for the most critical validator incidents. Bookmark it and review it before you need it.

Severity levels

Level	Examples	Response time
P0 — Critical	Node down, tombstoned, signing failure	Immediate
P1 — High	Jailed, catching up, low peers	Within 1 hour
P2 — Medium	High memory/CPU, missed blocks trending up	Within 4 hours
P3 — Low	Non-critical log errors, configuration drift	Next maintenance window

P0: Node not signing blocks

Detect:

# Check validator power (0 = not signing)
curl -s localhost:26660/metrics | grep tendermint_consensus_validator_power

# Check signing info
autheod query slashing signing-info \
  $(autheod tendermint show-validator --home /path/to/node-home)

Response:

Check if the service is running: sudo systemctl status autheod
If stopped, restart: sudo systemctl start autheod
Check logs: sudo journalctl -u autheod -n 200 --no-pager
Check sync status: autheod status | jq '.SyncInfo'
If out of sync, restore from snapshot (see Backups and restore)

P0: Validator tombstoned

Detect:

autheod query slashing signing-info \
  $(autheod tendermint show-validator --home /path/to/node-home) \
  | grep tombstoned

Returns tombstoned: true if tombstoned. Response: Tombstoning is permanent — it results from double-signing and cannot be undone.

Stop the tombstoned node immediately
Do not attempt unjail — it will fail
Commission a new server
Generate a new consensus key: autheod init new-validator --chain-id autheo_2127-1
Register a new validator with MsgCreateValidator and a new consensus key
Bind your Sovereign license to the new validator address

P1: Validator jailed (liveness)

Detect:

autheod query staking validator <autheovaloper-address> | grep status
# Look for: BOND_STATUS_UNBONDED (jailed)

Response:

Diagnose the cause

Check why blocks were missed — look for crashes, restarts, or network interruptions in the logs:

sudo journalctl -u autheod --since "2 hours ago" | grep -E "error|panic|missed"

Fix the root cause

Resolve the underlying issue before unjailing: disk full, OOM, misconfiguration, etc.

Ensure node is synced

autheod status | jq '.SyncInfo.catching_up'
# Must return false before unjailing

Verify license is not REVOKED

autheod query license license <license-id>

Submit unjail

autheod tx slashing unjail \
  --from mykey \
  --chain-id autheo_2127-1 \
  --keyring-backend file

Re-delegate if needed

If the license shows BOUND after unjailing:

autheod tx staking delegate <autheovaloper-address> <amount>aauth \
  --from mykey --chain-id autheo_2127-1 --keyring-backend file

P1: Node not syncing (catching_up: true)

Detect:

autheod status | jq '.SyncInfo.catching_up'
# Returns true if behind

Response:

Check peer count: curl -s localhost:26657/net_info | jq '.result.n_peers'
If peers < 3, add persistent peers in config/config.toml

If syncing is extremely slow (hours behind), restore from snapshot:

sudo systemctl stop autheod
rm -rf /path/to/node-home/data/
wget https://snapshot.autheo.com/data_backup_latest.tar.gz
tar xzvf data_backup_latest.tar.gz
mv data/ /path/to/node-home/data/
cp /secure/backup/priv_validator_state.json /path/to/node-home/data/priv_validator_state.json
sudo systemctl start autheod

P1: Hardware failure — migrate to new host

See Runbook B: Full hardware failure. The critical rule: confirm the old host is completely powered off before starting the new host with the same consensus key.

Post-incident review

After every P0 or P1 incident:

Document the timeline (when detected, when resolved)
Identify the root cause
Update monitoring thresholds if the incident was not caught early enough
Review the monitoring checklist
Update runbooks if the incident revealed a gap

Start Here

Getting Started

Web Apps

Developers

APIs

Node operations

SDKs

Papers

Severity levels

P0: Node not signing blocks

P0: Validator tombstoned

P1: Validator jailed (liveness)

P1: Node not syncing (catching_up: true)

P1: Hardware failure — migrate to new host

Post-incident review

​Severity levels

​P0: Node not signing blocks

​P0: Validator tombstoned

​P1: Validator jailed (liveness)

​P1: Node not syncing (catching_up: true)

​P1: Hardware failure — migrate to new host

​Post-incident review

Severity levels

P0: Node not signing blocks

P0: Validator tombstoned

P1: Validator jailed (liveness)

P1: Node not syncing (catching_up: true)

P1: Hardware failure — migrate to new host

Post-incident review