Three-layer monitoring architecture: because process alive does not mean healthy, and healthy does not mean correct.
Process-level monitoring by the operating system. If openclaw-gateway crashes, systemd automatically restarts it. This catches catastrophic process death but cannot detect application-level issues.
Detects
Process crash, OOM kill, service failure
Limitation
Process alive ≠ healthy. Gateway can be running but unresponsive.
Lightweight health ping that runs independently of the gateway process. Verifies the gateway is not just alive but actually responding to HTTP requests. Runs via cron — survives gateway death.
Detects
Unresponsive gateway, hung process, failed HTTP health endpoint
Limitation
Only checks availability, not correctness of data or configuration.
Mighty Mark's 138+ checks across 13 categories. Deep validation of gateway config, agent state, API connectivity, data integrity, fleet communication, security posture, and temporal systems.
Detects
Configuration drift, API failures, audit chain corruption, stale data, security violations
Limitation
Slower to run (minutes, not seconds). Scheduled daily + on-demand.
When problems are detected, Mighty Mark classifies them and emits structured JSON recommendations. A separate executor (fleet-heal) picks up FIXABLE items and performs the actual remediation—preserving the sentinel's read-only constraint.
| Classification | Description | Example |
|---|---|---|
| FIXABLE | Mighty Mark knows the exact remediation steps. fleet-heal can execute automatically. | Gateway process dead → restart via systemctl |
| BROKEN | Problem detected but remediation requires investigation. Human review recommended. | Audit chain hash mismatch → data corruption investigation |
| MANUAL | Requires human intervention. Cannot be automated safely. | Expired API key → credential rotation by operator |
Every check run produces a JSONL append-only incident log. Each record includes the fleet-bus auditHash so incident investigation can cross-reference the cryptographic audit trail—connecting operational incidents to tamper-proof evidence.
{
"timestamp": "2026-04-03T06:00:00Z",
"category": "gateway",
"passed": 9,
"failed": 1,
"warned": 0,
"classification": "FIXABLE",
"recommendation": "Gateway process unresponsive — restart via systemctl",
"auditHash": "a3f2c1...b8e9d4"
}mighty-mark triage performs version-indexed breaking change detection before an OpenClaw upgrade. It compares your current configuration against known breaking changes in the target version, flagging items that need attention before you upgrade.
# mighty-mark v0.7.x
# Start watchdog mode (lightweight health ping loop)
mighty-mark watchdog
# Run heal detection and emit recommendations
mighty-mark heal
# View incident log
mighty-mark incidents
# Run upgrade triage for target version
mighty-mark triage
# Install watchdog as system service
mighty-mark install-watchdog