Observability: Local vs Cloud
When your agent fails at 2 AM, what do you actually see?
Scenario: Rate-Limit Failure at 2:07 AM
Your agent is processing a batch of 50 research queries overnight. At 2:07 AM, the third-party API starts returning 429 Too Many Requests. The agent should back off and retry — but instead it burns through retries and halts with an unhandled exception.
Now the question: how long does it take to figure out what happened?
Hermes Agent: The Terminal Detective Work
With a locally-running Hermes agent, your observability surface is whatever the terminal printed — and whatever log files you remembered to configure.
- SSH into the machine at 8 AM when you wake up and see the failed Slack notification.
- Scroll through raw terminal output — thousands of lines of mixed stdout/stderr. The 429 errors are buried between routine status lines.
- Grep the log file:
grep "429" ~/.hermes/logs/agent.log | tail -20. You find the rate-limit hits, but not the context — which query triggered it? What was the retry strategy? - Cross-reference timestamps manually to reconstruct the sequence. Total time to root cause: ~25 minutes.
Gobii: Structured Observability by Default
Every Gobii agent run produces a structured trace — timestamped, filterable, and queryable from the cloud dashboard. No SSH required.
- Open the Gobii dashboard from your phone at 2:10 AM when the alert fires. (Yes, Gobii has alerting.)
- Filter the trace view by error severity. The 429 responses are highlighted in amber, with the exact tool call, payload, and response code.
- See the full causal chain: which query triggered the rate limit, how many retries were attempted, and the backoff intervals — all in one expandable tree.
- Adjust the retry policy from the dashboard and re-run. Total time to root cause: ~3 minutes.
Observability Comparison
| Capability | Hermes (Local) | Gobii (Cloud) |
|---|---|---|
| Structured Tracing | Raw terminal output only | Full trace tree per run |
| Real-Time Alerting | Manual (check logs) | Configurable webhooks + email |
| Historical Search | Grep text files | Filterable dashboard with date ranges |
| Multi-Agent View | One terminal per agent | Unified dashboard for all agents |
| Remote Access | SSH + VPN required | Browser or mobile, anywhere |
| Error Aggregation | Manual correlation | Auto-grouped by error type |
Why This Matters for Production
Development-time debugging is one thing. But when you’re running agents in production — especially overnight or across time zones — the observability gap becomes a business continuity risk.
Hermes’s local-first model puts the burden of observability entirely on you: configure logging, set up log shipping, build dashboards, wire up alerting. Gobii provides all of this as part of the platform, because cloud-native agents need cloud-native observability.