📱 Hardened Human-in-the-Loop (HITL) via SMS
Gobii v2.21.0 brings real human oversight to autonomous agents. Here is why that matters when self-evaluation fails.
The Trust Problem: Self-Evaluation Is Broken
Hermes Agent relies on self-evaluation — the agent decides whether it did a good job. According to the Kilo.ai analysis of 1,300+ Reddit comments, this system always thinks it succeeded, even during catastrophic failures:
"It always thinks it did a good job. ALWAYS. I had it pull water test results from the Indiana DNR site and it jumbled up everything... It thought it kicked ass!"
— u/CustomMerkins4u (+107 upvotes)
This creates a compounding failure loop: errors are encoded into auto-generated skills, degrading performance over time with no human checkpoint.
Gobii v2.21.0: SMS Contact Approval
Gobii v2.21.0 introduces SMS Contact Approval — a hardened HITL mechanism that routes critical action approvals through an out-of-band SMS channel.
🔐 Out-of-Band Security
Approval requests arrive on a separate device via SMS. Even if the agent session is compromised, the attacker cannot approve destructive actions without physical phone access.
✅ Real Accountability
No more "always pass" self-evaluation. A human explicitly approves or denies each gated action. The audit trail records who approved what and when.
📲 Mobile-First Workflow
Approve production deployments, contact outreach, or configuration changes from your phone. No desk required — the agent waits for your SMS reply.
🛡 Compliance Ready
SMS approval logs satisfy SOC 2 and ISO 27001 human-authorization requirements. Self-evaluation by an LLM does not.
Comparison: Hermes Self-Eval vs. Gobii SMS HITL
| Approval Dimension | Hermes Self-Evaluation | Gobii SMS HITL |
|---|---|---|
| Decision Maker | LLM (always approves itself) | Human (SMS reply) |
| Error Detection | Near-zero — always "success" | Human judgment |
| Audit Trail | Auto-generated "pass" log | Timestamped SMS + agent log |
| Channel Security | Same session as agent | Out-of-band SMS device |
| Compliance | Fails SOC 2 / ISO 27001 | Auditable human authorization |
📚 Sources
- Gobii Platform v2.21.0 Release Notes — SMS Contact Approval
- Kilo.ai: OpenClaw vs Hermes Analysis — Self-evaluation design flaw (1,300+ comments)
- r/hermesagent: Self-Sabotaging Plugins — User report of silent integration breakage