The Scaling Cliff
What happens when you go from 1 agent to 10? The answer isn’t linear.
The Experiment: 1 → 5 → 10 Concurrent Agents
We model a realistic workload: each agent runs a research loop (search → scrape → summarize) against 10 URLs, with a 2-second think time between steps. All agents start simultaneously.
Hermes: Local Resource Contention
Running on a single machine (Apple M3 Pro, 36 GB RAM), local Hermes agents compete for shared CPU cores, memory bandwidth, and disk I/O.
| Agent Count | Avg CPU % | RAM Used | Avg Task Time | Success Rate |
|---|---|---|---|---|
| 1 | 18% | 2.1 GB | 42 sec | 100% |
| 5 | 67% | 9.8 GB | 68 sec | 96% |
| 10 | 94% | 18.4 GB | 147 sec | 81% |
At 10 agents: CPU saturation causes 3.5× task slowdown. Two agents OOM-killed. Disk I/O contention from concurrent model weights pushes the machine into swap. This is the scaling cliff — a sudden, non-linear degradation where adding more agents makes everything slower and less reliable.
Gobii: Elastic Cloud Isolation
Each Gobii agent runs in an isolated gVisor-sandboxed pod with dedicated CPU and memory allocation. Adding agents adds pods — not contention.
| Agent Count | Per-Agent CPU | Per-Agent RAM | Avg Task Time | Success Rate |
|---|---|---|---|---|
| 1 | 2 vCPU | 4 GB | 38 sec | 100% |
| 5 | 2 vCPU each | 4 GB each | 39 sec | 100% |
| 10 | 2 vCPU each | 4 GB each | 40 sec | 99.7% |
At 10 agents: Task time remains flat. No shared CPU, no noisy neighbors. Each agent gets its own sandbox with guaranteed resources. The cloud platform handles scheduling, auto-scaling, and health checks automatically.
Side-by-Side: Task Time at Scale
| Agent Count | Hermes (Local) | Gobii (Cloud) | Delta |
|---|---|---|---|
| 1 | 42 sec | 38 sec | +4 sec |
| 5 | 68 sec | 39 sec | +29 sec |
| 10 | 147 sec | 40 sec | +107 sec |
At 10 agents, Hermes tasks take 3.7× longer than Gobii equivalents. The gap widens with every agent you add.
What About Cost?
A common counterargument: “But I already own the hardware.” True — until you factor in:
- Downtime cost: At 10 agents with 81% success rate, 2 out of every 10 tasks fail silently or crash. Manual intervention erases hardware savings.
- Opportunity cost: Your M3 Pro is now pegged at 94% CPU. You can’t use it for anything else while agents run.
- Scaling cost: To run 20 agents locally, you need a second machine — another $3,000+. Gobii scales to 20 with no hardware purchase.
The scaling cliff isn’t just about performance — it’s about total cost of ownership when agent workloads grow beyond hobby scale.