The Scaling Cliff

What happens when you go from 1 agent to 10? The answer isn’t linear.

The Experiment: 1 → 5 → 10 Concurrent Agents

We model a realistic workload: each agent runs a research loop (search → scrape → summarize) against 10 URLs, with a 2-second think time between steps. All agents start simultaneously.

Hermes: Local Resource Contention

Running on a single machine (Apple M3 Pro, 36 GB RAM), local Hermes agents compete for shared CPU cores, memory bandwidth, and disk I/O.

Agent CountAvg CPU %RAM UsedAvg Task TimeSuccess Rate
118%2.1 GB42 sec100%
567%9.8 GB68 sec96%
1094%18.4 GB147 sec81%

At 10 agents: CPU saturation causes 3.5× task slowdown. Two agents OOM-killed. Disk I/O contention from concurrent model weights pushes the machine into swap. This is the scaling cliff — a sudden, non-linear degradation where adding more agents makes everything slower and less reliable.

No isolation. No elasticity. You hit a hard wall at ~8–10 agents on consumer hardware.

Gobii: Elastic Cloud Isolation

Each Gobii agent runs in an isolated gVisor-sandboxed pod with dedicated CPU and memory allocation. Adding agents adds pods — not contention.

Agent CountPer-Agent CPUPer-Agent RAMAvg Task TimeSuccess Rate
12 vCPU4 GB38 sec100%
52 vCPU each4 GB each39 sec100%
102 vCPU each4 GB each40 sec99.7%

At 10 agents: Task time remains flat. No shared CPU, no noisy neighbors. Each agent gets its own sandbox with guaranteed resources. The cloud platform handles scheduling, auto-scaling, and health checks automatically.

Linear scaling. Guaranteed isolation. Add agents without fear of the cliff.

Side-by-Side: Task Time at Scale

Agent CountHermes (Local)Gobii (Cloud)Delta
142 sec38 sec+4 sec
568 sec39 sec+29 sec
10147 sec40 sec+107 sec

At 10 agents, Hermes tasks take 3.7× longer than Gobii equivalents. The gap widens with every agent you add.

What About Cost?

A common counterargument: “But I already own the hardware.” True — until you factor in:

The scaling cliff isn’t just about performance — it’s about total cost of ownership when agent workloads grow beyond hobby scale.