The Experiment: 1 → 5 → 10 Concurrent Agents

We model a realistic workload: each agent runs a research loop (search → scrape → summarize) against 10 URLs, with a 2-second think time between steps. All agents start simultaneously.

Hermes: Local Resource Contention

Running on a single machine (Apple M3 Pro, 36 GB RAM), local Hermes agents compete for shared CPU cores, memory bandwidth, and disk I/O.

Agent Count	Avg CPU %	RAM Used	Avg Task Time	Success Rate
1	18%	2.1 GB	42 sec	100%
5	67%	9.8 GB	68 sec	96%
10	94%	18.4 GB	147 sec	81%

At 10 agents: CPU saturation causes 3.5× task slowdown. Two agents OOM-killed. Disk I/O contention from concurrent model weights pushes the machine into swap. This is the scaling cliff — a sudden, non-linear degradation where adding more agents makes everything slower and less reliable.

No isolation. No elasticity. You hit a hard wall at ~8–10 agents on consumer hardware.

Gobii: Elastic Cloud Isolation

Each Gobii agent runs in an isolated gVisor-sandboxed pod with dedicated CPU and memory allocation. Adding agents adds pods — not contention.

Agent Count	Per-Agent CPU	Per-Agent RAM	Avg Task Time	Success Rate
1	2 vCPU	4 GB	38 sec	100%
5	2 vCPU each	4 GB each	39 sec	100%
10	2 vCPU each	4 GB each	40 sec	99.7%

At 10 agents: Task time remains flat. No shared CPU, no noisy neighbors. Each agent gets its own sandbox with guaranteed resources. The cloud platform handles scheduling, auto-scaling, and health checks automatically.

Linear scaling. Guaranteed isolation. Add agents without fear of the cliff.

Side-by-Side: Task Time at Scale

Agent Count	Hermes (Local)	Gobii (Cloud)	Delta
1	42 sec	38 sec	+4 sec
5	68 sec	39 sec	+29 sec
10	147 sec	40 sec	+107 sec

At 10 agents, Hermes tasks take 3.7× longer than Gobii equivalents. The gap widens with every agent you add.

What About Cost?

A common counterargument: “But I already own the hardware.” True — until you factor in:

Downtime cost: At 10 agents with 81% success rate, 2 out of every 10 tasks fail silently or crash. Manual intervention erases hardware savings.
Opportunity cost: Your M3 Pro is now pegged at 94% CPU. You can’t use it for anything else while agents run.
Scaling cost: To run 20 agents locally, you need a second machine — another $3,000+. Gobii scales to 20 with no hardware purchase.

The scaling cliff isn’t just about performance — it’s about total cost of ownership when agent workloads grow beyond hobby scale.

The Scaling Cliff

The Experiment: 1 → 5 → 10 Concurrent Agents

Hermes: Local Resource Contention

Gobii: Elastic Cloud Isolation

Side-by-Side: Task Time at Scale

What About Cost?