Video: "Build an AI Team: Hermes Agent Swarm Mode!" by Julian Goldie on YouTube.

Swarm versus sequential: what the difference actually is

Most single-agent setups work in sequence. One agent does the keyword research, then it writes the brief, then it drafts the page, then it checks the output. That is still faster than doing it yourself, but it is linear — each step waits for the previous one to finish.

Swarm mode breaks the dependency. You define multiple agent profiles, each with its own system prompt, toolset, and responsibility. A research agent and a competitor analysis agent can run at the same time, both feeding their outputs into a shared workspace. The writer agent picks up both outputs and starts drafting while the reviewer is already scanning earlier sections. The job finishes significantly faster than sequential processing.

How you configure a swarm

Each agent in a Hermes swarm is defined by a profile — essentially a named configuration that sets what model the agent uses, what skills it can access, and what its primary objective is. You create these in the Hermes configuration, then tell the swarm launcher which profiles to start and with what initial context.

Profiles share a workspace directory. One agent writes output files; another reads them as context before it starts. The coordination is loose — there is no built-in arbitration if two agents try to write the same file — so task scoping at the profile level is important. Well-defined profiles that own distinct parts of the work produce clean results. Overlapping profiles produce confused output.

What it actually works well for

Content production at scale is the most obvious application. A swarm that simultaneously researches competitors, identifies keyword gaps, and drafts an outline cuts the front-end time on a content project substantially. The same logic applies to technical audits: one agent checks page speed, one checks structured data, one checks internal links — all running in parallel and writing their findings to a shared report file.

Anything with a clear division of labour between specialist tasks benefits from swarms. Anything where the tasks are tightly dependent — where the second step cannot start without the exact output of the first — is still better handled sequentially, or with the Kanban board approach covered in an earlier article.

The cost and the limits to consider

Running five agents at once means five simultaneous calls to your model API. If you are using a paid provider, the bill scales accordingly. This makes swarm mode most attractive when you are running local open-source models — you pay for compute, not per-token — or when the time saving on a large project genuinely justifies the API spend.

Profile design takes real thought. A vague profile that says "improve the content" will produce inconsistent results when running alongside three other agents with equally vague briefs. The discipline of writing precise, bounded agent profiles is the bulk of the actual work.

Where this connects to NordSys

Building a working swarm is not difficult once you understand the model; it is tedious and error-prone if you are doing it for the first time. Profile design, workspace structure, model selection, and cost management all need thinking through before a production run. We help businesses design and deploy multi-agent setups that fit their actual workflows. Our AI Agents service covers the full build.

See our AI Agents service →