PaperClip plus Hermes Agent and Gemma 4: what a three-layer open-source AI stack looks like

Video: "PaperClip + Hermes Agent + Gemma 4 is INSANE!" by Julian Goldie on YouTube.

What PaperClip actually is

PaperClip is an open-source multi-agent coordinator. It sits above individual agents and manages how they communicate, what tasks they hand off, and how outputs from one agent become inputs for the next. Think of it as the project manager layer — it does not do the work itself, but it decides which agent does what and in what order.

The key distinction from a simple agent chain is autonomy. In a basic chain, you define the sequence manually. PaperClip evaluates the current state of the task and routes work to the appropriate agent dynamically. If a step fails or produces unexpected output, it can reroute rather than halting the whole job.

Where Hermes and Gemma 4 fit in

In Julian's setup, Hermes Agent handles the individual task execution — the actual steps of research, writing, file management, and tool calls. Gemma 4, Google's open-weight model, serves as the intelligence layer running inside Hermes. PaperClip sits above both, managing the higher-level workflow: which brief to work on next, whether the research output is complete enough to pass to the writer, when to trigger a review pass.

All three components are free and run locally. There is no OpenAI, Anthropic, or Google API key in the stack. Your data stays on your machine, and the marginal cost per run is electricity.

What the stack can handle in practice

The demo showed a content production workflow: a brief comes in, PaperClip assigns a research agent to gather sources, passes the research to a writing agent, then routes the draft to a review agent before writing the final file. The whole thing ran without manual intervention between steps. That is a genuine shift from what most agent setups look like in practice, where you are still nudging things along at each handoff.

The same coordination model applies to technical workflows — code review pipelines, data processing sequences, report generation chains. Anything with a predictable sequence of specialist steps is a candidate for this kind of setup.

The real overhead: configuration and reliability

Three-layer stacks are not plug-and-play. PaperClip, Hermes, and Gemma 4 each have their own configuration requirements, and getting the handoffs between them reliable takes real setup time. If any layer misbehaves — a model output that does not match the expected format, a tool call that returns an error — the coordination layer needs to handle it gracefully rather than silently producing bad output.

Gemma 4, like other models in its class, is capable but not at the frontier on complex reasoning. For straightforward workflows it performs well. For tasks requiring nuanced judgement — tone decisions in writing, architecture decisions in code — you may find you need to either accept more variability or swap in a stronger model where it matters most.

Where this connects to NordSys

Building a multi-agent stack that actually works for a real business workflow is a different problem from getting a demo to run. The configuration, error handling, model selection, and integration with existing tools all need thinking through. We help businesses design and deploy multi-agent setups that fit their actual operations — not just the tutorial version. Our AI Agents service covers the full build.

See our AI Agents service →