Hermes Agent + Comfy UI: what changes when your AI agent can generate images and video

Video: "Hermes + Comfy UI is Insane (FREE)!" by Julian Goldie on YouTube.

What Comfy UI is

Comfy UI is an open-source tool for building AI image, video, and audio generation workflows. Rather than a chat interface, it presents as a node-based graph editor — you connect individual processing steps (text conditioning, image sampling, upscaling, video encoding, and so on) into a visual pipeline that runs from left to right.

It became widely used in the Stable Diffusion community because it gives you precise control over every stage of the generation process. Want to run a specific checkpoint, apply a LoRA at a particular strength, pass the result through a custom upscaler, and then export as a JPEG at exactly 1:1 ratio? That is the kind of workflow Comfy UI is designed for. The tradeoff is that the learning curve is steep — you need to understand at least some of how diffusion models work to build useful pipelines.

The tool runs locally and is free. You host it on your own machine, which means your generated assets do not pass through any external service. For a business with any sensitivity around imagery — product photography, branded visuals, anything that should not leak — that matters.

What changes when Hermes connects to it

Comfy UI exposes an API that other tools can call. Hermes Agent can now use this API as a tool in its workflow, which means you can describe what you want in plain language and Hermes will construct the correct Comfy UI request, run the pipeline, and return the output.

In practice this looks something like: you ask Hermes to "generate three product images with a plain white background and consistent lighting" and it handles the Comfy UI side — selecting the workflow, injecting your text prompt, running the generation, and saving the results to a folder. You do not need to open Comfy UI or understand the node graph to get a result.

This is the significant shift. Comfy UI on its own requires you to learn its interface and maintain your own workflows. Hermes acting as an intermediary means a less technical user can access the same generation capabilities through a natural language request, without touching the graph editor.

What it is genuinely useful for

The most immediate use case is routine image generation for content: blog post headers, social media thumbnails, product variations, background removal, style-consistent batch processing. These are tasks that currently go to a designer or a paid image generation API — this stack does them locally, at no per-image cost, once the setup is done.

For a small business running regular content, the economics are reasonable. The setup cost (time and some hardware) is real, but once it is working, generating 50 images costs nothing beyond electricity. That is a different model from paying $0.04 per image at scale.

Video generation is included via Comfy UI's video nodes, though the hardware requirements are higher and generation times longer. Short clips — product animations, looping social assets — are feasible if you have a capable graphics card. Full-length video is not in scope.

What is still difficult

This is a technically involved setup. You need Hermes running correctly, Comfy UI installed and its API server enabled, the two configured to communicate, and a Comfy UI workflow that does what you want. Each of those steps has its own failure modes. Julian Goldie showed it working smoothly in the video, which is how demonstrations tend to go — the troubleshooting phase is not the interesting part to film.

Model quality matters a great deal. Comfy UI generates nothing on its own — it needs AI models (checkpoints, LoRAs, VAEs) that you download and install separately. The results you get depend entirely on which models you run and how you have configured them. Good results for product photography require different models and settings than good results for abstract illustration.

To be fair: this is still meaningfully easier than building a Comfy UI pipeline from scratch. Having Hermes handle the API calls removes one layer of technical overhead, and for someone already running Hermes, adding Comfy UI is an extension rather than a separate project.

The broader pattern

Each Hermes update over the past few months has added another capability layer: memory, multi-agent communication, dashboards, observability, and now creative media generation. What is emerging is less a single agent and more a modular system where you add the tools relevant to your workflow.

That is both the appeal and the complexity. A business that just wants to run a few AI tasks does not need all of these layers. But for a team already invested in the Hermes stack, adding Comfy UI is a straightforward extension that meaningfully expands what the agent can produce without switching to a different system or paying for another API.

Where this connects to NordSys

Setting up a Hermes Agent workflow that includes Comfy UI for image generation — or any other creative media tool — is part of our AI agents service. We handle the installation, configuration, and workflow design, then show you how to run it for your specific use cases, whether that is product imagery, content thumbnails, or batch-processing for a content team. If you want a working local AI creative stack without spending weeks on setup, that is what we are here for.

See our AI agents service →