Hermes Agent: The Self-Evolving AI Agent That Learns From Every Conversation

In February 2026, Nous Research open-sourced Hermes Agent. By April it had 85,000 GitHub stars. By June, 187,000. OpenRouter recorded 291 billion tokens consumed through Hermes Agent in a single day in mid-May. Over a trillion tokens that week.
Those numbers reflect a real shift: developers are moving their AI workflows into a persistent agent layer that runs continuously. Not a CLI tool you open and close, but a background process that learns from every interaction.
Here is what Hermes Agent does, why its learning loop matters, how it compares to OpenClaw and Claude Code, and what to know before installing it.
What Hermes Agent actually is
Hermes Agent is an agent operating system: a persistent runtime that sits between you and whatever LLM provider you choose and maintains context, memory, and skills across sessions. It is not a model, and it is not a coding tool.
Most developers first encounter AI tools at Layer 2: coding agents like Claude Code and Codex that wrap a model with codebase awareness and editing capabilities. Hermes Agent lives at Layer 3, the orchestration layer. It coordinates tools, remembers preferences, schedules tasks, and dispatches specialized sub-agents (including Layer 2 coding agents) as worker processes.
Built by the same research lab behind the Hermes family of fine-tuned models, it inherits a deep understanding of model behavior. But the value is architectural, not about inference quality. You point it at any model (Nous Portal, OpenAI, Anthropic, OpenRouter, NVIDIA NIM, DeepSeek, MiniMax, or a custom endpoint) and it handles the rest.
One command switches providers: hermes model. No code changes, no config overhaul, no lock-in.
The learning loop: why Hermes is different
Most AI agents treat every conversation as a blank slate. You describe your project, your preferences, your stack. The agent responds. Next session, you do it again.
Hermes Agent does not do that.
Skills that write themselves
When Hermes completes a complex multi-step task (configuring a deployment pipeline, setting up a monitoring stack, debugging a distributed system issue), it can detect the pattern and generate a reusable skill file. That skill persists as a .md file in your skills directory. Next time you ask for something similar, the skill loads automatically. No re-explaining, no starting from scratch.
Skills also self-improve during use. If a skill turns out to be incomplete (missing an edge case, handling an error poorly), the system refines it the next time it runs. This is not marketing speak about “learning.” It is a filesystem full of markdown procedures that get better the more you use them.
Three-layer memory
The memory architecture has three tiers:
| Tier | Storage | What It Holds | Retrieval |
|---|---|---|---|
| Core | MEMORY.md / USER.md (~800 + 500 tokens) | Critical facts, persistent preferences | Always in context |
| Searchable | SQLite + FTS5 full-text index | Past conversations, decisions, task outcomes | On-demand keyword + semantic search |
| External | Honcho, Mem0 (pluggable) | Dialectic user modeling, long-term patterns | Provider-specific retrieval |
The core tier is always loaded. Small enough to not bloat context, large enough for the essentials. Everything else is reachable but not crammed into every prompt. The system decides when to retrieve, using a combination of FTS5 keyword matching and LLM summarization.
Periodic nudges surface relevant memories at decision points. You do not have to remember to tell the agent about that architectural decision from three weeks ago. It nudges itself.
The agent that remembers you
Under the hood, Hermes uses Honcho for dialectic user modeling: building a representation of how you work, what you prefer, and how you make decisions. You do not fill out a profile. The system infers it from your behavior across sessions.
An agent installed in March knows more about your preferences in June than it did on day one, not because someone programmed it to, but because the architecture was designed for accumulation.
Architecture: provider freedom, anywhere runtime, multi-platform
Run it anywhere
Hermes Agent supports six terminal backends:
| Backend | Best For |
|---|---|
| Local | Development, quick testing |
| Docker | Sandboxed isolation, reproducible environments |
| SSH | Remote VPS, dedicated agent machine |
| Singularity | HPC, academic clusters |
| Modal | Serverless GPU (hibernates when idle, costs nearly nothing) |
| Daytona | Serverless dev environments (same hibernation model) |
Modal and Daytona are the interesting ones. They let your agent hibernate when idle and wake on demand. A $5/month VPS can host an agent that feels always-on without actually running 24/7 compute.
Any platform you use
The messaging gateway supports 23 platforms: Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Email, SMS, DingTalk, Feishu, WeCom, WeChat, QQ Bot, Home Assistant, Microsoft Teams, Google Chat, ntfy, and more.
You start a conversation on Telegram during your commute, continue it on the CLI at your desk, and check results on Discord. Same agent across all of them. Same memory, same skills, same personality.
Provider pluggability
Hermes Agent is aggressively provider-agnostic. Beyond Nous Portal’s 300+ models with bundled search, image generation, TTS, and browser tooling, you can point it at OpenRouter (200+ models), OpenAI, Anthropic, Google Gemini, NVIDIA NIM (Nemotron family), DeepSeek, Kimi/Moonshot, MiniMax, Hugging Face inference endpoints, or any OpenAI-compatible custom endpoint.
Per-skill model routing means a skill can specify its preferred model. A creative writing skill might use Claude Opus. A code generation skill might use GPT-5.3-Codex. A quick classification task might use a cheap Haiku variant. You configure this once and the agent routes automatically.
v0.16: desktop app, dashboard, and voice
The June 5 release (v0.16.0, “The Surface Release”) added three major pieces.
Hermes Desktop is a native Electron app for macOS, Windows, and Linux. It includes streaming chat with live tool-call visibility, drag-and-drop file upload, clipboard image paste, a side-by-side content preview rail, a built-in file browser, voice mode with ElevenLabs TTS, a Cmd+K command palette, and multi-profile support with concurrent sessions. Each profile can target a different remote host. Your laptop runs the thin GUI while heavy compute happens elsewhere.
The Web Dashboard grew from a session viewer into a full admin panel. You can configure messaging channels, manage API credentials, toggle tools and MCP servers, create webhooks, set up cron schedules, and view 30-day usage analytics with cost tracking. All from a browser. Pluggable authentication supports OIDC and username/password.
Voice Control (“Hermes Jarvis”) lets you talk to the agent and hear responses. It works across desktop, CLI, Telegram, and Discord.
The release also patched CVE-2026-48710, hardened SSRF protection, and stripped credentials from subprocess environments. Eight P0 security fixes in total.
Not competitors: how the stack actually fits together
The most common question about Hermes Agent is “should I use this instead of Claude Code?” The question itself gets the categories wrong.
These tools live at different layers:
| Layer | Tool | What It Does |
|---|---|---|
| Model | Claude Opus, GPT-5.3, Llama | Raw inference (completes tokens) |
| Coding Agent | Claude Code, Codex CLI | Codebase awareness, diff editing, PR automation |
| Agent OS | Hermes Agent, OpenClaw | Persistent context, cross-session memory, multi-channel, scheduling |
A coding agent edits files in your repo. An agent OS remembers why you edited them and can schedule follow-up work, notify you on Telegram when CI breaks, and dispatch the coding agent as a subprocess to fix it.
Within Layer 3, Hermes Agent and OpenClaw do compete directly. Here is how they split:
| Dimension | Hermes Agent | OpenClaw |
|---|---|---|
| Self-learning | Generates skills from repeated patterns | Static skills, manual creation |
| Memory model | Lean, search-first, context-transparent | Rich layers, prone to context bloat |
| Multi-agent | Parent + isolated sub-agents | Persistent agent teams with cross-talk |
| Deployment | VPS/serverless friendly | Local-machine oriented |
| Multi-channel | 23 platforms | 22 platforms |
| Marketplace | Growing (agentskills.io) | Mature (ClawHub, 5,700+ skills) |
| Migration | Imports OpenClaw config | Closed ecosystem |
| Model flexibility | Provider-agnostic, per-skill routing | Stable, harder to swap |
OpenClaw is a better control plane for managing fleets of agents across channels. Hermes Agent is a better self-improving runtime for personal automation that compounds over time. Neither is universally better. They optimize for different things.
For most individual developers, the deciding factor is the learning loop. If your workflows are repetitive and would benefit from accumulated pattern recognition, Hermes has an architectural advantage that OpenClaw does not currently match. If you need persistent agent teams with inter-agent communication and a mature skill marketplace, OpenClaw is further along.
Getting started in two minutes
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
source ~/.bashrc
hermes setup --portal # OAuth login, auto-configures model + tools
hermes # start chatting
The --portal flag logs you into Nous Portal via OAuth and configures the model, web search, image generation, TTS, and cloud browser under one subscription. No collecting five separate API keys.
For custom providers:
hermes model # interactive model picker
hermes tools # enable/disable specific tools
hermes gateway setup # configure Telegram, Discord, etc.
hermes gateway start # launch the messaging gateway
Migrating from OpenClaw:
hermes claw migrate # full migration
hermes claw migrate --dry-run # preview first
This imports your SOUL.md, MEMORY.md, USER.md, skills, API keys, messaging config, command allowlist, and TTS assets.
What nobody tells you
The learning loop needs repetition to kick in. A skill does not generate after one conversation. Hermes needs to see a pattern repeated (similar task, similar context, similar outcome) before it crystallizes into a reusable procedure. If you use it for wildly different tasks every day, the self-improving aspect stays dormant. If you have a stable set of recurring workflows, it accelerates fast.
Context management is still the hard problem. Even with three-tier memory and FTS5 retrieval, long-running agents accumulate context. The /compress command exists for a reason. Skill files and memory nudges help keep the active context lean, but they are mitigations, not solutions. Everything in AI agents eventually comes back to context window management.
The community is moving faster than the documentation. With 11,000+ commits, 321 contributors in v0.15 alone, and weekly releases, the official docs lag the actual feature surface. The Discord is effectively required reading if you want to know what the tool can actually do right now.
Free models are real but limited. NVIDIA’s Nemotron family is available through NIM at no cost, and Nous Portal includes free-tier access. These work well for memory consolidation, skill generation, and lightweight tasks. For complex reasoning or long agent loops, you will want a paid model. The architecture makes switching trivial: use free models for background work, paid models for critical paths.
Windows support is native now. The v0.16 PowerShell installer bundles a portable Git Bash (MinGit, ~45MB) with no admin rights required. CLI, gateway, TUI, and tools all run natively on Windows. The only feature that still needs WSL2 is the browser-based dashboard chat pane, which uses a POSIX PTY.
The bottom line
Hermes Agent has the numbers for a reason. A self-improving skill system, three-tier memory, provider agnosticism, and multi-platform reach is a combination no other agent framework currently ships.
It does not replace Claude Code or Codex. It sits above them: a persistent intelligence that remembers why you built something, schedules the work that follows, and gets better at helping you the longer you use it.
Install it, point it at a model, give it a recurring task, and check back in two weeks. An agent that resets every session feels different from one that accumulates over time. The gap is not subtle.