Hermes Agent: The Self-Evolving AI Agent That Learns From Every Conversation — BestGeneralAI Agents

In February 2026, Nous Research open-sourced Hermes Agent. By April it had 85,000 GitHub stars. By June, 187,000. OpenRouter recorded 291 billion tokens consumed through Hermes Agent in a single day in mid-May. Over a trillion tokens that week.

Those numbers reflect a real shift: developers are moving their AI workflows into a persistent agent layer that runs continuously. Not a CLI tool you open and close, but a background process that learns from every interaction.

Here is what Hermes Agent does, why its learning loop matters, how it compares to OpenClaw and Claude Code, and what to know before installing it.

What Hermes Agent actually is

Hermes Agent is an agent operating system: a persistent runtime that sits between you and whatever LLM provider you choose and maintains context, memory, and skills across sessions. It is not a model, and it is not a coding tool.

Most developers first encounter AI tools at Layer 2: coding agents like Claude Code and Codex that wrap a model with codebase awareness and editing capabilities. Hermes Agent lives at Layer 3, the orchestration layer. It coordinates tools, remembers preferences, schedules tasks, and dispatches specialized sub-agents (including Layer 2 coding agents) as worker processes.

Built by the same research lab behind the Hermes family of fine-tuned models, it inherits a deep understanding of model behavior. But the value is architectural, not about inference quality. You point it at any model (Nous Portal, OpenAI, Anthropic, OpenRouter, NVIDIA NIM, DeepSeek, MiniMax, or a custom endpoint) and it handles the rest.

One command switches providers: hermes model. No code changes, no config overhaul, no lock-in.

The learning loop: why Hermes is different

Most AI agents treat every conversation as a blank slate. You describe your project, your preferences, your stack. The agent responds. Next session, you do it again.

Hermes Agent does not do that.

Skills that write themselves

When Hermes completes a complex multi-step task (configuring a deployment pipeline, setting up a monitoring stack, debugging a distributed system issue), it can detect the pattern and generate a reusable skill file. That skill persists as a .md file in your skills directory. Next time you ask for something similar, the skill loads automatically. No re-explaining, no starting from scratch.

Skills also self-improve during use. If a skill turns out to be incomplete (missing an edge case, handling an error poorly), the system refines it the next time it runs. This is not marketing speak about “learning.” It is a filesystem full of markdown procedures that get better the more you use them.

Three-layer memory

The memory architecture has three tiers:

Tier	Storage	What It Holds	Retrieval
Core	MEMORY.md / USER.md (~800 + 500 tokens)	Critical facts, persistent preferences	Always in context
Searchable	SQLite + FTS5 full-text index	Past conversations, decisions, task outcomes	On-demand keyword + semantic search
External	Honcho, Mem0 (pluggable)	Dialectic user modeling, long-term patterns	Provider-specific retrieval

The core tier is always loaded. Small enough to not bloat context, large enough for the essentials. Everything else is reachable but not crammed into every prompt. The system decides when to retrieve, using a combination of FTS5 keyword matching and LLM summarization.

Periodic nudges surface relevant memories at decision points. You do not have to remember to tell the agent about that architectural decision from three weeks ago. It nudges itself.

The agent that remembers you

Under the hood, Hermes uses Honcho for dialectic user modeling: building a representation of how you work, what you prefer, and how you make decisions. You do not fill out a profile. The system infers it from your behavior across sessions.

An agent installed in March knows more about your preferences in June than it did on day one, not because someone programmed it to, but because the architecture was designed for accumulation.

Architecture: provider freedom, anywhere runtime, multi-platform

Run it anywhere

Hermes Agent supports six terminal backends:

Backend	Best For
Local	Development, quick testing
Docker	Sandboxed isolation, reproducible environments
SSH	Remote VPS, dedicated agent machine
Singularity	HPC, academic clusters
Modal	Serverless GPU (hibernates when idle, costs nearly nothing)
Daytona	Serverless dev environments (same hibernation model)

Modal and Daytona are the interesting ones. They let your agent hibernate when idle and wake on demand. A $5/month VPS can host an agent that feels always-on without actually running 24/7 compute.

Any platform you use

The messaging gateway supports 23 platforms: Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Email, SMS, DingTalk, Feishu, WeCom, WeChat, QQ Bot, Home Assistant, Microsoft Teams, Google Chat, ntfy, and more.

You start a conversation on Telegram during your commute, continue it on the CLI at your desk, and check results on Discord. Same agent across all of them. Same memory, same skills, same personality.

Provider pluggability

Hermes Agent is aggressively provider-agnostic. Beyond Nous Portal’s 300+ models with bundled search, image generation, TTS, and browser tooling, you can point it at OpenRouter (200+ models), OpenAI, Anthropic, Google Gemini, NVIDIA NIM (Nemotron family), DeepSeek, Kimi/Moonshot, MiniMax, Hugging Face inference endpoints, or any OpenAI-compatible custom endpoint.

Per-skill model routing means a skill can specify its preferred model. A creative writing skill might use Claude Opus. A code generation skill might use GPT-5.3-Codex. A quick classification task might use a cheap Haiku variant. You configure this once and the agent routes automatically.

v0.16: desktop app, dashboard, and voice

The June 5 release (v0.16.0, “The Surface Release”) added three major pieces.

Hermes Desktop is a native Electron app for macOS, Windows, and Linux. It includes streaming chat with live tool-call visibility, drag-and-drop file upload, clipboard image paste, a side-by-side content preview rail, a built-in file browser, voice mode with ElevenLabs TTS, a Cmd+K command palette, and multi-profile support with concurrent sessions. Each profile can target a different remote host. Your laptop runs the thin GUI while heavy compute happens elsewhere.

The Web Dashboard grew from a session viewer into a full admin panel. You can configure messaging channels, manage API credentials, toggle tools and MCP servers, create webhooks, set up cron schedules, and view 30-day usage analytics with cost tracking. All from a browser. Pluggable authentication supports OIDC and username/password.

Voice Control (“Hermes Jarvis”) lets you talk to the agent and hear responses. It works across desktop, CLI, Telegram, and Discord.

The release also patched CVE-2026-48710, hardened SSRF protection, and stripped credentials from subprocess environments. Eight P0 security fixes in total.

Not competitors: how the stack actually fits together

The most common question about Hermes Agent is “should I use this instead of Claude Code?” The question itself gets the categories wrong.

These tools live at different layers:

Layer	Tool	What It Does
Model	Claude Opus, GPT-5.3, Llama	Raw inference (completes tokens)
Coding Agent	Claude Code, Codex CLI	Codebase awareness, diff editing, PR automation
Agent OS	Hermes Agent, OpenClaw	Persistent context, cross-session memory, multi-channel, scheduling

A coding agent edits files in your repo. An agent OS remembers why you edited them and can schedule follow-up work, notify you on Telegram when CI breaks, and dispatch the coding agent as a subprocess to fix it.

Within Layer 3, Hermes Agent and OpenClaw do compete directly. Here is how they split:

Dimension	Hermes Agent	OpenClaw
Self-learning	Generates skills from repeated patterns	Static skills, manual creation
Memory model	Lean, search-first, context-transparent	Rich layers, prone to context bloat
Multi-agent	Parent + isolated sub-agents	Persistent agent teams with cross-talk
Deployment	VPS/serverless friendly	Local-machine oriented
Multi-channel	23 platforms	22 platforms
Marketplace	Growing (agentskills.io)	Mature (ClawHub, 5,700+ skills)
Migration	Imports OpenClaw config	Closed ecosystem
Model flexibility	Provider-agnostic, per-skill routing	Stable, harder to swap

OpenClaw is a better control plane for managing fleets of agents across channels. Hermes Agent is a better self-improving runtime for personal automation that compounds over time. Neither is universally better. They optimize for different things.

For most individual developers, the deciding factor is the learning loop. If your workflows are repetitive and would benefit from accumulated pattern recognition, Hermes has an architectural advantage that OpenClaw does not currently match. If you need persistent agent teams with inter-agent communication and a mature skill marketplace, OpenClaw is further along.

Getting started in two minutes

curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
source ~/.bashrc
hermes setup --portal    # OAuth login, auto-configures model + tools
hermes                   # start chatting

The --portal flag logs you into Nous Portal via OAuth and configures the model, web search, image generation, TTS, and cloud browser under one subscription. No collecting five separate API keys.

For custom providers:

hermes model              # interactive model picker
hermes tools              # enable/disable specific tools
hermes gateway setup      # configure Telegram, Discord, etc.
hermes gateway start      # launch the messaging gateway

Migrating from OpenClaw:

hermes claw migrate              # full migration
hermes claw migrate --dry-run    # preview first

This imports your SOUL.md, MEMORY.md, USER.md, skills, API keys, messaging config, command allowlist, and TTS assets.

What nobody tells you

The learning loop needs repetition to kick in. A skill does not generate after one conversation. Hermes needs to see a pattern repeated (similar task, similar context, similar outcome) before it crystallizes into a reusable procedure. If you use it for wildly different tasks every day, the self-improving aspect stays dormant. If you have a stable set of recurring workflows, it accelerates fast.

Context management is still the hard problem. Even with three-tier memory and FTS5 retrieval, long-running agents accumulate context. The /compress command exists for a reason. Skill files and memory nudges help keep the active context lean, but they are mitigations, not solutions. Everything in AI agents eventually comes back to context window management.

The community is moving faster than the documentation. With 11,000+ commits, 321 contributors in v0.15 alone, and weekly releases, the official docs lag the actual feature surface. The Discord is effectively required reading if you want to know what the tool can actually do right now.

Free models are real but limited. NVIDIA’s Nemotron family is available through NIM at no cost, and Nous Portal includes free-tier access. These work well for memory consolidation, skill generation, and lightweight tasks. For complex reasoning or long agent loops, you will want a paid model. The architecture makes switching trivial: use free models for background work, paid models for critical paths.

Windows support is native now. The v0.16 PowerShell installer bundles a portable Git Bash (MinGit, ~45MB) with no admin rights required. CLI, gateway, TUI, and tools all run natively on Windows. The only feature that still needs WSL2 is the browser-based dashboard chat pane, which uses a POSIX PTY.

The bottom line

Hermes Agent has the numbers for a reason. A self-improving skill system, three-tier memory, provider agnosticism, and multi-platform reach is a combination no other agent framework currently ships.

It does not replace Claude Code or Codex. It sits above them: a persistent intelligence that remembers why you built something, schedules the work that follows, and gets better at helping you the longer you use it.

Install it, point it at a model, give it a recurring task, and check back in two weeks. An agent that resets every session feels different from one that accumulates over time. The gap is not subtle.