配置

所有设置都保存在 ~/.hermes/ 目录中，便于统一管理。

Directory Structure

~/.hermes/
├── config.yaml     # Settings (model, terminal, TTS, compression, etc.)
├── .env            # API keys and secrets
├── auth.json       # OAuth provider credentials (Nous Portal, etc.)
├── SOUL.md         # Primary agent identity (slot #1 in system prompt)
├── memories/       # Persistent memory (MEMORY.md, USER.md)
├── skills/         # Agent-created skills (managed via skill_manage tool)
├── cron/           # Scheduled jobs
├── sessions/       # Gateway sessions
└── logs/           # Logs (errors.log, gateway.log — secrets auto-redacted)

Managing Configuration

hermes config              # View current configuration
hermes config edit         # Open config.yaml in your editor
hermes config set KEY VAL  # Set a specific value
hermes config check        # Check for missing options (after updates)
hermes config migrate      # Interactively add missing options

# Examples:
hermes config set model anthropic/claude-opus-4
hermes config set terminal.backend docker
hermes config set OPENROUTER_API_KEY sk-or-...  # Saves to .env

提示

hermes config set 会自动把值写入正确的位置：API 密钥会写进 .env，其余设置写进 config.yaml。

Configuration Precedence

配置优先级从高到低如下：

CLI 参数 - 例如 hermes chat --model anthropic/claude-sonnet-4，只覆盖当前调用
~/.hermes/config.yaml - 所有非机密设置的主配置文件
~/.hermes/.env - 环境变量兜底；对机密（API 密钥、token、密码）来说是必需项
内置默认值 - 当没有任何其他设置时使用的安全默认配置

Rule of Thumb

机密信息（API 密钥、bot token、密码）放在 .env。其他内容（模型、终端后端、压缩设置、记忆限制、工具集）放在 config.yaml。对于非机密设置，如果两边都设置了，以 config.yaml 为准。

Environment Variable Substitution

你可以在 config.yaml 中用 ${VAR_NAME} 语法引用环境变量：

auxiliary:
  vision:
    api_key: ${GOOGLE_API_KEY}
    base_url: ${CUSTOM_VISION_URL}

delegation:
  api_key: ${DELEGATION_KEY}

一个值中也可以出现多个引用，例如：url: "${HOST}:${PORT}"。如果某个变量未设置，占位符会原样保留（${UNDEFINED_VAR} 仍然保持原样）。仅支持 ${VAR} 语法，不会展开裸写的 $VAR。

有关 AI 大模型提供商（provider）配置（OpenRouter、Anthropic、Copilot、自定义端点、自托管 LLM、fallback model 等），请见 AI Providers。

Provider Timeouts

你可以使用 providers.<id>.request_timeout_seconds 为某个大模型提供商（provider）设置统一请求超时，也可以用 providers.<id>.models.<model>.timeout_seconds 对单个模型覆盖。这个设置会作用于所有传输方式上的主轮次客户端（OpenAI-wire、原生 Anthropic、Anthropic-compatible）、fallback 链、凭据轮换后的重建流程，以及 OpenAI-wire 的逐请求 timeout 参数，因此其优先级高于旧的 HERMES_API_TIMEOUT 环境变量。

你也可以设置 providers.<id>.stale_timeout_seconds 来控制非流式 stale-call detector，或使用 providers.<id>.models.<model>.stale_timeout_seconds 对单个模型覆盖。它的优先级高于旧的 HERMES_API_CALL_STALE_TIMEOUT。

如果你不设置这些值，将继续使用旧默认值（HERMES_API_TIMEOUT=1800s、HERMES_API_CALL_STALE_TIMEOUT=300s、原生 Anthropic 为 900s）。目前 AWS Bedrock 尚未接入这套逻辑（bedrock_converse 和 AnthropicBedrock SDK 路径都依赖 boto3 自身的超时配置）。示例见 cli-config.yaml.example 中的注释。

Terminal Backend Configuration

Hermes 支持 6 种终端后端。它们决定了 agent 的 shell 命令究竟在哪里执行：本地机器、Docker 容器、通过 SSH 访问的远程服务器、Modal 云沙箱、Daytona 工作区，或 Singularity / Apptainer 容器。

terminal:
  backend: local    # local | docker | ssh | modal | daytona | singularity
  cwd: "."          # Working directory ("." = current dir for local, "/root" for containers)
  timeout: 180      # Per-command timeout in seconds
  env_passthrough: []  # Env var names to forward to sandboxed execution (terminal + execute_code)
  singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20"  # Container image for Singularity backend
  modal_image: "nikolaik/python-nodejs:python3.11-nodejs20"                 # Container image for Modal backend
  daytona_image: "nikolaik/python-nodejs:python3.11-nodejs20"               # Container image for Daytona backend

对于 Modal 和 Daytona 这类云沙箱，container_persistent: true 只表示 Hermes 会尽量保留文件系统状态，不保证还是同一个活着的沙箱、PID 空间或后台进程。

Backend Overview

Backend	Where commands run	Isolation	Best for
local	直接在你的机器上运行	无	开发、个人使用
docker	Docker 容器	完整（namespaces、cap-drop）	安全沙箱、CI/CD
ssh	远程服务器	网络边界	远程开发、强算力机器
modal	Modal 云沙箱	完整（云 VM）	短生命周期云计算、评测
daytona	Daytona 工作区	完整（云容器）	托管式云开发环境
singularity	Singularity / Apptainer 容器	namespaces（`--containall`）	HPC 集群、共享机器

Local Backend

这是默认后端。命令会直接在本机运行，不做隔离，也不需要额外配置。

terminal:
  backend: local

注意

agent 拥有与你当前用户账户相同的文件系统访问权限。你可以通过 hermes tools 禁用不想开放的工具，或者切换到 Docker 获得沙箱隔离。

Docker Backend

命令会在 Docker 容器中运行，并带有安全加固（移除所有 capability、禁止提权、限制 PID 数量）。

terminal:
  backend: docker
  docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
  docker_mount_cwd_to_workspace: false  # Mount launch dir into /workspace
  docker_forward_env:              # Env vars to forward into container
    - "GITHUB_TOKEN"
  docker_volumes:                  # Host directory mounts
    - "/home/user/projects:/workspace/projects"
    - "/home/user/data:/data:ro"   # :ro for read-only

  # Resource limits
  container_cpu: 1                 # CPU cores (0 = unlimited)
  container_memory: 5120           # MB (0 = unlimited)
  container_disk: 51200            # MB (requires overlay2 on XFS+pquota)
  container_persistent: true       # Persist /workspace and /root across sessions

要求： 需要安装并运行 Docker Desktop 或 Docker Engine。Hermes 会在 $PATH 以及常见 macOS 安装路径（/usr/local/bin/docker、/opt/homebrew/bin/docker、Docker Desktop app bundle）中寻找 Docker。

容器生命周期： 每个会话都会启动一个长生命周期容器（docker run -d ... sleep 2h），后续命令通过 docker exec 和 login shell 执行；清理时容器会被停止并移除。

安全加固：

--cap-drop ALL，只恢复 DAC_OVERRIDE、CHOWN、FOWNER
--security-opt no-new-privileges
--pids-limit 256
对 /tmp（512MB）、/var/tmp（256MB）和 /run（64MB）使用限额 tmpfs

凭据转发： docker_forward_env 中列出的环境变量会优先从当前 shell 获取，如果没有，再从 ~/.hermes/.env 中读取。技能也可以声明 required_environment_variables，它们会自动合并进来。

SSH Backend

通过 SSH 在远程服务器上运行命令。它使用 ControlMaster 复用连接（空闲 5 分钟保活），并默认启用持久 shell，因此工作目录和环境变量等状态能跨命令保留。

terminal:
  backend: ssh
  persistent_shell: true           # Keep a long-lived bash session (default: true)

必需环境变量：

TERMINAL_SSH_HOST=my-server.example.com
TERMINAL_SSH_USER=ubuntu

可选：

Variable	Default	Description
`TERMINAL_SSH_PORT`	`22`	SSH 端口
`TERMINAL_SSH_KEY`	(system default)	SSH 私钥路径
`TERMINAL_SSH_PERSISTENT`	`true`	是否启用持久 shell

工作方式： 初始化时使用 BatchMode=yes 和 StrictHostKeyChecking=accept-new 建立连接。持久 shell 会在远程主机上维持一个长期存在的 bash -l 进程，并通过临时文件通信。需要 stdin_data 或 sudo 的命令会自动退回到 one-shot 模式。

在 Modal 云沙箱中运行命令。每个任务都有独立 VM，CPU、内存、磁盘都可配置；文件系统可在会话间快照与恢复。

terminal:
  backend: modal
  container_cpu: 1                 # CPU cores
  container_memory: 5120           # MB (5GB)
  container_disk: 51200            # MB (50GB)
  container_persistent: true       # Snapshot/restore filesystem

要求： 需要 MODAL_TOKEN_ID 和 MODAL_TOKEN_SECRET 环境变量，或者存在 ~/.modal.toml 配置文件。

持久化： 启用后，沙箱文件系统会在清理时保存快照，并在下一次会话恢复。快照信息保存在 ~/.hermes/modal_snapshots.json。保留的是文件系统状态，而不是活进程、PID 空间或后台任务。

凭据文件： 会从 ~/.hermes/ 中自动挂载并在每次命令前同步（如 OAuth token 等）。

Daytona Backend

在 Daytona 托管工作区中运行命令，支持通过 stop / resume 维持持久化。

terminal:
  backend: daytona
  container_cpu: 1                 # CPU cores
  container_memory: 5120           # MB → converted to GiB
  container_disk: 10240            # MB → converted to GiB (max 10 GiB)
  container_persistent: true       # Stop/resume instead of delete

要求： 需要 DAYTONA_API_KEY 环境变量。

持久化： 启用后，清理时会停止沙箱而不是删除，下次会话直接恢复。沙箱名称格式为 hermes-{task_id}。

磁盘限制： Daytona 最多只允许 10 GiB。超过上限的请求会被截断，并给出警告。

Singularity/Apptainer Backend

在 Singularity/Apptainer 容器中运行命令，适合 Docker 不可用的 HPC 集群或共享机器。

terminal:
  backend: singularity
  singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20"
  container_cpu: 1                 # CPU cores
  container_memory: 5120           # MB
  container_persistent: true       # Writable overlay persists across sessions

要求： $PATH 中存在 apptainer 或 singularity 可执行文件。

镜像处理： Docker URL（docker://...）会自动转换成 SIF 文件并缓存；已有 .sif 文件则会直接使用。

Scratch 目录解析顺序： TERMINAL_SCRATCH_DIR → TERMINAL_SANDBOX_DIR/singularity → /scratch/$USER/hermes-agent（HPC 约定）→ ~/.hermes/sandboxes/singularity。

隔离方式： 使用 --containall --no-home，在不挂载宿主 home 目录的情况下实现完整 namespace 隔离。

Common Terminal Backend Issues

如果终端命令一开始就失败，或终端工具显示为禁用状态：

Local - 无额外依赖，是最安全的入门默认项
Docker - 执行 docker version 检查 Docker 是否正常；若失败，请修复 Docker，或运行 hermes config set terminal.backend local
SSH - TERMINAL_SSH_HOST 与 TERMINAL_SSH_USER 都必须设置；任意一个缺失时 Hermes 会给出明确错误
Modal - 需要 MODAL_TOKEN_ID 环境变量或 ~/.modal.toml；可运行 hermes doctor
Daytona - 需要 DAYTONA_API_KEY；服务器 URL 由 Daytona SDK 处理
Singularity - 需要 $PATH 中有 apptainer 或 singularity，在 HPC 集群中很常见

拿不准时，先把 terminal.backend 改回 local，确认命令能否在本地正常执行。

Docker Volume Mounts

使用 Docker 后端时，docker_volumes 允许你把宿主机目录共享到容器中。每个条目都使用标准 Docker -v 语法：host_path:container_path[:options]。

terminal:
  backend: docker
  docker_volumes:
    - "/home/user/projects:/workspace/projects"   # Read-write (default)
    - "/home/user/datasets:/data:ro"              # Read-only
    - "/home/user/.hermes/cache/documents:/output" # Gateway-visible exports

这适用于：

给 agent 提供文件（数据集、配置、参考代码）
从 agent 接收文件（生成的代码、报告、导出结果）
共享工作区（你和 agent 同时访问同一批文件）

如果你在使用消息网关，并希望 agent 通过 MEDIA:/... 发送生成文件，最好挂载一个宿主机可见的专用导出目录，例如 /home/user/.hermes/cache/documents:/output。

在 Docker 中把文件写到 /output/...
在 MEDIA: 中输出宿主机路径，例如： MEDIA:/home/user/.hermes/cache/documents/report.txt
不要输出 /workspace/... 或 /output/...，除非网关进程在宿主机上也能访问这个完全相同的路径

注意

YAML 中重复键会静默覆盖先前值。如果你已经有一个 docker_volumes: 块，请把新挂载项合并到同一个列表中，而不是在文件后面再写另一个 docker_volumes:。

也可以通过环境变量设置：TERMINAL_DOCKER_VOLUMES='["/host:/container"]'（JSON 数组）。

Docker Credential Forwarding

默认情况下，Docker 终端会话不会继承宿主机上的任意凭据。如果你确实需要在容器内访问某个 token，请将它加入 terminal.docker_forward_env。

terminal:
  backend: docker
  docker_forward_env:
    - "GITHUB_TOKEN"
    - "NPM_TOKEN"

Hermes 会优先从当前 shell 中解析这些变量；如果没有，再从通过 hermes config set 保存到 ~/.hermes/.env 的值中读取。

注意

凡是列在 docker_forward_env 中的内容，容器内的命令都能看到。只应转发那些你愿意暴露给终端会话的凭据。

Optional: Mount the Launch Directory into `/workspace`

Docker 沙箱默认保持隔离。除非你显式开启，否则 Hermes 不会把当前宿主机工作目录自动传进容器。

在 config.yaml 中这样开启：

terminal:
  backend: docker
  docker_mount_cwd_to_workspace: true

开启后：

如果你从 ~/projects/my-app 启动 Hermes，这个宿主目录会被绑定挂载到 /workspace
Docker 后端会从 /workspace 启动
文件工具和终端命令都会看到同一个已挂载项目

关闭时，/workspace 仍由沙箱自己管理，除非你通过 docker_volumes 显式挂载内容。

安全取舍如下：

false - 保留沙箱边界
true - 让沙箱直接访问你启动 Hermes 时所在的宿主目录

只有在你明确希望容器直接操作宿主文件时，才建议开启。

Persistent Shell

默认情况下，每条终端命令都会在独立子进程中运行，因此工作目录、环境变量与 shell 变量都会在命令间重置。启用 persistent shell 后，会在多次 execute() 调用之间保留同一个长期存在的 bash 进程，从而让这些状态能够延续。

这个能力对 SSH 后端 尤其有用，因为还能省掉每条命令建立连接的开销。SSH 默认启用持久 shell，本地后端默认关闭。

terminal:
  persistent_shell: true   # default — enables persistent shell for SSH

禁用方式：

hermes config set terminal.persistent_shell false

哪些状态会在命令间保留：

工作目录（例如 cd /tmp 会影响下一条命令）
导出的环境变量（例如 export FOO=bar）
shell 变量（例如 MY_VAR=hello）

优先级：

Level	Variable	Default
Config	`terminal.persistent_shell`	`true`
SSH override	`TERMINAL_SSH_PERSISTENT`	follows config
Local override	`TERMINAL_LOCAL_PERSISTENT`	`false`

按后端区分的环境变量优先级最高。如果你也想让本地后端启用持久 shell：

export TERMINAL_LOCAL_PERSISTENT=true

备注

需要 stdin_data 或 sudo 的命令会自动回退到 one-shot 模式，因为 persistent shell 的 stdin 已经被 IPC 协议占用。

关于各后端的更多细节，请参见 Code Execution 和 Terminal section of the README。

Skill Settings

技能可以通过各自 SKILL.md 的 frontmatter 声明自己的配置项。这些都是非机密值（路径、偏好、领域设置等），会存储在 config.yaml 的 skills.config 命名空间下。

skills:
  config:
    myplugin:
      path: ~/myplugin-data   # Example — each skill defines its own keys

技能设置的工作方式：

hermes config migrate 会扫描所有已启用技能，找出尚未配置的设置，并提供交互式提示
hermes config show 会在 “Skill Settings” 区域显示全部技能配置
当技能加载时，解析后的配置值会自动注入技能上下文

手动设置值：

hermes config set skills.config.myplugin.path ~/myplugin-data

如果你想在自定义技能中声明配置项，请参见 Creating Skills — Config Settings。

Memory Configuration

memory:
  memory_enabled: true
  user_profile_enabled: true
  memory_char_limit: 2200   # ~800 tokens
  user_char_limit: 1375     # ~500 tokens

File Read Safety

这个设置控制单次 read_file 调用最多能返回多少内容。超出限制时，调用会报错，并提示 agent 使用 offset 和 limit 缩小范围，从而避免一次读取压缩 JS bundle 或大数据文件就把上下文窗口塞满。

file_read_max_chars: 100000  # default — ~25-35K tokens

如果你使用大上下文模型并且经常读取大文件，可以调高它；如果你使用小上下文模型，则可以调低它，提高读取效率：

# Large context model (200K+)
file_read_max_chars: 200000

# Small local model (16K context)
file_read_max_chars: 30000

Hermes 还会自动去重文件读取：如果同一文件区间在文件未变动时被读取两次，第二次会返回轻量占位结果，而不是再次发送原文。这种缓存会在上下文压缩后重置，以便 agent 在旧内容被摘要掉之后还能重新读取文件。

Git Worktree Isolation

如果你想在同一个仓库中并行运行多个 agent，可以启用 git worktree 隔离：

worktree: true    # Always create a worktree (same as hermes -w)
# worktree: false # Default — only when -w flag is passed

启用后，每个 CLI 会话都会在 .worktrees/ 下创建一个全新 worktree，并分配自己的分支。不同 agent 可以独立编辑、提交、推送和创建 PR，而不会互相影响。干净的 worktree 在退出时会被移除；脏 worktree 会保留，方便手动恢复。

你还可以在仓库根目录通过 .worktreeinclude 指定需要复制到 worktree 中的 gitignored 文件：

# .worktreeinclude
.env
.venv/
node_modules/

Context Compression

Hermes 会自动压缩长对话，以保持在模型上下文窗口之内。压缩摘要器是一次独立的 LLM 调用，你可以把它指向任意大模型提供商（provider）或端点。

所有压缩相关设置都位于 config.yaml 中，不使用环境变量。

Full reference

compression:
  enabled: true                                     # Toggle compression on/off
  threshold: 0.50                                   # Compress at this % of context limit
  target_ratio: 0.20                                # Fraction of threshold to preserve as recent tail
  protect_last_n: 20                                # Min recent messages to keep uncompressed

# The summarization model/provider is configured under auxiliary:
auxiliary:
  compression:
    model: "google/gemini-3-flash-preview"          # Model for summarization
    provider: "auto"                                # Provider: "auto", "openrouter", "nous", "codex", "main", etc.
    base_url: null                                  # Custom OpenAI-compatible endpoint (overrides provider)

Legacy config migration

旧版配置中的 compression.summary_model、compression.summary_provider 和 compression.summary_base_url 会在首次加载时自动迁移到 auxiliary.compression.*（配置版本 17）。无需手动操作。

Common setups

默认自动检测，无需配置：

compression:
  enabled: true
  threshold: 0.50

这会用第一个可用的大模型提供商（provider）（OpenRouter → Nous → Codex），并默认选用 Gemini Flash。

强制使用特定大模型提供商（provider）（OAuth 或 API key 均可）：

auxiliary:
  compression:
    provider: nous
    model: gemini-3-flash

适用于任意 provider，例如 nous、openrouter、codex、anthropic、main 等。

自定义端点（自托管、Ollama、zai、DeepSeek 等）：

auxiliary:
  compression:
    model: glm-4.7
    base_url: https://api.z.ai/api/coding/paas/v4

这会直接调用自定义 OpenAI-compatible 端点，并使用 OPENAI_API_KEY 认证。

How the three knobs interact

`auxiliary.compression.provider`	`auxiliary.compression.base_url`	Result
`auto` (default)	not set	自动检测最佳可用大模型提供商（provider）
`nous` / `openrouter` / etc.	not set	强制使用对应大模型提供商（provider）及其认证方式
any	set	直接使用自定义端点（忽略 provider）

Summary model context length requirement

摘要模型的上下文窗口必须至少与主模型一样大。压缩器会把会话中间部分完整发给摘要模型；如果摘要模型的上下文窗口比主模型小，就会因上下文长度错误而失败。一旦失败，这段中间内容会在没有摘要的情况下被直接丢弃，从而静默丢失上下文。如果你手动覆盖摘要模型，请先确认它的上下文窗口不小于主模型。

Context Engine

Context engine 控制 Hermes 在接近模型 token 上限时如何管理对话。内置 compressor 引擎使用有损摘要（见 Context Compression）。插件引擎则可以替换为其他策略。

context:
  engine: "compressor"    # default — built-in lossy summarization

使用插件引擎（例如 LCM）时：

context:
  engine: "lcm"          # must match the plugin's name

插件引擎不会自动启用，必须显式把 context.engine 设置为插件名。你可以通过 hermes plugins → Provider Plugins → Context Engine 浏览和选择可用引擎。

类似的单选机制也适用于记忆插件，详见 Memory Providers。

Iteration Budget Pressure

当 agent 处理复杂任务并调用大量工具时，可能会不知不觉耗尽迭代预算（默认 90 轮）。Budget pressure 会在接近上限时自动提醒模型：

Threshold	Level	What the model sees
70%	Caution	`[BUDGET: 63/90. 27 iterations left. Start consolidating.]`
90%	Warning	`[BUDGET WARNING: 81/90. Only 9 left. Respond NOW.]`

这些提示不会作为单独消息插入，而是写进最后一个工具结果 JSON 的 _budget_warning 字段，以保持 prompt caching 不被破坏。

agent:
  max_turns: 90                # Max iterations per conversation turn (default: 90)

Budget pressure 默认启用。模型会自然地在工具结果中看到这些警告，从而更倾向于整合工作并在耗尽预算前给出响应。

当预算真正耗尽时，CLI 会提示用户：⚠ Iteration budget reached (90/90) — response may be incomplete。如果预算在执行过程中耗尽，agent 会先生成一份已完成工作的摘要，然后再停止。

API Timeouts

Hermes 为流式调用设置了独立超时层，也为非流式调用提供 stale detector。当你保留默认值时，stale detector 只会对本地 provider 自动调优。

Timeout	Default	Local providers	Config / env
Socket read timeout	120s	自动提升到 1800s	`HERMES_STREAM_READ_TIMEOUT`
Stale stream detection	180s	自动禁用	`HERMES_STREAM_STALE_TIMEOUT`
Stale non-stream detection	300s	仅在未显式配置时自动禁用	`providers.<id>.stale_timeout_seconds` or `HERMES_API_CALL_STALE_TIMEOUT`
API call (non-streaming)	1800s	不变	`providers.<id>.request_timeout_seconds` / `timeout_seconds` or `HERMES_API_TIMEOUT`

Socket read timeout 控制 httpx 等待 provider 返回下一个数据块的最长时间。本地 LLM 在大上下文预填充时，首个 token 可能要等好几分钟，因此 Hermes 检测到本地端点后会把这个超时抬高到 30 分钟。如果你显式设置了 HERMES_STREAM_READ_TIMEOUT，则始终以该值为准。

Stale stream detection 会在连接只收到 SSE keep-alive ping 而没有真实内容时终止连接。对于本地 provider，它会被彻底禁用，因为本地模型预填充时通常不会发送 keep-alive ping。

Stale non-stream detection 会终止那些长时间没有任何响应的非流式调用。默认情况下，Hermes 会对本地端点关闭这项检测，以避免在长时间预填充时误报。如果你显式设置了 providers.<id>.stale_timeout_seconds、providers.<id>.models.<model>.stale_timeout_seconds 或 HERMES_API_CALL_STALE_TIMEOUT，则即便是本地端点也会严格使用你指定的值。

Context Pressure Warnings

与迭代预算压力不同，context pressure 跟踪的是当前对话离压缩阈值还有多近，也就是上下文压缩即将触发的点。这样你和 agent 都能更直观地知道对话是不是已经变得很长。

Progress	Level	What happens
>= 60% to threshold	Info	CLI 显示青色进度条；网关发送信息提示
>= 85% to threshold	Warning	CLI 显示加粗黄色进度条；网关提醒压缩即将到来

在 CLI 中，它会显示为工具输出流中的一条进度条：

  ◐ context ████████████░░░░░░░░ 62% to compaction  48k threshold (50%) · approaching compaction

在消息平台中，则会发出纯文本通知：

◐ Context: ████████████░░░░░░░░ 62% to compaction (threshold: 50% of window).

如果自动压缩被禁用，提示内容会改为提醒你上下文可能会被截断。

Context pressure 完全自动，无需额外配置。它只是一个面向用户的提醒，不会修改消息流，也不会向模型上下文注入额外内容。

Credential Pool Strategies

当你为同一个大模型提供商（provider）配置了多个 API key 或 OAuth token 时，可以指定轮换策略：

credential_pool_strategies:
  openrouter: round_robin    # cycle through keys evenly
  anthropic: least_used      # always pick the least-used key

可选值有：fill_first（默认）、round_robin、least_used、random。详见 Credential Pools。

Auxiliary Models

Hermes 会为图像分析、网页摘要、浏览器截图分析等旁路任务使用轻量“辅助模型”。默认情况下，这些任务会通过自动检测使用 Gemini Flash，无需你额外配置。

The universal config pattern

在 Hermes 中，每一个模型槽位，无论是辅助任务、压缩还是 fallback，都遵循同样的 3 个旋钮：

Key	What it does	Default
`provider`	用哪个大模型提供商（provider）做认证和路由	`"auto"`
`model`	请求哪个模型	该 provider 的默认模型
`base_url`	自定义 OpenAI-compatible 端点（覆盖 provider）	not set

如果设置了 base_url，Hermes 会忽略 provider，直接调用该端点（使用 api_key 或 OPENAI_API_KEY 认证）。如果只设置了 provider，则使用该 provider 的内建认证和 base URL。

辅助任务可用的大模型提供商（provider）包括：auto、main，以及 provider registry 中列出的所有 provider，例如 openrouter、nous、openai-codex、copilot、copilot-acp、anthropic、gemini、google-gemini-cli、qwen-oauth、zai、kimi-coding、kimi-coding-cn、minimax、minimax-cn、deepseek、nvidia、xai、ollama-cloud、alibaba、bedrock、huggingface、arcee、xiaomi、kilocode、opencode-zen、opencode-go、ai-gateway，以及你在 custom_providers 中自定义命名的 provider（例如 provider: "beans"）。

"main" is for auxiliary tasks only

"main" 的含义是“使用与主 agent 相同的大模型提供商（provider）”。它只在 auxiliary:、compression: 和 fallback_model: 配置中有效，不能用于顶层 model.provider。如果你的主模型是一个自定义 OpenAI-compatible 端点，那么在 model: 中应设置 provider: custom。主模型 provider 的完整说明见 AI Providers。

Full auxiliary config reference

auxiliary:
  # Image analysis (vision_analyze tool + browser screenshots)
  vision:
    provider: "auto"           # "auto", "openrouter", "nous", "codex", "main", etc.
    model: ""                  # e.g. "openai/gpt-4o", "google/gemini-2.5-flash"
    base_url: ""               # Custom OpenAI-compatible endpoint (overrides provider)
    api_key: ""                # API key for base_url (falls back to OPENAI_API_KEY)
    timeout: 120               # seconds — LLM API call timeout; vision payloads need generous timeout
    download_timeout: 30       # seconds — image HTTP download; increase for slow connections

  # Web page summarization + browser page text extraction
  web_extract:
    provider: "auto"
    model: ""                  # e.g. "google/gemini-2.5-flash"
    base_url: ""
    api_key: ""
    timeout: 360               # seconds (6min) — per-attempt LLM summarization

  # Dangerous command approval classifier
  approval:
    provider: "auto"
    model: ""
    base_url: ""
    api_key: ""
    timeout: 30                # seconds

  # Context compression timeout (separate from compression.* config)
  compression:
    timeout: 120               # seconds — compression summarizes long conversations, needs more time

  # Session search — summarizes past session matches
  session_search:
    provider: "auto"
    model: ""
    base_url: ""
    api_key: ""
    timeout: 30
    max_concurrency: 3       # Limit parallel summaries to reduce request-burst 429s
    extra_body: {}           # Provider-specific OpenAI-compatible request fields

  # Skills hub — skill matching and search
  skills_hub:
    provider: "auto"
    model: ""
    base_url: ""
    api_key: ""
    timeout: 30

  # MCP tool dispatch
  mcp:
    provider: "auto"
    model: ""
    base_url: ""
    api_key: ""
    timeout: 30

  # Memory flush — summarizes conversation for persistent memory
  flush_memories:
    provider: "auto"
    model: ""
    base_url: ""
    api_key: ""
    timeout: 30

提示

每个辅助任务都有可配置的 timeout。默认值分别是：vision 120s、web_extract 360s、approval 30s、compression 120s。如果你给辅助任务使用了较慢的本地模型，可以适当调大这些时间。vision 还单独拥有 download_timeout（默认 30s），用于图片的 HTTP 下载，在慢连接或自托管图片服务器场景中也可以调大。

信息

上下文压缩有两块配置：compression: 用于阈值等行为设置，auxiliary.compression: 用于模型与大模型提供商（provider）设置。fallback model 使用独立的 fallback_model: 块。三者都遵循相同的 provider / model / base_url 模式。

Session Search Tuning

如果你给 auxiliary.session_search 使用的是推理较重的模型，Hermes 提供了两项内建控制：

auxiliary.session_search.max_concurrency：限制同时对多少个匹配会话做摘要
auxiliary.session_search.extra_body：向该摘要请求透传 provider 特定的 OpenAI-compatible 请求体字段

示例：

auxiliary:
  session_search:
    provider: "main"
    model: "glm-4.5-air"
    timeout: 60
    max_concurrency: 2
    extra_body:
      enable_thinking: false

如果你的 provider 对请求突发有限流，而你愿意用更低的并行度换取更稳定的 session_search，就可以用 max_concurrency。

extra_body 只适用于那些文档明确支持 OpenAI-compatible 扩展字段的 provider。Hermes 只会原样透传这个对象，不会自行合成额外逻辑。

注意

extra_body 只有在目标 provider 真正支持对应字段时才会生效。如果 provider 没有暴露原生的“关闭推理”字段，Hermes 无法替它伪造。

Changing the Vision Model

如果你想把图像分析从 Gemini Flash 改为 GPT-4o：

auxiliary:
  vision:
    model: "openai/gpt-4o"

也可以通过环境变量设置（位于 ~/.hermes/.env）：

AUXILIARY_VISION_MODEL=openai/gpt-4o

Provider Options

这些选项适用于辅助任务配置（auxiliary:、compression:、fallback_model:），而不是主模型的 model.provider。

Provider	Description	Requirements
`"auto"`	使用当前最佳可用 provider（默认）。vision 会尝试 OpenRouter → Nous → Codex。	—
`"openrouter"`	强制通过 OpenRouter 路由到任意模型（Gemini、GPT-4o、Claude 等）	`OPENROUTER_API_KEY`
`"nous"`	强制使用 Nous Portal	`hermes auth`
`"codex"`	强制使用 Codex OAuth（ChatGPT 账户），支持 vision（gpt-5.3-codex）	`hermes model` → Codex
`"main"`	使用你当前的自定义 / 主端点。来源可以是 `OPENAI_BASE_URL` + `OPENAI_API_KEY`，也可以是通过 `hermes model` / `config.yaml` 配置的自定义端点。适用于 OpenAI、本地模型或任意 OpenAI-compatible API。仅限辅助任务，不能用于 `model.provider`。	自定义端点凭据 + base URL

Common Setups

直接使用自定义端点（比 provider: "main" 更适合描述本地 / 自托管 API）：

auxiliary:
  vision:
    base_url: "http://localhost:1234/v1"
    api_key: "local-key"
    model: "qwen2.5-vl"

base_url 的优先级高于 provider，因此这是将辅助任务路由到特定端点的最明确方式。对这种直接端点覆盖，Hermes 会使用配置中的 api_key 或回退到 OPENAI_API_KEY；不会复用 OPENROUTER_API_KEY。

用 OpenAI API key 做 vision：

# In ~/.hermes/.env:
# OPENAI_BASE_URL=https://api.openai.com/v1
# OPENAI_API_KEY=sk-...

auxiliary:
  vision:
    provider: "main"
    model: "gpt-4o"       # or "gpt-4o-mini" for cheaper

用 OpenRouter 做 vision：

auxiliary:
  vision:
    provider: "openrouter"
    model: "openai/gpt-4o"      # or "google/gemini-2.5-flash", etc.

用 Codex OAuth（ChatGPT Pro / Plus 账户，不需要 API key）：

auxiliary:
  vision:
    provider: "codex"     # uses your ChatGPT OAuth token
    # model defaults to gpt-5.3-codex (supports vision)

使用本地 / 自托管模型：

auxiliary:
  vision:
    provider: "main"      # uses your active custom endpoint
    model: "my-local-model"

provider: "main" 的含义是：辅助任务使用与你日常聊天相同的大模型提供商（provider），不管它是一个命名的自定义 provider（例如 beans），还是内置 provider（例如 openrouter），又或者是旧式 OPENAI_BASE_URL 端点。

提示

如果你的主模型大模型提供商（provider）本来就是 Codex OAuth，那么 vision 会自动可用，无需额外配置。Codex 已经包含在自动检测链中。

注意

Vision 必须使用多模态模型。 如果设置 provider: "main"，请确认你的主端点本身支持视觉输入，否则图像分析会失败。

Environment Variables (legacy)

辅助模型也可以通过环境变量设置，但更推荐使用 config.yaml，因为它更容易管理，而且支持 base_url、api_key 等完整选项。

Setting	Environment Variable
Vision provider	`AUXILIARY_VISION_PROVIDER`
Vision model	`AUXILIARY_VISION_MODEL`
Vision endpoint	`AUXILIARY_VISION_BASE_URL`
Vision API key	`AUXILIARY_VISION_API_KEY`
Web extract provider	`AUXILIARY_WEB_EXTRACT_PROVIDER`
Web extract model	`AUXILIARY_WEB_EXTRACT_MODEL`
Web extract endpoint	`AUXILIARY_WEB_EXTRACT_BASE_URL`
Web extract API key	`AUXILIARY_WEB_EXTRACT_API_KEY`

压缩与 fallback model 设置只支持 config.yaml。

提示

运行 hermes config 可以查看当前辅助模型配置。只有那些与默认值不同的覆盖项才会显示出来。

Reasoning Effort

控制模型在回复前“思考”多少：

agent:
  reasoning_effort: ""   # empty = medium (default). Options: none, minimal, low, medium, high, xhigh (max)

不设置时，默认是 medium，也就是适合大多数任务的平衡档位。显式设置则会覆盖默认值；档位越高，复杂任务上的效果通常更好，但 token 消耗和延迟也更大。

你也可以在运行时通过 /reasoning 修改：

/reasoning           # Show current effort level and display state
/reasoning high      # Set reasoning effort to high
/reasoning none      # Disable reasoning
/reasoning show      # Show model thinking above each response
/reasoning hide      # Hide model thinking

Tool-Use Enforcement

某些模型偶尔会把自己“打算做什么”说成自然语言，而不是实际调用工具，例如说 “I would run the tests...” 却不真正调用终端。Tool-use enforcement 会向系统提示中注入额外指导，把模型推回到正确的工具调用行为。

agent:
  tool_use_enforcement: "auto"   # "auto" | true | false | ["model-substring", ...]

Value	Behavior
`"auto"`（默认）	对名字中包含 `gpt`、`codex`、`gemini`、`gemma`、`grok` 的模型启用；对其他模型（Claude、DeepSeek、Qwen 等）关闭。
`true`	无论模型是什么都强制启用。适合那些总在描述动作、却不真正执行动作的模型。
`false`	无论模型是什么都关闭。
`["gpt", "codex", "qwen", "llama"]`	只有当模型名中包含列出的任意子串（大小写不敏感）时才启用。

What it injects

启用后，系统提示中最多会增加三层指导：

通用工具使用约束（所有命中的模型） - 立即调用工具而不是描述意图；持续工作直到任务完成；不要用“我接下来会做……”这种承诺结束当前轮次。
OpenAI 执行纪律（仅 GPT 与 Codex） - 针对 GPT 家族常见失败模式的额外约束，例如拿到部分结果就停、跳过前置检索、凭空猜测而不调用工具、未验证就宣布完成。
Google 操作指导（仅 Gemini 与 Gemma） - 强调简洁表达、绝对路径、并行工具调用，以及编辑前先验证。

这些指导对用户是透明的，只会影响系统提示。像 Claude 这样本来就很擅长可靠调用工具的模型，通常不需要这类额外引导，这也是 "auto" 默认排除它们的原因。

When to turn it on

如果你使用的模型不在默认自动列表中，但发现它经常描述“将要做什么”而不真正去做，可以把 tool_use_enforcement 设为 true，或把模型名特征加进列表：

agent:
  tool_use_enforcement: ["gpt", "codex", "gemini", "grok", "my-custom-model"]

TTS Configuration

tts:
  provider: "edge"              # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts"
  speed: 1.0                    # Global speed multiplier (fallback for all providers)
  edge:
    voice: "en-US-AriaNeural"   # 322 voices, 74 languages
    speed: 1.0                  # Speed multiplier (converted to rate percentage, e.g. 1.5 → +50%)
  elevenlabs:
    voice_id: "pNInz6obpgDQGcFmaJgB"
    model_id: "eleven_multilingual_v2"
  openai:
    model: "gpt-4o-mini-tts"
    voice: "alloy"              # alloy, echo, fable, onyx, nova, shimmer
    speed: 1.0                  # Speed multiplier (clamped to 0.25–4.0 by the API)
    base_url: "https://api.openai.com/v1"  # Override for OpenAI-compatible TTS endpoints
  minimax:
    speed: 1.0                  # Speech speed multiplier
    # base_url: ""              # Optional: override for OpenAI-compatible TTS endpoints
  mistral:
    model: "voxtral-mini-tts-2603"
    voice_id: "c69964a6-ab8b-4f8a-9465-ec0925096ec8"  # Paul - Neutral (default)
  gemini:
    model: "gemini-2.5-flash-preview-tts"   # or gemini-2.5-pro-preview-tts
    voice: "Kore"               # 30 prebuilt voices: Zephyr, Puck, Kore, Enceladus, etc.
  xai:
    voice_id: "eve"             # xAI TTS voice
    language: "en"              # ISO 639-1
    sample_rate: 24000
    bit_rate: 128000            # MP3 bitrate
    # base_url: "https://api.x.ai/v1"
  neutts:
    ref_audio: ''
    ref_text: ''
    model: neuphonic/neutts-air-q4-gguf
    device: cpu

这既影响 text_to_speech 工具，也影响语音模式中的语音回复（CLI 或消息网关中的 /voice tts）。

速度回退顺序： provider 专属速度（例如 tts.edge.speed）→ 全局 tts.speed → 默认值 1.0。如果想让所有 provider 用统一语速，就设置全局 tts.speed；如果想细调，则按 provider 覆盖。

Display Settings

display:
  tool_progress: all      # off | new | all | verbose
  tool_progress_command: false  # Enable /verbose slash command in messaging gateway
  tool_progress_overrides: {}  # Per-platform overrides (see below)
  interim_assistant_messages: true  # Gateway: send natural mid-turn assistant updates as separate messages
  skin: default           # Built-in or custom CLI skin (see user-guide/features/skins)
  personality: "kawaii"  # Legacy cosmetic field still surfaced in some summaries
  compact: false          # Compact output mode (less whitespace)
  resume_display: full    # full (show previous messages on resume) | minimal (one-liner only)
  bell_on_complete: false # Play terminal bell when agent finishes (great for long tasks)
  show_reasoning: false   # Show model reasoning/thinking above each response (toggle with /reasoning show|hide)
  streaming: false        # Stream tokens to terminal as they arrive (real-time output)
  show_cost: false        # Show estimated $ cost in the CLI status bar
  tool_preview_length: 0  # Max chars for tool call previews (0 = no limit, show full paths/commands)

Mode	What you see
`off`	安静模式，只显示最终回复
`new`	只有工具发生变化时才显示指示器
`all`	每次工具调用都显示简短预览（默认）
`verbose`	显示完整参数、结果和调试日志

在 CLI 中，可以通过 /verbose 在这些模式间切换。若想在消息平台（Telegram、Discord、Slack 等）中也使用 /verbose，请把 display.tool_progress_command 设为 true。开启后，命令会循环切换模式并写回配置。

Per-platform progress overrides

不同平台对详细程度的容忍度不同。例如 Signal 不支持编辑消息，因此每次进度更新都会变成一条新消息，噪音很大。你可以用 tool_progress_overrides 为特定平台单独设置显示模式：

display:
  tool_progress: all          # global default
  tool_progress_overrides:
    signal: 'off'             # silence progress on Signal
    telegram: verbose         # detailed progress on Telegram
    slack: 'off'              # quiet in shared Slack workspace

没有覆盖项的平台会回退到全局 tool_progress。合法平台键包括：telegram、discord、slack、signal、whatsapp、matrix、mattermost、email、sms、homeassistant、dingtalk、feishu、wecom、weixin、bluebubbles、qqbot。

interim_assistant_messages 只作用于网关。开启后，Hermes 会把中途完成的自然语言进度更新作为单独消息发出去。它与 tool_progress 相互独立，也不依赖流式输出。

Privacy

privacy:
  redact_pii: false  # Strip PII from LLM context (gateway only)

当 redact_pii 为 true 时，网关会在支持的平台上，先对系统提示中的个人身份信息做脱敏，再发送给 LLM：

Field	Treatment
Phone numbers (user ID on WhatsApp/Signal)	哈希成 `user_<12-char-sha256>`
User IDs	哈希成 `user_<12-char-sha256>`
Chat IDs	仅哈希数字部分，平台前缀保留（例如 `telegram:<hash>`）
Home channel IDs	哈希数字部分
User names / usernames	不处理（属于用户主动公开的标识）

平台支持： 当前支持 WhatsApp、Signal 和 Telegram。Discord 与 Slack 不支持，因为它们的 mention 语法（<@user_id>）要求在 LLM 上下文里保留真实 ID。

这些哈希是确定性的，同一用户始终映射到相同哈希，因此模型仍然能区分群聊里的不同用户。真正的路由与消息投递仍然使用原始值。

Speech-to-Text (STT)

stt:
  provider: "local"            # "local" | "groq" | "openai" | "mistral"
  local:
    model: "base"              # tiny, base, small, medium, large-v3
  openai:
    model: "whisper-1"         # whisper-1 | gpt-4o-mini-transcribe | gpt-4o-transcribe
  # model: "whisper-1"         # Legacy fallback key still respected

各大模型提供商（provider）的行为如下：

local 使用运行在本机上的 faster-whisper，需要你单独执行 pip install faster-whisper
groq 使用 Groq 的 Whisper-compatible 端点，并读取 GROQ_API_KEY
openai 使用 OpenAI speech API，并读取 VOICE_TOOLS_OPENAI_KEY

如果请求的大模型提供商（provider）不可用，Hermes 会按 local → groq → openai 的顺序自动 fallback。

Groq 与 OpenAI 的模型覆盖使用环境变量：

STT_GROQ_MODEL=whisper-large-v3-turbo
STT_OPENAI_MODEL=whisper-1
GROQ_BASE_URL=https://api.groq.com/openai/v1
STT_OPENAI_BASE_URL=https://api.openai.com/v1

Voice Mode (CLI)

voice:
  record_key: "ctrl+b"         # Push-to-talk key inside the CLI
  max_recording_seconds: 120    # Hard stop for long recordings
  auto_tts: false               # Enable spoken replies automatically when /voice on
  beep_enabled: true            # Play record start/stop beeps in CLI voice mode
  silence_threshold: 200        # RMS threshold for speech detection
  silence_duration: 3.0         # Seconds of silence before auto-stop

在 CLI 中使用 /voice on 开启麦克风模式，用 record_key 开始 / 停止录音，用 /voice tts 切换语音播报。完整说明见 Voice Mode。

Streaming

让 token 在到达时实时流式显示到终端或消息平台，而不是等整段回复生成完毕后再显示。

CLI Streaming

display:
  streaming: true         # Stream tokens to terminal in real-time
  show_reasoning: true    # Also stream reasoning/thinking tokens (optional)

启用后，回复会逐 token 显示在流式框中。工具调用仍会在后台静默处理。如果当前 provider 不支持 streaming，则会自动回退到普通显示模式。

Gateway Streaming (Telegram, Discord, Slack)

streaming:
  enabled: true           # Enable progressive message editing
  transport: edit         # "edit" (progressive message editing) or "off"
  edit_interval: 0.3      # Seconds between message edits
  buffer_threshold: 40    # Characters before forcing an edit flush
  cursor: " ▉"            # Cursor shown during streaming

启用后，bot 会在收到第一个 token 时先发出一条消息，然后随着 token 到达不断编辑这条消息。对于不支持编辑消息的平台（Signal、Email、Home Assistant），系统会在第一次尝试时自动检测并为该会话优雅地关闭 streaming，而不会把聊天刷屏。

如果你想要的是自然语言形式的中途进度更新，而不是逐 token 编辑，请使用 display.interim_assistant_messages: true。

溢出处理： 如果流式内容超过了平台单条消息长度上限（大约 4096 字符），当前消息会被收尾，新消息会自动开始。

备注

Streaming 默认关闭。你可以在 ~/.hermes/config.yaml 中开启它，体验实时输出。

Group Chat Session Isolation

控制共享聊天是每个房间保留一个会话，还是每个参与者各自拥有一个会话：

group_sessions_per_user: true  # true = per-user isolation in groups/channels, false = one shared session per chat

true 是默认也是推荐设置。在 Discord 频道、Telegram 群组、Slack 频道等共享上下文中，只要平台能提供用户 ID，每个发送者就会拥有自己的会话。
false 会恢复旧的“房间共享会话”行为。如果你确实想让 Hermes 把整个频道当成一次协作式对话，这会很有用；但这也意味着用户之间会共享上下文、token 成本和中断状态。
私聊不受影响。Hermes 依然会像以前一样按 chat / DM ID 对私聊建会话。
无论设置如何，线程与其父频道总是相互隔离；当设置为 true 时，线程内还会继续按参与者隔离。

更多行为细节和示例见 Sessions 以及 Discord guide。

Unauthorized DM Behavior

控制当未知用户向 bot 发送私信时，Hermes 应如何处理：

unauthorized_dm_behavior: pair

whatsapp:
  unauthorized_dm_behavior: ignore

pair 是默认值。Hermes 会拒绝访问，但在私信中回复一个一次性 pairing code。
ignore 会静默丢弃未经授权的私信。
平台级配置会覆盖全局默认值，因此你可以在大多数平台上启用 pairing，同时让某个特定平台保持安静。

Quick Commands

你可以定义自定义命令，在不调用 LLM 的情况下直接执行 shell 命令。这类命令零 token 消耗、响应极快，特别适合在消息平台（Telegram、Discord 等）中用来做快速状态检查或运行小脚本。

quick_commands:
  status:
    type: exec
    command: systemctl status hermes-agent
  disk:
    type: exec
    command: df -h /
  update:
    type: exec
    command: cd ~/.hermes/hermes-agent && git pull && pip install -e .
  gpu:
    type: exec
    command: nvidia-smi --query-gpu=name,utilization.gpu,memory.used,memory.total --format=csv,noheader

用法：在 CLI 或任何消息平台中输入 /status、/disk、/update 或 /gpu。命令会直接在宿主机本地执行并返回输出，不经过 LLM，也不消耗 token。

30 秒超时 - 超时命令会被终止并返回错误
优先级更高 - quick command 会先于技能命令检查，因此可覆盖技能同名命令
不进入自动补全表 - quick command 在 dispatch 时解析，不会出现在内置斜杠命令补全表中
类型目前只支持 exec - 即执行 shell 命令，其他类型会报错
适用于所有平台 - CLI、Telegram、Discord、Slack、WhatsApp、Signal、Email、Home Assistant 等

Human Delay

在消息平台中模拟更像人的回复节奏：

human_delay:
  mode: "off"                  # off | natural | custom
  min_ms: 800                  # Minimum delay (custom mode)
  max_ms: 2500                 # Maximum delay (custom mode)

Code Execution

用于配置 execute_code 工具：

code_execution:
  mode: project                # project (default) | strict
  timeout: 300                 # Max execution time in seconds
  max_tool_calls: 50           # Max tool calls within code execution

mode 控制脚本执行时的工作目录和 Python 解释器：

project（默认）- 脚本在会话工作目录中运行，使用当前激活 virtualenv / conda 环境中的 Python。项目依赖（如 pandas、torch、项目自身包）和相对路径（如 .env、./data.csv）都会像 terminal() 看到的一样自然可用。
strict - 脚本在一个临时 staging 目录里运行，使用 sys.executable（即 Hermes 自己的 Python）。可复现性更强，但项目依赖和相对路径不会自动解析。

环境净化（会剥离 *_API_KEY、*_TOKEN、*_SECRET、*_PASSWORD、*_CREDENTIAL、*_PASSWD、*_AUTH）以及工具白名单在两种模式下都一样；切换模式不会降低安全边界。

Web Search Backends

web_search、web_extract 和 web_crawl 支持四种后端大模型提供商（provider）。你可以在 config.yaml 中配置，也可以通过 hermes tools 设置：

web:
  backend: firecrawl    # firecrawl | parallel | tavily | exa

Backend	Env Var	Search	Extract	Crawl
Firecrawl（默认）	`FIRECRAWL_API_KEY`	✔	✔	✔
Parallel	`PARALLEL_API_KEY`	✔	✔	—
Tavily	`TAVILY_API_KEY`	✔	✔	✔
Exa	`EXA_API_KEY`	✔	✔	—

后端选择： 如果没有设置 web.backend，Hermes 会根据可用 API key 自动检测。如果只设置了 EXA_API_KEY，就使用 Exa；只设置 TAVILY_API_KEY 则用 Tavily；只设置 PARALLEL_API_KEY 则用 Parallel；其他情况默认使用 Firecrawl。

自托管 Firecrawl： 把 FIRECRAWL_API_URL 指向你的实例。设置自定义 URL 后，API key 可变为可选项（服务端可用 USE_DB_AUTHENTICATION=false 关闭认证）。

Parallel 搜索模式： 使用 PARALLEL_SEARCH_MODE 控制行为，可选 fast、one-shot 或 agentic（默认）。

Exa： 在 ~/.hermes/.env 中设置 EXA_API_KEY。支持 category 过滤（company、research paper、news、people、personal site、pdf）以及域名 / 日期过滤。

Browser

配置浏览器自动化行为：

browser:
  inactivity_timeout: 120        # Seconds before auto-closing idle sessions
  command_timeout: 30             # Timeout in seconds for browser commands (screenshot, navigate, etc.)
  record_sessions: false         # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/
  camofox:
    managed_persistence: false   # When true, Camofox sessions persist cookies/logins across restarts

浏览器工具集支持多个 provider。详见 Browser feature page。

Timezone

使用 IANA 时区字符串覆盖服务器本地时区。它会影响日志时间戳、cron 调度，以及系统提示中的时间注入。

timezone: "America/New_York"   # IANA timezone (default: "" = server-local time)

支持任意 IANA 时区标识符，例如 America/New_York、Europe/London、Asia/Kolkata、UTC。留空或省略则使用服务器本地时间。

Discord

用于配置消息网关中的 Discord 行为：

discord:
  require_mention: true          # Require @mention to respond in server channels
  free_response_channels: ""     # Comma-separated channel IDs where bot responds without @mention
  auto_thread: true              # Auto-create threads on @mention in channels

require_mention - 当为 true（默认）时，bot 只会在服务器频道中被 @BotName 提及时响应。私信始终无需 mention。
free_response_channels - 用逗号分隔的一组频道 ID，在这些频道中 bot 会回复所有消息，无需 mention。
auto_thread - 当为 true（默认）时，频道内的 mention 会自动创建线程，让对话保持整洁，类似 Slack 的 thread 体验。

Security

命令执行前的安全扫描与 secrets 脱敏配置：

security:
  redact_secrets: true           # Redact API key patterns in tool output and logs
  tirith_enabled: true           # Enable Tirith security scanning for terminal commands
  tirith_path: "tirith"          # Path to tirith binary (default: "tirith" in $PATH)
  tirith_timeout: 5              # Seconds to wait for tirith scan before timing out
  tirith_fail_open: true         # Allow command execution if tirith is unavailable
  website_blocklist:             # See Website Blocklist section below
    enabled: false
    domains: []
    shared_files: []

redact_secrets - 在工具输出进入上下文和日志前，自动识别并遮蔽看起来像 API key、token、密码的内容
tirith_enabled - 为 true 时，终端命令会在执行前交给 Tirith 扫描，以识别潜在危险操作
tirith_path - tirith 二进制路径；若你安装在非常规位置，可在这里指定
tirith_timeout - 等待 tirith 扫描的最大时长；超时后命令继续执行
tirith_fail_open - 为 true（默认）时，tirith 不可用或失败不会阻止命令执行；设为 false 则在 tirith 无法验证时直接阻止命令

Website Blocklist

阻止 agent 的 web / browser 工具访问特定域名：

security:
  website_blocklist:
    enabled: false               # Enable URL blocking (default: false)
    domains:                     # List of blocked domain patterns
      - "*.internal.company.com"
      - "admin.example.com"
      - "*.local"
    shared_files:                # Load additional rules from external files
      - "/etc/hermes/blocked-sites.txt"

启用后，凡是命中域名规则的 URL，都会在 web 或 browser 工具真正执行前被拒绝。它适用于 web_search、web_extract、browser_navigate 以及所有访问 URL 的工具。

域名规则支持：

精确域名：admin.example.com
通配子域名：*.internal.company.com
顶级域通配：*.local

共享文件中每行写一条规则；空行和以 # 开头的注释会被忽略。文件缺失或不可读时只会记日志警告，不会禁用其他 web 工具。

这套策略会缓存 30 秒，因此无需重启即可较快生效。

Smart Approvals

控制 Hermes 如何处理潜在危险命令：

approvals:
  mode: manual   # manual | smart | off

Mode	Behavior
`manual`（默认）	对所有被标记的命令都先提示用户确认。CLI 中显示交互审批对话框；消息平台中则会挂起等待用户回复。
`smart`	用辅助 LLM 判断命令是否真的危险。低风险命令会自动批准，并在会话级保留；真正高风险的命令则升级给用户。
`off`	跳过全部审批检查，相当于 `HERMES_YOLO_MODE=true`。请谨慎使用。

smart mode 很适合减轻“审批疲劳”：在安全操作上允许 agent 更自主地推进，同时仍能拦住真正危险的命令。

注意

将 approvals.mode 设为 off 会关闭所有终端命令安全检查。只应在可信的沙箱环境中使用。

Checkpoints

在破坏性文件操作前自动创建文件系统快照。详细说明见 Checkpoints & Rollback。

checkpoints:
  enabled: true                  # Enable automatic checkpoints (also: hermes --checkpoints)
  max_snapshots: 50              # Max checkpoints to keep per directory

Delegation

用于配置 delegate tool 的子 agent 行为：

delegation:
  # model: "google/gemini-3-flash-preview"  # Override model (empty = inherit parent)
  # provider: "openrouter"                  # Override provider (empty = inherit parent)
  # base_url: "http://localhost:1234/v1"    # Direct OpenAI-compatible endpoint (takes precedence over provider)
  # api_key: "local-key"                    # API key for base_url (falls back to OPENAI_API_KEY)
  max_concurrent_children: 3                # Parallel children per batch (floor 1, no ceiling). Also via DELEGATION_MAX_CONCURRENT_CHILDREN env var.
  max_spawn_depth: 1                        # Delegation tree depth cap (1-3, clamped). 1 = flat (default): parent spawns leaves that cannot delegate. 2 = orchestrator children can spawn leaf grandchildren. 3 = three levels.
  orchestrator_enabled: true                # Global kill switch. When false, role="orchestrator" is ignored and every child is forced to leaf regardless of max_spawn_depth.

子 agent provider:model 覆盖： 默认情况下，子 agent 会继承父 agent 的 provider 和 model。你可以用 delegation.provider 与 delegation.model 指向另一组 provider:model，例如让子 agent 使用更便宜、更快的模型来处理小范围任务，而主 agent 使用更昂贵的推理模型。

直接端点覆盖： 如果你希望显式走自定义端点路径，可以设置 delegation.base_url、delegation.api_key 与 delegation.model。这样子 agent 会直接请求该 OpenAI-compatible 端点，并优先于 delegation.provider。如果不提供 delegation.api_key，则只会回退到 OPENAI_API_KEY。

委派时的大模型提供商（provider）解析逻辑与 CLI / 网关启动时一致。当前支持所有已配置 provider，例如 openrouter、nous、copilot、zai、kimi-coding、minimax、minimax-cn。一旦设置 provider，系统会自动解析正确的 base URL、API key 和 API mode。

优先级： delegation.base_url → delegation.provider → 父 provider。delegation.model → 父 model。若只设置 model 而不设置 provider，则只是改模型名，凭据仍继承父 agent，这在同一大模型提供商（provider）内切换模型时很有用。

宽度与深度： max_concurrent_children 控制每批最多并行启动多少子 agent（默认 3，最小 1，无上限），也可通过 DELEGATION_MAX_CONCURRENT_CHILDREN 设置。如果模型提交的 tasks 数组超出限制，delegate_task 会直接返回工具错误，而不是静默截断。max_spawn_depth 控制树深度（限制为 1 到 3）。默认 1 时，委派是扁平的：子节点不能再派生孙节点，role="orchestrator" 也会退化成 leaf。设为 2 时，orchestrator 子节点可以再派生 leaf 孙节点；设为 3 则允许三层树。若 orchestrator_enabled: false，则无论如何都只能得到 leaf。成本会随深度和并行度成倍增长，例如 max_spawn_depth: 3 且 max_concurrent_children: 3 时，最多可能出现 27 个并发 leaf agent。详见 Subagent Delegation → Depth Limit and Nested Orchestration。

Clarify

用于配置澄清提示的等待行为：

clarify:
  timeout: 120                 # Seconds to wait for user clarification response

Context Files (SOUL.md, AGENTS.md)

Hermes 会使用两类不同范围的上下文文件：

File	Purpose	Scope
`SOUL.md`	Primary agent identity，定义 agent 是谁（系统提示第 1 槽）	`~/.hermes/SOUL.md` 或 `$HERMES_HOME/SOUL.md`
`.hermes.md` / `HERMES.md`	项目专属说明（最高优先级）	向上走到 git root
`AGENTS.md`	项目专属说明、编码约定	递归目录扫描
`CLAUDE.md`	Claude Code 上下文文件（也会被识别）	仅工作目录
`.cursorrules`	Cursor IDE 规则（也会被识别）	仅工作目录
`.cursor/rules/*.mdc`	Cursor 规则文件（也会被识别）	仅工作目录

SOUL.md 是 agent 的核心身份，会占据系统提示中的第 1 槽，并完全替代内置默认身份
如果 SOUL.md 缺失、为空或读取失败，Hermes 会回退到内置默认身份
项目上下文文件按优先级加载，且同一时刻只会加载其中一种：.hermes.md → AGENTS.md → CLAUDE.md → .cursorrules。SOUL.md 始终独立加载
AGENTS.md 支持分层组合：如果子目录里也有 AGENTS.md，则都会一并加载
如果 SOUL.md 不存在，Hermes 会自动创建一个默认版本
所有上下文文件都会被限制在 20,000 字符以内，并采用智能截断

另见：

Working Directory

Context	Default
CLI (`hermes`)	你运行命令时所在的当前目录
Messaging gateway	用户主目录 `~`（可通过 `MESSAGING_CWD` 覆盖）
Docker / Singularity / Modal / SSH	容器或远程机器中的用户主目录

你可以这样覆盖工作目录：

# In ~/.hermes/.env or ~/.hermes/config.yaml:
MESSAGING_CWD=/home/myuser/projects    # Gateway sessions
TERMINAL_CWD=/workspace                # All terminal sessions

Directory Structure​

Managing Configuration​

Configuration Precedence​

Environment Variable Substitution​

Provider Timeouts​

Terminal Backend Configuration​

Backend Overview​

Local Backend​

Docker Backend​

SSH Backend​

Modal Backend​

Daytona Backend​

Singularity/Apptainer Backend​

Common Terminal Backend Issues​

Docker Volume Mounts​

Docker Credential Forwarding​

Optional: Mount the Launch Directory into /workspace​

Persistent Shell​

Skill Settings​

Memory Configuration​

File Read Safety​

Git Worktree Isolation​

Context Compression​

Full reference​

Common setups​

How the three knobs interact​

Context Engine​

Iteration Budget Pressure​

API Timeouts​

Context Pressure Warnings​

Credential Pool Strategies​

Auxiliary Models​

The universal config pattern​

Full auxiliary config reference​

Session Search Tuning​

Changing the Vision Model​

Provider Options​

Common Setups​

Environment Variables (legacy)​

Reasoning Effort​

Tool-Use Enforcement​

What it injects​

When to turn it on​

TTS Configuration​

Display Settings​

Per-platform progress overrides​

Privacy​

Speech-to-Text (STT)​

Voice Mode (CLI)​

Streaming​

CLI Streaming​

Gateway Streaming (Telegram, Discord, Slack)​

Group Chat Session Isolation​

Unauthorized DM Behavior​

Quick Commands​

Human Delay​

Code Execution​

Web Search Backends​

Browser​

Timezone​

Discord​

Security​

Website Blocklist​

Smart Approvals​

Checkpoints​

Delegation​

Clarify​

Context Files (SOUL.md, AGENTS.md)​

Working Directory​