API 服务器

API 服务器会把 Hermes Agent 暴露为一个兼容 OpenAI 的 HTTP 端点。任何支持 OpenAI 格式的前端，例如 Open WebUI、LobeChat、LibreChat、NextChat、ChatBox 等，都可以把 Hermes Agent 作为后端连接并使用。

你的智能体会带着完整工具集处理请求，包括终端、文件操作、网页搜索、记忆和技能，并返回最终响应。启用流式输出时，工具进度也会内联返回，便于前端展示智能体当前正在执行什么。

快速开始

1. 启用 API 服务器

在 ~/.hermes/.env 中加入：

API_SERVER_ENABLED=true
API_SERVER_KEY=change-me-local-dev
# 可选：仅当浏览器需要直接调用 Hermes 时才设置
# API_SERVER_CORS_ORIGINS=http://localhost:3000

2. 启动网关

hermes gateway

你会看到：

[API Server] API server listening on http://127.0.0.1:8642

3. 连接前端

把任意兼容 OpenAI 的客户端指向 http://localhost:8642/v1：

# 使用 curl 测试
curl http://localhost:8642/v1/chat/completions   -H "Authorization: Bearer change-me-local-dev"   -H "Content-Type: application/json"   -d '{"model": "hermes-agent", "messages": [{"role": "user", "content": "Hello!"}]}'

你也可以直接连接 Open WebUI、LobeChat 或其他前端。完整分步说明见 Open WebUI 集成指南。

端点

POST /v1/chat/completions

标准 OpenAI Chat Completions 格式。它是无状态的，完整对话会在每次请求里通过 messages 数组传入。

请求：

{
  "model": "hermes-agent",
  "messages": [
    {"role": "system", "content": "You are a Python expert."},
    {"role": "user", "content": "Write a fibonacci function"}
  ],
  "stream": false
}

响应：

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "hermes-agent",
  "choices": [{
    "index": 0,
    "message": {"role": "assistant", "content": "Here's a fibonacci function..."},
    "finish_reason": "stop"
  }],
  "usage": {"prompt_tokens": 50, "completion_tokens": 200, "total_tokens": 250}
}

内联图片输入： 用户消息的 content 可以是由 text 和 image_url 组成的数组。支持远程 http(s) URL，也支持 data:image/... URL：

{
  "model": "hermes-agent",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/cat.png", "detail": "high"}}
      ]
    }
  ]
}

上传文件（file / input_file / file_id）以及非图片 data: URL 会返回 400 unsupported_content_type。

流式输出（"stream": true）：返回基于 SSE 的逐 token 响应流。对于 Chat Completions，流里使用标准 chat.completion.chunk 事件，并额外加入 Hermes 自定义的 hermes.tool.progress 事件，用于工具启动可视化。对于 Responses，流里会使用 OpenAI Responses 事件类型，例如 response.created、response.output_text.delta、response.output_item.added、response.output_item.done 和 response.completed。

流中的工具进度：

Chat Completions：Hermes 发出 event: hermes.tool.progress，用于显示工具开始执行，而不会污染持久化的 assistant 文本。
Responses：Hermes 会在 SSE 流中发出原生规范里的 function_call 和 function_call_output 输出项，因此客户端可以实时渲染结构化工具 UI。

POST /v1/responses

OpenAI Responses API 格式。支持通过 previous_response_id 在服务端保存对话状态。服务器会存储完整对话历史，包括工具调用及其结果，因此多轮上下文无需由客户端自行维护。

请求：

{
  "model": "hermes-agent",
  "input": "What files are in my project?",
  "instructions": "You are a helpful coding assistant.",
  "store": true
}

响应：

{
  "id": "resp_abc123",
  "object": "response",
  "status": "completed",
  "model": "hermes-agent",
  "output": [
    {"type": "function_call", "name": "terminal", "arguments": "{\"command\": \"ls\"}", "call_id": "call_1"},
    {"type": "function_call_output", "call_id": "call_1", "output": "README.md src/ tests/"},
    {"type": "message", "role": "assistant", "content": [{"type": "output_text", "text": "Your project has..."}]}
  ],
  "usage": {"input_tokens": 50, "output_tokens": 200, "total_tokens": 250}
}

内联图片输入： input[].content 可以包含 input_text 和 input_image。支持远程 URL 与 data:image/... URL：

{
  "model": "hermes-agent",
  "input": [
    {
      "role": "user",
      "content": [
        {"type": "input_text", "text": "Describe this screenshot."},
        {"type": "input_image", "image_url": "data:image/png;base64,iVBORw0K..."}
      ]
    }
  ]
}

上传文件（input_file / file_id）以及非图片 data: URL 会返回 400 unsupported_content_type。

使用 previous_response_id 做多轮对话

你可以把 responses 链接起来，在多轮之间保留完整上下文，包括工具调用：

{
  "input": "Now show me the README",
  "previous_response_id": "resp_abc123"
}

服务器会根据已存储的 response 链重建完整对话，保留之前所有工具调用与结果。链式请求还会共享同一会话，因此在 dashboard 和 session history 中会显示为同一个会话条目。

命名会话

你也可以直接使用 conversation 参数，而不是自己追踪 response ID：

{"input": "Hello", "conversation": "my-project"}
{"input": "What's in src/?", "conversation": "my-project"}
{"input": "Run the tests", "conversation": "my-project"}

服务器会自动把请求接到该会话最新的 response 上，效果类似网关会话中的 /title。

GET /v1/responses/{id}

按 ID 取回一个已存储的 response。

DELETE /v1/responses/{id}

删除一个已存储的 response。

GET /v1/models

列出可用模型。对外公布的模型名默认取自 profile 名称；默认 profile 则使用 hermes-agent。大多数前端会依赖这个接口进行模型发现。

GET /health

健康检查，返回 {"status": "ok"}。同时也支持 GET /v1/health，用于兼容期望 /v1/ 前缀的 OpenAI 风格客户端。

GET /health/detailed

扩展健康检查，还会报告活跃会话、运行中的智能体以及资源使用情况，便于监控与可观测性系统接入。

Runs API（更适合进度订阅的替代方案）

除了 /v1/chat/completions 和 /v1/responses，服务器还提供 runs API，适合长流程会话，尤其适用于客户端希望订阅进度事件，而不是自己管理流式连接的场景。

POST /v1/runs

创建一个新的 agent run。返回 run_id，后续可用它来订阅进度事件。

GET /v1/runs/{run_id}/events

返回该 run 的 SSE 事件流，包括工具调用进度、token 增量以及生命周期事件。很适合 dashboard 或厚客户端在不中断状态的前提下随时 attach / detach。

Jobs API（后台计划任务）

服务器还暴露了一套轻量 jobs CRUD 接口，用于远程管理计划任务和后台 agent run。所有端点都使用同一套 bearer token 认证。

GET /api/jobs

列出全部计划任务。

POST /api/jobs

创建新的计划任务。请求体结构与 hermes cron 相同，包括 prompt、schedule、技能、provider override 和 delivery target。

GET /api/jobs/{job_id}

获取单个任务定义及其最近一次运行状态。

PATCH /api/jobs/{job_id}

更新任务的部分字段，例如 prompt、schedule。支持部分更新合并。

DELETE /api/jobs/{job_id}

删除任务，并取消当前仍在执行中的任务实例。

POST /api/jobs/{job_id}/pause

暂停任务而不删除。恢复前将不会继续计算下一次计划执行时间。

POST /api/jobs/{job_id}/resume

恢复一个先前已暂停的任务。

POST /api/jobs/{job_id}/run

立即触发任务，跳过其日程等待。

System Prompt 处理方式

当前端发送 system 消息（Chat Completions）或 instructions 字段（Responses API）时，Hermes Agent 会把它 叠加在自己的核心系统提示之上。你的智能体仍会保留原有工具、记忆和技能，前端传入的系统提示只是在此基础上增加额外指令。

这意味着你可以按前端定制行为，而不会丢失 Hermes 的能力：

Open WebUI 的系统提示可以写成：“You are a Python expert. Always include type hints.”
同时智能体仍然拥有终端、文件工具、网页搜索、记忆等全部能力。

认证

使用 Authorization 头里的 Bearer token：

Authorization: Bearer ***

通过 API_SERVER_KEY 环境变量配置该 key。如果你需要让浏览器直接调用 Hermes，还应把 API_SERVER_CORS_ORIGINS 设置为显式白名单。

Security

API 服务器会开放 Hermes Agent 的完整工具集，包括终端命令。当你把服务绑定到 0.0.0.0 这类非 loopback 地址时，API_SERVER_KEY 是必需的。同时也应尽量收紧 API_SERVER_CORS_ORIGINS 的范围，以限制浏览器访问来源。

默认绑定地址 127.0.0.1 仅供本机使用。浏览器访问默认关闭；只有在你明确知道信任来源时才应开启。

配置

环境变量

变量	默认值	说明
`API_SERVER_ENABLED`	`false`	启用 API 服务器
`API_SERVER_PORT`	`8642`	HTTP 服务端口
`API_SERVER_HOST`	`127.0.0.1`	绑定地址（默认仅 localhost）
`API_SERVER_KEY`	(none)	认证 Bearer token
`API_SERVER_CORS_ORIGINS`	(none)	允许的浏览器来源，逗号分隔
`API_SERVER_MODEL_NAME`	(profile name)	`/v1/models` 中展示的模型名；默认是 profile 名，默认 profile 则是 `hermes-agent`

config.yaml

# 当前暂不支持，请使用环境变量。
# 未来版本会加入 config.yaml 支持。

安全响应头

所有响应都会带以下安全头：

X-Content-Type-Options: nosniff — 防止 MIME 类型嗅探
Referrer-Policy: no-referrer — 防止 referrer 泄露

CORS

API 服务器默认不会开启浏览器 CORS。

如果需要浏览器直接访问，请显式设置白名单：

API_SERVER_CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000

启用 CORS 后：

预检响应 会带 Access-Control-Max-Age: 600（10 分钟缓存）
SSE 流式响应 同样会带 CORS 头，确保浏览器 EventSource 正常工作
Idempotency-Key 会被列为允许请求头，客户端可用它做去重（响应会按 key 缓存 5 分钟）

像 Open WebUI 这类前端通常是服务端到服务端连接，并不需要 CORS。

兼容前端

任何支持 OpenAI API 格式的前端都可以使用。已测试或已文档化的集成包括：

Frontend	Stars	连接方式
Open WebUI	126k	提供完整指南
LobeChat	73k	自定义大模型提供商（provider）端点
LibreChat	34k	在 `librechat.yaml` 中配置自定义端点
AnythingLLM	56k	通用 OpenAI 大模型提供商（provider）
NextChat	87k	`BASE_URL` 环境变量
ChatBox	39k	API Host 配置
Jan	26k	远程模型配置
HF Chat-UI	8k	`OPENAI_BASE_URL`
big-AGI	7k	自定义端点
OpenAI Python SDK	—	`OpenAI(base_url="http://localhost:8642/v1")`
curl	—	直接 HTTP 请求

用 profiles 做多用户部署

如果你希望给多个用户提供彼此隔离的 Hermes 实例（独立配置、记忆、技能），请使用 profiles：

# 为每个用户创建一个 profile
hermes profile create alice
hermes profile create bob

# 为每个 profile 配置不同端口的 API 服务器
hermes -p alice config set API_SERVER_ENABLED true
hermes -p alice config set API_SERVER_PORT 8643
hermes -p alice config set API_SERVER_KEY alice-secret

hermes -p bob config set API_SERVER_ENABLED true
hermes -p bob config set API_SERVER_PORT 8644
hermes -p bob config set API_SERVER_KEY bob-secret

# 启动每个 profile 的网关
hermes -p alice gateway &
hermes -p bob gateway &

每个 profile 的 API 服务器都会自动把 profile 名作为模型 ID：

http://localhost:8643/v1/models → 模型 alice
http://localhost:8644/v1/models → 模型 bob

在 Open WebUI 中，你可以把它们分别添加为独立连接，模型下拉框里会看到 alice 和 bob 两个彼此隔离的模型。详见 Open WebUI 指南。

限制

响应存储 — 已存储的 responses（用于 previous_response_id）保存在 SQLite 中，可跨网关重启保留。最多保存 100 条，采用 LRU 淘汰。
不支持文件上传 — /v1/chat/completions 与 /v1/responses 都支持内联图片，但不支持上传文件（file、input_file、file_id）和非图片文档输入。
model 字段只用于外观兼容 — 请求中的 model 字段会被接受，但实际使用的 LLM 模型由服务端 config.yaml 决定。

Proxy Mode

API 服务器同时也是 网关代理模式 的后端。当另一个 Hermes 网关实例通过 GATEWAY_PROXY_URL 指向当前 API 服务器时，它会把所有消息转发到这里，而不是在本地运行自己的智能体。这很适合拆分部署，例如一个 Docker 容器负责 Matrix E2EE，而主机上的 agent 负责实际处理。

完整配置见 Matrix Proxy Mode。

快速开始​

1. 启用 API 服务器​

2. 启动网关​

3. 连接前端​

端点​

POST /v1/chat/completions​

POST /v1/responses​

使用 previous_response_id 做多轮对话​

命名会话​

GET /v1/responses/{id}​

DELETE /v1/responses/{id}​

GET /v1/models​

GET /health​

GET /health/detailed​

Runs API（更适合进度订阅的替代方案）​

POST /v1/runs​

GET /v1/runs/{run_id}/events​

Jobs API（后台计划任务）​

GET /api/jobs​

POST /api/jobs​

GET /api/jobs/{job_id}​

PATCH /api/jobs/{job_id}​

DELETE /api/jobs/{job_id}​

POST /api/jobs/{job_id}/pause​

POST /api/jobs/{job_id}/resume​

POST /api/jobs/{job_id}/run​

System Prompt 处理方式​

认证​

配置​

环境变量​

config.yaml​

安全响应头​

CORS​

兼容前端​

用 profiles 做多用户部署​

限制​

Proxy Mode​