图像生成

Hermes Agent 通过 FAL.ai 根据文本提示生成图片。默认内置 9 个模型，它们在速度、质量和成本之间各有权衡。活动模型可通过 hermes tools 配置，并会持久保存在 config.yaml 中。

支持的模型

Model	Speed	Strengths	Price
`fal-ai/flux-2/klein/9b` (default)	`<1s`	快、文字清晰	$0.006/MP
`fal-ai/flux-2-pro`	~6s	高质量写实	$0.03/MP
`fal-ai/z-image/turbo`	~2s	中英双语、6B 参数	$0.005/MP
`fal-ai/nano-banana-pro`	~8s	Gemini 3 Pro，推理更强、文字渲染更好	$0.15/image (1K)
`fal-ai/gpt-image-1.5`	~15s	Prompt 服从度高	$0.034/image
`fal-ai/gpt-image-2`	~20s	一流文字渲染 + CJK，世界知识更强	$0.04–0.06/image
`fal-ai/ideogram/v3`	~5s	排版最强	$0.03–0.09/image
`fal-ai/recraft/v4/pro/text-to-image`	~8s	设计稿、品牌系统、生产级输出	$0.25/image
`fal-ai/qwen-image`	~12s	LLM-based，复杂文本场景	$0.02/MP

价格以写作时 FAL 官方为准，最新价格请查看 fal.ai。

设置

Nous 订阅用户

如果你拥有付费 Nous Portal 订阅，可通过 Tool Gateway 使用图像生成，而无需单独配置 FAL API key。模型选择在两种路径下都会被保留。

获取 FAL API Key

在 fal.ai 注册
在控制台生成 API key

配置并选择模型

运行：

hermes tools

进入 🎨 Image Generation，选择后端（Nous Subscription 或 FAL.ai），随后会看到一个对齐展示的模型表。使用方向键导航，Enter 确认：

  Model                          Speed    Strengths                    Price
  fal-ai/flux-2/klein/9b         <1s      Fast, crisp text             $0.006/MP   ← currently in use
  fal-ai/flux-2-pro              ~6s      Studio photorealism          $0.03/MP
  fal-ai/z-image/turbo           ~2s      Bilingual EN/CN, 6B          $0.005/MP
  ...

你的选择会保存到 config.yaml：

image_gen:
  model: fal-ai/flux-2/klein/9b
  use_gateway: false

GPT-Image 质量档位

fal-ai/gpt-image-1.5 与 fal-ai/gpt-image-2 的请求质量被固定为 medium。我们不会把 low / high 暴露给普通用户，因为在 Nous Portal 场景下这会让计费变得不可预测。如果你要更便宜的方案，可选 Klein 9B 或 Z-Image Turbo；如果想要更高质量，可选 Nano Banana Pro 或 Recraft V4 Pro。

使用方式

面对智能体的 schema 被刻意设计得很简洁，模型只需使用你已经配置好的设置：

Generate an image of a serene mountain landscape with cherry blossoms

Create a square portrait of a wise old owl — use the typography model

Make me a futuristic cityscape, landscape orientation

宽高比

从智能体角度看，所有模型都统一接受三种宽高比，内部会自动转换到各模型原生参数：

Agent input	image_size (flux/z-image/qwen/recraft/ideogram)	aspect_ratio (nano-banana-pro)	image_size (gpt-image-1.5)	image_size (gpt-image-2)
`landscape`	`landscape_16_9`	`16:9`	`1536x1024`	`landscape_4_3` (1024×768)
`square`	`square_hd`	`1:1`	`1024x1024`	`square_hd` (1024×1024)
`portrait`	`portrait_16_9`	`9:16`	`1024x1536`	`portrait_4_3` (768×1024)

GPT Image 2 使用 4:3 预设而不是 16:9，因为其最小像素数要求更高，landscape_16_9 会被拒绝。

自动放大

通过 FAL 的 Clarity Upscaler 做放大是按模型控制的：

Model	Upscale?	Why
`fal-ai/flux-2-pro`	✓	兼容旧默认行为
其他全部模型	✗	快速模型不值得牺牲速度，高分辨率模型也不太需要

如果放大失败（例如网络问题或限流），系统会自动返回原图。

内部工作方式

模型解析 — _resolve_fal_model() 会依次读取 image_gen.model、FAL_IMAGE_MODEL，最后回退到 fal-ai/flux-2/klein/9b
构建 payload — _build_fal_payload() 把统一的 aspect_ratio 翻译为各模型自己的格式，并过滤掉不支持的参数
提交请求 — _submit_fal_request() 根据配置走直接 FAL 凭据或托管网关
放大 — 仅当模型元数据标记 upscale: True 时执行
投递 — 最终返回图片 URL，随后智能体发出 MEDIA:<url> 供各平台适配器转为原生媒体消息

调试

启用调试日志：

export IMAGE_TOOLS_DEBUG=true

调试日志会写到 ./logs/image_tools_debug_<session_id>.json。

平台投递

Platform	Delivery
CLI	以 markdown `![](url)` 打印图片 URL
Telegram	以图片消息发送，并把 prompt 作为 caption
Discord	嵌入在消息中
Slack	由 Slack 自动展开 URL
WhatsApp	以媒体消息发送
Others	以纯文本 URL 发送

限制

需要 FAL 凭据（直接 FAL_KEY 或 Nous Subscription）
仅支持 text-to-image，不支持 inpainting、img2img 或图片编辑
URL 是临时的，FAL 托管地址会在数小时或数天后失效
每模型约束不同，不支持的参数会被静默丢弃，这是预期行为

支持的模型​

设置​

获取 FAL API Key​

配置并选择模型​

GPT-Image 质量档位​

使用方式​

宽高比​

自动放大​

内部工作方式​

调试​

平台投递​

限制​