ChatML 与 Harmony：了解 OpenAI 的新格式 🔍

社区文章发布于 2025 年 8 月 9 日

Jisoo Kim

kuotient

OpenAI 刚刚发布了他们的 **Harmony 格式**，与 gpt-oss 模型一起，引入了一种与我们一直用于 Qwen3 等模型的 ChatML 格式完全不同的结构化推理和工具调用的方法。

如果您正在使用推理模型或构建推理基础设施，了解这些格式至关重要。今天，我们将深入探讨这两种格式，进行并排比较，以帮助您了解发生了什么变化以及为什么它很重要！

什么是 ChatML？📝

ChatML（聊天标记语言）一直是许多开源模型（特别是 Qwen 系列）的首选格式。它本质上是一种受 XML 启发的方式来构建对话，它是在需要清晰区分对话不同部分的需求下演变而来的。

主要特点

使用特殊标记 <|im_start|> 和 <|im_end|> 来标记消息边界
简单的基于角色的结构（系统、用户、助手）
思考/推理包裹在 <think>...</think> 块中
工具定义和调用使用 XML 风格的标签
经过验证、久经考验的格式，在许多模型中得到应用

将 ChatML 视为“经典”方法——直接、类似 XML，并注重清晰度。

什么是 Harmony？🎭

Harmony 是 OpenAI 专门为 gpt-oss 模型设计的新响应格式。它不仅仅是一种提示格式，更是对模型如何构建输出（尤其是复杂推理和工具使用）的彻底重新思考。

主要创新

多通道架构：消息可以标记为 analysis（推理）、commentary（工具前言）或 final（面向用户）
角色层次结构：system > developer > user > assistant > tool（用于处理指令冲突）
消息路由：使用 to= 语法在组件之间定向消息
TypeScript 风格的工具定义：比 JSON 模式更具表达性和简洁性
每个通道独立的安全标准：analysis 通道不像 final 那样经过安全过滤

Harmony 将模型的输出视为多线程对话，其中不同类型的内容通过不同的通道流动。

Harmony 的 TypeScript 风格语法：为何重要？🚀

Harmony 最有趣的设计选择之一是使用 TypeScript 风格的工具定义，而不是 JSON 模式。这不仅仅是语法糖——其背后有充分的理由。

基于代码的定义的力量

正如最近对代码代理的研究所示，以代码表示动作具有以下几个优点：

简洁性：代码操作比 JSON 紧凑约 30%
并行性：需要 4 个并行流的 5 个操作？在 JSON 中，那是 20 个独立的数据块；在代码中，它是一个表达式
变量管理：轻松存储和引用结果 (rock_image = generate_image("rock"))
可读性：代理日志变得更加清晰
LLM 流畅性：模型训练数据中有大量的代码

工具定义比较

ChatML（JSON 模式）

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get weather for a city",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {"type": "string"},
        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
      },
      "required": ["city"]
    }
  }
}

Harmony（TypeScript 风格）

namespace functions {
  // Get weather for a city
  type get_weather = (_: {
    city: string,
    unit?: "celsius" | "fahrenheit", // default: celsius
  }) => any;
}

TypeScript 版本不仅更简洁，而且开发人员更熟悉，LLM 也更容易处理。

深入示例 🏊‍♂️

让我们看看这些格式如何处理真实世界的场景，从简单的聊天到使用工具的复杂多步推理。

1. 基本对话（无思考）

最简单的情况——只是提问，没有任何推理步骤。

ChatML (Qwen3)

<|im_start|>user
What's the capital of France?<|im_end|>
<|im_start|>assistant
<think>

</think>

The capital of France is Paris.<|im_end|>

OpenAI Harmony

<|start|>user<|message|>What's the capital of France?<|end|>
<|start|>assistant<|channel|>final<|message|>The capital of France is Paris.<|return|>

正在发生什么： 即使禁用了思考，Qwen3 也会包含空的 <think></think> 标签作为结构提示。Harmony 则更直接——它直接跳转到 final 通道以处理面向用户的内容。

2. 带思考的对话

这里是哲学差异变得清晰的地方。

ChatML (Qwen3)

<|im_start|>user
If I have 3 apples and buy 5 more, then give away 2, how many do I have?<|im_end|>
<|im_start|>assistant
<think>
Starting with 3 apples
Adding 5 more: 3 + 5 = 8
Giving away 2: 8 - 2 = 6
So the final answer is 6 apples.
</think>

You would have 6 apples. 

Here's the breakdown: You start with 3 apples, buy 5 more (3 + 5 = 8), then give away 2 (8 - 2 = 6).<|im_end|>

OpenAI Harmony

<|start|>user<|message|>If I have 3 apples and buy 5 more, then give away 2, how many do I have?<|end|>
<|start|>assistant<|channel|>analysis<|message|>User starts with 3 apples. Buys 5 more: 3 + 5 = 8. Gives away 2: 8 - 2 = 6. Final count is 6 apples.<|end|>
<|start|>assistant<|channel|>final<|message|>You would have 6 apples.

Here's the breakdown: You start with 3 apples, buy 5 more (3 + 5 = 8), then give away 2 (8 - 2 = 6).<|return|>

有什么不同

ChatML 将所有内容保留在一条消息中，并内联思考
Harmony 将推理（analysis 通道）和答案（final 通道）分成单独的消息
Harmony 的方法使得程序化地从面向用户的内容中过滤推理变得更容易

3. 带思考的多轮对话

这揭示了每种格式如何处理对话历史。

ChatML (Qwen3)

<|im_start|>user
What's 15% of 80?<|im_end|>
<|im_start|>assistant
15% of 80 is 12.

To calculate: 80 × 0.15 = 12<|im_end|>
<|im_start|>user
Now add 25 to that result<|im_end|>
<|im_start|>assistant
<think>
The previous result was 12.
Adding 25: 12 + 25 = 37
</think>

Adding 25 to 12 gives us 37.<|im_end|>

重要： Qwen3 删除了之前轮次的思考以节省标记！

OpenAI Harmony

<|start|>user<|message|>What's 15% of 80?<|end|>
<|start|>assistant<|channel|>analysis<|message|>Calculate 15% of 80: 80 × 0.15 = 12<|end|>
<|start|>assistant<|channel|>final<|message|>15% of 80 is 12.<|return|>
<|start|>user<|message|>Now add 25 to that result<|end|>
<|start|>assistant<|channel|>analysis<|message|>Previous result was 12. Adding 25: 12 + 25 = 37<|end|>
<|start|>assistant<|channel|>final<|message|>Adding 25 to 12 gives us 37.<|return|>

历史管理： 两种格式都修剪历史推理，但 Harmony 的通道系统使其更明确和可控。

4. 函数调用

在工具处理方面，两种格式存在显著差异。

ChatML (Qwen3)

<|im_start|>system
# Tools

You may call one or more functions to assist with the user query.

<tools>
{"type": "function", "function": {"name": "get_weather", "description": "Get weather for a city", "parameters": {"type": "object", "properties": {"city": {"type": "string"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}}, "required": ["city"]}}}
</tools>

For each function call, return a json object within <tool_call></tool_call> XML tags.<|im_end|>
<|im_start|>user
What's the weather in Tokyo?<|im_end|>
<|im_start|>assistant
<think>
User wants Tokyo weather. Need to call get_weather function.
</think>

I'll check the current weather in Tokyo for you.

<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo", "unit": "celsius"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"temperature": 22, "condition": "partly cloudy", "humidity": 65}
</tool_response><|im_end|>
<|im_start|>assistant
The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|im_end|>

OpenAI Harmony

<|start|>developer<|message|># Tools
## functions
namespace functions {
// Get weather for a city
type get_weather = (_: {
  city: string,
  unit?: "celsius" | "fahrenheit", // default: celsius
}) => any;
} // namespace functions<|end|>
<|start|>user<|message|>What's the weather in Tokyo?<|end|>
<|start|>assistant<|channel|>commentary<|message|>I'll check the current weather in Tokyo for you.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather<|constrain|>json<|message|>{"city": "Tokyo", "unit": "celsius"}<|call|>
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>{"temperature": 22, "condition": "partly cloudy", "humidity": 65}<|end|>
<|start|>assistant<|channel|>final<|message|>The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|return|>

主要区别

工具定义：JSON 模式 vs TypeScript 风格类型
工具响应：ChatML 将其包装为用户消息；Harmony 使用专用工具角色
消息路由：Harmony 的 to= 语法明确显示了组件之间的流向
通道：Harmony 将工具交互（commentary）与最终答案分开

5. 带思维链的函数调用

复杂的场景真正展示了架构上的差异。

ChatML (Qwen3)

<|im_start|>system
# Tools
<tools>
{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}}
{"type": "function", "function": {"name": "get_flight_status", "description": "Check flight status", "parameters": {"type": "object", "properties": {"flight_number": {"type": "string"}}}}}
</tools><|im_end|>
<|im_start|>user
Should I pack an umbrella for my UA902 flight to Tokyo?<|im_end|>
<|im_start|>assistant
<think>
User is asking about packing an umbrella for flight UA902 to Tokyo.
I need to:
1. Check the flight status to confirm it's going to Tokyo
2. Check Tokyo's weather to see if rain is expected
Let me start with the flight status.
</think>

I'll help you decide if you need an umbrella. Let me check your flight details and the weather in Tokyo.

<tool_call>
{"name": "get_flight_status", "arguments": {"flight_number": "UA902"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"flight": "UA902", "destination": "Tokyo (NRT)", "status": "On time", "departure": "14:30"}
</tool_response><|im_end|>
<|im_start|>assistant
<think>
Flight confirmed to Tokyo, departing at 14:30. Now checking weather.
</think>

<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"temperature": 18, "condition": "rainy", "precipitation": "80%"}
</tool_response><|im_end|>
<|im_start|>assistant
<think>
Tokyo weather shows rainy conditions with 80% precipitation. Definitely need an umbrella.
</think>

Yes, you should definitely pack an umbrella! Your UA902 flight to Tokyo is on time (departing at 14:30), and Tokyo is currently experiencing rainy weather with an 80% chance of precipitation. Better to be prepared! ☔<|im_end|>

OpenAI Harmony

<|start|>developer<|message|># Tools
## functions
namespace functions {
type get_weather = (_: {city: string}) => any;
type get_flight_status = (_: {flight_number: string}) => any;
} // namespace functions<|end|>
<|start|>user<|message|>Should I pack an umbrella for my UA902 flight to Tokyo?<|end|>
<|start|>assistant<|channel|>analysis<|message|>User asking about packing umbrella for UA902 to Tokyo. Need to: 1) Verify flight destination 2) Check Tokyo weather. Starting with flight status.<|end|>
<|start|>assistant<|channel|>commentary<|message|>I'll help you decide if you need an umbrella. Let me check your flight details and the weather in Tokyo.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_flight_status<|constrain|>json<|message|>{"flight_number": "UA902"}<|call|>
<|start|>functions.get_flight_status to=assistant<|channel|>commentary<|message|>{"flight": "UA902", "destination": "Tokyo (NRT)", "status": "On time", "departure": "14:30"}<|end|>
<|start|>assistant<|channel|>analysis<|message|>Flight confirmed to Tokyo, on time at 14:30. Now checking weather conditions.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather<|constrain|>json<|message|>{"city": "Tokyo"}<|call|>
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>{"temperature": 18, "condition": "rainy", "precipitation": "80%"}<|end|>
<|start|>assistant<|channel|>analysis<|message|>Tokyo has rainy conditions with 80% precipitation. Clear recommendation to pack umbrella.<|end|>
<|start|>assistant<|channel|>final<|message|>Yes, you should definitely pack an umbrella! Your UA902 flight to Tokyo is on time (departing at 14:30), and Tokyo is currently experiencing rainy weather with an 80% chance of precipitation. Better to be prepared! ☔<|return|>

在复杂场景中

ChatML 在同一对话回合中保留工具调用之间的思考
Harmony 通过单独的 analysis 消息明确跟踪每个推理步骤
Harmony 的 commentary 通道可以在工具执行前包含面向用户的“引言”
Harmony 中的消息路由（to=）创建了清晰的执行跟踪

这对生态系统意味着什么 🌍

Harmony 的引入代表了我们对模型输出思考方式的重大演变

基于通道的安全性：不同通道的不同安全标准对于推理模型来说是一个颠覆性变化
更好的可观察性：明确的路由和通道使得调试复杂的代理行为更容易
代码优先工具：TypeScript 语法可能成为工具定义的新标准
结构化推理：在格式层面而不是仅仅约定俗成地将思考与输出分离

主要收获 📌

ChatML 更简单，更成熟，并获得广泛的生态系统支持
Harmony 引入了强大的新概念，如通道和消息路由
两种格式都处理推理和工具，但采用的理念截然不同
Harmony 的 TypeScript 风格定义更简洁，更适合开发人员
Harmony 中的通道系统能够对用户看到的内容进行精细控制

随着生态系统的发展，了解这两种格式至关重要。虽然我们无法选择模型使用的格式（只有在训练基础模型时才能选择），但了解它们的工作原理有助于我们构建更好的推理基础设施，并充分利用这些强大的推理模型。

社区

通过拖放到文本输入框、粘贴或点击此处上传图片、音频和视频。

点击或粘贴此处以上传图片

· 注册或登录发表评论