ChatML 与 Harmony:了解 OpenAI 的新格式 🔍

社区文章 发布于 2025 年 8 月 9 日

OpenAI 刚刚发布了他们的 **Harmony 格式**,与 gpt-oss 模型一起,引入了一种与我们一直用于 Qwen3 等模型的 ChatML 格式完全不同的结构化推理和工具调用的方法。

如果您正在使用推理模型或构建推理基础设施,了解这些格式至关重要。今天,我们将深入探讨这两种格式,进行并排比较,以帮助您了解发生了什么变化以及为什么它很重要!

什么是 ChatML?📝

ChatML(聊天标记语言)一直是许多开源模型(特别是 Qwen 系列)的首选格式。它本质上是一种受 XML 启发的方式来构建对话,它是在需要清晰区分对话不同部分的需求下演变而来的。

主要特点

  • 使用特殊标记 <|im_start|><|im_end|> 来标记消息边界
  • 简单的基于角色的结构(系统、用户、助手)
  • 思考/推理包裹在 <think>...</think> 块中
  • 工具定义和调用使用 XML 风格的标签
  • 经过验证、久经考验的格式,在许多模型中得到应用

将 ChatML 视为“经典”方法——直接、类似 XML,并注重清晰度。

什么是 Harmony?🎭

Harmony 是 OpenAI 专门为 gpt-oss 模型设计的新响应格式。它不仅仅是一种提示格式,更是对模型如何构建输出(尤其是复杂推理和工具使用)的彻底重新思考。

主要创新

  • 多通道架构:消息可以标记为 analysis(推理)、commentary(工具前言)或 final(面向用户)
  • 角色层次结构:system > developer > user > assistant > tool(用于处理指令冲突)
  • 消息路由:使用 to= 语法在组件之间定向消息
  • TypeScript 风格的工具定义:比 JSON 模式更具表达性和简洁性
  • 每个通道独立的安全标准analysis 通道不像 final 那样经过安全过滤

Harmony 将模型的输出视为多线程对话,其中不同类型的内容通过不同的通道流动。

Harmony 的 TypeScript 风格语法:为何重要?🚀

Harmony 最有趣的设计选择之一是使用 TypeScript 风格的工具定义,而不是 JSON 模式。这不仅仅是语法糖——其背后有充分的理由。

基于代码的定义的力量

正如最近对代码代理的研究所示,以代码表示动作具有以下几个优点:

  1. 简洁性:代码操作比 JSON 紧凑约 30%
  2. 并行性:需要 4 个并行流的 5 个操作?在 JSON 中,那是 20 个独立的数据块;在代码中,它是一个表达式
  3. 变量管理:轻松存储和引用结果 (rock_image = generate_image("rock"))
  4. 可读性:代理日志变得更加清晰
  5. LLM 流畅性:模型训练数据中有大量的代码

工具定义比较

ChatML(JSON 模式)

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get weather for a city",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {"type": "string"},
        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
      },
      "required": ["city"]
    }
  }
}

Harmony(TypeScript 风格)

namespace functions {
  // Get weather for a city
  type get_weather = (_: {
    city: string,
    unit?: "celsius" | "fahrenheit", // default: celsius
  }) => any;
}

TypeScript 版本不仅更简洁,而且开发人员更熟悉,LLM 也更容易处理。

深入示例 🏊‍♂️

让我们看看这些格式如何处理真实世界的场景,从简单的聊天到使用工具的复杂多步推理。

1. 基本对话(无思考)

最简单的情况——只是提问,没有任何推理步骤。

ChatML (Qwen3)

<|im_start|>user
What's the capital of France?<|im_end|>
<|im_start|>assistant
<think>

</think>

The capital of France is Paris.<|im_end|>

image/png

OpenAI Harmony

<|start|>user<|message|>What's the capital of France?<|end|>
<|start|>assistant<|channel|>final<|message|>The capital of France is Paris.<|return|>

image/png

正在发生什么: 即使禁用了思考,Qwen3 也会包含空的 <think></think> 标签作为结构提示。Harmony 则更直接——它直接跳转到 final 通道以处理面向用户的内容。

2. 带思考的对话

这里是哲学差异变得清晰的地方。

ChatML (Qwen3)

<|im_start|>user
If I have 3 apples and buy 5 more, then give away 2, how many do I have?<|im_end|>
<|im_start|>assistant
<think>
Starting with 3 apples
Adding 5 more: 3 + 5 = 8
Giving away 2: 8 - 2 = 6
So the final answer is 6 apples.
</think>

You would have 6 apples. 

Here's the breakdown: You start with 3 apples, buy 5 more (3 + 5 = 8), then give away 2 (8 - 2 = 6).<|im_end|>

image/png

OpenAI Harmony

<|start|>user<|message|>If I have 3 apples and buy 5 more, then give away 2, how many do I have?<|end|>
<|start|>assistant<|channel|>analysis<|message|>User starts with 3 apples. Buys 5 more: 3 + 5 = 8. Gives away 2: 8 - 2 = 6. Final count is 6 apples.<|end|>
<|start|>assistant<|channel|>final<|message|>You would have 6 apples.

Here's the breakdown: You start with 3 apples, buy 5 more (3 + 5 = 8), then give away 2 (8 - 2 = 6).<|return|>

image/png

有什么不同

  • ChatML 将所有内容保留在一条消息中,并内联思考
  • Harmony 将推理(analysis 通道)和答案(final 通道)分成单独的消息
  • Harmony 的方法使得程序化地从面向用户的内容中过滤推理变得更容易

3. 带思考的多轮对话

这揭示了每种格式如何处理对话历史。

ChatML (Qwen3)

<|im_start|>user
What's 15% of 80?<|im_end|>
<|im_start|>assistant
15% of 80 is 12.

To calculate: 80 × 0.15 = 12<|im_end|>
<|im_start|>user
Now add 25 to that result<|im_end|>
<|im_start|>assistant
<think>
The previous result was 12.
Adding 25: 12 + 25 = 37
</think>

Adding 25 to 12 gives us 37.<|im_end|>

image/png

重要: Qwen3 删除了之前轮次的思考以节省标记!

OpenAI Harmony

<|start|>user<|message|>What's 15% of 80?<|end|>
<|start|>assistant<|channel|>analysis<|message|>Calculate 15% of 80: 80 × 0.15 = 12<|end|>
<|start|>assistant<|channel|>final<|message|>15% of 80 is 12.<|return|>
<|start|>user<|message|>Now add 25 to that result<|end|>
<|start|>assistant<|channel|>analysis<|message|>Previous result was 12. Adding 25: 12 + 25 = 37<|end|>
<|start|>assistant<|channel|>final<|message|>Adding 25 to 12 gives us 37.<|return|>

image/png

历史管理: 两种格式都修剪历史推理,但 Harmony 的通道系统使其更明确和可控。

4. 函数调用

在工具处理方面,两种格式存在显著差异。

ChatML (Qwen3)

<|im_start|>system
# Tools

You may call one or more functions to assist with the user query.

<tools>
{"type": "function", "function": {"name": "get_weather", "description": "Get weather for a city", "parameters": {"type": "object", "properties": {"city": {"type": "string"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}}, "required": ["city"]}}}
</tools>

For each function call, return a json object within <tool_call></tool_call> XML tags.<|im_end|>
<|im_start|>user
What's the weather in Tokyo?<|im_end|>
<|im_start|>assistant
<think>
User wants Tokyo weather. Need to call get_weather function.
</think>

I'll check the current weather in Tokyo for you.

<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo", "unit": "celsius"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"temperature": 22, "condition": "partly cloudy", "humidity": 65}
</tool_response><|im_end|>
<|im_start|>assistant
The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|im_end|>

image/png

OpenAI Harmony

<|start|>developer<|message|># Tools
## functions
namespace functions {
// Get weather for a city
type get_weather = (_: {
  city: string,
  unit?: "celsius" | "fahrenheit", // default: celsius
}) => any;
} // namespace functions<|end|>
<|start|>user<|message|>What's the weather in Tokyo?<|end|>
<|start|>assistant<|channel|>commentary<|message|>I'll check the current weather in Tokyo for you.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather<|constrain|>json<|message|>{"city": "Tokyo", "unit": "celsius"}<|call|>
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>{"temperature": 22, "condition": "partly cloudy", "humidity": 65}<|end|>
<|start|>assistant<|channel|>final<|message|>The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|return|>

image/png

主要区别

  • 工具定义:JSON 模式 vs TypeScript 风格类型
  • 工具响应:ChatML 将其包装为用户消息;Harmony 使用专用工具角色
  • 消息路由:Harmony 的 to= 语法明确显示了组件之间的流向
  • 通道:Harmony 将工具交互(commentary)与最终答案分开

5. 带思维链的函数调用

复杂的场景真正展示了架构上的差异。

ChatML (Qwen3)

<|im_start|>system
# Tools
<tools>
{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}}
{"type": "function", "function": {"name": "get_flight_status", "description": "Check flight status", "parameters": {"type": "object", "properties": {"flight_number": {"type": "string"}}}}}
</tools><|im_end|>
<|im_start|>user
Should I pack an umbrella for my UA902 flight to Tokyo?<|im_end|>
<|im_start|>assistant
<think>
User is asking about packing an umbrella for flight UA902 to Tokyo.
I need to:
1. Check the flight status to confirm it's going to Tokyo
2. Check Tokyo's weather to see if rain is expected
Let me start with the flight status.
</think>

I'll help you decide if you need an umbrella. Let me check your flight details and the weather in Tokyo.

<tool_call>
{"name": "get_flight_status", "arguments": {"flight_number": "UA902"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"flight": "UA902", "destination": "Tokyo (NRT)", "status": "On time", "departure": "14:30"}
</tool_response><|im_end|>
<|im_start|>assistant
<think>
Flight confirmed to Tokyo, departing at 14:30. Now checking weather.
</think>

<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"temperature": 18, "condition": "rainy", "precipitation": "80%"}
</tool_response><|im_end|>
<|im_start|>assistant
<think>
Tokyo weather shows rainy conditions with 80% precipitation. Definitely need an umbrella.
</think>

Yes, you should definitely pack an umbrella! Your UA902 flight to Tokyo is on time (departing at 14:30), and Tokyo is currently experiencing rainy weather with an 80% chance of precipitation. Better to be prepared! ☔<|im_end|>

image/png

OpenAI Harmony

<|start|>developer<|message|># Tools
## functions
namespace functions {
type get_weather = (_: {city: string}) => any;
type get_flight_status = (_: {flight_number: string}) => any;
} // namespace functions<|end|>
<|start|>user<|message|>Should I pack an umbrella for my UA902 flight to Tokyo?<|end|>
<|start|>assistant<|channel|>analysis<|message|>User asking about packing umbrella for UA902 to Tokyo. Need to: 1) Verify flight destination 2) Check Tokyo weather. Starting with flight status.<|end|>
<|start|>assistant<|channel|>commentary<|message|>I'll help you decide if you need an umbrella. Let me check your flight details and the weather in Tokyo.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_flight_status<|constrain|>json<|message|>{"flight_number": "UA902"}<|call|>
<|start|>functions.get_flight_status to=assistant<|channel|>commentary<|message|>{"flight": "UA902", "destination": "Tokyo (NRT)", "status": "On time", "departure": "14:30"}<|end|>
<|start|>assistant<|channel|>analysis<|message|>Flight confirmed to Tokyo, on time at 14:30. Now checking weather conditions.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather<|constrain|>json<|message|>{"city": "Tokyo"}<|call|>
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>{"temperature": 18, "condition": "rainy", "precipitation": "80%"}<|end|>
<|start|>assistant<|channel|>analysis<|message|>Tokyo has rainy conditions with 80% precipitation. Clear recommendation to pack umbrella.<|end|>
<|start|>assistant<|channel|>final<|message|>Yes, you should definitely pack an umbrella! Your UA902 flight to Tokyo is on time (departing at 14:30), and Tokyo is currently experiencing rainy weather with an 80% chance of precipitation. Better to be prepared! ☔<|return|>

image/png

在复杂场景中

  • ChatML 在同一对话回合中保留工具调用之间的思考
  • Harmony 通过单独的 analysis 消息明确跟踪每个推理步骤
  • Harmony 的 commentary 通道可以在工具执行前包含面向用户的“引言”
  • Harmony 中的消息路由(to=)创建了清晰的执行跟踪

这对生态系统意味着什么 🌍

Harmony 的引入代表了我们对模型输出思考方式的重大演变

  1. 基于通道的安全性:不同通道的不同安全标准对于推理模型来说是一个颠覆性变化
  2. 更好的可观察性:明确的路由和通道使得调试复杂的代理行为更容易
  3. 代码优先工具:TypeScript 语法可能成为工具定义的新标准
  4. 结构化推理:在格式层面而不是仅仅约定俗成地将思考与输出分离

主要收获 📌

  • ChatML 更简单,更成熟,并获得广泛的生态系统支持
  • Harmony 引入了强大的新概念,如通道和消息路由
  • 两种格式都处理推理和工具,但采用的理念截然不同
  • Harmony 的 TypeScript 风格定义更简洁,更适合开发人员
  • Harmony 中的通道系统能够对用户看到的内容进行精细控制

随着生态系统的发展,了解这两种格式至关重要。虽然我们无法选择模型使用的格式(只有在训练基础模型时才能选择),但了解它们的工作原理有助于我们构建更好的推理基础设施,并充分利用这些强大的推理模型。

社区

注册登录 发表评论