理解并实现思维树范式

社区文章 发布于 2025 年 3 月 26 日

动机

《思维树》(ToT)论文(Yao 等)展示了如何将大型语言模型(LLMs)的推理能力与启发式引导的树搜索框架相结合。但在深入其实现之前,让我们先了解一下背景。

LLM 旨在进行自回归文本生成。这使得它们在推理过程中局限于令牌级别的、从左到右的决策过程。根据 ToT 论文的作者的说法,这让人联想到

  • 人类思维中的“系统 1”(快速、自动、无意识)模式。
  • 强化学习中联想式的“无模型”范式。

给定正确类型的提示,这种自回归机制会引发思维链(CoT)推理(Wei 等),使 LLM 能够处理需要数学、符号、常识和知识推理的各种任务。

然而,以从左到右的方式生成推理轨迹对于需要探索、战略性前瞻或初始决策起重要作用(因为未来的决策依赖于它们)的任务来说,就显得力不从心了。

在 ToT 论文中,作者提出,从左到右的 CoT 推理可能会受益于启发式引导的树搜索框架的增强。这让人联想到

  • 人类思维中的“系统 2”(慢速、审慎、有意识)模式。
  • 强化学习中审慎的“基于模型”规划范式。

根据作者的说法,这样的系统具有两个关键特征

  1. 能够为当前,即局部(节点级别)决策维护和探索多样化的替代方案。
  2. 能够评估每个节点,并积极向前看或回溯以做出全局决策。

这样的系统将能够考虑多种不同的推理路径,自我评估选择以决定下一步行动,并在必要时向前看或回溯以做出全局选择。

以下是 ToT 范式与其他三种流行推理范式的比较。

我们在本博客文章中的目标如下

  1. 理解思维树 (ToT) 范式。
  2. 实现一个可重用的 TreeOfThoughts 类。

为实现这些目标,我们将依次研究两个任务:《创意写作》和《24点游戏》。通过理解如何在这些任务(一次一个)上应用 ToT,我们将逐步构建我们的可重用类。

注意: ToT 论文还涵盖了三个额外的任务:迷你填字游戏GSM8kStrategyQA。为简洁起见,本博客文章中不予介绍。

设置

我们将需要以下导入:

from openai import OpenAI
from huggingface_hub import InferenceClient
from google.colab import userdata # Only if you're using Colab.
from typing import Union, Optional, List
from collections.abc import Callable
import re
from collections import deque
from IPython.display import display, HTML

我们将尝试使我们的 TreeOfThoughts 类与 OpenAI 的 Chat Completions API 和 Hugging Face 的 Serverless Inference API 兼容。

如果您想使用 OpenAI,可以按照以下方式创建客户端

client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))

ToT 论文在所有实验中都使用了 GPT-4。如果你想重现论文的结果,请坚持使用 OpenAI 选项。

但是,如果你更喜欢使用 Hugging Face,可以按如下方式创建客户端

# client = InferenceClient(provider="hf-inference", api_key=userdata.get('HF_TOKEN'), headers={'x-use-cache': "false"})

注意: 您必须通过传入以下参数来关闭缓存:headers={"x-use-cache": "false"}。否则,您将无法生成 n 个独立同分布 (i.i.d.) 响应。(正如我们将看到的,生成 n 个独立同分布响应是创意写作任务所必需的。)

现在,让我们创建一个带有 chat_completions 方法的简陋 Preliminary 类。

class Preliminary:
    def __init__(self, client: Union[OpenAI, InferenceClient], model: str = "gpt-4"):
        self.client = client
        self.model = model

    # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/models.py
    def chat_completions(
            self,
            prompt: str,
            temperature: float = 0.7,
            max_tokens: int = 1000,
            n: int = 1,
            stop: Optional[List[str]] = None,
            **kwargs
    ) -> List[str]:
        outputs = []
        messages = [{'role': "user", 'content': prompt}]
        if isinstance(self.client, OpenAI):
            response = self.client.chat.completions.create(
                messages=messages,
                model=self.model,
                temperature=temperature,
                max_tokens=max_tokens,
                n=n, # The `n` responses are i.i.d.
                stop=stop,
                **kwargs
            )
            outputs.extend([choice.message.content for choice in response.choices])
        else: # `self.client` is an instance of `InferenceClient`.
            # The Hugging Face API doesn't support the `n` argument. Hence, we need to use a loop to generate `n` i.i.d. responses.
            for _ in range(n):
                response = self.client.chat.completions.create(
                    messages=messages,
                    model=self.model,
                    temperature=temperature,
                    max_tokens=max_tokens,
                    stop=stop,
                    **kwargs
                )
                outputs.append(response.choices[0].message.content)
        return outputs

nstop 参数的描述(来自 OpenAI API 文档

让我们测试一下这个方法。

prelim = Preliminary(client, model="gpt-4")
# prelim = Preliminary(client, model="meta-llama/Meta-Llama-3.1-8B-Instruct")
responses = prelim.chat_completions("Write a haiku about delicious food.", n=2)
for response in responses: # The two responses are i.i.d.
    print(response)
    print("---")
Savoring each bite,
Flavors dance on eager tongues,
Feast of joy and light.
---
Savory delight,
Flavors dance upon my tongue,
Feast in every bite.
---

创意写作

创意写作任务中,LLM 会收到一个包含四个随机句子的输入序列。任务是写一段连贯的文字,其中有四个段落,分别这四个随机句子结尾

注意:上表中的“#ToT 步骤”指的是中间步骤的数量。正如我们稍后将看到的,对于创意写作任务,只有一个中间步骤:生成写作计划。

以下是一个输入序列示例

# Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/data/text/data_100_random_text.txt
input_seq = """1. It isn't difficult to do a handstand if you just stand on your hands.
2. It caught him off guard that space smelled of seared steak.
3. When she didn’t like a guy who was trying to pick her up, she started using sign language.
4. Each person who knows you has a different perception of who you are."""

在深入 ToT 之前,让我们看看如何使用零样本思维链 (CoT) 方法来解决这个问题。

# Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/prompts/text.py
zero_shot_cot_prompt = f"""Write a coherent passage of 4 short paragraphs. The end sentence of each paragraph must be:

{input_seq}

Make a plan then write. Your output should be of the following format:

Plan:
Your plan here.

Passage:
Your passage here.
"""
print(zero_shot_cot_prompt)
Write a coherent passage of 4 short paragraphs. The end sentence of each paragraph must be:

1. It isn't difficult to do a handstand if you just stand on your hands.
2. It caught him off guard that space smelled of seared steak.
3. When she didn’t like a guy who was trying to pick her up, she started using sign language.
4. Each person who knows you has a different perception of who you are.

Make a plan then write. Your output should be of the following format:

Plan:
Your plan here.

Passage:
Your passage here.

注意:您可能想知道以上是如何成为零样本 CoT 提示的。毕竟,著名的句子“让我们一步一步思考。”Kojima et al.)在上面的提示中缺失了。答案是,句子“先制定计划,然后写作。”引发了思维链推理,即生成计划的中间步骤。

responses = prelim.chat_completions(zero_shot_cot_prompt, n=1) # Since we're passing `n=1`, we'll get back only one response.
print(responses[0])
Plan:
My plan is to create a narrative that revolves around an astronaut's experience in space. The first paragraph will include the astronaut's training before the mission, focusing on physical fitness and particularly handstands. In the second paragraph, the astronaut will finally be in space and be surprised by the smell. The third paragraph will introduce a flashback of the astronaut's unique way of dealing with unwanted attention before the mission. The last paragraph will reflect on how these experiences shape different people's perceptions of the astronaut.

Passage:
As a child, John always had a knack for gymnastics. He was more comfortable in the world upside down, doing handstands and cartwheels, than others were walking on their two feet. His skills would later prove to be beneficial in his career as an astronaut, where physical fitness was a top priority. As he trained for his first mission, he found comfort in the old familiarity of handstands, an exercise that was part of their zero-gravity training. He would often tell his colleagues, "It isn't difficult to do a handstand if you just stand on your hands."

When John finally made it to space, it was nothing like he expected. The zero-gravity, the silence, and the view were all breathtaking. But what caught his attention the most was the smell. After removing his helmet inside the spacecraft, a strong, strange aroma filled his nostrils. It was almost like... seared steak. It caught him off guard that space smelled of seared steak.

Back on Earth, John was a quiet and reserved man. He disliked the attention he got from being an astronaut, especially from women who were more interested in his status than him. He found a clever way to deal with unwanted advances. He remembered a friend who was deaf and communicated through sign language. When she didn’t like a guy who was trying to pick her up, she started using sign language.

To his colleagues, John was a strong and capable astronaut. To the women he rebuffed, he was a strange man who suddenly became mute. To his old gymnastics coach, he was a talented gymnast who could've won medals. Each person had a different story about John, a different perception. Each person who knows you has a different perception of who you are.

LLM 生成了一个计划,然后是一段文字。

接下来,我们来看看 stop 参数的作用。

responses = []
stop_string = 'Passage:'
for step in range(1, 3):
    if step == 1:
        response = prelim.chat_completions(zero_shot_cot_prompt, n=1, stop=[stop_string])[0]
    else:
        response = prelim.chat_completions(zero_shot_cot_prompt, n=1)[0]
    responses.append(response)
    print(f"Step {step} output:\n---")
    print(response)
    print("---\n~~~")
Step 1 output:
---
Plan:
1. Introduce the main character, a gymnast, and describe their training routine.
2. Transition to the main character's dream of being an astronaut, introducing the unexpected aspects of that experience.
3. Introduce a secondary character and their attempts to flirt with the main character, and the main character's unique way of avoiding it.
4. Discuss the overall theme of individual perception and how it applies to the main character.


---
~~~
Step 2 output:
---
Plan:
1. Discuss the simplicity in achieving a seemingly complex task.
2. Introduce a character who is an astronaut and describe his surprising experience in space.
3. Introduce a female character with an unconventional approach to warding off unwanted attention.
4. Conclude with a philosophical reflection on identity and perception.

Passage:
In life, many tasks may seem daunting at first, but upon closer inspection, they are often much simpler than they first appear. A common example of this is a handstand. To many, the idea of balancing one's entire body weight on their hands seems nearly impossible. But when you break it down, it's all about finding your center of gravity and pushing off with the right amount of force. It isn't difficult to do a handstand if you just stand on your hands.

A similar principle can be applied to Robert, an astronaut who had trained for years to venture into the unknown of space. He had prepared for every possible scenario, or so he thought. There was one aspect of space that he hadn't expected. He was aware of the silence, the darkness, and the weightlessness, but he hadn't predicted the smell. It caught him off guard that space smelled of seared steak.

Meanwhile, back on Earth, a woman named Sarah was dealing with her own set of unique circumstances. Sarah had a knack for attracting attention, especially from men she had no interest in. Over the years, she developed a unique way of dissuading these unwanted suitors. Instead of simply telling them she wasn't interested, she would start communicating only in sign language. When she didn’t like a guy who was trying to pick her up, she started using sign language.

These three stories may seem disconnected, but they all highlight the individuality and unique perception inherent in every person. Each person's experiences and actions shape how others perceive them, and no two perceptions are exactly alike. Just as Sarah used sign language to express her disinterest, Robert was surprised by the smell of space, and you may find handstands easy once you try, people's perceptions of you are shaped by their own unique experiences and interpretations. Each person who knows you has a different perception of who you are.
---
~~~

事情是这样的

  • 在步骤 1 中,我们传入了停止字符串 'Passage:'。一旦 LLM 生成了此停止字符串,文本生成便停止了。传入此停止字符串允许我们仅生成计划。
  • 在步骤 2 中,我们没有传入任何停止字符串。因此,LLM 生成了一个计划和一段文字。

但是请注意,步骤 2 中的计划是从头开始生成的。然而,在使用 ToT 时,这并非我们所愿。相反,我们希望步骤 2 利用步骤 1 中生成的计划。我们该如何实现呢?

嗯,我们需要维护状态。但是状态是什么?

一个状态就是到目前为止所有生成的想法的集合。实际上,它是到目前为止所有想法的串联(用'\n'分隔)。

我们需要创建一个可调用函数,通过将状态附加到基本提示来动态生成提示。

def get_thought_gen_prompt(input_seq: str, state: str) -> str:
    """Get thought generation prompt.

    Keyword arguments:
    input_seq -- the input sequence
    state -- concatenation of all the thoughts so far (separated by '\n')
    """
    # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/prompts/text.py
    base_prompt = f"""Write a coherent passage of 4 short paragraphs. The end sentence of each paragraph must be:

{input_seq}

Make a plan then write. Your output should be of the following format:

Plan:
Your plan here.

Passage:
Your passage here.
"""
    if state == '': # Root node; no thoughts have been generated yet.
        return base_prompt
    else:
        return base_prompt + '\n' + state

现在,让我们模拟在步骤 1 中生成计划,然后在步骤 2 中生成段落,其中步骤 2 的提示利用了步骤 1 的状态。

states = ['']
thoughts = ['']
n_steps = 2 # 1 intermediate step + 1 output generation step.
for step in range(1, n_steps + 1):
    prompt = get_thought_gen_prompt(input_seq, states[-1])
    print(f"Step {step} prompt:\n---")
    print(f"{prompt}\n---")
    if step == 1:
        thought = prelim.chat_completions(prompt, n=1, stop=[stop_string])[0]
    else:
        thought = prelim.chat_completions(prompt, n=1)[0]
    thoughts.append(thought)
    if states[-1] == '':
        updated_state = thought
    else:
        updated_state = states[-1] + '\n' + thought
    states.append(updated_state)
    print(f"Step {step} updated state:\n---")
    print(states[-1])
    print("---\n~~~")
Step 1 prompt:
---
Write a coherent passage of 4 short paragraphs. The end sentence of each paragraph must be:

1. It isn't difficult to do a handstand if you just stand on your hands.
2. It caught him off guard that space smelled of seared steak.
3. When she didn’t like a guy who was trying to pick her up, she started using sign language.
4. Each person who knows you has a different perception of who you are.

Make a plan then write. Your output should be of the following format:

Plan:
Your plan here.

Passage:
Your passage here.

---
Step 1 updated state:
---
Plan:
In the first paragraph, I'll introduce a character who is a gymnast. The second paragraph will shift to this character's dream of being an astronaut, and the surprising revelation he has while in space. The third paragraph will introduce a new character, a woman who cleverly avoids unwanted attention. The final paragraph will tie the two characters together, exploring their perspectives of each other.


---
~~~
Step 2 prompt:
---
Write a coherent passage of 4 short paragraphs. The end sentence of each paragraph must be:

1. It isn't difficult to do a handstand if you just stand on your hands.
2. It caught him off guard that space smelled of seared steak.
3. When she didn’t like a guy who was trying to pick her up, she started using sign language.
4. Each person who knows you has a different perception of who you are.

Make a plan then write. Your output should be of the following format:

Plan:
Your plan here.

Passage:
Your passage here.

Plan:
In the first paragraph, I'll introduce a character who is a gymnast. The second paragraph will shift to this character's dream of being an astronaut, and the surprising revelation he has while in space. The third paragraph will introduce a new character, a woman who cleverly avoids unwanted attention. The final paragraph will tie the two characters together, exploring their perspectives of each other.


---
Step 2 updated state:
---
Plan:
In the first paragraph, I'll introduce a character who is a gymnast. The second paragraph will shift to this character's dream of being an astronaut, and the surprising revelation he has while in space. The third paragraph will introduce a new character, a woman who cleverly avoids unwanted attention. The final paragraph will tie the two characters together, exploring their perspectives of each other.


Passage:
Matthew had always been nimble, even as a boy. He had a knack for gymnastics, and his specialty was doing handstands. He would often say to his friends who marveled at his skill, "It's all about balance and strength. It isn't difficult to do a handstand if you just stand on your hands."

As Matthew grew older, his passion for gymnastics remained, but his dreams reached for the stars. He wanted to become an astronaut. He trained relentlessly, and finally, he found himself floating in the weightlessness of space. The first time he took off his helmet inside the spaceship, he was taken aback. It caught him off guard that space smelled of seared steak.

Back on earth, there was a woman named Emily. She was vibrant and witty, but often found herself the target of unwanted attention. Whenever a man tried to harass her or make her uncomfortable, she had a trick up her sleeve. When she didn’t like a guy who was trying to pick her up, she started using sign language.

Emily and Matthew were friends. They had met at a mutual friend's party and hit it off. Emily saw Matthew as a dreamer, always reaching for the stars, while Matthew saw Emily as a quick-witted, independent woman. They had different perceptions of each other, but that wasn't strange. After all, each person who knows you has a different perception of who you are.
---
~~~

奏效了!

现在,让我们深入 ToT。节点定义如下:

class TreeNode:
    def __init__(self, state: str, thought: str, value: float = None):
        self.state = state
        self.thought = thought
        self.value = value
        self.children = []

我们将使用多叉树数据结构实现 ToT。换句话说,每个节点允许有两个以上的子节点。因此,children 属性是一个列表。(注意:虽然 ToT 并不严格要求使用显式数据结构,但它有助于理解算法和可视化树。)

但到底什么是思想?根据论文:

CoT 在没有明确分解的情况下连贯地采样思想,而 ToT 利用问题属性来设计和分解中间思想步骤。

换句话说,我们需要精确地定义什么是中间思想,什么是输出。在创意写作任务中,中间思想是写作计划。而输出是段落...

如前所述,状态只是迄今为止所有想法的串联(用 '\n' 分隔)。

是分配给特定状态的启发式值。值用于修剪不具前景的节点。

对于创意写作任务,采用了广度优先搜索 (BFS) 算法的定制版本。其工作原理如下:

  1. 执行从 root 节点开始。在这里,思想是一个空字符串,状态也是空字符串(因为尚未生成任何思想)。根节点可以认为是树的第 0 层。
  2. 现在,是步骤 1 的时候了。一个思想生成器用于生成 n_candidates 个独立同分布的中间思想(计划)。这些思想中的每一个都是根节点的子节点。这些节点共同构成了树的第 1 层。(对于创意写作任务,作者选择将 n_candidates 设置为 5。)
  3. 一个状态评估器用于对这些计划投票 n_evals 次。(对于创意写作任务,作者选择将 n_evals 设置为 5)。
  4. 启发式计算器用于整理这些投票,并为每个计划分配一个启发式值。(启发式值就是计划获得的总票数。)
  5. 是时候进行剪枝了。参数 breadth_limit 指的是在树的每个级别保留最具前景的状态数量(剪枝后)。(对于创意写作任务,作者选择将 breadth_limit 设置为 1。因此,只保留最佳计划。)
  6. 现在,是步骤 2 的时候了。在此步骤中,执行过程与上述第 2、3、4 和 5 点完全相同。换句话说,思想生成器生成 5 个输出(段落)。这些节点共同构成了树的第 2 层。状态评估器对它们投票 5 次。启发式计算器汇总投票,并为每个节点分配一个值。剪枝用于保留获胜的段落。

以下是上述内容的图解摘要

对于创意写作任务,所使用的思想生成策略是 'sample'。这意味着以独立同分布的方式对 n_candidates 个思想进行采样。摘自论文:

当思想空间丰富(例如,每个思想是一个段落)且独立同分布样本能带来多样性时,此策略效果更好。

(我们将在“24点游戏”任务中看到不同的思想生成策略:'propose'。稍后详细介绍...)

对于创意写作任务,采用的状态评估策略是'vote'。引用论文中的说法:

当问题成功难以直接评估时(例如,段落连贯性),自然地会比较不同的部分解决方案并投票选出最有前途的一个。

(我们将在“24点游戏”任务中看到另一种状态评估策略:'value'。稍后详细介绍...)

事实证明,状态评估器就是 LLM 本身。然而,它(显然)需要一个与思想生成器不同的提示。状态评估提示由以下可调用函数给出

def get_state_eval_prompt(input_seq: str, states: List[str]) -> str:
    """Get state evaluation prompt.

    Keyword arguments:
    input_seq -- the input sequence
    states -- the states to vote on
    """
    # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/prompts/text.py
    vote_prompt = '''Given an instruction and several choices, decide which choice is most promising. Analyze each choice in detail, then conclude in the last line "The best choice is {s}", where s the integer id of the choice.'''
    instruction = f"""Write a coherent passage of 4 short paragraphs. The end sentence of each paragraph must be:

{input_seq}

Make a plan then write. Your output should be of the following format:

Plan:
Your plan here.

Passage:
Your passage here.
"""
    prompt = vote_prompt + '\n\nInstruction:\n' + instruction + '\n'
    # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/tasks/text.py
    for i, state in enumerate(states, start=1):
        prompt += f'Choice {i}:\n{state}\n'
    return prompt

如果以上函数看起来不清楚,请不用担心。我们将在下面详细检查状态评估提示。

对于创意写作任务,启发式计算器由以下可调用函数给出

# Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/tasks/text.py
def heuristic_calculator(states: List[str], state_evals: List[str]) -> List[int]:
    n_candidates = len(states)
    vote_results = [0] * n_candidates
    for j in range(len(state_evals)):
        pattern = r".*best choice is .*(\d+).*"
        match = re.match(pattern, state_evals[j], re.DOTALL)
        if match:
            vote = int(match.groups()[0]) - 1
            if vote in range(n_candidates):
                vote_results[vote] += 1
        else:
            print(f'Warning! Did not get a regex match for the following state evaluation:\n{state_evals[j]}')
    return vote_results

同样,如果上述函数看起来晦涩难懂,请不要担心。我们将在下面详细探讨它。

目前,我们继续编写用于“创意写作任务”的 TreeOfThoughts 类。它包含以下方法

  • __初始化__
  • 聊天完成
  • 思想生成器
  • 状态评估器
  • 广度优先搜索
  • generate_html_tree(用于生成树的 HTML 表示的实用程序)
  • render_html_tree(用于绘制树的 HTML 表示的实用程序)
class TreeOfThoughts:
    def __init__(
            self,
            client: Union[OpenAI, InferenceClient],
            model: str,
            input_seq: str,
            get_thought_gen_prompt: Callable,
            get_state_eval_prompt: Callable,
            heuristic_calculator: Callable
    ):
        self.client = client
        self.model = model # e.g., "gpt-4" if using `OpenAI` and "meta-llama/Meta-Llama-3.1-8B-Instruct" if using `InferenceClient`.
        self.input_seq = input_seq # Note: `input_seq` contains the input sequence ("x" in the ToT paper), before wrapping it with a prompt.
        self.root = TreeNode(state='', thought='')
        self.n_steps = 2 # 1 intermediate step + 1 output generation step.
        # Note: The tree height is equal to `n_steps + 1`. That is, we include the root node when calculating the tree height.
        self.thought_gen_strategy = 'sample'
        self.get_thought_gen_prompt = get_thought_gen_prompt
        self.n_candidates = 5 # The number of candidates (thoughts) to generate from a particular node. Also referred to as "size limit" and "k" in the ToT paper.
        self.stop_string = 'Passage:'
        self.state_eval_strategy = 'vote'
        self.get_state_eval_prompt = get_state_eval_prompt
        self.n_evals = 5 # The number of times to vote on the states.
        self.heuristic_calculator = heuristic_calculator
        self.breadth_limit = 1 # The number of most promising states to retain (after pruning) - at each level of the tree.

    # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/models.py
    def chat_completions(
            self,
            prompt: str,
            temperature: float = 0.7,
            max_tokens: int = 1000,
            n: int = 1,
            stop: Optional[List[str]] = None,
            **kwargs
    ) -> List[str]:
        outputs = []
        messages = [{'role': "user", 'content': prompt}]
        if isinstance(self.client, OpenAI):
            response = self.client.chat.completions.create(
                messages=messages,
                model=self.model,
                temperature=temperature,
                max_tokens=max_tokens,
                n=n, # The `n` responses are i.i.d.
                stop=stop,
                **kwargs
            )
            outputs.extend([choice.message.content for choice in response.choices])
        else: # `self.client` is an instance of `InferenceClient`.
            # The Hugging Face API doesn't support the `n` argument. Hence, we need to use a loop to generate `n` i.i.d. responses.
            for _ in range(n):
                response = self.client.chat.completions.create(
                    messages=messages,
                    model=self.model,
                    temperature=temperature,
                    max_tokens=max_tokens,
                    stop=stop,
                    **kwargs
                )
                outputs.append(response.choices[0].message.content)
        return outputs

    def thought_generator(self, state: str, stop_string: Optional[List[str]] = None) -> List[str]:
        if self.thought_gen_strategy == 'sample':
            prompt = self.get_thought_gen_prompt(self.input_seq, state)
            thoughts = self.chat_completions(prompt, n=self.n_candidates, stop=stop_string)
            return thoughts
        else: # `self.thought_gen_strategy` is equal to 'propose'.
            pass

    def state_evaluator(self, states: List[str]) -> List[float]:
        if self.state_eval_strategy == 'vote':
            prompt = self.get_state_eval_prompt(self.input_seq, states)
            state_evals = self.chat_completions(prompt, n=self.n_evals)
            vote_results = self.heuristic_calculator(states, state_evals)
            return vote_results
        else: # `self.state_eval_strategy` is equal to 'value'.
            pass

    # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/methods/bfs.py
    def bfs(self, verbose: bool = True) -> str:
        queue = deque()
        queue.append(self.root)

        for step in range(1, self.n_steps + 1):
            if verbose:
                print(f"Step {step} (corresponding to level {step} of the tree):-\n---")
            for i in range(len(queue)):
                node = queue.popleft()
                if verbose:
                    print(f"Node {i + 1} in level {step}:-")
                    if node.state != "":
                        print(f"State of current node:-\n{node.state}\n---")
                    else:
                        print("State of current node:-\n<EMPTY STRING> (root node; no thoughts generated yet)\n---")

                if step == 1:
                    thoughts = self.thought_generator(state=node.state, stop_string=[self.stop_string])
                else:
                    thoughts = self.thought_generator(state=node.state)
                if node.state == '':
                    updated_states = thoughts
                else:
                    updated_states = [node.state + '\n' + thought for thought in thoughts]
                for j in range(len(thoughts)):
                    if verbose:
                        print(f"Thought candidate {j + 1}:-\n{thoughts[j]}\n---")
                    child = TreeNode(state=updated_states[j], thought=thoughts[j])
                    node.children.append(child)
                    queue.append(child)
                if verbose:
                    print("Each of the above thought candidates has been added as a child of the current node.\n---")

            if verbose:
                print("Using the state evaluator to obtain values...\n---")
            states = [node.state for node in queue]
            values = self.state_evaluator(states=states)
            for i in range(len(queue)):
                queue[i].value = values[i]
                if verbose:
                    print(f"Element {i + 1} in queue:-\n")
                    print(f"Value: {queue[i].value}\n---")

            if verbose:
                print("Initiating pruning (using the values obtained from the state evaluator).")
                print(f"Number of elements in queue: {len(queue)}")
            sorted_nodes = sorted(queue, key=lambda node: node.value, reverse=True)
            if step == self.n_steps:
                if verbose:
                    print("Since this is the last step, setting the breadth limit to 1.")
                    print("In other words, retaining only the highest value element (in this last step).\n---")
                top_b_nodes = sorted_nodes[:1]
            else:
                if verbose:
                    print(f"Since this isn't the last step, leaving the breadth limit {self.breadth_limit} unchanged.\n---")
                top_b_nodes = sorted_nodes[:self.breadth_limit]
            top_b_states = [node.state for node in top_b_nodes]
            for i in range(len(queue)):
                node = queue.popleft()
                if verbose:
                    print(f"Element {i + 1} in queue:-\n")
                if node.state in top_b_states:
                    if verbose:
                        print(f"Retaining this element as it's in the top {len(top_b_states)} elements.\n---")
                    queue.append(node)
                else:
                    if verbose:
                        print(f"Dropping this element as it's not in the top {len(top_b_states)} elements.\n---")

            if verbose:
                print("~~~")

        # Return the thought of the highest value node (from the last step):
        node = queue.popleft()
        return node.thought

    def generate_html_tree(self, node: TreeNode) -> str:
        if node is None:
            return ""
        else:
            html = f"""<div class='node'>
<p>State:<br>{node.state}</p>
<hr>
<p>Thought:<br>{node.thought}</p>
<hr>
<p>Value:<br>{node.value}</p>"""
            for child in node.children:
                html += f"""<div class='child'>{self.generate_html_tree(child)}</div>"""
            html += """</div>"""
            return html

    def render_html_tree(self):
        html_tree = self.generate_html_tree(self.root)
        wrapped_html = f"""<!DOCTYPE html>
<html>
<head>
    <style>
        .node {{
            display: inline-block;
            border: 1px solid blue;
            padding: 10px;
            margin: 5px;
            text-align: center;
        }}
        .child {{
            display: flex;
        }}
    </style>
</head>
<body>
    {html_tree}
</body>
</html>"""
        display(HTML(wrapped_html))

让我们实例化我们的类。

tot = TreeOfThoughts(client, "gpt-4", input_seq, get_thought_gen_prompt, get_state_eval_prompt, heuristic_calculator)

但在我们运行 BFS 算法之前,让我们稍微慢一点,在第一步中模拟思想生成。

state = tot.root.state
thoughts = tot.thought_generator(state=state, stop_string=[tot.stop_string])
if state == '':
    updated_states = thoughts
else:
    updated_states = [state + '\n' + thought for thought in thoughts]
len(updated_states)
5

已经生成了 5 个思想。让我们看看它们。

for j in range(len(thoughts)):
    print(f"Thought candidate {j + 1}:-")
    print(thoughts[j])
    print("---\n")
Thought candidate 1:-
Plan:
1. Introduce a young boy learning acrobats and his ease in performing handstands.
2. Transition to a different character, an astronaut who experiences the strange smell of space.
3. Shift the narrative to a woman in a bar who uses sign language to ward off unwanted attention.
4. Conclude with a reflection on the varying perceptions of these characters by the people in their lives.


---

Thought candidate 2:-
Plan:
1. The first paragraph will detail the narrator's attempt at learning how to do a handstand. 
2. The second paragraph will transition to the narrator's dream of becoming an astronaut and his experiences in a simulated environment.
3. The third paragraph will delve into a romantic situation involving a woman who uses sign language as a way to deflect unwanted attention.
4. The fourth paragraph will reflect on the different perceptions people have of the narrator, in light of his experiences and actions.


---

Thought candidate 3:-
Plan:
1. Begin with a discussion on a gymnastics class, focusing on the instructor teaching how to do a handstand.
2. Transition to a character's first experience in space, with an unexpected sensory experience.
3. Introduce a new character who has a unique way of dealing with unwanted attention.
4. Conclude with a reflection on the nature of personal identity and perception.


---

Thought candidate 4:-
Plan:
In this passage, I will begin by talking about a gymnastics class where the instructor is teaching how to do a handstand. Then, the passage will transition to a man who is experiencing space for the first time and is surprised by what he senses. The third paragraph will introduce a woman who has a unique way of dealing with unwanted attention. Lastly, I will conclude with a commentary on how everyone has their own unique perception of us.


---

Thought candidate 5:-
Plan:
1. Introduce a character who is trying to learn a handstand.
2. Transition to the character's dream of becoming an astronaut, leading to a surprising fact about space.
3. Introduce another character, a woman, who has developed an interesting strategy to avoid unwanted advances.
4. Conclude with a reflection on the nature of perception and identity.


---

接下来,正如承诺的那样,我们将仔细检查状态评估提示。

prompt = tot.get_state_eval_prompt(tot.input_seq, updated_states)
print(prompt)
Given an instruction and several choices, decide which choice is most promising. Analyze each choice in detail, then conclude in the last line "The best choice is {s}", where s the integer id of the choice.

Instruction:
Write a coherent passage of 4 short paragraphs. The end sentence of each paragraph must be:

1. It isn't difficult to do a handstand if you just stand on your hands.
2. It caught him off guard that space smelled of seared steak.
3. When she didn’t like a guy who was trying to pick her up, she started using sign language.
4. Each person who knows you has a different perception of who you are.

Make a plan then write. Your output should be of the following format:

Plan:
Your plan here.

Passage:
Your passage here.

Choice 1:
Plan:
1. Introduce a young boy learning acrobats and his ease in performing handstands.
2. Transition to a different character, an astronaut who experiences the strange smell of space.
3. Shift the narrative to a woman in a bar who uses sign language to ward off unwanted attention.
4. Conclude with a reflection on the varying perceptions of these characters by the people in their lives.


Choice 2:
Plan:
1. The first paragraph will detail the narrator's attempt at learning how to do a handstand. 
2. The second paragraph will transition to the narrator's dream of becoming an astronaut and his experiences in a simulated environment.
3. The third paragraph will delve into a romantic situation involving a woman who uses sign language as a way to deflect unwanted attention.
4. The fourth paragraph will reflect on the different perceptions people have of the narrator, in light of his experiences and actions.


Choice 3:
Plan:
1. Begin with a discussion on a gymnastics class, focusing on the instructor teaching how to do a handstand.
2. Transition to a character's first experience in space, with an unexpected sensory experience.
3. Introduce a new character who has a unique way of dealing with unwanted attention.
4. Conclude with a reflection on the nature of personal identity and perception.


Choice 4:
Plan:
In this passage, I will begin by talking about a gymnastics class where the instructor is teaching how to do a handstand. Then, the passage will transition to a man who is experiencing space for the first time and is surprised by what he senses. The third paragraph will introduce a woman who has a unique way of dealing with unwanted attention. Lastly, I will conclude with a commentary on how everyone has their own unique perception of us.


Choice 5:
Plan:
1. Introduce a character who is trying to learn a handstand.
2. Transition to the character's dream of becoming an astronaut, leading to a surprising fact about space.
3. Introduce another character, a woman, who has developed an interesting strategy to avoid unwanted advances.
4. Conclude with a reflection on the nature of perception and identity.

有了这个提示,我们可以在步骤 1 中模拟状态评估。

state_evals = tot.chat_completions(prompt, n=tot.n_evals)
for i, eval in enumerate(state_evals, start=1):
    print(f"Vote {i}:")
    print("---")
    print(eval)
    print("---\n~~~")
Vote 1:
---
Analysis:

Choice 1: This plan provides a clear and logical structure that will allow for smooth transitions between each paragraph and incorporates all the given sentences. It does not, however, maintain a single point of view or theme between the paragraphs, which may lead to a disjointed narrative.

Choice 2: This plan maintains a consistent narrative perspective, focusing on the experiences of a single narrator. This will likely result in a more coherent narrative, but it may be challenging to convincingly incorporate the given sentences into this narrative.

Choice 3: Like choice 1, this plan provides a clear and logical structure but does not maintain a single point of view or theme between the paragraphs. This may lead to a disjointed narrative.

Choice 4: This plan is similar to choices 1 and 3, but it provides a slightly more detailed outline of what each paragraph will discuss. This may help in ensuring a smooth transition between each paragraph.

Choice 5: This plan introduces two characters and maintains a consistent theme of perception and identity throughout. This may result in a more coherent narrative than choices 1, 3, and 4, but the transition between the first two paragraphs may be challenging.

The best choice is 2.
---
~~~
Vote 2:
---
Analyzing each choice:

Choice 1: The plan is well-structured and clearly addresses the sentence prompts. Using different characters for each paragraph can make the passage a bit disjointed, but it does offer a diverse range of scenarios.

Choice 2: This plan effectively integrates the sentence prompts into a single narrative. The transitions between the paragraphs are smooth and the passage maintains a coherent focus on the narrator.

Choice 3: Similar to Choice 1, this plan uses different characters for each paragraph. While it addresses the sentence prompts, the transitions between the paragraphs may be abrupt.

Choice 4: This plan is similar to Choice 3 but lacks the specificity and clear transitions that make a passage coherent and engaging.

Choice 5: This plan also integrates the sentence prompts into a single narrative. However, the transitions between paragraphs are not as smoothly outlined as in Choice 2, which may affect the coherence of the passage.

The best choice is 2.
---
~~~
Vote 3:
---
Analysis:

Choice 1: This plan seems well-structured and the transitions between paragraphs seem logical. However, it might be challenging to tie all these characters together coherently in just four paragraphs.

Choice 2: This plan follows the narrator through different stages of his life, which can make the passage more coherent and relatable. However, it might be hard to smoothly transition from a gymnastics class to space simulation.

Choice 3: This plan also introduces too many characters and it might be difficult to tie them all in a coherent story. However, it does follow the instructions closely.

Choice 4: This plan is similar to choice 3, but it lacks the detail that might make the passage engaging and interesting. It does follow the instructions closely.

Choice 5: This plan seems to introduce less characters and might be easier to execute in a coherent, engaging passage. It also follows the instructions closely.

The best choice is 5.
---
~~~
Vote 4:
---
Analyzing each choice:

Choice 1:
This plan offers a clear transition between the different characters and their experiences. It does not simply focus on one character, but rather tells the stories of several characters, providing a more diverse and interesting narrative. The conclusion brings all the narratives together, reflecting on the varying perceptions of these characters.

Choice 2:
This plan focuses on one character, the narrator, who experiences all the situations mentioned in the end sentences. This could provide a more in-depth exploration of a single character, but it may also limit the variety of experiences and perspectives.

Choice 3:
This plan offers a variety of experiences and perspectives, similar to Choice 1. However, it doesn't clearly specify how it will connect these different experiences and perspectives in the conclusion, which might lead to a less coherent narrative.

Choice 4:
This plan is very similar to Choice 3, but it does offer a clear conclusion that ties everything together. It also specifies that it will introduce a commentary on personal perception, which may provide a deeper exploration of the theme.

Choice 5:
This plan also offers a variety of experiences and characters, similar to Choice 1 and 3. However, it doesn't clearly specify how it will connect these different experiences and perspectives in the conclusion, which might lead to a less coherent narrative.

The best choice is 1. It provides a coherent plan for introducing multiple characters and experiences, while ensuring that these are tied together in the conclusion. It also offers the opportunity to explore a variety of perspectives, which may lead to a richer narrative.
---
~~~
Vote 5:
---
Analyzing each choice:

Choice 1: This plan is good as it gives a clear layout of the story. However, it may lack coherence as it jumps between three different characters. The transition between these characters might be difficult to make smoothly.

Choice 2: This plan maintains the same character throughout the passage, ensuring better coherence. It provides a clear transition between each paragraph and maintains a consistent narrative voice.

Choice 3: This plan is similar to Choice 1, with its use of different characters, which could potentially disrupt the coherence of the passage. It also doesn't clearly explain how the story will transition between characters.

Choice 4: This plan is also similar to Choice 1, but it lacks the explicit detail of how the story will transition between characters. It may also disrupt the coherence due to the abrupt shifts between characters.

Choice 5: This plan is also similar to Choice 1, and it suffers from the same potential issues. It does not clearly explain how the story will transition between characters.

The best choice is 2.
---
~~~

接下来,正如承诺的那样,我们将详细检查启发式计算器。

一个包含所有零的数组被初始化如下:

n_candidates = len(updated_states)
vote_results = [0] * n_candidates
vote_results
[0, 0, 0, 0, 0]

然后,使用以下循环计算投票。在每次迭代中,使用正则表达式查找 LLM 投票给哪个选择。

for j in range(len(state_evals)):
    pattern = r".*best choice is .*(\d+).*"
    match = re.match(pattern, state_evals[j], re.DOTALL)
    if match:
        vote = int(match.groups()[0]) - 1
        if vote in range(n_candidates):
            vote_results[vote] += 1
    else:
        print(f'Warning! Did not get a regex match for the following state evaluation:\n\n{state_evals[j]}')
vote_results
[1, 3, 0, 0, 1]

那么剪枝呢?它是如何工作的?

我们已经使用队列实现了 BFS 算法。假设我们的队列包含以下对象(每个对象代表一个节点)。

queue = deque()
queue.append({'state': "q", 'value': 1})
queue.append({'state': "t", 'value': 5})
queue.append({'state': "w", 'value': 2})
queue.append({'state': "r", 'value': 4})
queue.append({'state': "e", 'value': 3})

假设我们选择的 breadth_limit3。换句话说,我们希望保留值最高的 3 个节点。

breadth_limit = 3
top_b_nodes = sorted(queue, key=lambda node: node['value'], reverse=True)[:breadth_limit]
top_b_nodes
[{'state': 't', 'value': 5},
 {'state': 'r', 'value': 4},
 {'state': 'e', 'value': 3}]

从上述内容,我们可以创建一个包含前 3 个状态的列表。

top_b_states = [node['state'] for node in top_b_nodes]
top_b_states
['t', 'r', 'e']

现在,我们将使用一个循环来出队(popleft)每个节点。如果该节点的状态在 top_b_states 中,我们将其重新入队(append)。

for i in range(len(queue)):
    node = queue.popleft()
    if node['state'] in top_b_states:
        queue.append(node)

for node in queue:
    print(node)
{'state': 't', 'value': 5}
{'state': 'r', 'value': 4}
{'state': 'e', 'value': 3}

剪枝模拟完成!(上述剪枝逻辑是 bfs 方法的一部分。)

最后,让我们实际调用 bfs 方法。通过传入 verbose=True,我们可以观察 BFS 算法的运行情况。

output = tot.bfs(verbose=True)
print(output)
Step 1 (corresponding to level 1 of the tree):-
---
Node 1 in level 1:-
State of current node:-
<EMPTY STRING> (root node; no thoughts generated yet)
---
Thought candidate 1:-
Plan:
1. Introduce a character who likes to do handstands in his spare time and how he has mastered this skill.
2. The same character gets the opportunity to go to space and his surprising discovery there.
3. Introduce a female character who has her own unique way of dealing with unwanted attention.
4. Discuss how every individual has their own perception of a person based on their interactions and experiences with them.


---
Thought candidate 2:-
Plan:
In the first paragraph, introduce a gymnastics class where a trainer is teaching learners how to do a handstand. In the second paragraph, shift to the story of an astronaut on his first space mission. In the third paragraph, introduce a woman with a unique strategy to avoid unwanted attention in a bar. Finally, in the fourth paragraph, discuss how different people have different perceptions of the same person, reflecting on the earlier characters.


---
Thought candidate 3:-
Plan:
1. Introduce the protagonist's experience with gymnastics and how he found the handstand easy.
2. Shift to the protagonist's experience as an astronaut and his surprise at the smell of space.
3. Introduce a female character and her clever tactic to deal with unwanted attention.
4. Discuss the idea of perceptions and how it varies from person to person.


---
Thought candidate 4:-
Plan:
The first paragraph will establish the context of a gymnastics class where the protagonist is learning to do a handstand. The second paragraph will transition to a discussion about the protagonist's interests, specifically his fascination with space travel. The third paragraph will introduce a new character, who is a friend of the protagonist and has a unique way of deflecting unwanted attention. The last paragraph will elaborate on the protagonist's reflections about individual perceptions and how they shape our identity.


---
Thought candidate 5:-
Plan:
In this passage, we will start by introducing an adventurous protagonist who is always up for a challenge and loves to learn, using the example of a handstand as a metaphor for his approach to life. We will then move into the protagonist's journey into becoming an astronaut, where he experiences the unexpected smell of space. In the third paragraph, we will introduce a love interest who has a unique way of warding off unwanted suitors. The final paragraph will delve into the protagonist's introspective revelation about the nature of identity and perception.


---
Each of the above thought candidates has been added as a child of the current node.
---
Using the state evaluator to obtain values...
---
Element 1 in queue:-

Value: 0
---
Element 2 in queue:-

Value: 0
---
Element 3 in queue:-

Value: 0
---
Element 4 in queue:-

Value: 1
---
Element 5 in queue:-

Value: 4
---
Initiating pruning (using the values obtained from the state evaluator).
Number of elements in queue: 5
Since this isn't the last step, leaving the breadth limit 1 unchanged.
---
Element 1 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 2 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 3 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 4 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 5 in queue:-

Retaining this element as it's in the top 1 elements.
---
~~~
Step 2 (corresponding to level 2 of the tree):-
---
Node 1 in level 2:-
State of current node:-
Plan:
In this passage, we will start by introducing an adventurous protagonist who is always up for a challenge and loves to learn, using the example of a handstand as a metaphor for his approach to life. We will then move into the protagonist's journey into becoming an astronaut, where he experiences the unexpected smell of space. In the third paragraph, we will introduce a love interest who has a unique way of warding off unwanted suitors. The final paragraph will delve into the protagonist's introspective revelation about the nature of identity and perception.


---
Thought candidate 1:-
Passage:
The protagonist was always up for a challenge; he thrived on overcoming obstacles and learning new things. He was the kind of person who would see the world from a different angle, literally and metaphorically. One day, he watched a group of people doing handstands, and they made it look so easy. He decided to give it a try, and after a few failed attempts, he finally managed to stay upright. He laughed to himself as he thought, it isn't difficult to do a handstand if you just stand on your hands.

His adventurous spirit eventually led him into the field of space exploration. After years of rigorous training, he found himself floating in the vast expanse of space, looking at Earth from a completely different perspective. He took off his helmet inside the spaceship and was taken aback by the smell. It was not the sterile smell of the spaceship interior that he expected, but a rich, almost comforting aroma. It caught him off guard that space smelled of seared steak.

Back on Earth, he met a woman who was as unique and intriguing as the cosmos he had explored. She was not easily impressed and had her own ways of dealing with unwanted attention. Once, in a crowded bar, he saw her being approached by a man who was clearly not taking no for an answer. Instead of responding verbally, she started communicating with him in sign language. When she didn’t like a guy who was trying to pick her up, she started using sign language.

As he got to know her better, he started reflecting on his own life and how people perceived him. He realized that everyone he met saw a different version of him based on their interactions. His family saw him as a dreamer, his colleagues as a daredevil, and she saw him as a kindred spirit. Each person who knows you has a different perception of who you are.
---
Thought candidate 2:-
Passage:
Tom was a man of adventure, always seeking new challenges and experiences. He reveled in the thrill of uncertainty, the rush of adrenaline, and the pure joy that came with accomplishment. His friends often marveled at his fearlessness, to which Tom would laugh and say, "fear is simply a matter of perspective." He believed that the key to overcoming any obstacle was to approach it head on, and with a bit of creativity. After all, he would often quip, it isn't difficult to do a handstand if you just stand on your hands.

This adventurous spirit led Tom to pursue a career as an astronaut. The idea of exploring the unknown, of venturing into a place few have ever been, was tantalizing. When he finally made it into space, he was awed by the beauty of the cosmos. But one thing he hadn't expected was the smell. As he removed his helmet inside the space station, he was met with a scent that was unmistakably familiar yet oddly out of place. It caught him off guard that space smelled of seared steak.

Back on Earth, Tom met a woman named Maya, who was as intriguing as she was beautiful. She had a passion for learning, just like Tom, and had even picked up sign language as a hobby. Maya had a unique approach to dealing with unwanted advances. Instead of giving the usual excuses or ignoring the person, she would simply switch to sign language. When she didn’t like a guy who was trying to pick her up, she started using sign language.

As Tom got to know Maya, he began to reflect on the nature of identity and perception. He realized that while he saw himself as an adventurous astronaut, others might see him as a daredevil, a risk-taker, or even a reckless thrill-seeker. Maya, on the other hand, saw him as a man of curiosity, courage, and resilience. This led him to an important realization: each person who knows you has a different perception of who you are.
---
Thought candidate 3:-
Passage:
The protagonist, a young man of distinct courage and curiosity, had always been one for challenges. From a young age, he had a knack for trying new things, no matter how daunting they seemed. He had a way of simplifying complex tasks, breaking them down into manageable steps, and thereby making the impossible seem possible. When asked how he had mastered the art of doing a handstand, he'd always reply with a smirk, "It isn't difficult to do a handstand if you just stand on your hands."

As he grew older, his thirst for adventure and knowledge led him to pursue a career that was not only challenging but also out of this world, literally. He became an astronaut, a profession that took him to the stars and beyond. His first journey into space was nothing short of extraordinary. One experience, in particular, was unexpected. As he took off his helmet inside the spaceship, he was surprised by a distinct smell. It caught him off guard that space smelled of seared steak.

Back on earth, he met an intriguing woman who was as unique as she was beautiful. She had a whimsical sense of humor and a knack for dealing with unwanted attention. Whenever a man approached her with an intention she didn't appreciate, she had a unique way of warding them off. When she didn’t like a guy who was trying to pick her up, she started using sign language.

Throughout these experiences, he realized something profound about the nature of identity. He understood that the way he saw himself, the adventurous child, the daring astronaut, and the man smitten by love, was unique to him. Similarly, each person in his life, from his parents to his colleagues, and the woman he loved, saw a different version of him based on their interactions and experiences. Each person who knows you has a different perception of who you are.
---
Thought candidate 4:-
Passage:
Our protagonist has always been an adventurous soul, always up to conquer the impossible. Whether it's climbing the highest mountain or diving deep into the sea, he loves to challenge himself. He takes on every challenge as if it were as simple as standing on your hands. As he often says, "It isn't difficult to do a handstand if you just stand on your hands."

Having conquered most challenges on Earth, he set his eyes on the final frontier - space. Training to become an astronaut, he learned and experienced things beyond his wildest dreams. His first journey into space brought an unexpected surprise. As he floated in the vacuum of space, he opened his helmet visor and was hit with a smell that was strange yet oddly familiar. It caught him off guard that space smelled of seared steak.

Back on earth, amid all his adventures and achievements, he met a woman who was as unique and intriguing as space itself. She was not easily impressed, and she had a unique way of dealing with men she wasn't interested in. Once, when a rather persistent suitor tried his luck, she simply switched to sign language. When she didn’t like a guy who was trying to pick her up, she started using sign language.

As our protagonist navigated through these diverse experiences, he began to realize a profound truth about identity and perception. He was a different person to each individual he met - an adventurer to some, a reckless risk-taker to others, a hero to many, and a mystery to the woman he loved. He understood that each person who knows you has a different perception of who you are.
---
Thought candidate 5:-
Passage:
Our protagonist, John, was an adventurous spirit, always eager to take on new challenges. He had a knack for making the seemingly demanding tasks seem simple, much like his unique perspective on handstands. To John, handstands weren't about balance or strength, but a simple reorientation of perspective. After all, he'd say with a smirk, "It isn't difficult to do a handstand if you just stand on your hands."

His thirst for adventure eventually led John to become an astronaut. The training was rigorous, and the anticipation of the unknown was exhilarating. However, nothing prepared him for his first spacewalk. As he popped open his suit's visor, an unexpected smell wafted into his nostrils. It was a surreal moment, heightened by the realization that space smelled of seared steak. It caught him off guard.

Back on Earth, John met Lily, a vibrant woman with a sparkling wit and an unusual strategy for dealing with unwanted advances. Lily was deaf, but she didn't let that define her; instead, she used it to her advantage. When she didn’t like a guy who was trying to pick her up, she started using sign language, a tactic that left most suitors confused and quickly deterred.

John's travels and encounters with people like Lily made him realize that everyone he met had a different perception of him. Some saw him as the daring astronaut, others as the curious man who saw handstands in an unusual light, and to Lily, he was a patient man who took the time to learn sign language. It was a profound realization: each person who knows you has a different perception of who you are.
---
Each of the above thought candidates has been added as a child of the current node.
---
Using the state evaluator to obtain values...
---
Element 1 in queue:-

Value: 3
---
Element 2 in queue:-

Value: 0
---
Element 3 in queue:-

Value: 0
---
Element 4 in queue:-

Value: 0
---
Element 5 in queue:-

Value: 2
---
Initiating pruning (using the values obtained from the state evaluator).
Number of elements in queue: 5
Since this is the last step, setting the breadth limit to 1.
In other words, retaining only the highest value element (in this last step).
---
Element 1 in queue:-

Retaining this element as it's in the top 1 elements.
---
Element 2 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 3 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 4 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 5 in queue:-

Dropping this element as it's not in the top 1 elements.
---
~~~
Passage:
The protagonist was always up for a challenge; he thrived on overcoming obstacles and learning new things. He was the kind of person who would see the world from a different angle, literally and metaphorically. One day, he watched a group of people doing handstands, and they made it look so easy. He decided to give it a try, and after a few failed attempts, he finally managed to stay upright. He laughed to himself as he thought, it isn't difficult to do a handstand if you just stand on your hands.

His adventurous spirit eventually led him into the field of space exploration. After years of rigorous training, he found himself floating in the vast expanse of space, looking at Earth from a completely different perspective. He took off his helmet inside the spaceship and was taken aback by the smell. It was not the sterile smell of the spaceship interior that he expected, but a rich, almost comforting aroma. It caught him off guard that space smelled of seared steak.

Back on Earth, he met a woman who was as unique and intriguing as the cosmos he had explored. She was not easily impressed and had her own ways of dealing with unwanted attention. Once, in a crowded bar, he saw her being approached by a man who was clearly not taking no for an answer. Instead of responding verbally, she started communicating with him in sign language. When she didn’t like a guy who was trying to pick her up, she started using sign language.

As he got to know her better, he started reflecting on his own life and how people perceived him. He realized that everyone he met saw a different version of him based on their interactions. His family saw him as a dreamer, his colleagues as a daredevil, and she saw him as a kindred spirit. Each person who knows you has a different perception of who you are.

好的。是时候可视化这棵树了。

tot.render_html_tree()

注意: HTML 树在此博客文章中未能正确呈现。(但在 Colab/Jupyter 中完美呈现。)因此,我已将树保存为 HTML 文件,您可以在此处查看。以下是相同内容的截图

在上述可视化中,每个框代表一个节点。嵌套的框表示后代。(由于剪枝,并非所有节点都有子节点。)

希望您到目前为止喜欢这篇博客文章!下一节关于“24点游戏”任务有点长且细致入微。所以,如果您需要,现在是喝咖啡/茶的好时机:)

24点游戏

24点游戏任务中,LLM 会收到一个包含四个数字的输入序列。任务是生成一个方程(仅使用+-*/运算符),将这四个数字组合起来得到24。(每个数字在方程中只能使用一次。)

例如,如果输入序列是 "4 9 10 13",那么一个有效的输出是方程 "(13 - 9) * (10 - 4) = 24"

注意:上表中的“#ToT 步骤”指的是中间步骤的数量。对于24点游戏任务,有三个中间步骤:每个中间步骤都是一个中间方程(如上表所示)。

在深入了解 ToT 之前,让我们看看如何使用少量样本思维链 (CoT) 方法来解决这个问题。

我们来看下面的例子

# Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/data/24/24.csv
input_seq = '1 1 1 8'
# Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/prompts/game24.py
five_shot_cot_prompt = f'''Use numbers and basic arithmetic operations (+ - * /) to obtain 24. Each step, you are only allowed to choose two of the remaining numbers to obtain a new number.
Input: 4 4 6 8
Steps:
4 + 8 = 12 (left: 4 6 12)
6 - 4 = 2 (left: 2 12)
2 * 12 = 24 (left: 24)
Answer: (6 - 4) * (4 + 8) = 24
Input: 2 9 10 12
Steps:
12 * 2 = 24 (left: 9 10 24)
10 - 9 = 1 (left: 1 24)
24 * 1 = 24 (left: 24)
Answer: (12 * 2) * (10 - 9) = 24
Input: 4 9 10 13
Steps:
13 - 10 = 3 (left: 3 4 9)
9 - 3 = 6 (left: 4 6)
4 * 6 = 24 (left: 24)
Answer: 4 * (9 - (13 - 10)) = 24
Input: 1 4 8 8
Steps:
8 / 4 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
3 * 8 = 24 (left: 24)
Answer: (1 + 8 / 4) * 8 = 24
Input: 5 5 5 9
Steps:
5 + 5 = 10 (left: 5 9 10)
10 + 5 = 15 (left: 9 15)
15 + 9 = 24 (left: 24)
Answer: ((5 + 5) + 5) + 9 = 24
Input: {input_seq}
'''
print(five_shot_cot_prompt)
Use numbers and basic arithmetic operations (+ - * /) to obtain 24. Each step, you are only allowed to choose two of the remaining numbers to obtain a new number.
Input: 4 4 6 8
Steps:
4 + 8 = 12 (left: 4 6 12)
6 - 4 = 2 (left: 2 12)
2 * 12 = 24 (left: 24)
Answer: (6 - 4) * (4 + 8) = 24
Input: 2 9 10 12
Steps:
12 * 2 = 24 (left: 9 10 24)
10 - 9 = 1 (left: 1 24)
24 * 1 = 24 (left: 24)
Answer: (12 * 2) * (10 - 9) = 24
Input: 4 9 10 13
Steps:
13 - 10 = 3 (left: 3 4 9)
9 - 3 = 6 (left: 4 6)
4 * 6 = 24 (left: 24)
Answer: 4 * (9 - (13 - 10)) = 24
Input: 1 4 8 8
Steps:
8 / 4 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
3 * 8 = 24 (left: 24)
Answer: (1 + 8 / 4) * 8 = 24
Input: 5 5 5 9
Steps:
5 + 5 = 10 (left: 5 9 10)
10 + 5 = 15 (left: 9 15)
15 + 9 = 24 (left: 24)
Answer: ((5 + 5) + 5) + 9 = 24
Input: 1 1 1 8

在上述提示中,提供了五个“输入”、“步骤”和“答案”示例,然后是新的“输入”。希望 LLM 能够进行上下文学习,为新的“输入”生成适当的“步骤”和“答案”。让我们试一试。

responses = prelim.chat_completions(five_shot_cot_prompt, n=1)
print(responses[0])
Steps:
1 + 1 = 2 (left: 1 2 8)
2 * 8 = 16 (left: 1 16)
16 + 8 = 24 (left: 24)
Answer: ((1 + 1) * 8) + 1 = 24

少数样本 CoT 在此场合失败了!ToT 能做得更好吗?让我们拭目以待。

我们首先需要一个合适的思维生成策略。

回想一下,在创意写作任务中,我们使用了'sample'思想生成策略(它涉及以独立同分布的方式生成n_candidates个思想)。然而,当每个思想都非常短(例如,只有一个词或一行)时,独立同分布策略会导致生成大量重复的思想。

为了避免这个问题,作者在24点游戏任务中采用了不同的思想生成策略:'propose''propose'策略需要使用建议提示顺序生成思想。(生成的思想由分隔符(如'\n')分隔)。论文中写道:

当思想空间更受约束(例如,每个思想只是一词或一行)时,此策略效果更好,因为在同一上下文中提出不同思想可以避免重复。

让我们通过一个例子来具体化这个想法。以下是中间步骤的建议提示:

# Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/prompts/game24.py
remaining_numbers = input_seq
one_shot_propose_prompt = f'''Input: 2 8 8 14
Possible next steps:
2 + 8 = 10 (left: 8 10 14)
8 / 2 = 4 (left: 4 8 14)
14 + 2 = 16 (left: 8 8 16)
2 * 8 = 16 (left: 8 14 16)
8 - 2 = 6 (left: 6 8 14)
14 - 8 = 6 (left: 2 6 8)
14 /  2 = 7 (left: 7 8 8)
14 - 2 = 12 (left: 8 8 12)
Input: {remaining_numbers}
Possible next steps:
'''
print(one_shot_propose_prompt)
Input: 2 8 8 14
Possible next steps:
2 + 8 = 10 (left: 8 10 14)
8 / 2 = 4 (left: 4 8 14)
14 + 2 = 16 (left: 8 8 16)
2 * 8 = 16 (left: 8 14 16)
8 - 2 = 6 (left: 6 8 14)
14 - 8 = 6 (left: 2 6 8)
14 /  2 = 7 (left: 7 8 8)
14 - 2 = 12 (left: 8 8 12)
Input: 1 1 1 8
Possible next steps:

这是一个单样本提示,包含一个“输入”和“可能的下一步”的示例。希望 LLM 能够进行上下文学习,为新的“输入”(剩余数字)生成各种“可能的下一步”。(有趣的是,上面的提示不包含任务特定的指令,即它没有告诉 LLM 任何关于24点游戏任务的信息。)

让我们看看 LLM 生成了什么想法。

responses = prelim.chat_completions(one_shot_propose_prompt, n=1)
thoughts = responses[0].split('\n')
thoughts
['1 + 1 = 2 (left: 1 2 8)',
 '1 * 1 = 1 (left: 1 1 8)',
 '8 / 1 = 8 (left: 1 1 8)',
 '8 - 1 = 7 (left: 1 1 7)',
 '8 * 1 = 8 (left: 1 1 8)',
 '1 * 8 = 8 (left: 1 1 8)',
 '8 + 1 = 9 (left: 1 1 9)']

注意: 回想一下,使用 'sample' 策略时,我们指定了 n_candidates - 在每个思想生成步骤中生成的思想数量。使用 'propose' 策略时,我们不指定 n_candidates;而是将其作为 LLM 的决策。

对于下一个思想生成步骤,我们需要处理剩余的数字(例如,"1 2 8")。因此,让我们编写一个函数来从思想中提取剩余的数字。

# Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/tasks/game24.py
def get_remaining_numbers(thought: str) -> str:
    return thought.split('left: ')[-1].split(')')[0]

我们来试一下上面其中一个想法。

print(thoughts[0])
remaining_numbers = get_remaining_numbers(thoughts[0])
remaining_numbers
1 + 1 = 2 (left: 1 2 8)
'1 2 8'

使用上述剩余数字,我们的单样本建议提示现在如下

one_shot_propose_prompt = f'''Input: 2 8 8 14
Possible next steps:
2 + 8 = 10 (left: 8 10 14)
8 / 2 = 4 (left: 4 8 14)
14 + 2 = 16 (left: 8 8 16)
2 * 8 = 16 (left: 8 14 16)
8 - 2 = 6 (left: 6 8 14)
14 - 8 = 6 (left: 2 6 8)
14 /  2 = 7 (left: 7 8 8)
14 - 2 = 12 (left: 8 8 12)
Input: {remaining_numbers}
Possible next steps:
'''
print(one_shot_propose_prompt)
Input: 2 8 8 14
Possible next steps:
2 + 8 = 10 (left: 8 10 14)
8 / 2 = 4 (left: 4 8 14)
14 + 2 = 16 (left: 8 8 16)
2 * 8 = 16 (left: 8 14 16)
8 - 2 = 6 (left: 6 8 14)
14 - 8 = 6 (left: 2 6 8)
14 /  2 = 7 (left: 7 8 8)
14 - 2 = 12 (left: 8 8 12)
Input: 1 2 8
Possible next steps:

让我们看看 LLM 生成了什么想法。

responses = prelim.chat_completions(one_shot_propose_prompt, n=1)
thoughts = responses[0].split('\n')
thoughts
['1 + 2 = 3 (left: 3 8)',
 '8 - 1 = 7 (left: 2 7)',
 '8 - 2 = 6 (left: 1 6)',
 '2 * 1 = 2 (left: 2 8)',
 '8 / 2 = 4 (left: 1 4)',
 '8 / 1 = 8 (left: 2 8)']

让我们从上述某个想法中提取剩余的数字。

print(thoughts[0])
remaining_numbers = get_remaining_numbers(thoughts[0])
remaining_numbers
1 + 2 = 3 (left: 3 8)
'3 8'

使用上述剩余数字,我们的单样本建议提示现在如下

one_shot_propose_prompt = f'''Input: 2 8 8 14
Possible next steps:
2 + 8 = 10 (left: 8 10 14)
8 / 2 = 4 (left: 4 8 14)
14 + 2 = 16 (left: 8 8 16)
2 * 8 = 16 (left: 8 14 16)
8 - 2 = 6 (left: 6 8 14)
14 - 8 = 6 (left: 2 6 8)
14 /  2 = 7 (left: 7 8 8)
14 - 2 = 12 (left: 8 8 12)
Input: {remaining_numbers}
Possible next steps:
'''
print(one_shot_propose_prompt)
Input: 2 8 8 14
Possible next steps:
2 + 8 = 10 (left: 8 10 14)
8 / 2 = 4 (left: 4 8 14)
14 + 2 = 16 (left: 8 8 16)
2 * 8 = 16 (left: 8 14 16)
8 - 2 = 6 (left: 6 8 14)
14 - 8 = 6 (left: 2 6 8)
14 /  2 = 7 (left: 7 8 8)
14 - 2 = 12 (left: 8 8 12)
Input: 3 8
Possible next steps:

让我们看看 LLM 生成了什么想法。

responses = prelim.chat_completions(one_shot_propose_prompt, n=1)
thoughts = responses[0].split('\n')
thoughts
['3 + 8 = 11 (left: 11)',
 '8 - 3 = 5 (left: 5)',
 '3 * 8 = 24 (left: 24)',
 '8 / 3 = 2.67 (left: 2.67)']

我们看到其中一个想法是 "3 * 8 = 24 (left: 24)"。换句话说,在思想生成步骤 123 中,至少存在一个成功搜索路径(可以达到 24)。

现在我们已经生成了中间思想,我们需要生成输出。(这可以认为是思想生成步骤 4。)

例如,假设特定节点的状态如下:

thoughts = ['1 + 1 = 2 (left: 1 2 8)', '1 + 2 = 3 (left: 3 8)', '3 * 8 = 24 (left: 24)']
state =  '\n'.join(thoughts)
print(state)
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
3 * 8 = 24 (left: 24)

上述状态包含了所有正确的中间思维,能够生成输出 "Answer: (1 + (1 + 1)) * 8 = 24"。由于这个输出生成任务与之前生成中间思维的任务非常不同,它的提示也会看起来非常不同。提示如下:

# Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/prompts/game24.py
five_shot_cot_prompt = f'''Use numbers and basic arithmetic operations (+ - * /) to obtain 24. Each step, you are only allowed to choose two of the remaining numbers to obtain a new number.
Input: 4 4 6 8
Steps:
4 + 8 = 12 (left: 4 6 12)
6 - 4 = 2 (left: 2 12)
2 * 12 = 24 (left: 24)
Answer: (6 - 4) * (4 + 8) = 24
Input: 2 9 10 12
Steps:
12 * 2 = 24 (left: 9 10 24)
10 - 9 = 1 (left: 1 24)
24 * 1 = 24 (left: 24)
Answer: (12 * 2) * (10 - 9) = 24
Input: 4 9 10 13
Steps:
13 - 10 = 3 (left: 3 4 9)
9 - 3 = 6 (left: 4 6)
4 * 6 = 24 (left: 24)
Answer: 4 * (9 - (13 - 10)) = 24
Input: 1 4 8 8
Steps:
8 / 4 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
3 * 8 = 24 (left: 24)
Answer: (1 + 8 / 4) * 8 = 24
Input: 5 5 5 9
Steps:
5 + 5 = 10 (left: 5 9 10)
10 + 5 = 15 (left: 9 15)
15 + 9 = 24 (left: 24)
Answer: ((5 + 5) + 5) + 9 = 24
Input: {input_seq}
Steps:
{state}
'''
print(five_shot_cot_prompt)
Use numbers and basic arithmetic operations (+ - * /) to obtain 24. Each step, you are only allowed to choose two of the remaining numbers to obtain a new number.
Input: 4 4 6 8
Steps:
4 + 8 = 12 (left: 4 6 12)
6 - 4 = 2 (left: 2 12)
2 * 12 = 24 (left: 24)
Answer: (6 - 4) * (4 + 8) = 24
Input: 2 9 10 12
Steps:
12 * 2 = 24 (left: 9 10 24)
10 - 9 = 1 (left: 1 24)
24 * 1 = 24 (left: 24)
Answer: (12 * 2) * (10 - 9) = 24
Input: 4 9 10 13
Steps:
13 - 10 = 3 (left: 3 4 9)
9 - 3 = 6 (left: 4 6)
4 * 6 = 24 (left: 24)
Answer: 4 * (9 - (13 - 10)) = 24
Input: 1 4 8 8
Steps:
8 / 4 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
3 * 8 = 24 (left: 24)
Answer: (1 + 8 / 4) * 8 = 24
Input: 5 5 5 9
Steps:
5 + 5 = 10 (left: 5 9 10)
10 + 5 = 15 (left: 9 15)
15 + 9 = 24 (left: 24)
Answer: ((5 + 5) + 5) + 9 = 24
Input: 1 1 1 8
Steps:
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
3 * 8 = 24 (left: 24)

您可能已经注意到,这与之前的五样本 CoT 提示完全相同,只是我们也注入了中间步骤。这次 LLM 能够生成正确的输出吗?

responses = prelim.chat_completions(five_shot_cot_prompt, n=1)
thoughts = responses[0].split('\n')
thoughts
['Answer: (1 + 1 + 1) * 8 = 24']

是的,它能!

现在,上述工作流程引出了一个小问题。有两个独立的提示(一个用于生成中间思想,一个用于生成最终答案)。此外,我们正在使用一个外部函数 get_remaining_numbers 从中间思想中提取剩余数字。但是我们需要向我们的 TreeOfThoughts 类传递一个单一的可调用函数 get_thought_gen_prompt(它返回正确的提示)。我们如何做到这一点?

以下可调用函数可以完成此任务

def get_thought_gen_prompt(input_seq: str, state: str) -> str:
    """Get thought generation prompt.

    Keyword arguments:
    input_seq -- the input sequence (comprising four numbers, e.g., '1 1 1 8')
    state -- concatenation of all the thoughts so far (separated by '\n')
    """

    # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/tasks/game24.py
    def get_remaining_numbers(thought: str) -> str:
        return thought.split('left: ')[-1].split(')')[0]

    if state == '': # Root node; no thoughts have been generated yet.
        remaining_numbers = input_seq
    else:
        last_thought = state.strip().split('\n')[-1]
        remaining_numbers = get_remaining_numbers(last_thought)

    if remaining_numbers != '24': # Intermediate step.
        # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/prompts/game24.py
        prompt = f'''Input: 2 8 8 14
Possible next steps:
2 + 8 = 10 (left: 8 10 14)
8 / 2 = 4 (left: 4 8 14)
14 + 2 = 16 (left: 8 8 16)
2 * 8 = 16 (left: 8 14 16)
8 - 2 = 6 (left: 6 8 14)
14 - 8 = 6 (left: 2 6 8)
14 /  2 = 7 (left: 7 8 8)
14 - 2 = 12 (left: 8 8 12)
Input: {remaining_numbers}
Possible next steps:
'''
    else: # Last (output generation) step.
        # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/prompts/game24.py
        prompt = f'''Use numbers and basic arithmetic operations (+ - * /) to obtain 24. Each step, you are only allowed to choose two of the remaining numbers to obtain a new number.
Input: 4 4 6 8
Steps:
4 + 8 = 12 (left: 4 6 12)
6 - 4 = 2 (left: 2 12)
2 * 12 = 24 (left: 24)
Answer: (6 - 4) * (4 + 8) = 24
Input: 2 9 10 12
Steps:
12 * 2 = 24 (left: 9 10 24)
10 - 9 = 1 (left: 1 24)
24 * 1 = 24 (left: 24)
Answer: (12 * 2) * (10 - 9) = 24
Input: 4 9 10 13
Steps:
13 - 10 = 3 (left: 3 4 9)
9 - 3 = 6 (left: 4 6)
4 * 6 = 24 (left: 24)
Answer: 4 * (9 - (13 - 10)) = 24
Input: 1 4 8 8
Steps:
8 / 4 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
3 * 8 = 24 (left: 24)
Answer: (1 + 8 / 4) * 8 = 24
Input: 5 5 5 9
Steps:
5 + 5 = 10 (left: 5 9 10)
10 + 5 = 15 (left: 9 15)
15 + 9 = 24 (left: 24)
Answer: ((5 + 5) + 5) + 9 = 24
Input: {input_seq}
Steps:
{state}
'''
    return prompt

通过采用以下策略,我们已将所有内容封装到一个可调用函数中

  • get_remaining_numbers 现在是一个嵌套函数。
  • 通过在 '\n' 字符处分割,从状态中提取最后一个思想。(这之所以有效,是因为每个思想都出现在新行上。)
  • 使用 get_remaining_numbers 从最后一个思想中提取 remaining_numbers
  • 如果 remaining_numbers 不等于 '24',则返回用于生成中间思想的提示。否则,返回用于生成最终答案的提示。

现在,“24点游戏”的 BFS 算法与“创意写作”的 BFS 算法非常相似,只是存在以下差异

  • n_steps 等于 43 个中间步骤 + 1 个输出生成步骤)。
  • 作者选择将 n_evals 设置为 3
  • 作者选择将 breadth_limit 设置为 5

对于24点游戏任务,采用的状态评估策略是'value'(而不是'vote')。论文中写道:

独立评估每个状态……一个值提示会针对状态 $s$ 进行推理,以生成一个标量值 $v$(例如 1-10)或一个分类(例如 sure/likely/impossible),这些都可以启发式地转换为一个值……这样的评估不需要完美,只需要对决策提供近似的帮助。

换句话说,'value' 策略不是对状态进行投票,而是独立评估每个状态。

事实证明,状态评估器就是 LLM 本身。然而,它(显然)需要一个与思想生成器不同的提示。我们来看看。

回想一下,LLM 的思想有两种不同的类型:(i) 中间思想和 (ii) 最终答案。因此,我们需要两个单独的状态评估提示:(1) 一个提示用于评估仅包含中间思想的状态,以及 (2) 另一个提示用于评估同时包含中间思想和最终答案的状态。

让我们从前者开始。假设目前已经生成了两个中间思想。

thoughts = ['1 + 1 = 2 (left: 1 2 8)', '1 + 2 = 3 (left: 3 8)']
state =  '\n'.join(thoughts)
print(state)
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)

我们可以通过在 '\n' 字符处分割来从状态中提取最后一个思想。

last_thought = state.strip().split('\n')[-1]
last_thought
'1 + 2 = 3 (left: 3 8)'

然后使用我们熟悉的 get_remaining_numbers 函数提取剩余数字。

remaining_numbers = get_remaining_numbers(last_thought)
remaining_numbers
'3 8'

以下八样本值提示用于评估剩余数字是否能达到 24

# Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/prompts/game24.py
eight_shot_value_prompt = f'''Evaluate if given numbers can reach 24 (sure/likely/impossible)
10 14
10 + 14 = 24
sure
11 12
11 + 12 = 23
12 - 11 = 1
11 * 12 = 132
11 / 12 = 0.91
impossible
4 4 10
4 + 4 + 10 = 8 + 10 = 18
4 * 10 - 4 = 40 - 4 = 36
(10 - 4) * 4 = 6 * 4 = 24
sure
4 9 11
9 + 11 + 4 = 20 + 4 = 24
sure
5 7 8
5 + 7 + 8 = 12 + 8 = 20
(8 - 5) * 7 = 3 * 7 = 21
I cannot obtain 24 now, but numbers are within a reasonable range
likely
5 6 6
5 + 6 + 6 = 17
(6 - 5) * 6 = 1 * 6 = 6
I cannot obtain 24 now, but numbers are within a reasonable range
likely
10 10 11
10 + 10 + 11 = 31
(11 - 10) * 10 = 10
10 10 11 are all too big
impossible
1 3 3
1 * 3 * 3 = 9
(1 + 3) * 3 = 12
1 3 3 are all too small
impossible
{remaining_numbers}
'''
print(eight_shot_value_prompt)
Evaluate if given numbers can reach 24 (sure/likely/impossible)
10 14
10 + 14 = 24
sure
11 12
11 + 12 = 23
12 - 11 = 1
11 * 12 = 132
11 / 12 = 0.91
impossible
4 4 10
4 + 4 + 10 = 8 + 10 = 18
4 * 10 - 4 = 40 - 4 = 36
(10 - 4) * 4 = 6 * 4 = 24
sure
4 9 11
9 + 11 + 4 = 20 + 4 = 24
sure
5 7 8
5 + 7 + 8 = 12 + 8 = 20
(8 - 5) * 7 = 3 * 7 = 21
I cannot obtain 24 now, but numbers are within a reasonable range
likely
5 6 6
5 + 6 + 6 = 17
(6 - 5) * 6 = 1 * 6 = 6
I cannot obtain 24 now, but numbers are within a reasonable range
likely
10 10 11
10 + 10 + 11 = 31
(11 - 10) * 10 = 10
10 10 11 are all too big
impossible
1 3 3
1 * 3 * 3 = 9
(1 + 3) * 3 = 12
1 3 3 are all too small
impossible
3 8

让我们看看 LLM 的回应。

responses = prelim.chat_completions(eight_shot_value_prompt, n=1)
print(responses[0])
3 + 8 = 11
3 * 8 = 24
sure

现在,让我们考虑包含最终答案的状态。

thoughts = ['1 + 1 = 2 (left: 1 2 8)', '1 + 2 = 3 (left: 3 8)', '3 * 8 = 24 (left: 24)', 'Answer: ((1 + 1) + 1) * 8 = 24']
state =  '\n'.join(thoughts)
print(state)
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
3 * 8 = 24 (left: 24)
Answer: ((1 + 1) + 1) * 8 = 24

首先,提取最后一行。

last_line = state.strip().split('\n')[-1]
last_line
'Answer: ((1 + 1) + 1) * 8 = 24'

如果字符串 'left': 不是 last_line 的子字符串,那么我们知道我们有了最终答案。在这种情况下,我们按如下方式提取方程

if 'left: ' not in last_line:
    ans = last_line.lower().replace('answer: ', '')
ans
'((1 + 1) + 1) * 8 = 24'

以下六样本值提示用于评估提取的方程是否正确

# Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/prompts/game24.py
six_shot_value_last_step_prompt = f'''Use numbers and basic arithmetic operations (+ - * /) to obtain 24. Given an input and an answer, give a judgement (sure/impossible) if the answer is correct, i.e. it uses each input exactly once and no other numbers, and reach 24.
Input: 4 4 6 8
Answer: (4 + 8) * (6 - 4) = 24
Judge:
sure
Input: 2 9 10 12
Answer: 2 * 12 * (10 - 9) = 24
Judge:
sure
Input: 4 9 10 13
Answer: (13 - 9) * (10 - 4) = 24
Judge:
sure
Input: 4 4 6 8
Answer: (4 + 8) * (6 - 4) + 1 = 25
Judge:
impossible
Input: 2 9 10 12
Answer: 2 * (12 - 10) = 24
Judge:
impossible
Input: 4 9 10 13
Answer: (13 - 4) * (10 - 9) = 24
Judge:
impossible
Input: {input_seq}
Answer: {ans}
Judge:'''
print(six_shot_value_last_step_prompt)
Use numbers and basic arithmetic operations (+ - * /) to obtain 24. Given an input and an answer, give a judgement (sure/impossible) if the answer is correct, i.e. it uses each input exactly once and no other numbers, and reach 24.
Input: 4 4 6 8
Answer: (4 + 8) * (6 - 4) = 24
Judge:
sure
Input: 2 9 10 12
Answer: 2 * 12 * (10 - 9) = 24
Judge:
sure
Input: 4 9 10 13
Answer: (13 - 9) * (10 - 4) = 24
Judge:
sure
Input: 4 4 6 8
Answer: (4 + 8) * (6 - 4) + 1 = 25
Judge:
impossible
Input: 2 9 10 12
Answer: 2 * (12 - 10) = 24
Judge:
impossible
Input: 4 9 10 13
Answer: (13 - 4) * (10 - 9) = 24
Judge:
impossible
Input: 1 1 1 8
Answer: ((1 + 1) + 1) * 8 = 24
Judge:

这提示也太复杂了吧!

让我们看看 LLM 的回应。

responses = prelim.chat_completions(six_shot_value_last_step_prompt, n=1)
print(responses[0])
sure

再次,上述工作流程引出了一个小问题。有两个独立的提示。但是我们需要向我们的 TreeOfThoughts 类传递一个单一的可调用函数 get_state_eval_prompt(它返回正确的提示)。我们如何做到这一点?

以下可调用函数可以完成此任务

def get_state_eval_prompt(input_seq: str, state: str) -> str:
    """Get state evaluation prompt.

    Keyword arguments:
    input_seq -- the input sequence (comprising four numbers, e.g., '1 1 1 8')
    state -- concatenation of all the thoughts so far (separated by '\n')
    """

    # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/tasks/game24.py
    def get_remaining_numbers(thought: str) -> str:
        return thought.split('left: ')[-1].split(')')[0]

    last_line = state.strip().split('\n')[-1]

    if 'left: ' not in last_line: # Last (output generation) step.
        ans = last_line.lower().replace('answer: ', '')
        # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/prompts/game24.py
        prompt = f'''Use numbers and basic arithmetic operations (+ - * /) to obtain 24. Given an input and an answer, give a judgement (sure/impossible) if the answer is correct, i.e. it uses each input exactly once and no other numbers, and reach 24.
Input: 4 4 6 8
Answer: (4 + 8) * (6 - 4) = 24
Judge:
sure
Input: 2 9 10 12
Answer: 2 * 12 * (10 - 9) = 24
Judge:
sure
Input: 4 9 10 13
Answer: (13 - 9) * (10 - 4) = 24
Judge:
sure
Input: 4 4 6 8
Answer: (4 + 8) * (6 - 4) + 1 = 25
Judge:
impossible
Input: 2 9 10 12
Answer: 2 * (12 - 10) = 24
Judge:
impossible
Input: 4 9 10 13
Answer: (13 - 4) * (10 - 9) = 24
Judge:
impossible
Input: {input_seq}
Answer: {ans}
Judge:'''
    else: # Intermediate step.
        remaining_numbers = get_remaining_numbers(last_line)
        # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/prompts/game24.py
        prompt = f'''Evaluate if given numbers can reach 24 (sure/likely/impossible)
10 14
10 + 14 = 24
sure
11 12
11 + 12 = 23
12 - 11 = 1
11 * 12 = 132
11 / 12 = 0.91
impossible
4 4 10
4 + 4 + 10 = 8 + 10 = 18
4 * 10 - 4 = 40 - 4 = 36
(10 - 4) * 4 = 6 * 4 = 24
sure
4 9 11
9 + 11 + 4 = 20 + 4 = 24
sure
5 7 8
5 + 7 + 8 = 12 + 8 = 20
(8 - 5) * 7 = 3 * 7 = 21
I cannot obtain 24 now, but numbers are within a reasonable range
likely
5 6 6
5 + 6 + 6 = 17
(6 - 5) * 6 = 1 * 6 = 6
I cannot obtain 24 now, but numbers are within a reasonable range
likely
10 10 11
10 + 10 + 11 = 31
(11 - 10) * 10 = 10
10 10 11 are all too big
impossible
1 3 3
1 * 3 * 3 = 9
(1 + 3) * 3 = 12
1 3 3 are all too small
impossible
{remaining_numbers}
'''
    return prompt

我们需要的最后一个可调用函数是启发式计算器。启发式计算器的作用是将每个状态的多个评估整理成一个单一的启发式分数。(每个评估都是“sure”/“likely”/“impossible”。)我们来看看。

假设我们位于具有以下状态的节点上

# Say:
thoughts = ['1 + 1 = 2 (left: 1 2 8)', '1 + 2 = 3 (left: 3 8)']
state =  '\n'.join(thoughts)
print(state)
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)

如前所述,作者选择将 n_evals 设置为 3

n_evals = 3

让我们得到 3 个状态评估。

prompt = get_state_eval_prompt(input_seq, state)
state_evals = prelim.chat_completions(prompt, n=n_evals)
for eval in state_evals:
    print(eval)
    print("---")
3 + 8 = 11
3 * 8 = 24
sure
---
3 + 8 = 11
8 - 3 = 5
3 * 8 = 24
sure
---
3 + 8 = 11
3 * 8 = 24
sure
---

以下可调用函数将三个评估整理成一个单一的启发式分数。

# Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/tasks/game24.py
def heuristic_calculator(state: str, state_evals: List[str]) -> float:
    if len(state.strip().split('\n')) == 4 and 'answer' not in state.lower(): # Such a state is undesirable.
        return 0
    value_names = [_.split('\n')[-1].lower() for _ in state_evals] # A list containing 'impossible' / 'likely' / 'sure' values.
    value_map = {'impossible': 0.001, 'likely': 1, 'sure': 20} # Ad hoc.
    value = sum(value * value_names.count(name) for name, value in value_map.items())
    return value

简要说明

  • 如果一个特定状态包含 4 行,且不包含最终答案,则该状态的值被赋值为 0(因为这样的状态是不理想的)。
  • 否则,从 3 个状态评估中提取最后几行。这会得到一个包含 'impossible'/'likely'/'sure' 值的列表。
  • 返回上述值的加权和,其中(临时)权重为 {'impossible': 0.001, 'likely': 1, 'sure': 20}

我们来试一下上面的状态评估。

heuristic_calculator(state, state_evals)
60.0

这是可能的最大值,因为所有 3 个状态评估都是 'sure'

现在,让我们尝试一个没有任何希望达到 24 的状态。

# Say:
thoughts = ['1 + 1 = 2 (left: 1 2 8)', '8 - 1 = 7 (left: 2 7)']
state =  '\n'.join(thoughts)
print(state)
1 + 1 = 2 (left: 1 2 8)
8 - 1 = 7 (left: 2 7)
prompt = get_state_eval_prompt(input_seq, state)
state_evals = prelim.chat_completions(prompt, n=n_evals)
for eval in state_evals:
    print(eval)
    print("---")
2 + 7 = 9
2 * 7 = 14
2 / 7 = 0.28
7 - 2 = 5
impossible
---
2 + 7 = 9
2 * 7 = 14
2 / 7 = 0.28
7 - 2 = 5
impossible
---
2 + 7 = 9
2 * 7 = 14
7 - 2 = 5
7 / 2 = 3.5
impossible
---
heuristic_calculator(state, state_evals)
0.003

分配了一个很低的值。很好。

接下来,我们考虑包含正确最终答案的状态。

# Say:
thoughts = ['1 + 1 = 2 (left: 1 2 8)', '1 + 2 = 3 (left: 3 8)', '3 * 8 = 24 (left: 24)', 'Answer: ((1 + 1) + 1) * 8 = 24']
state =  '\n'.join(thoughts)
print(state)
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
3 * 8 = 24 (left: 24)
Answer: ((1 + 1) + 1) * 8 = 24
prompt = get_state_eval_prompt(input_seq, state)
state_evals = prelim.chat_completions(prompt, n=n_evals)
for eval in state_evals:
    print(eval)
    print("---")
sure
---
sure
---
sure
---
heuristic_calculator(state, state_evals)
60.0

完美。

最后,我们考虑包含错误最终答案的状态。

# Say:
thoughts = ['1 + 1 = 2 (left: 1 2 8)', '1 + 2 = 3 (left: 3 8)', '3 * 8 = 24 (left: 24)', 'Answer: ((1 + 1) + 1) - 8 = 24']
state =  '\n'.join(thoughts)
print(state)
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
3 * 8 = 24 (left: 24)
Answer: ((1 + 1) + 1) - 8 = 24
prompt = get_state_eval_prompt(input_seq, state)
state_evals = prelim.chat_completions(prompt, n=n_evals)
for eval in state_evals:
    print(eval)
    print("---")
impossible
---
impossible
---
impossible
---
heuristic_calculator(state, state_evals)
0.003

太棒了。

希望您现在对启发式计算器有了很好的直觉。

最后,我们准备编写用于24点游戏任务的 TreeOfThoughts 类。

注意:除了 bfs 方法外,我们还添加了 dfs(深度优先搜索)方法——这是 ToT 论文中的另一种搜索算法。现在不用担心它。dfs 方法将在下面详细解释。

class TreeOfThoughts:
    def __init__(
            self,
            client: Union[OpenAI, InferenceClient],
            model: str,
            input_seq: str,
            get_thought_gen_prompt: Callable,
            get_state_eval_prompt: Callable,
            heuristic_calculator: Callable,
            max_per_state: Optional[int] = None
    ):
        self.client = client
        self.model = model # e.g., "gpt-4" if using `OpenAI` and "meta-llama/Meta-Llama-3.1-8B-Instruct" if using `InferenceClient`.
        self.input_seq = input_seq
        self.root = TreeNode(state='', thought='')
        self.n_steps = 4 # 3 intermediate steps + 1 output generation step.
        self.thought_gen_strategy = 'propose'
        self.get_thought_gen_prompt = get_thought_gen_prompt
        self.state_eval_strategy = 'value'
        self.get_state_eval_prompt = get_state_eval_prompt
        self.n_evals = 3 # The number of times to sample values for each state.
        self.heuristic_calculator = heuristic_calculator
        self.breadth_limit = 5 # Relevant only for the BFS search algorithm.
        self.heuristic_threshold = 3.0 # Relevant only for the DFS search algorithm; will be explained below.
        self.max_per_state = max_per_state # Relevant only for the DFS search algorithm; will be explained below.

    # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/models.py
    def chat_completions(
            self,
            prompt: str,
            temperature: float = 0.7,
            max_tokens: int = 1000,
            n: int = 1,
            stop: Optional[List[str]] = None,
            **kwargs
    ) -> List[str]:
        outputs = []
        messages = [{'role': "user", 'content': prompt}]
        if isinstance(self.client, OpenAI):
            response = self.client.chat.completions.create(
                messages=messages,
                model=self.model,
                temperature=temperature,
                max_tokens=max_tokens,
                n=n, # The `n` responses are i.i.d.
                stop=stop,
                **kwargs
            )
            outputs.extend([choice.message.content for choice in response.choices])
        else: # `self.client` is an instance of `InferenceClient`.
            # The Hugging Face API doesn't support the `n` argument. Hence, we need to use a loop to generate `n` i.i.d. responses.
            for _ in range(n):
                response = self.client.chat.completions.create(
                    messages=messages,
                    model=self.model,
                    temperature=temperature,
                    max_tokens=max_tokens,
                    stop=stop,
                    **kwargs
                )
                outputs.append(response.choices[0].message.content)
        return outputs

    def thought_generator(self, state: str) -> List[str]:
        if self.thought_gen_strategy == 'sample':
            pass
        else: # `self.thought_gen_strategy` is equal to 'propose'.
            prompt = self.get_thought_gen_prompt(self.input_seq, state)
            responses = self.chat_completions(prompt, n=1)
            thoughts = responses[0].split('\n')
            return thoughts

    def state_evaluator(self, state: str) -> float:
        if self.state_eval_strategy == 'vote':
            pass
        else: # `self.state_eval_strategy` is equal to 'value'.
            prompt = self.get_state_eval_prompt(self.input_seq, state)
            state_evals = self.chat_completions(prompt, n=self.n_evals)
            value = self.heuristic_calculator(state, state_evals)
            return value

    # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/methods/bfs.py
    def bfs(self, verbose: bool = True) -> str:
        queue = deque()
        queue.append(self.root)

        for step in range(1, self.n_steps + 1):
            if verbose:
                print(f"Step {step} (corresponding to level {step} of the tree):-\n---")
            for i in range(len(queue)):
                node = queue.popleft()
                if verbose:
                    print(f"Node {i + 1} in level {step}:-")
                    if node.state != "":
                        print(f"State of current node:-\n{node.state}\n---")
                    else:
                        print("State of current node:-\n<EMPTY STRING> (root node; no thoughts generated yet)\n---")

                thoughts = self.thought_generator(state=node.state)
                if node.state == '':
                    updated_states = thoughts
                else:
                    updated_states = [node.state + '\n' + thought for thought in thoughts]
                for j in range(len(thoughts)):
                    if verbose:
                        print(f"Thought candidate {j + 1}:-\n{thoughts[j]}\n---")
                    child = TreeNode(state=updated_states[j], thought=thoughts[j])
                    node.children.append(child)
                    queue.append(child)
                if verbose:
                    print("Each of the above thought candidates has been added as a child of the current node.\n---")

            if verbose:
                print("Using the state evaluator to obtain values...\n---")
            for i in range(len(queue)):
                queue[i].value = self.state_evaluator(state=queue[i].state)
                if verbose:
                    print(f"Element {i + 1} in queue:-\n")
                    print(f"Value: {queue[i].value}\n---")

            if verbose:
                print("Initiating pruning (using the values obtained from the state evaluator).")
                print(f"Number of elements in queue: {len(queue)}")
            sorted_nodes = sorted(queue, key=lambda node: node.value, reverse=True)
            if step == self.n_steps:
                if verbose:
                    print("Since this is the last step, setting the breadth limit to 1.")
                    print("In other words, retaining only the highest value element (in this last step).\n---")
                top_b_nodes = sorted_nodes[:1]
            else:
                if verbose:
                    print(f"Since this isn't the last step, leaving the breadth limit {self.breadth_limit} unchanged.\n---")
                top_b_nodes = sorted_nodes[:self.breadth_limit]
            top_b_states = [node.state for node in top_b_nodes]
            for i in range(len(queue)):
                node = queue.popleft()
                if verbose:
                    print(f"Element {i + 1} in queue:-\n")
                if node.state in top_b_states:
                    if verbose:
                        print(f"Retaining this element as it's in the top {len(top_b_states)} elements.\n---")
                    queue.append(node)
                else:
                    if verbose:
                        print(f"Dropping this element as it's not in the top {len(top_b_states)} elements.\n---")

            if verbose:
                print("~~~")

        # Return the thought of the highest value node (from the last step):
        node = queue.popleft()
        return node.thought

    # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/scripts/crosswords/search_crosswords-dfs.ipynb
    def dfs(self, verbose: bool = True) -> str:
        dfs_output = None

        def dfs_func(node: TreeNode, step: int) -> bool:
            nonlocal dfs_output

            if step > self.n_steps: # Base case: successful search.
                dfs_output = node.state # Record the last (output generation) step's output in the nonlocal variable `dfs_output`.
                return True

            if verbose:
                print(f"Step: {step}\n---")
                if node.state != "":
                    print(f"State of current node:-\n{node.state}\n---")
                else:
                    print("State of current node:-\n<EMPTY STRING> (root node; no thoughts generated yet)\n---")

            thoughts = self.thought_generator(state=node.state)
            if len(thoughts) == 0:
                if verbose:
                    print("No thoughts were generated. It's a dead end. Backtracking to the parent node.\n~~~")
                return False
            if node.state == '':
                updated_states = thoughts
            else:
                updated_states = [node.state + '\n' + thought for thought in thoughts]
            for j in range(len(thoughts)):
                if verbose:
                    print(f"Thought candidate {j + 1}:-\n{thoughts[j]}\n---")
                child = TreeNode(state=updated_states[j], thought=thoughts[j])
                node.children.append(child)
            if verbose:
                print("Each of the above thought candidates has been added as a child of the current node.\n---")

            cnt_per_state = 0
            for child in node.children:
                if verbose:
                    print("Reminder:-")
                    if node.state != "":
                        print(f"State of current node:-\n{node.state}\n---")
                    else:
                        print("State of current node:-\n<EMPTY STRING> (root node; no thoughts generated yet)\n---")
                    print(f"Currently traversing child number: {cnt_per_state + 1}\n")
                    print(f"State of current child:-\n{child.state}\n")
                    print("Using the state evaluator to obtain value...\n")
                child.value = self.state_evaluator(state=child.state)
                if verbose:
                    print(f"Value of current child: {child.value}\n---")
                if child.value >= self.heuristic_threshold:
                # Note: If this `if` condition isn't met, the child node is pruned, i.e., a subtree of the child isn't grown.
                    if verbose:
                        print("Value exceeds heuristic threshold. Searching subtree.\n---\n~~~")
                    end_search = dfs_func(child, step + 1)
                    if end_search:
                        if verbose:
                            print(f"Searching the subtree was successful! Backtracking all the way up.\n~~~")
                        return True
                    else:
                        if verbose:
                            print(f"Back at step {step}. Searching the subtree was unsuccessful! Trying the next child.\n---")
                cnt_per_state += 1
                if cnt_per_state >= self.max_per_state:
                    if verbose:
                        print(f"{self.max_per_state} children already searched for this node. Breaking the loop.\n---")
                    break
            if verbose:
                print(f"None of the child nodes led to success. Seems like a dead end. Backtracking to the parent node.\n~~~")
            return False

        dfs_func(node=self.root, step=1)
        return dfs_output

    def generate_html_tree(self, node: TreeNode) -> str:
        if node is None:
            return ""
        else:
            html = f"""<div class='node'>
<p>State:<br>{node.state}</p>
<hr>
<p>Thought:<br>{node.thought}</p>
<hr>
<p>Value:<br>{node.value}</p>"""
            for child in node.children:
                html += f"""<div class='child'>{self.generate_html_tree(child)}</div>"""
            html += """</div>"""
            return html

    def render_html_tree(self):
        html_tree = self.generate_html_tree(self.root)
        wrapped_html = f"""<!DOCTYPE html>
<html>
<head>
    <style>
        .node {{
            display: inline-block;
            border: 1px solid blue;
            padding: 10px;
            margin: 5px;
            text-align: center;
        }}
        .child {{
            display: flex;
        }}
    </style>
</head>
<body>
    {html_tree}
</body>
</html>"""
        display(HTML(wrapped_html))

让我们实例化我们的类,并运行 BFS 算法。

tot = TreeOfThoughts(client, "gpt-4", input_seq, get_thought_gen_prompt, get_state_eval_prompt, heuristic_calculator)
output = tot.bfs(verbose=True)
print(output)
Step 1 (corresponding to level 1 of the tree):-
---
Node 1 in level 1:-
State of current node:-
<EMPTY STRING> (root node; no thoughts generated yet)
---
Thought candidate 1:-
1 + 1 = 2 (left: 1 2 8)
---
Thought candidate 2:-
1 * 1 = 1 (left: 1 1 8)
---
Thought candidate 3:-
8 - 1 = 7 (left: 1 1 7)
---
Thought candidate 4:-
8 / 1 = 8 (left: 1 1 8)
---
Thought candidate 5:-
1 + 1 + 1 = 3 (left: 3 8)
---
Thought candidate 6:-
8 - 1 - 1 = 6 (left: 1 6)
---
Thought candidate 7:-
8 / 1 / 1 = 8 (left: 1 8)
---
Each of the above thought candidates has been added as a child of the current node.
---
Using the state evaluator to obtain values...
---
Element 1 in queue:-

Value: 21.001
---
Element 2 in queue:-

Value: 0.003
---
Element 3 in queue:-

Value: 0.003
---
Element 4 in queue:-

Value: 0.003
---
Element 5 in queue:-

Value: 60.0
---
Element 6 in queue:-

Value: 0.003
---
Element 7 in queue:-

Value: 0.003
---
Initiating pruning (using the values obtained from the state evaluator).
Number of elements in queue: 7
Since this isn't the last step, leaving the breadth limit 5 unchanged.
---
Element 1 in queue:-

Retaining this element as it's in the top 5 elements.
---
Element 2 in queue:-

Retaining this element as it's in the top 5 elements.
---
Element 3 in queue:-

Retaining this element as it's in the top 5 elements.
---
Element 4 in queue:-

Retaining this element as it's in the top 5 elements.
---
Element 5 in queue:-

Retaining this element as it's in the top 5 elements.
---
Element 6 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 7 in queue:-

Dropping this element as it's not in the top 5 elements.
---
~~~
Step 2 (corresponding to level 2 of the tree):-
---
Node 1 in level 2:-
State of current node:-
1 + 1 = 2 (left: 1 2 8)
---
Thought candidate 1:-
1 + 2 = 3 (left: 3 8)
---
Thought candidate 2:-
2 * 1 = 2 (left: 2 8)
---
Thought candidate 3:-
8 - 1 = 7 (left: 2 7)
---
Thought candidate 4:-
8 - 2 = 6 (left: 1 6)
---
Thought candidate 5:-
8 / 1 = 8 (left: 2 8)
---
Thought candidate 6:-
2 * 8 = 16 (left: 1 16)
---
Thought candidate 7:-
8 / 2 = 4 (left: 1 4)
---
Each of the above thought candidates has been added as a child of the current node.
---
Node 2 in level 2:-
State of current node:-
1 * 1 = 1 (left: 1 1 8)
---
Thought candidate 1:-
1 + 1 = 2 (left: 2 8)
---
Thought candidate 2:-
1 * 1 = 1 (left: 1 8)
---
Thought candidate 3:-
8 - 1 = 7 (left: 1 7)
---
Thought candidate 4:-
8 / 1 = 8 (left: 1 8)
---
Each of the above thought candidates has been added as a child of the current node.
---
Node 3 in level 2:-
State of current node:-
8 - 1 = 7 (left: 1 1 7)
---
Thought candidate 1:-
1 + 1 = 2 (left: 2 7)
---
Thought candidate 2:-
7 - 1 = 6 (left: 1 6)
---
Thought candidate 3:-
7 / 1 = 7 (left: 1 7)
---
Thought candidate 4:-
1 * 1 = 1 (left: 1 7)
---
Each of the above thought candidates has been added as a child of the current node.
---
Node 4 in level 2:-
State of current node:-
8 / 1 = 8 (left: 1 1 8)
---
Thought candidate 1:-
1 + 1 = 2 (left: 2 8)
---
Thought candidate 2:-
1 * 1 = 1 (left: 1 8)
---
Thought candidate 3:-
8 - 1 = 7 (left: 1 7)
---
Thought candidate 4:-
8 / 1 = 8 (left: 1 8)
---
Each of the above thought candidates has been added as a child of the current node.
---
Node 5 in level 2:-
State of current node:-
1 + 1 + 1 = 3 (left: 3 8)
---
Thought candidate 1:-
3 + 8 = 11 (left: 11)
---
Thought candidate 2:-
8 / 3 = 2.67 (left: 2.67)
---
Thought candidate 3:-
8 - 3 = 5 (left: 5)
---
Thought candidate 4:-
3 * 8 = 24 (left: 24)
---
Each of the above thought candidates has been added as a child of the current node.
---
Using the state evaluator to obtain values...
---
Element 1 in queue:-

Value: 60.0
---
Element 2 in queue:-

Value: 0.003
---
Element 3 in queue:-

Value: 0.003
---
Element 4 in queue:-

Value: 0.003
---
Element 5 in queue:-

Value: 0.003
---
Element 6 in queue:-

Value: 0.003
---
Element 7 in queue:-

Value: 0.003
---
Element 8 in queue:-

Value: 0.003
---
Element 9 in queue:-

Value: 0.003
---
Element 10 in queue:-

Value: 0.003
---
Element 11 in queue:-

Value: 0.003
---
Element 12 in queue:-

Value: 0.003
---
Element 13 in queue:-

Value: 0.003
---
Element 14 in queue:-

Value: 0.003
---
Element 15 in queue:-

Value: 0.003
---
Element 16 in queue:-

Value: 0.003
---
Element 17 in queue:-

Value: 0.003
---
Element 18 in queue:-

Value: 0.003
---
Element 19 in queue:-

Value: 0.003
---
Element 20 in queue:-

Value: 0.003
---
Element 21 in queue:-

Value: 0.002
---
Element 22 in queue:-

Value: 0.003
---
Element 23 in queue:-

Value: 60.0
---
Initiating pruning (using the values obtained from the state evaluator).
Number of elements in queue: 23
Since this isn't the last step, leaving the breadth limit 5 unchanged.
---
Element 1 in queue:-

Retaining this element as it's in the top 5 elements.
---
Element 2 in queue:-

Retaining this element as it's in the top 5 elements.
---
Element 3 in queue:-

Retaining this element as it's in the top 5 elements.
---
Element 4 in queue:-

Retaining this element as it's in the top 5 elements.
---
Element 5 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 6 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 7 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 8 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 9 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 10 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 11 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 12 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 13 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 14 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 15 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 16 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 17 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 18 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 19 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 20 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 21 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 22 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 23 in queue:-

Retaining this element as it's in the top 5 elements.
---
~~~
Step 3 (corresponding to level 3 of the tree):-
---
Node 1 in level 3:-
State of current node:-
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
---
Thought candidate 1:-
3 + 8 = 11 (left: 11)
---
Thought candidate 2:-
8 - 3 = 5 (left: 5)
---
Thought candidate 3:-
3 * 8 = 24 (left: 24)
---
Thought candidate 4:-
8 / 3 = 2.67 (left: 2.67)
---
Each of the above thought candidates has been added as a child of the current node.
---
Node 2 in level 3:-
State of current node:-
1 + 1 = 2 (left: 1 2 8)
2 * 1 = 2 (left: 2 8)
---
Thought candidate 1:-
2 + 8 = 10 (left: 10)
---
Thought candidate 2:-
2 * 8 = 16 (left: 16)
---
Thought candidate 3:-
8 - 2 = 6 (left: 6)
---
Thought candidate 4:-
8 / 2 = 4 (left: 4)
---
Each of the above thought candidates has been added as a child of the current node.
---
Node 3 in level 3:-
State of current node:-
1 + 1 = 2 (left: 1 2 8)
8 - 1 = 7 (left: 2 7)
---
Thought candidate 1:-
2 + 7 = 9 (left: 9)
---
Thought candidate 2:-
7 - 2 = 5 (left: 5)
---
Thought candidate 3:-
2 * 7 = 14 (left: 14)
---
Thought candidate 4:-
7 / 2 = 3.5 (left: 3.5)
---
Each of the above thought candidates has been added as a child of the current node.
---
Node 4 in level 3:-
State of current node:-
1 + 1 = 2 (left: 1 2 8)
8 - 2 = 6 (left: 1 6)
---
Thought candidate 1:-
1 + 6 = 7 (left: 7)
---
Thought candidate 2:-
6 - 1 = 5 (left: 5)
---
Thought candidate 3:-
1 * 6 = 6 (left: 6)
---
Thought candidate 4:-
6 / 1 = 6 (left: 6)
---
Each of the above thought candidates has been added as a child of the current node.
---
Node 5 in level 3:-
State of current node:-
1 + 1 + 1 = 3 (left: 3 8)
3 * 8 = 24 (left: 24)
---
Thought candidate 1:-
Answer: (1 + 1 + 1) * 8 = 24
---
Each of the above thought candidates has been added as a child of the current node.
---
Using the state evaluator to obtain values...
---
Element 1 in queue:-

Value: 0.003
---
Element 2 in queue:-

Value: 0.003
---
Element 3 in queue:-

Value: 60.0
---
Element 4 in queue:-

Value: 0.003
---
Element 5 in queue:-

Value: 0.003
---
Element 6 in queue:-

Value: 0.003
---
Element 7 in queue:-

Value: 20.002
---
Element 8 in queue:-

Value: 0.001
---
Element 9 in queue:-

Value: 0.003
---
Element 10 in queue:-

Value: 0.003
---
Element 11 in queue:-

Value: 0.003
---
Element 12 in queue:-

Value: 0.003
---
Element 13 in queue:-

Value: 0.001
---
Element 14 in queue:-

Value: 0.003
---
Element 15 in queue:-

Value: 0.003
---
Element 16 in queue:-

Value: 40.001
---
Element 17 in queue:-

Value: 60.0
---
Initiating pruning (using the values obtained from the state evaluator).
Number of elements in queue: 17
Since this isn't the last step, leaving the breadth limit 5 unchanged.
---
Element 1 in queue:-

Retaining this element as it's in the top 5 elements.
---
Element 2 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 3 in queue:-

Retaining this element as it's in the top 5 elements.
---
Element 4 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 5 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 6 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 7 in queue:-

Retaining this element as it's in the top 5 elements.
---
Element 8 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 9 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 10 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 11 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 12 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 13 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 14 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 15 in queue:-

Dropping this element as it's not in the top 5 elements.
---
Element 16 in queue:-

Retaining this element as it's in the top 5 elements.
---
Element 17 in queue:-

Retaining this element as it's in the top 5 elements.
---
~~~
Step 4 (corresponding to level 4 of the tree):-
---
Node 1 in level 4:-
State of current node:-
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
3 + 8 = 11 (left: 11)
---
Thought candidate 1:-
There is no possible operation as there is only one number.
---
Each of the above thought candidates has been added as a child of the current node.
---
Node 2 in level 4:-
State of current node:-
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
3 * 8 = 24 (left: 24)
---
Thought candidate 1:-
Answer: ((1 + 1) + 1) * 8 = 24
---
Each of the above thought candidates has been added as a child of the current node.
---
Node 3 in level 4:-
State of current node:-
1 + 1 = 2 (left: 1 2 8)
2 * 1 = 2 (left: 2 8)
8 - 2 = 6 (left: 6)
---
Thought candidate 1:-
8 + 6 = 14 (left: 8 14 14)
---
Thought candidate 2:-
8 - 6 = 2 (left: 2 8 14)
---
Thought candidate 3:-
14 - 6 = 8 (left: 8 8 8)
---
Thought candidate 4:-
14 + 6 = 20 (left: 8 8 20)
---
Thought candidate 5:-
2 * 6 = 12 (left: 8 12 14)
---
Thought candidate 6:-
14 / 6 = ~2.33 (left: ~2.33 8 8) (not a valid step, as we are only considering integer solutions)
---
Thought candidate 7:-
8 / 6 = ~1.33 (left: ~1.33 8 14) (not a valid step, as we are only considering integer solutions)
---
Thought candidate 8:-
6 * 8 = 48 (left: 8 14 48)
---
Each of the above thought candidates has been added as a child of the current node.
---
Node 4 in level 4:-
State of current node:-
1 + 1 = 2 (left: 1 2 8)
8 - 2 = 6 (left: 1 6)
6 / 1 = 6 (left: 6)
---
Thought candidate 1:-
6 + 10 = 16 (left: 8 14 16)
---
Thought candidate 2:-
6 + 4 = 10 (left: 8 10 14)
---
Thought candidate 3:-
16 - 6 = 10 (left: 8 10 14)
---
Thought candidate 4:-
16 / 6 = 2.67 (left: 2.67 8 14)
---
Thought candidate 5:-
6 - 2 = 4 (left: 4 8 14)
---
Thought candidate 6:-
6 / 2 = 3 (left: 3 8 14)
---
Thought candidate 7:-
7 + 6 = 13 (left: 8 8 13)
---
Thought candidate 8:-
12 - 6 = 6 (left: 6 8 8)
---
Each of the above thought candidates has been added as a child of the current node.
---
Node 5 in level 4:-
State of current node:-
1 + 1 + 1 = 3 (left: 3 8)
3 * 8 = 24 (left: 24)
Answer: (1 + 1 + 1) * 8 = 24
---
Thought candidate 1:-
1 + 1 = 2 (left: 1 2)
---
Thought candidate 2:-
1 - 1 = 0 (left: 0 1)
---
Thought candidate 3:-
1 * 1 = 1 (left: 1 1)
---
Thought candidate 4:-
This input is incomplete, it is not possible to define the next steps without knowing the remaining part of the expression.
---
Each of the above thought candidates has been added as a child of the current node.
---
Using the state evaluator to obtain values...
---
Element 1 in queue:-

Value: 0
---
Element 2 in queue:-

Value: 60.0
---
Element 3 in queue:-

Value: 0
---
Element 4 in queue:-

Value: 0
---
Element 5 in queue:-

Value: 0
---
Element 6 in queue:-

Value: 0
---
Element 7 in queue:-

Value: 0
---
Element 8 in queue:-

Value: 0
---
Element 9 in queue:-

Value: 0
---
Element 10 in queue:-

Value: 0
---
Element 11 in queue:-

Value: 0
---
Element 12 in queue:-

Value: 0
---
Element 13 in queue:-

Value: 0
---
Element 14 in queue:-

Value: 0
---
Element 15 in queue:-

Value: 0
---
Element 16 in queue:-

Value: 0
---
Element 17 in queue:-

Value: 0
---
Element 18 in queue:-

Value: 0
---
Element 19 in queue:-

Value: 0.003
---
Element 20 in queue:-

Value: 0.003
---
Element 21 in queue:-

Value: 0.003
---
Element 22 in queue:-

Value: 0.003
---
Initiating pruning (using the values obtained from the state evaluator).
Number of elements in queue: 22
Since this is the last step, setting the breadth limit to 1.
In other words, retaining only the highest value element (in this last step).
---
Element 1 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 2 in queue:-

Retaining this element as it's in the top 1 elements.
---
Element 3 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 4 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 5 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 6 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 7 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 8 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 9 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 10 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 11 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 12 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 13 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 14 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 15 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 16 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 17 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 18 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 19 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 20 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 21 in queue:-

Dropping this element as it's not in the top 1 elements.
---
Element 22 in queue:-

Dropping this element as it's not in the top 1 elements.
---
~~~
Answer: ((1 + 1) + 1) * 8 = 24

是时候可视化这棵树了。

tot.render_html_tree()

为了避免 HTML 渲染问题,我已将树保存为 HTML 文件,您可以在此处查看。以下是相同内容的截图

好的。是时候看看 dfs 方法了。

注意: ToT 论文没有在“创意写作”任务中演示 DFS。(它只演示了 BFS。)但我们无论如何都会演示它。

dfs 方法是深度优先搜索 (DFS) 算法的定制版本。其工作原理如下

  1. dfs 方法内部,有一个名为 dfs_output 的变量(初始值为 None)。如果搜索成功,搜索结果将记录在此变量中。如果搜索不成功,此变量的值将保持为 None
  2. DFS 最适合使用递归执行。因此,我们在 dfs 方法内部使用了嵌套的递归函数 dfs_func。这个嵌套函数返回一个布尔值:如果搜索成功则返回 True,否则返回 False
  3. 基本情况如下:if step > self.n_steps。但为什么呢?嗯,假设如果当前步骤已超过解决问题所需的步骤数,则搜索成功。例如,在24点游戏任务中,self.n_steps 始终等于 43 个中间步骤 + 1 个输出生成步骤)。因此,如果当前步骤超过 4,我们将输出记录在非局部变量 dfs_output 中,然后通过返回 True 来回溯。
  4. 在递归情况下,我们从当前节点生成思想候选。每个思想候选都作为当前节点的子节点添加。
  5. 现在,是时候遍历子节点了。对于每个子节点,我们从状态评估器获取一个值。
  6. 我们使用一个启发式阈值来决定是增长子树(从此子节点开始)还是剪枝。经过一番实验,我们发现 3.0 的启发式阈值对此任务效果很好。
  7. 如果子节点的值未能超过启发式阈值,则子节点被剪枝,即不会增长该子节点的子树。
  8. 否则,我们使用以下递归调用来增长和搜索子树:end_search = dfs_func(child, step + 1)
  9. 如果 end_search 恰好为 True,则表示搜索成功。在这种情况下,我们通过返回 True 来回溯。
  10. 如果 end_search 恰好为 False,我们不返回任何内容。相反,我们继续下一个子项。
  11. 为了更好地控制搜索,使用了额外的超参数 max_per_state。此超参数指定了特定节点要探索的最大子节点数。如果探索的子节点数达到 max_per_state,我们将中断循环。
  12. 如果遍历子节点没有导致成功的搜索,那么当前节点似乎是一个死胡同。在这种情况下,我们回溯到父节点。搜索将继续...

好了,让我们实际调用 dfs 方法。通过传入 verbose=True,我们可以观察 DFS 算法的运行情况。

为了了解算法,我们最初将 max_per_state 设置为一个不合理低的数值:2。(由于我们不允许在每个节点探索足够的子节点,搜索将失败。这是故意的。我们希望在搜索轨迹中看到回溯的作用。)

tot = TreeOfThoughts(client, "gpt-4", input_seq, get_thought_gen_prompt, get_state_eval_prompt, heuristic_calculator, max_per_state=2)
output = tot.dfs(verbose=True)
print("None" if output is None else output)
Step: 1
---
State of current node:-
<EMPTY STRING> (root node; no thoughts generated yet)
---
Thought candidate 1:-
1 + 1 = 2 (left: 1 2 8)
---
Thought candidate 2:-
1 * 1 = 1 (left: 1 1 8)
---
Thought candidate 3:-
8 - 1 = 7 (left: 1 1 7)
---
Thought candidate 4:-
8 / 1 = 8 (left: 1 1 8)
---
Thought candidate 5:-
1 * 8 = 8 (left: 1 1 8)
---
Thought candidate 6:-
8 - 1 = 7 (left: 1 7 1)
---
Thought candidate 7:-
8 / 1 = 8 (left: 1 8 1)
---
Each of the above thought candidates has been added as a child of the current node.
---
Reminder:-
State of current node:-
<EMPTY STRING> (root node; no thoughts generated yet)
---
Currently traversing child number: 1

State of current child:-
1 + 1 = 2 (left: 1 2 8)

Using the state evaluator to obtain value...

Value of current child: 22.0
---
Value exceeds heuristic threshold. Searching subtree.
---
~~~
Step: 2
---
State of current node:-
1 + 1 = 2 (left: 1 2 8)
---
Thought candidate 1:-
1 + 2 = 3 (left: 3 8)
---
Thought candidate 2:-
2 * 1 = 2 (left: 2 8)
---
Thought candidate 3:-
8 - 1 = 7 (left: 2 7)
---
Thought candidate 4:-
8 / 1 = 8 (left: 2 8)
---
Thought candidate 5:-
8 - 2 = 6 (left: 1 6)
---
Thought candidate 6:-
2 * 8 = 16 (left: 1 16)
---
Thought candidate 7:-
1 * 2 = 2 (left: 2 8)
---
Each of the above thought candidates has been added as a child of the current node.
---
Reminder:-
State of current node:-
1 + 1 = 2 (left: 1 2 8)
---
Currently traversing child number: 1

State of current child:-
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)

Using the state evaluator to obtain value...

Value of current child: 60.0
---
Value exceeds heuristic threshold. Searching subtree.
---
~~~
Step: 3
---
State of current node:-
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
---
Thought candidate 1:-
3 + 8 = 11 (left: 11)
---
Thought candidate 2:-
8 - 3 = 5 (left: 5)
---
Thought candidate 3:-
8 / 3 = 2.67 (left: 2.67)
---
Thought candidate 4:-
3 * 8 = 24 (left: 24)
---
Each of the above thought candidates has been added as a child of the current node.
---
Reminder:-
State of current node:-
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
---
Currently traversing child number: 1

State of current child:-
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
3 + 8 = 11 (left: 11)

Using the state evaluator to obtain value...

Value of current child: 0.003
---
Reminder:-
State of current node:-
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
---
Currently traversing child number: 2

State of current child:-
1 + 1 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
8 - 3 = 5 (left: 5)

Using the state evaluator to obtain value...

Value of current child: 0.003
---
2 children already searched for this node. Breaking the loop.
---
None of the child nodes led to success. Seems like a dead end. Backtracking to the parent node.
~~~
Back at step 2. Searching the subtree was unsuccessful! Trying the next child.
---
Reminder:-
State of current node:-
1 + 1 = 2 (left: 1 2 8)
---
Currently traversing child number: 2

State of current child:-
1 + 1 = 2 (left: 1 2 8)
2 * 1 = 2 (left: 2 8)

Using the state evaluator to obtain value...

Value of current child: 0.003
---
2 children already searched for this node. Breaking the loop.
---
None of the child nodes led to success. Seems like a dead end. Backtracking to the parent node.
~~~
Back at step 1. Searching the subtree was unsuccessful! Trying the next child.
---
Reminder:-
State of current node:-
<EMPTY STRING> (root node; no thoughts generated yet)
---
Currently traversing child number: 2

State of current child:-
1 * 1 = 1 (left: 1 1 8)

Using the state evaluator to obtain value...

Value of current child: 0.003
---
2 children already searched for this node. Breaking the loop.
---
None of the child nodes led to success. Seems like a dead end. Backtracking to the parent node.
~~~
None

接下来,我们将 max_per_state 增加到 10(以增加成功搜索的概率),看看会发生什么。

tot = TreeOfThoughts(client, "gpt-4", input_seq, get_thought_gen_prompt, get_state_eval_prompt, heuristic_calculator, max_per_state=10)
output = tot.dfs(verbose=True)
print("None" if output is None else output)
Step: 1
---
State of current node:-
<EMPTY STRING> (root node; no thoughts generated yet)
---
Thought candidate 1:-
1 + 1 = 2 (left: 1 2 8)
---
Thought candidate 2:-
1 * 1 = 1 (left: 1 1 8)
---
Thought candidate 3:-
8 - 1 = 7 (left: 1 1 7)
---
Thought candidate 4:-
8 / 1 = 8 (left: 1 1 8)
---
Thought candidate 5:-
8 - 1 = 7 (left: 1 7 1)
---
Thought candidate 6:-
1 + 1 = 2 (left: 2 1 8)
---
Thought candidate 7:-
8 / 1 = 8 (left: 1 8 1)
---
Thought candidate 8:-
1 * 1 = 1 (left: 1 8 1)
---
Each of the above thought candidates has been added as a child of the current node.
---
Reminder:-
State of current node:-
<EMPTY STRING> (root node; no thoughts generated yet)
---
Currently traversing child number: 1

State of current child:-
1 + 1 = 2 (left: 1 2 8)

Using the state evaluator to obtain value...

Value of current child: 2.001
---
Reminder:-
State of current node:-
<EMPTY STRING> (root node; no thoughts generated yet)
---
Currently traversing child number: 2

State of current child:-
1 * 1 = 1 (left: 1 1 8)

Using the state evaluator to obtain value...

Value of current child: 0.003
---
Reminder:-
State of current node:-
<EMPTY STRING> (root node; no thoughts generated yet)
---
Currently traversing child number: 3

State of current child:-
8 - 1 = 7 (left: 1 1 7)

Using the state evaluator to obtain value...

Value of current child: 0.003
---
Reminder:-
State of current node:-
<EMPTY STRING> (root node; no thoughts generated yet)
---
Currently traversing child number: 4

State of current child:-
8 / 1 = 8 (left: 1 1 8)

Using the state evaluator to obtain value...

Value of current child: 0.003
---
Reminder:-
State of current node:-
<EMPTY STRING> (root node; no thoughts generated yet)
---
Currently traversing child number: 5

State of current child:-
8 - 1 = 7 (left: 1 7 1)

Using the state evaluator to obtain value...

Value of current child: 0.003
---
Reminder:-
State of current node:-
<EMPTY STRING> (root node; no thoughts generated yet)
---
Currently traversing child number: 6

State of current child:-
1 + 1 = 2 (left: 2 1 8)

Using the state evaluator to obtain value...

Value of current child: 21.001
---
Value exceeds heuristic threshold. Searching subtree.
---
~~~
Step: 2
---
State of current node:-
1 + 1 = 2 (left: 2 1 8)
---
Thought candidate 1:-
2 + 1 = 3 (left: 3 8)
---
Thought candidate 2:-
2 * 1 = 2 (left: 2 8)
---
Thought candidate 3:-
8 - 2 = 6 (left: 1 6)
---
Thought candidate 4:-
8 - 1 = 7 (left: 2 7)
---
Thought candidate 5:-
8 / 2 = 4 (left: 1 4)
---
Thought candidate 6:-
8 / 1 = 8 (left: 2 8)
---
Thought candidate 7:-
2 * 8 = 16 (left: 1 16)
---
Thought candidate 8:-
1 * 8 = 8 (left: 2 8)
---
Each of the above thought candidates has been added as a child of the current node.
---
Reminder:-
State of current node:-
1 + 1 = 2 (left: 2 1 8)
---
Currently traversing child number: 1

State of current child:-
1 + 1 = 2 (left: 2 1 8)
2 + 1 = 3 (left: 3 8)

Using the state evaluator to obtain value...

Value of current child: 60.0
---
Value exceeds heuristic threshold. Searching subtree.
---
~~~
Step: 3
---
State of current node:-
1 + 1 = 2 (left: 2 1 8)
2 + 1 = 3 (left: 3 8)
---
Thought candidate 1:-
3 + 8 = 11 (left: 11)
---
Thought candidate 2:-
8 - 3 = 5 (left: 5)
---
Thought candidate 3:-
8 / 3 = 2.666667 (left: 2.666667)
---
Thought candidate 4:-
8 * 3 = 24 (left: 24)
---
Thought candidate 5:-
3 - 8 = -5 (left: -5)
---
Each of the above thought candidates has been added as a child of the current node.
---
Reminder:-
State of current node:-
1 + 1 = 2 (left: 2 1 8)
2 + 1 = 3 (left: 3 8)
---
Currently traversing child number: 1

State of current child:-
1 + 1 = 2 (left: 2 1 8)
2 + 1 = 3 (left: 3 8)
3 + 8 = 11 (left: 11)

Using the state evaluator to obtain value...

Value of current child: 0.002
---
Reminder:-
State of current node:-
1 + 1 = 2 (left: 2 1 8)
2 + 1 = 3 (left: 3 8)
---
Currently traversing child number: 2

State of current child:-
1 + 1 = 2 (left: 2 1 8)
2 + 1 = 3 (left: 3 8)
8 - 3 = 5 (left: 5)

Using the state evaluator to obtain value...

Value of current child: 0.002
---
Reminder:-
State of current node:-
1 + 1 = 2 (left: 2 1 8)
2 + 1 = 3 (left: 3 8)
---
Currently traversing child number: 3

State of current child:-
1 + 1 = 2 (left: 2 1 8)
2 + 1 = 3 (left: 3 8)
8 / 3 = 2.666667 (left: 2.666667)

Using the state evaluator to obtain value...

Value of current child: 0.002
---
Reminder:-
State of current node:-
1 + 1 = 2 (left: 2 1 8)
2 + 1 = 3 (left: 3 8)
---
Currently traversing child number: 4

State of current child:-
1 + 1 = 2 (left: 2 1 8)
2 + 1 = 3 (left: 3 8)
8 * 3 = 24 (left: 24)

Using the state evaluator to obtain value...

Value of current child: 60.0
---
Value exceeds heuristic threshold. Searching subtree.
---
~~~
Step: 4
---
State of current node:-
1 + 1 = 2 (left: 2 1 8)
2 + 1 = 3 (left: 3 8)
8 * 3 = 24 (left: 24)
---
Thought candidate 1:-
Answer: (1 + 1 + 1) * 8 = 24
---
Each of the above thought candidates has been added as a child of the current node.
---
Reminder:-
State of current node:-
1 + 1 = 2 (left: 2 1 8)
2 + 1 = 3 (left: 3 8)
8 * 3 = 24 (left: 24)
---
Currently traversing child number: 1

State of current child:-
1 + 1 = 2 (left: 2 1 8)
2 + 1 = 3 (left: 3 8)
8 * 3 = 24 (left: 24)
Answer: (1 + 1 + 1) * 8 = 24

Using the state evaluator to obtain value...

Value of current child: 60.0
---
Value exceeds heuristic threshold. Searching subtree.
---
~~~
Searching the subtree was successful! Backtracking all the way up.
~~~
Searching the subtree was successful! Backtracking all the way up.
~~~
Searching the subtree was successful! Backtracking all the way up.
~~~
Searching the subtree was successful! Backtracking all the way up.
~~~
1 + 1 = 2 (left: 2 1 8)
2 + 1 = 3 (left: 3 8)
8 * 3 = 24 (left: 24)
Answer: (1 + 1 + 1) * 8 = 24

搜索成功了!上面的搜索轨迹很棒,对吗?

好的。让我们可视化这棵树。

tot.render_html_tree()

为了避免 HTML 渲染问题,我已将树保存为 HTML 文件,您可以在此处查看。以下是相同内容的截图

一个可重用的 TreeOfThoughts

在上述部分中,ToT 的超参数是硬编码的(分别与论文中用于创意写作24点游戏的值相对应)。然而,为了使该类可重用,我们需要在构造函数中接受超参数作为参数。

这是一个可重用的 TreeOfThoughts

class TreeOfThoughts:
    def __init__(
            self,
            client: Union[OpenAI, InferenceClient],
            model: str,
            input_seq: str,
            n_steps: int,
            thought_gen_strategy: str,
            get_thought_gen_prompt: Callable,
            state_eval_strategy: str,
            get_state_eval_prompt: Callable,
            n_evals: int,
            heuristic_calculator: Callable,
            n_candidates: Optional[int] = None,
            stop_string: Optional[str] = None,
            breadth_limit: Optional[int] = None,
            heuristic_threshold: Optional[float] = None,
            max_per_state: Optional[int] = None
    ):
        self.client = client
        self.model = model # e.g., "gpt-4" if using `OpenAI` and "meta-llama/Meta-Llama-3.1-8B-Instruct" if using `InferenceClient`.
        self.input_seq = input_seq
        self.root = TreeNode(state='', thought='')
        self.n_steps = n_steps # Equal to the number of intermediate steps + 1 output generation step.
        # Note: The tree height is equal to `n_steps + 1`. That is, we include the root node when calculating the tree height.
        if thought_gen_strategy in ['sample', 'propose']:
            self.thought_gen_strategy = thought_gen_strategy
        else:
            raise ValueError(f"The `thought_gen_strategy` argument must be either 'sample' or 'propose'. Couldn't recognize the following: '{thought_gen_strategy}'")
        self.get_thought_gen_prompt = get_thought_gen_prompt
        if state_eval_strategy in ['vote', 'value']:
            self.state_eval_strategy = state_eval_strategy
        else:
            raise ValueError(f"The `state_eval_strategy` argument must be either 'vote' or 'value'. Couldn't recognize the following: '{state_eval_strategy}'")
        self.get_state_eval_prompt = get_state_eval_prompt
        self.n_evals = n_evals # The number of times to either (i) vote on the states, or (ii) sample values for each state (depending on `state_eval_strategy`).
        self.heuristic_calculator = heuristic_calculator
        self.n_candidates = n_candidates # The number of thoughts to generate from a particular node. Relevant only for the 'sample' thought generation strategy.
        self.stop_string = stop_string # Relevant only for the 'sample' thought generation strategy.
        if self.thought_gen_strategy == 'sample':
            assert self.stop_string is not None, "For the 'sample' thought generation strategy, `stop_string` can't be `None` (due to the zero-shot CoT prompt template)."
            assert self.n_steps == 2, "For the 'sample' thought generation strategy, `n_steps` must be equal to 2 (due to the zero-shot CoT prompt template)."
        self.breadth_limit = breadth_limit # The number of most promising states to retain (after pruning) - at each level of the tree. Relevant only for BFS.
        self.heuristic_threshold = heuristic_threshold # Used to decide whether to grow/prune a subtree (starting at a particular child). Relevant only for DFS.
        self.max_per_state = max_per_state # The maximum number of children to explore for a particular node. Relevant only for DFS.

    # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/models.py
    def chat_completions(
            self,
            prompt: str,
            temperature: float = 0.7,
            max_tokens: int = 1000,
            n: int = 1,
            stop: Optional[List[str]] = None,
            **kwargs
    ) -> List[str]:
        outputs = []
        messages = [{'role': "user", 'content': prompt}]
        if isinstance(self.client, OpenAI):
            response = self.client.chat.completions.create(
                messages=messages,
                model=self.model,
                temperature=temperature,
                max_tokens=max_tokens,
                n=n, # The `n` responses are i.i.d.
                stop=stop,
                **kwargs
            )
            outputs.extend([choice.message.content for choice in response.choices])
        else: # `self.client` is an instance of `InferenceClient`.
            # The Hugging Face API doesn't support the `n` argument. Hence, we need to use a loop to generate `n` i.i.d. responses.
            for _ in range(n):
                response = self.client.chat.completions.create(
                    messages=messages,
                    model=self.model,
                    temperature=temperature,
                    max_tokens=max_tokens,
                    stop=stop,
                    **kwargs
                )
                outputs.append(response.choices[0].message.content)
        return outputs

    def thought_generator(self, state: str, stop_string: Optional[List[str]] = None) -> List[str]:
        prompt = self.get_thought_gen_prompt(self.input_seq, state)
        if self.thought_gen_strategy == 'sample':
            thoughts = self.chat_completions(prompt, n=self.n_candidates, stop=stop_string)
            return thoughts
        else: # `self.thought_gen_strategy` is equal to 'propose'.
            responses = self.chat_completions(prompt, n=1)
            thoughts = responses[0].split('\n')
            return thoughts

    def state_evaluator(self, states: Optional[List[str]] = None, state: Optional[str] = None) -> Union[List[float], float]:
        if self.state_eval_strategy == 'vote':
            assert states is not None, "For the 'vote' state evaluation strategy, `states` can't be `None`."
            prompt = self.get_state_eval_prompt(self.input_seq, states)
            state_evals = self.chat_completions(prompt, n=self.n_evals)
            vote_results = self.heuristic_calculator(states, state_evals)
            return vote_results
        else: # `self.state_eval_strategy` is equal to 'value'.
            assert state is not None, "For the 'value' state evaluation strategy, `state` can't be `None`."
            prompt = self.get_state_eval_prompt(self.input_seq, state)
            state_evals = self.chat_completions(prompt, n=self.n_evals)
            value = self.heuristic_calculator(state, state_evals)
            return value

    # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/methods/bfs.py
    def bfs(self, verbose: bool = True) -> str:
        assert self.breadth_limit is not None, "For the BFS search algorithm, `breadth_limit` can't be `None`."

        queue = deque()
        queue.append(self.root)

        for step in range(1, self.n_steps + 1):
            if verbose:
                print(f"Step {step} (corresponding to level {step} of the tree):-\n---")
            for i in range(len(queue)):
                node = queue.popleft()
                if verbose:
                    print(f"Node {i + 1} in level {step}:-")
                    if node.state != "":
                        print(f"State of current node:-\n{node.state}\n---")
                    else:
                        print("State of current node:-\n<EMPTY STRING> (root node; no thoughts generated yet)\n---")

                if self.thought_gen_strategy == 'sample' and step == 1:
                    thoughts = self.thought_generator(state=node.state, stop_string=[self.stop_string])
                else:
                    thoughts = self.thought_generator(state=node.state)
                if node.state == '':
                    updated_states = thoughts
                else:
                    updated_states = [node.state + '\n' + thought for thought in thoughts]
                for j in range(len(thoughts)):
                    if verbose:
                        print(f"Thought candidate {j + 1}:-\n{thoughts[j]}\n---")
                    child = TreeNode(state=updated_states[j], thought=thoughts[j])
                    node.children.append(child)
                    queue.append(child)

            if verbose:
                print("Using the state evaluator to obtain values...\n---")
            if self.state_eval_strategy == 'vote':
                states = [node.state for node in queue]
                values = self.state_evaluator(states=states)
            for i in range(len(queue)):
                if self.state_eval_strategy == 'vote':
                    queue[i].value = values[i]
                else: # `self.state_eval_strategy` is equal to 'value'.
                    queue[i].value = self.state_evaluator(state=queue[i].state)
                if verbose:
                    print(f"Element {i + 1} in queue:-\n")
                    print(f"Value: {queue[i].value}\n---")

            if verbose:
                print("Initiating pruning (using the values obtained from the state evaluator).")
                print(f"Number of elements in queue: {len(queue)}")
            sorted_nodes = sorted(queue, key=lambda node: node.value, reverse=True)
            if step == self.n_steps:
                if verbose:
                    print("Since this is the last step, setting the breadth limit to 1.")
                    print("In other words, retaining only the highest value element (in this last step).\n---")
                top_b_nodes = sorted_nodes[:1]
            else:
                if verbose:
                    print(f"Since this isn't the last step, leaving the breadth limit {self.breadth_limit} unchanged.\n---")
                top_b_nodes = sorted_nodes[:self.breadth_limit]
            top_b_states = [node.state for node in top_b_nodes]
            for i in range(len(queue)):
                node = queue.popleft()
                if verbose:
                    print(f"Element {i + 1} in queue:-\n")
                if node.state in top_b_states:
                    if verbose:
                        print(f"Retaining this element as it's in the top {len(top_b_states)} elements.\n---")
                    queue.append(node)
                else:
                    if verbose:
                        print(f"Dropping this element as it's not in the top {len(top_b_states)} elements.\n---")

            if verbose:
                print("~~~")

        # Return the thought of the highest value node (from the last step):
        node = queue.popleft()
        return node.thought

    # Reference: https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/scripts/crosswords/search_crosswords-dfs.ipynb
    def dfs(self, verbose: bool = True) -> str:
        assert self.heuristic_threshold is not None and self.max_per_state is not None, "For the DFS search algorithm, `heuristic_threshold` and `max_per_state` can't be `None`."

        dfs_output = None

        def dfs_func(node: TreeNode, step: int) -> bool:
            nonlocal dfs_output

            if step > self.n_steps: # Base case: successful search.
                dfs_output = node.state # Record the last (output generation) step's output in the nonlocal variable `dfs_output`.
                return True

            if verbose:
                print(f"Step: {step}\n---")
                if node.state != "":
                    print(f"State of current node:-\n{node.state}\n---")
                else:
                    print("State of current node:-\n<EMPTY STRING> (root node; no thoughts generated yet)\n---")

            thoughts = self.thought_generator(state=node.state)
            if len(thoughts) == 0:
                if verbose:
                    print("No thoughts were generated. It's a dead end. Backtracking to the parent node.\n~~~")
                return False
            if node.state == '':
                updated_states = thoughts
            else:
                updated_states = [node.state + '\n' + thought for thought in thoughts]
            for j in range(len(thoughts)):
                if verbose:
                    print(f"Thought candidate {j + 1}:-\n{thoughts[j]}\n---")
                child = TreeNode(state=updated_states[j], thought=thoughts[j])
                node.children.append(child)
            if verbose:
                print("Each of the above thought candidates has been added as a child of the current node.\n---")

            cnt_per_state = 0
            for child in node.children:
                if verbose:
                    print("Reminder:-")
                    if node.state != "":
                        print(f"State of current node:-\n{node.state}\n---")
                    else:
                        print("State of current node:-\n<EMPTY STRING> (root node; no thoughts generated yet)\n---")
                    print(f"Currently traversing child number: {cnt_per_state + 1}\n")
                    print(f"State of current child:-\n{child.state}\n")
                    print("Using the state evaluator to obtain value...\n")
                child.value = self.state_evaluator(state=child.state)
                if verbose:
                    print(f"Value of current child: {child.value}\n---")
                if child.value >= self.heuristic_threshold:
                # Note: If this `if` condition isn't met, the child node is pruned, i.e., a subtree of the child isn't grown.
                    if verbose:
                        print("Value exceeds heuristic threshold. Searching subtree.\n---\n~~~")
                    end_search = dfs_func(child, step + 1)
                    if end_search:
                        if verbose:
                            print(f"Searching the subtree was successful! Backtracking all the way up.\n~~~")
                        return True
                    else:
                        if verbose:
                            print(f"Back at step {step}. Searching the subtree was unsuccessful! Trying the next child.\n---")
                cnt_per_state += 1
                if cnt_per_state >= self.max_per_state:
                    if verbose:
                        print(f"{self.max_per_state} children already searched for this node. Breaking the loop.\n---")
                    break
            if verbose:
                print(f"None of the child nodes led to success. Seems like a dead end. Backtracking to the parent node.\n~~~")
            return False

        dfs_func(node=self.root, step=1)
        return dfs_output

    def generate_html_tree(self, node: TreeNode) -> str:
        if node is None:
            return ""
        else:
            html = f"""<div class='node'>
<p>State:<br>{node.state}</p>
<hr>
<p>Thought:<br>{node.thought}</p>
<hr>
<p>Value:<br>{node.value}</p>"""
            for child in node.children:
                html += f"""<div class='child'>{self.generate_html_tree(child)}</div>"""
            html += """</div>"""
            return html

    def render_html_tree(self):
        html_tree = self.generate_html_tree(self.root)
        wrapped_html = f"""<!DOCTYPE html>
<html>
<head>
    <style>
        .node {{
            display: inline-block;
            border: 1px solid blue;
            padding: 10px;
            margin: 5px;
            text-align: center;
        }}
        .child {{
            display: flex;
        }}
    </style>
</head>
<body>
    {html_tree}
</body>
</html>"""
        display(HTML(wrapped_html))

要在新任务上使用上述类,我们需要编写三个适用于该任务的自定义可调用函数

  • 获取思维生成提示
  • 获取状态评估提示
  • 启发式计算器

自定义可调用函数提供了将 ToT 框架适应新任务所需的灵活性。

此外,我们需要为该任务设置合适的超参数。(特别是,超参数需要在 (i) 搜索的穷尽程度和 (ii) 平均花费的时间之间取得平衡。)我们应该能够通过结合 (1) 我们对任务的人类知识/直觉和 (2) 一些实验来设置合适的超参数。

有了以上准备,我们就可以在新任务上应用 ToT 范式了。

结论

ToT 论文的灵感来源于 Newell、Shaw 和 Simon 在 1950 年代对人工智能的开创性工作。Newell 等人将问题解决描述为在组合问题空间中进行搜索,该空间被表示为一棵树。但什么是组合问题空间呢?ToT 论文中写道:

对人类问题解决的研究表明,人们在组合问题空间中进行搜索——一个树,其中节点代表部分解决方案,分支对应于修改它们的运算符。采取哪个分支由启发式决定,这些启发式有助于导航问题空间并引导问题解决者找到解决方案。

换句话说,人类在解决日常问题时(没有意识到)会进行启发式引导的树搜索。

来自 Newell 等人:

一个真正的问题解决过程涉及重复使用现有信息以启动探索,而探索反过来又揭示更多信息,直到最终发现解决问题的途径。

ToT 论文受此启发,并展示了将 LLM 的思维链 (CoT) 推理能力与启发式引导的树搜索框架相结合的强大之处。

ToT 的结果与 CoT 相比如何?

创意写作任务中,进行了两种类型的评估:(i) 使用 GPT-4 零样本提示提供 1-10 的标量分数(LLM 作为评判者),以及 (ii) 使用人类判断来比较来自不同方法的成对输出。

  • 关于 (i):ToT(7.56)平均被认为比 CoT(6.93)生成更连贯的段落。
  • 关于 (ii):结果发现,在 100 对段落中,人类在 41 对中更喜欢 ToT 而非 CoT,而在 21 对中更喜欢 CoT 而非 ToT。另外 38 对被认为是“同样连贯”。

24点游戏任务中,虽然使用 CoT 提示的 GPT-4 只解决了 4% 的任务,但 ToT 的成功率达到了 74%。这是一个巨大的差异!

希望这篇博客文章能让您更容易理解和使用 ToT 范式。如果您有任何想法,请随时发表评论!

GitHub 仓库: https://github.com/sambitmukherjee/reasoning-paradigms

致谢: 感谢我的同事 Rishav DashSandeep DeyRahim Khan 在 Python 代码和此博客文章的早期草稿上提供的宝贵反馈。

参考文献

  • J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv preprint arXiv:2201.11903, 2022.
  • T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, Y. Iwasawa. Large Language Models are Zero-Shot Reasoners. arXiv preprint arXiv:2205.11916, 2022.
  • S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, K. Narasimhan. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv preprint arXiv:2305.10601, 2023.
  • 思维树 GitHub 仓库:https://github.com/princeton-nlp/tree-of-thought-llm
  • A. Newell, J. C. Shaw, and H. A. Simon. 关于通用问题解决程序的报告。在 IFIP congress,第 256 卷,第 64 页。匹兹堡,宾夕法尼亚州,1959 年。

社区

注册登录 发表评论