推理提供商文档

使用带有推理提供商的编码环境

Hugging Face's logo
加入 Hugging Face 社区

并获得增强的文档体验

开始使用

使用带有推理提供商的编码环境

OpenEnv 是一个开放标准的代理式环境,AI 模型可以在其中与编码、浏览网页、玩游戏或交易股票市场等任务进行交互。

在本指南中,您将学习如何将 Hugging Face **推理提供商**与 OpenEnv 编码环境配对,以迭代生成、运行和优化 Python 程序。您可以将此模式用于任何代码生成任务,或者尝试其他环境来执行诸如浏览网页之类的任务。

为什么使用 OpenEnv 编码环境?

OpenEnv 编码环境是一个轻量级的 HTTP 服务,可在隔离的解释器中执行不受信任的 Python 代码。每次调用都会返回 stdoutstderrexit_code。将该沙箱与托管的 LLM 配对,可以构建闭环求解器:模型提出代码,OpenEnv Coding 执行它,然后您将结果反馈给模型,直到任务成功。

项目概览

我们将构建一个编码代理,它使用 OpenEnv 编码环境来执行 LLM 生成的代码。工作流程遵循以下步骤:

  1. 连接到 Hub 上的 OpenEnv 编码环境。
  2. 使用推理提供商生成 Python 代码来解决任务。
  3. 在环境中执行代码。
  4. 从环境中获取结果反馈。
  5. 将反馈发送回模型并重复此过程,直到任务成功。

如果您更喜欢将此指南作为完整脚本,请查看下面的 示例

第一步:准备环境

我们需要安装 OpenEnv 库并连接到编码环境。

让我们开始从 GitHub 安装 OpenEnv 库。

pip install git+https://github.com/meta-pytorch/OpenEnv.git

安装 OpenEnv 库后,我们就可以开始编码了。我们可以通过创建 CodingEnv 对象来连接到编码环境。

from envs.coding_env import CodeAction, CodingEnv

env = CodingEnv(base_url="https://openenv-coding-env.hf.space")

此示例连接到 Hugging Face Spaces 上托管的 编码环境。这对于本指南来说非常完美,但如果您需要在本地运行环境,我们建议使用 CodingEnv.from_hub(...)

第二步:配置推理客户端

设置与推理提供商的连接。您可以在 设置页面 找到您的用户访问令牌。

from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"]
)

第三步:定义任务和辅助函数

让我们定义一个带有明确成功标准的编码任务。

SYSTEM_PROMPT = (
    "You are an expert Python programmer. Respond with valid Python code that "
    "solves the user's task. Always wrap your final answer in a fenced code "
    "block starting with ```python. Provide a complete script that can be "
    "executed as-is, with no commentary outside the code block."
)
CODING_TASK = (
    "Write Python code that prints the sum of squares of the integers from 1 "
    "to 100 inclusive. The final line must be exactly `Result: <value>` with "
    "the correct number substituted."
)
EXPECTED_SUBSTRING = "Result: 338350"

环境将执行代码并返回结果。我们需要定义一种检查任务是否已解决的方法,以便我们可以停止循环。

接下来,我们需要创建实用函数来提取 Python 代码并为语言模型格式化反馈。

为简单起见,我们将使用简单的字符串匹配和 vanilla Python 来提取代码和格式化反馈。您可能选择使用您喜欢的提示工程库来构建更复杂的函数。

import re

def extract_python_code(text: str) -> str:
    """Extract the first Python code block from the model output."""
    code_blocks = re.findall(
        r"```(?:python)?\s*(.*?)```",
        text,
        re.IGNORECASE | re.DOTALL,
    )
    if code_blocks:
        return code_blocks[0].strip()
    return text.strip()


def format_feedback(
    step: int,
    stdout: str,
    stderr: str,
    exit_code: int,
) -> str:
    """Generate feedback text describing the previous execution."""
    stdout_display = stdout if stdout.strip() else "<empty>"
    stderr_display = stderr if stderr.strip() else "<empty>"
    return (
        f"Execution feedback for step {step}:\n"
        f"exit_code={exit_code}\n"
        f"stdout:\n{stdout_display}\n"
        f"stderr:\n{stderr_display}\n"
        "If the task is not solved, return an improved Python script."
    )


def build_initial_prompt(task: str) -> str:
    """Construct the first user prompt for the coding task."""
    return (
        "You must write Python code to satisfy the following task. "
        "When executed, your script should behave exactly as described.\n\n"
        f"Task:\n{task}\n\n"
        "Reply with the full script in a single ```python code block."
    )

这些函数将用于提取代码并为语言模型格式化反馈。

第四步:实现主求解循环

现在,让我们实现主求解循环。此循环将迭代地生成代码、执行它,并为语言模型格式化反馈。

让我们为循环定义一些常量。

MAX_ATTEMPTS = 5 # how many times to try to solve the task
MODEL = "openai/gpt-oss-120b:novita" # works great for this task
MAX_TOKENS = 2048 # the maximum number of tokens to generate
TEMPERATURE = 0.2 # the temperature to use for the model

让我们定义语言模型的初始消息。

history = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": build_initial_prompt(CODING_TASK)},
]

我们需要重置环境并获取初始观察。

obs = env.reset().observation

现在,让我们实现主循环。

for step in range(1, MAX_ATTEMPTS + 1):
    # Get model response
    response = client.chat.completions.create(
        model=MODEL,
        messages=history,
        max_tokens=2048,
        temperature=0.2,
    )

    assistant_message = response.choices[0].message.content.strip()
    history.append({"role": "assistant", "content": assistant_message})

    # Extract and execute code
    code = extract_python_code(assistant_message)

    # act in the environment
    result = env.step(CodeAction(code=code))

    # get the feedback from the environment
    obs = result.observation

    # Check if task is solved
    solved = obs.exit_code == 0 and EXPECTED_SUBSTRING in obs.stdout

    if solved:
        break

    # Provide feedback for next iteration
    history.append(
        {
            "role": "user",
            "content": format_feedback(
                step,
                obs.stdout,
                obs.stderr,
                obs.exit_code,
            ),
        }
    )

现在,让我们打印最终结果。

print(obs.stdout)

🎉 就是这样!您已成功构建了一个编码代理,它使用 OpenEnv 编码环境来执行 LLM 生成的代码。您现在可以尝试将其用于自己的任务。

完整的可用示例

这是展示本指南中所有概念的完整脚本。

点击查看完整的 coding_env_inference.py 脚本
#!/usr/bin/env python3
"""Solve a coding task with a hosted LLM via Hugging Face Inference.

This script mirrors ``textarena_wordle_inference.py`` but targets the Coding
environment. It launches the CodingEnv Docker image locally and asks an
OpenAI-compatible model served through Hugging Face's router to iteratively
produce Python code until the task is solved.

Prerequisites
-------------
1. Build the Coding environment Docker image::

       docker build \
           -f src/envs/coding_env/server/Dockerfile \
           -t coding-env:latest .

2. Set your Hugging Face token, or any other API key that is compatible with the OpenAI API:

       export HF_TOKEN=your_token_here
       export API_KEY=your_api_key_here

3. Run the script::

       python examples/coding_env_inference.py

The script keeps sending execution feedback to the model until it prints
``Result: 338350`` or reaches the configured step limit.
"""

from __future__ import annotations

import os
import re
from typing import List, Tuple

from openai import OpenAI

from envs.coding_env import CodeAction, CodingEnv


# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------

API_BASE_URL = "https://router.huggingface.co/v1"
API_KEY = os.getenv("API_KEY") or os.getenv("HF_TOKEN")

MODEL = "openai/gpt-oss-120b:novita"
MAX_STEPS = 5
VERBOSE = True

CODING_TASK = (
    "Write Python code that prints the sum of squares of the integers from 1 "
    "to 100 inclusive. The final line must be exactly `Result: <value>` with "
    "the correct number substituted."
)
EXPECTED_SUBSTRING = "Result: 338350"

SYSTEM_PROMPT = (
    "You are an expert Python programmer. Respond with valid Python code that "
    "solves the user's task. Always wrap your final answer in a fenced code "
    "block starting with ```python. Provide a complete script that can be "
    "executed as-is, with no commentary outside the code block."
)


# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------

def extract_python_code(text: str) -> str:
    """Extract the first Python code block from the model output."""

    code_blocks = re.findall(
        r"```(?:python)?\s*(.*?)```",
        text,
        re.IGNORECASE | re.DOTALL,
    )
    if code_blocks:
        return code_blocks[0].strip()
    return text.strip()


def format_feedback(
    step: int,
    stdout: str,
    stderr: str,
    exit_code: int,
) -> str:
    """Generate feedback text describing the previous execution."""

    stdout_display = stdout if stdout.strip() else "<empty>"
    stderr_display = stderr if stderr.strip() else "<empty>"
    return (
        f"Execution feedback for step {step}:\n"
        f"exit_code={exit_code}\n"
        f"stdout:\n{stdout_display}\n"
        f"stderr:\n{stderr_display}\n"
        "If the task is not solved, return an improved Python script."
    )


def build_initial_prompt(task: str) -> str:
    """Construct the first user prompt for the coding task."""

    return (
        "You must write Python code to satisfy the following task. "
        "When executed, your script should behave exactly as described.\n\n"
        f"Task:\n{task}\n\n"
        "Reply with the full script in a single ```python code block."
    )


# ---------------------------------------------------------------------------
# Gameplay
# ---------------------------------------------------------------------------

def solve_coding_task(
    env: CodingEnv,
    client: OpenAI,
) -> Tuple[bool, List[str]]:
    """Iteratively ask the model for code until the task is solved."""

    history = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": build_initial_prompt(CODING_TASK)},
    ]

    obs = env.reset().observation

    transcripts: List[str] = []

    for step in range(1, MAX_STEPS + 1):
        response = client.chat.completions.create(
            model=MODEL,
            messages=history,
            max_tokens=2048,
            temperature=0.2,
        )

        assistant_message = response.choices[0].message.content.strip()
        history.append({"role": "assistant", "content": assistant_message})

        code = extract_python_code(assistant_message)

        if VERBOSE:
            print(f"\n🛠️  Step {step}: executing model-produced code")
            print(code)

        result = env.step(CodeAction(code=code))
        obs = result.observation

        transcripts.append(
            (
                f"Step {step} | exit_code={obs.exit_code}\n"
                f"stdout:\n{obs.stdout}\n"
                f"stderr:\n{obs.stderr}\n"
            )
        )

        if VERBOSE:
            print("   ▶ exit_code:", obs.exit_code)
            if obs.stdout:
                print("   ▶ stdout:\n" + obs.stdout)
            if obs.stderr:
                print("   ▶ stderr:\n" + obs.stderr)

        solved = obs.exit_code == 0 and EXPECTED_SUBSTRING in obs.stdout
        if solved:
            return True, transcripts

        history.append(
            {
                "role": "user",
                "content": format_feedback(
                    step,
                    obs.stdout,
                    obs.stderr,
                    obs.exit_code,
                ),
            }
        )

        # Keep conversation history compact to avoid exceeding context limits
        if len(history) > 20:
            history = [history[0]] + history[-19:]

    return False, transcripts


# ---------------------------------------------------------------------------
# Entrypoint
# ---------------------------------------------------------------------------

def main() -> None:
    if not API_KEY:
        raise SystemExit(
            "HF_TOKEN (or API_KEY) must be set to query the model."
        )

    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)

    env = CodingEnv.from_docker_image(
        "coding-env:latest",
        ports={8000: 8000},
    )

    try:
        success, transcripts = solve_coding_task(env, client)
    finally:
        env.close()

    print(
        "\n✅ Session complete"
        if success
        else "\n⚠️ Session finished without solving the task"
    )
    print("--- Execution transcripts ---")
    for entry in transcripts:
        print(entry)


if __name__ == "__main__":
    main()

后续步骤

现在您已经拥有一个可用的编码代理,以下是一些扩展和改进它的方法:

  • 尝试不同的模型和提供商,看看它们的表现如何。
  • 尝试其他环境,例如浏览网页或玩游戏。
  • 将环境集成到您的应用程序中
在 GitHub 上更新

© . This site is unofficial and not affiliated with Hugging Face, Inc.