推理提供商文档
使用带有推理提供商的编码环境
并获得增强的文档体验
开始使用
使用带有推理提供商的编码环境
OpenEnv 是一个开放标准的代理式环境,AI 模型可以在其中与编码、浏览网页、玩游戏或交易股票市场等任务进行交互。
在本指南中,您将学习如何将 Hugging Face **推理提供商**与 OpenEnv 编码环境配对,以迭代生成、运行和优化 Python 程序。您可以将此模式用于任何代码生成任务,或者尝试其他环境来执行诸如浏览网页之类的任务。
为什么使用 OpenEnv 编码环境?
OpenEnv 编码环境是一个轻量级的 HTTP 服务,可在隔离的解释器中执行不受信任的 Python 代码。每次调用都会返回 stdout、stderr 和 exit_code。将该沙箱与托管的 LLM 配对,可以构建闭环求解器:模型提出代码,OpenEnv Coding 执行它,然后您将结果反馈给模型,直到任务成功。
项目概览
我们将构建一个编码代理,它使用 OpenEnv 编码环境来执行 LLM 生成的代码。工作流程遵循以下步骤:
- 连接到 Hub 上的 OpenEnv 编码环境。
- 使用推理提供商生成 Python 代码来解决任务。
- 在环境中执行代码。
- 从环境中获取结果反馈。
- 将反馈发送回模型并重复此过程,直到任务成功。
如果您更喜欢将此指南作为完整脚本,请查看下面的 示例。
第一步:准备环境
我们需要安装 OpenEnv 库并连接到编码环境。
让我们开始从 GitHub 安装 OpenEnv 库。
pip install git+https://github.com/meta-pytorch/OpenEnv.git
安装 OpenEnv 库后,我们就可以开始编码了。我们可以通过创建 CodingEnv 对象来连接到编码环境。
from envs.coding_env import CodeAction, CodingEnv
env = CodingEnv(base_url="https://openenv-coding-env.hf.space")此示例连接到 Hugging Face Spaces 上托管的 编码环境。这对于本指南来说非常完美,但如果您需要在本地运行环境,我们建议使用 CodingEnv.from_hub(...)。
第二步:配置推理客户端
设置与推理提供商的连接。您可以在 设置页面 找到您的用户访问令牌。
from openai import OpenAI
client = OpenAI(
base_url="https://router.huggingface.co/v1",
api_key=os.environ["HF_TOKEN"]
)第三步:定义任务和辅助函数
让我们定义一个带有明确成功标准的编码任务。
SYSTEM_PROMPT = (
"You are an expert Python programmer. Respond with valid Python code that "
"solves the user's task. Always wrap your final answer in a fenced code "
"block starting with ```python. Provide a complete script that can be "
"executed as-is, with no commentary outside the code block."
)
CODING_TASK = (
"Write Python code that prints the sum of squares of the integers from 1 "
"to 100 inclusive. The final line must be exactly `Result: <value>` with "
"the correct number substituted."
)
EXPECTED_SUBSTRING = "Result: 338350"环境将执行代码并返回结果。我们需要定义一种检查任务是否已解决的方法,以便我们可以停止循环。
接下来,我们需要创建实用函数来提取 Python 代码并为语言模型格式化反馈。
为简单起见,我们将使用简单的字符串匹配和 vanilla Python 来提取代码和格式化反馈。您可能选择使用您喜欢的提示工程库来构建更复杂的函数。
import re
def extract_python_code(text: str) -> str:
"""Extract the first Python code block from the model output."""
code_blocks = re.findall(
r"```(?:python)?\s*(.*?)```",
text,
re.IGNORECASE | re.DOTALL,
)
if code_blocks:
return code_blocks[0].strip()
return text.strip()
def format_feedback(
step: int,
stdout: str,
stderr: str,
exit_code: int,
) -> str:
"""Generate feedback text describing the previous execution."""
stdout_display = stdout if stdout.strip() else "<empty>"
stderr_display = stderr if stderr.strip() else "<empty>"
return (
f"Execution feedback for step {step}:\n"
f"exit_code={exit_code}\n"
f"stdout:\n{stdout_display}\n"
f"stderr:\n{stderr_display}\n"
"If the task is not solved, return an improved Python script."
)
def build_initial_prompt(task: str) -> str:
"""Construct the first user prompt for the coding task."""
return (
"You must write Python code to satisfy the following task. "
"When executed, your script should behave exactly as described.\n\n"
f"Task:\n{task}\n\n"
"Reply with the full script in a single ```python code block."
)这些函数将用于提取代码并为语言模型格式化反馈。
第四步:实现主求解循环
现在,让我们实现主求解循环。此循环将迭代地生成代码、执行它,并为语言模型格式化反馈。
让我们为循环定义一些常量。
MAX_ATTEMPTS = 5 # how many times to try to solve the task
MODEL = "openai/gpt-oss-120b:novita" # works great for this task
MAX_TOKENS = 2048 # the maximum number of tokens to generate
TEMPERATURE = 0.2 # the temperature to use for the model让我们定义语言模型的初始消息。
history = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": build_initial_prompt(CODING_TASK)},
]我们需要重置环境并获取初始观察。
obs = env.reset().observation
现在,让我们实现主循环。
for step in range(1, MAX_ATTEMPTS + 1):
# Get model response
response = client.chat.completions.create(
model=MODEL,
messages=history,
max_tokens=2048,
temperature=0.2,
)
assistant_message = response.choices[0].message.content.strip()
history.append({"role": "assistant", "content": assistant_message})
# Extract and execute code
code = extract_python_code(assistant_message)
# act in the environment
result = env.step(CodeAction(code=code))
# get the feedback from the environment
obs = result.observation
# Check if task is solved
solved = obs.exit_code == 0 and EXPECTED_SUBSTRING in obs.stdout
if solved:
break
# Provide feedback for next iteration
history.append(
{
"role": "user",
"content": format_feedback(
step,
obs.stdout,
obs.stderr,
obs.exit_code,
),
}
)
现在,让我们打印最终结果。
print(obs.stdout)🎉 就是这样!您已成功构建了一个编码代理,它使用 OpenEnv 编码环境来执行 LLM 生成的代码。您现在可以尝试将其用于自己的任务。
完整的可用示例
这是展示本指南中所有概念的完整脚本。
点击查看完整的 coding_env_inference.py 脚本
#!/usr/bin/env python3
"""Solve a coding task with a hosted LLM via Hugging Face Inference.
This script mirrors ``textarena_wordle_inference.py`` but targets the Coding
environment. It launches the CodingEnv Docker image locally and asks an
OpenAI-compatible model served through Hugging Face's router to iteratively
produce Python code until the task is solved.
Prerequisites
-------------
1. Build the Coding environment Docker image::
docker build \
-f src/envs/coding_env/server/Dockerfile \
-t coding-env:latest .
2. Set your Hugging Face token, or any other API key that is compatible with the OpenAI API:
export HF_TOKEN=your_token_here
export API_KEY=your_api_key_here
3. Run the script::
python examples/coding_env_inference.py
The script keeps sending execution feedback to the model until it prints
``Result: 338350`` or reaches the configured step limit.
"""
from __future__ import annotations
import os
import re
from typing import List, Tuple
from openai import OpenAI
from envs.coding_env import CodeAction, CodingEnv
# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------
API_BASE_URL = "https://router.huggingface.co/v1"
API_KEY = os.getenv("API_KEY") or os.getenv("HF_TOKEN")
MODEL = "openai/gpt-oss-120b:novita"
MAX_STEPS = 5
VERBOSE = True
CODING_TASK = (
"Write Python code that prints the sum of squares of the integers from 1 "
"to 100 inclusive. The final line must be exactly `Result: <value>` with "
"the correct number substituted."
)
EXPECTED_SUBSTRING = "Result: 338350"
SYSTEM_PROMPT = (
"You are an expert Python programmer. Respond with valid Python code that "
"solves the user's task. Always wrap your final answer in a fenced code "
"block starting with ```python. Provide a complete script that can be "
"executed as-is, with no commentary outside the code block."
)
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def extract_python_code(text: str) -> str:
"""Extract the first Python code block from the model output."""
code_blocks = re.findall(
r"```(?:python)?\s*(.*?)```",
text,
re.IGNORECASE | re.DOTALL,
)
if code_blocks:
return code_blocks[0].strip()
return text.strip()
def format_feedback(
step: int,
stdout: str,
stderr: str,
exit_code: int,
) -> str:
"""Generate feedback text describing the previous execution."""
stdout_display = stdout if stdout.strip() else "<empty>"
stderr_display = stderr if stderr.strip() else "<empty>"
return (
f"Execution feedback for step {step}:\n"
f"exit_code={exit_code}\n"
f"stdout:\n{stdout_display}\n"
f"stderr:\n{stderr_display}\n"
"If the task is not solved, return an improved Python script."
)
def build_initial_prompt(task: str) -> str:
"""Construct the first user prompt for the coding task."""
return (
"You must write Python code to satisfy the following task. "
"When executed, your script should behave exactly as described.\n\n"
f"Task:\n{task}\n\n"
"Reply with the full script in a single ```python code block."
)
# ---------------------------------------------------------------------------
# Gameplay
# ---------------------------------------------------------------------------
def solve_coding_task(
env: CodingEnv,
client: OpenAI,
) -> Tuple[bool, List[str]]:
"""Iteratively ask the model for code until the task is solved."""
history = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": build_initial_prompt(CODING_TASK)},
]
obs = env.reset().observation
transcripts: List[str] = []
for step in range(1, MAX_STEPS + 1):
response = client.chat.completions.create(
model=MODEL,
messages=history,
max_tokens=2048,
temperature=0.2,
)
assistant_message = response.choices[0].message.content.strip()
history.append({"role": "assistant", "content": assistant_message})
code = extract_python_code(assistant_message)
if VERBOSE:
print(f"\n🛠️ Step {step}: executing model-produced code")
print(code)
result = env.step(CodeAction(code=code))
obs = result.observation
transcripts.append(
(
f"Step {step} | exit_code={obs.exit_code}\n"
f"stdout:\n{obs.stdout}\n"
f"stderr:\n{obs.stderr}\n"
)
)
if VERBOSE:
print(" ▶ exit_code:", obs.exit_code)
if obs.stdout:
print(" ▶ stdout:\n" + obs.stdout)
if obs.stderr:
print(" ▶ stderr:\n" + obs.stderr)
solved = obs.exit_code == 0 and EXPECTED_SUBSTRING in obs.stdout
if solved:
return True, transcripts
history.append(
{
"role": "user",
"content": format_feedback(
step,
obs.stdout,
obs.stderr,
obs.exit_code,
),
}
)
# Keep conversation history compact to avoid exceeding context limits
if len(history) > 20:
history = [history[0]] + history[-19:]
return False, transcripts
# ---------------------------------------------------------------------------
# Entrypoint
# ---------------------------------------------------------------------------
def main() -> None:
if not API_KEY:
raise SystemExit(
"HF_TOKEN (or API_KEY) must be set to query the model."
)
client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
env = CodingEnv.from_docker_image(
"coding-env:latest",
ports={8000: 8000},
)
try:
success, transcripts = solve_coding_task(env, client)
finally:
env.close()
print(
"\n✅ Session complete"
if success
else "\n⚠️ Session finished without solving the task"
)
print("--- Execution transcripts ---")
for entry in transcripts:
print(entry)
if __name__ == "__main__":
main()后续步骤
现在您已经拥有一个可用的编码代理,以下是一些扩展和改进它的方法:
- 尝试不同的模型和提供商,看看它们的表现如何。
- 尝试其他环境,例如浏览网页或玩游戏。
- 将环境集成到您的应用程序中