开始入门：

要开始入门，请从此处下载项目（点击 GDRL-IL-Project.zip 旁边的下载图标）。该 zip 文件包含“Starter”和“Complete”项目。

游戏代码已在 starter 项目中实现，并且节点已配置。我们将专注于

为 AIController 节点实现代码，
录制专家演示，
训练智能体并导出 .onnx 文件，我们可以在 Godot 中使用该文件进行推理。

在 Godot 中打开 starter 项目

解压 zip 文件，打开 Godot，点击“导入”，然后导航到解压存档的 Starter\Godot 文件夹。

打开机器人场景

您可以在文件系统搜索中搜索“robot”。

此场景包含几个不同的节点，包括 robot 节点，其中包含机器人的视觉形状，CameraXRotation 节点用于在使用鼠标的人工控制模式下“上下”旋转相机。AI 智能体不控制此节点，因为它对于学习任务不是必需的。RaycastSensors 节点包含两个 Raycast 传感器，可帮助智能体“感知”游戏世界的各个部分，包括墙壁、地板等。

点击 AIController3D 旁边的卷轴以打开脚本进行编辑

您可能需要折叠“robot”分支才能更容易找到它，或者您可以在“Robot”节点上方的“Filter”框中键入 aicontroller。

将 get_obs() 和 get_reward() 方法替换为以下实现：

func get_obs() -> Dictionary:
	var observations: Array[float] = []
	for raycast_sensor in raycast_sensors:
		observations.append_array(raycast_sensor.get_observation())

	var level_size = 16.0

	var chest_local = to_local(chest.global_position)
	var chest_direction = chest_local.normalized()
	var chest_distance = clampf(chest_local.length(), 0.0, level_size)
	
	var lever_local = to_local(lever.global_position)
	var lever_direction = lever_local.normalized()
	var lever_distance = clampf(lever_local.length(), 0.0, level_size)
		
	var key_local = to_local(key.global_position)
	var key_direction = key_local.normalized()
	var key_distance = clampf(key_local.length(), 0.0, level_size)
	
	var raft_local = to_local(raft.global_position)
	var raft_direction = raft_local.normalized()
	var raft_distance = clampf(raft_local.length(), 0.0, level_size)
	
	var player_speed = player.global_basis.inverse() * player.velocity.limit_length(5.0) / 5.0

	(
		observations
		.append_array(
			[
				chest_direction.x,
				chest_direction.y,
				chest_direction.z,
				chest_distance,
				lever_direction.x,
				lever_direction.y,
				lever_direction.z,
				lever_distance,
				key_direction.x,
				key_direction.y,
				key_direction.z,
				key_distance,
				raft_direction.x,
				raft_direction.y,
				raft_direction.z,
				raft_distance,
				raft.movement_direction_multiplier,
				float(player._is_lever_pulled),
				float(player._is_chest_opened),
				float(player._is_key_collected),
				float(player.is_on_floor()),
				player_speed.x,
				player_speed.y,
				player_speed.z,
			]
		)
	)
	return {"obs": observations}

func get_reward() -> float:
	return reward

在 get_obs() 中，我们首先从添加到 inspector 中 AIController3D 节点的两个 Raycast 传感器获取 obs，并将它们添加到 obs 中，然后我们获取到宝箱、拉杆、钥匙和木筏的相对位置向量，我们将其分为方向和距离，然后我们也将其添加到 obs 中。

我们还将其他游戏状态信息添加到 obs 中

拉杆是否已被拉动，
钥匙是否已被收集，
宝箱是否已被打开，
玩家是否在地板上（也决定了玩家是否可以跳跃），
玩家的归一化局部速度。

我们将布尔值（例如 _is_lever_pulled）转换为浮点数（0 或 1）。

在 get_reward() 中，我们只需要返回当前奖励。

将 _physics_process() 和 reset() 方法替换为以下实现：

func _physics_process(delta: float) -> void:
	# Reset on timeout, this is implemented in parent class to set needs_reset to true,
	# we are re-implementing here to call player.game_over() that handles the game reset.
	n_steps += 1
	if n_steps > reset_after:
		player.game_over()

	# In training or onnx inference modes, this method will be called by sync node with actions provided,
	# For expert demo recording mode, it will be called without any actions (as we set the actions based on human input),
	# For human control mode the method will not be called, so we call it here without any actions provided.
	if control_mode == ControlModes.HUMAN:
		set_action()

	# Reset the game faster if the lever is not pulled.
	steps_without_lever_pulled += 1
	if steps_without_lever_pulled > 200 and (not player._is_lever_pulled):
		player.game_over()

func reset():
	super.reset()
	steps_without_lever_pulled = 0

将 get_action_space()、get_action() 和 set_action() 方法替换为以下实现：

# Defines the actions for the AI agent ("size": 2 means 2 floats for this action)
func get_action_space() -> Dictionary:
	return {
		"movement": {"size": 2, "action_type": "continuous"},
		"rotation": {"size": 1, "action_type": "continuous"},
		"jump": {"size": 1, "action_type": "continuous"},
		"use_action": {"size": 1, "action_type": "continuous"}
	}

# We return the action values in the same order as defined in get_action_space() (important), but all in one array
# For actions of size 1, we return 1 float in the array, for size 2, 2 floats in the array, etc.
# set_action is called just before get_action by the sync node, so we can read the newly set values
func get_action():
	return [
		# "movement" action values
		player.requested_movement.x,
		player.requested_movement.y,
		# "rotation" action value
		player.requested_rotation.x,
		# "jump" action value (-1 if not requested, 1 if requested)
		-1.0 + 2.0 * float(player.jump_requested),
		# "use_action" action value (-1 if not requested, 1 if requested)
		-1.0 + 2.0 * float(player.use_action_requested)
	]

# Here we set human control and AI control actions to the robot
func set_action(action = null) -> void:
	# If there's no action provided, it means that AI is not controlling the robot (human control),
	if not action:
		# Only rotate if the mouse has moved since the last set_action call
		if previous_mouse_movement == mouse_movement:
			mouse_movement = Vector2.ZERO

		player.requested_movement = Input.get_vector(
			"move_left", "move_right", "move_forward", "move_back"
		)
		player.requested_rotation = mouse_movement

		var use_action = Input.is_action_pressed("requested_action")
		var jump = Input.is_action_pressed("requested_jump")

		player.use_action_requested = use_action
		player.jump_requested = jump

		previous_mouse_movement = mouse_movement	
	else:
		# If there is action provided, we set the actions received from the AI agent 
		player.requested_movement = Vector2(action.movement[0], action.movement[1])
		# The agent only rotates the robot along the Y axis, no need to rotate the camera along X axis
		player.requested_rotation = Vector2(action.rotation[0], 0.0)
		player.jump_requested = bool(action.jump[0] > 0)
		player.use_action_requested = bool(action.use_action[0] > 0)

对于 get_action()（仅在使用演示记录模式时需要），我们需要提供智能体在遇到相同状态时要发送的动作。重要的是这些值处于正确的范围内（-1.0 到 1.0），这就是为什么对于布尔状态，我们有 -1 + 2 * variable，并且顺序正确，如 get_action_space() 中定义的那样。

在演示记录模式下，调用 set_action() 时不提供动作，因为我们需要根据人工输入设置动作值。在训练/推理模式下，调用该方法时会带有一个 action 参数，其中包含 RL 模型提供的所有动作的值，因此我们有一个 if/else 来处理这两种情况。

更多信息包含在代码注释中。

将 _input 方法替换为以下实现：

# Record mouse movement for human and demo_record modes
# We don't directly rotate in input to allow for frame skipping (action_repeat setting) which
# will also be applied to the AI agent in training/inference modes.
func _input(event):
	if not (heuristic == "human" or heuristic == "demo_record"):
		return

	if event is InputEventMouseMotion:
		var movement_scale: float = 0.005
		mouse_movement.y = clampf(event.relative.y * movement_scale, -1.0, 1.0)
		mouse_movement.x = clampf(event.relative.x * movement_scale, -1.0, 1.0)

此代码部分记录人工控制和演示记录模式下的鼠标移动。

最后，保存脚本。我们准备好进行下一步了。

打开演示记录场景，然后点击 AIController3D 节点

您可以在文件系统搜索中搜索“demo”，也可以在场景的过滤器框中搜索“aicontroller”。

您无需进行任何更改，因为一切都已预设，但让我们回顾一下您需要在自己的环境中设置的内容

该场景包含修改后的 Level > Robot > AIController3D 节点设置

Control Mode 设置为 Record Expert Demos
Expert Demo Save Path 已填写
Action Repeat 设置为与 training_scene 和 onnx_inference_scene 中 Sync 节点设置的值相同。这意味着智能体设置的每个动作都会重复 3 个物理帧。AIController 中的设置将相同的动作重复添加到人工输入（这会引入一些延迟）以匹配相同的行为。这是一个相当低的值，不会引入太多延迟。如果您更改此值，请确保在所有 3 个位置都进行更改。
Remove Last Episode 键允许我们设置一个键，该键可用于删除录制期间失败的 episode，而无需重新启动整个会话。例如，如果机器人掉入水中并且游戏重置，我们可以使用此键删除先前录制的 episode，同时录制下一个 episode。它设置为 R，但您可以通过单击它，然后单击 Configure 按钮将其更改为任何键。

在具有挑战性的环境中使 episode 录制更容易的另一种方法是在录制期间减慢环境速度。这可以通过单击场景中的 Sync 节点并调整 Speed Up 属性（默认设置为 1）轻松完成。

让我们录制一些演示：

请注意，只有当我们录制了至少一个完整的 episode 并通过单击“X”或按 ALT+F4 关闭游戏窗口后，演示才会保存。使用 Godot 编辑器中的停止按钮不会保存演示。最好先尝试录制一个 episode，然后检查您是否在文件系统或 Godot 项目文件夹中看到“expert_demos.json”。

确保您仍然在 demo_record_scene 中，按 F6，演示录制将开始。

控制

鼠标控制相机（如果您需要调整鼠标灵敏度，请打开 robot 场景，单击 Robot 节点并调整 Rotation Speed，在录制演示、训练和推理时保持相同的值），
WASD 控制玩家移动，
SPACE 跳跃，
E 激活拉杆并打开宝箱

您可以先进行一些练习，以熟悉环境。如果您希望跳过录制演示，您也可以在已完成的项目中找到预先录制的演示，并从那里使用 expert_demos.json 文件。

录制的演示应至少包含 22-24 个完整的成功 episode。多个演示文件也可以在训练阶段使用，因此您不必一次录制所有演示（您可以使用之前提到的 Expert Demo Save Path 属性更改文件名）。

录制 23 个 episode 花了我约 10 分钟（由于钥匙有 2 个交替的生成位置，22 或 24 个将在演示中提供钥匙位置的均匀分布，但它非常接近）。当接近拉杆或宝箱时，我按下并按住 E 键稍长时间，以确保在靠近这些对象时动作被记录多个步骤。我还删除了几个我没有成功完成的 episode，方法是在接下来的 episode 中按 R 键。

这是演示录制过程的加速视频

导出游戏以进行训练：

您可以使用 Project > Export 从 Godot 导出游戏。

< > 在 GitHub 上更新

Deep RL 课程