載入自定義四足機器人環境¶

在本教程中，你將使用模型檔案（以 .xml 結尾）建立一個 MuJoCo 四足機器人行走環境，而無需建立新的類。

步驟

獲取你的機器人的 MJCF（或 URDF）模型檔案。
- 建立你自己的模型（參見 MuJoCo 指南），或者，
- 尋找一個現成的模型（在本教程中，我們將使用 MuJoCo Menagerie 集合中的一個模型）。
使用 xml_file 引數載入模型。
調整環境引數以獲得所需的行為。
1. 調整環境模擬引數。
2. 調整環境終止引數。
3. 調整環境獎勵引數。
4. 調整環境觀測引數。
訓練智慧體來移動你的機器人。

# The reader is expected to be familiar with the `Gymnasium` API & library, the basics of robotics,
# and the included `Gymnasium/MuJoCo` environments with the robot model they use.
# Familiarity with the **MJCF** file model format and the `MuJoCo` simulator is not required but is recommended.

設定¶

我們需求 gymnasium>=1.0.0。

import numpy as np

import gymnasium as gym


# Make sure Gymnasium is properly installed
# You can run this in your terminal:
# pip install "gymnasium>=1.0.0"

步驟 0.1 - 下載機器人模型¶

在本教程中，我們將從優秀的 MuJoCo Menagerie 機器人模型集合中載入 Unitree Go1 機器人。Go1 是一種四足機器人，控制它移動是一個重要的學習問題，比 Gymnasium/MuJoCo/Ant 環境難得多。

注意：原始教程包含一張 Unitree Go1 機器人在平坦地形場景中的圖片。你可以在以下連結檢視此圖片：https://github.com/google-deepmind/mujoco_menagerie/blob/main/unitree_go1/go1.png?raw=true

# You can download the whole MuJoCo Menagerie collection (which includes `Go1`):
# git clone https://github.com/google-deepmind/mujoco_menagerie.git

# You can use any other quadruped robot with this tutorial, just adjust the environment parameter values for your robot.

步驟 1 - 載入模型¶

要載入模型，我們只需在 Ant-v5 框架中使用 xml_file 引數。

# Basic loading (uncomment to use)
# env = gym.make('Ant-v5', xml_file='./mujoco_menagerie/unitree_go1/scene.xml')

# Although this is enough to load the model, we will need to tweak some environment parameters
# to get the desired behavior for our environment, so we will also explicitly set the simulation,
# termination, reward and observation arguments, which we will tweak in the next step.

env = gym.make(
    "Ant-v5",
    xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
    forward_reward_weight=0,
    ctrl_cost_weight=0,
    contact_cost_weight=0,
    healthy_reward=0,
    main_body=1,
    healthy_z_range=(0, np.inf),
    include_cfrc_ext_in_observation=True,
    exclude_current_positions_from_observation=False,
    reset_noise_scale=0,
    frame_skip=1,
    max_episode_steps=1000,
)

步驟 2 - 調整環境引數¶

調整環境引數對於獲得所需的學習行為至關重要。在以下小節中，建議讀者查閱引數文件以獲取更詳細的資訊。

步驟 2.1 - 調整環境模擬引數¶

相關的引數是 frame_skip、reset_noise_scale 和 max_episode_steps。

# We want to tweak the `frame_skip` parameter to get `dt` to an acceptable value
# (typical values are `dt` ∈ [0.01, 0.1] seconds),

# Reminder: dt = frame_skip × model.opt.timestep, where `model.opt.timestep` is the integrator
# time step selected in the MJCF model file.

# The `Go1` model we are using has an integrator timestep of `0.002`, so by selecting
# `frame_skip=25` we can set the value of `dt` to `0.05s`.

# To avoid overfitting the policy, `reset_noise_scale` should be set to a value appropriate
# to the size of the robot, we want the value to be as large as possible without the initial
# distribution of states being invalid (`Terminal` regardless of control actions),
# for `Go1` we choose a value of `0.1`.

# And `max_episode_steps` determines the number of steps per episode before `truncation`,
# here we set it to 1000 to be consistent with the based `Gymnasium/MuJoCo` environments,
# but if you need something higher you can set it so.

env = gym.make(
    "Ant-v5",
    xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
    forward_reward_weight=0,
    ctrl_cost_weight=0,
    contact_cost_weight=0,
    healthy_reward=0,
    main_body=1,
    healthy_z_range=(0, np.inf),
    include_cfrc_ext_in_observation=True,
    exclude_current_positions_from_observation=False,
    reset_noise_scale=0.1,  # set to avoid policy overfitting
    frame_skip=25,  # set dt=0.05
    max_episode_steps=1000,  # kept at 1000
)

步驟 2.2 - 調整環境終止引數¶

終止對於機器人環境很重要，可以避免取樣“無用”的時間步。

# The arguments of interest are `terminate_when_unhealthy` and `healthy_z_range`.

# We want to set `healthy_z_range` to terminate the environment when the robot falls over,
# or jumps really high, here we have to choose a value that is logical for the height of the robot,
# for `Go1` we choose `(0.195, 0.75)`.
# Note: `healthy_z_range` checks the absolute value of the height of the robot,
# so if your scene contains different levels of elevation it should be set to `(-np.inf, np.inf)`

# We could also set `terminate_when_unhealthy=False` to disable termination altogether,
# which is not desirable in the case of `Go1`.

env = gym.make(
    "Ant-v5",
    xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
    forward_reward_weight=0,
    ctrl_cost_weight=0,
    contact_cost_weight=0,
    healthy_reward=0,
    main_body=1,
    healthy_z_range=(
        0.195,
        0.75,
    ),  # set to avoid sampling steps where the robot has fallen or jumped too high
    include_cfrc_ext_in_observation=True,
    exclude_current_positions_from_observation=False,
    reset_noise_scale=0.1,
    frame_skip=25,
    max_episode_steps=1000,
)

# Note: If you need a different termination condition, you can write your own `TerminationWrapper`
# (see the documentation).

步驟 2.3 - 調整環境獎勵引數¶

相關的引數是 forward_reward_weight、ctrl_cost_weight、contact_cost_weight、healthy_reward 和 main_body。

# For the arguments `forward_reward_weight`, `ctrl_cost_weight`, `contact_cost_weight` and `healthy_reward`
# we have to pick values that make sense for our robot, you can use the default `MuJoCo/Ant`
# parameters for references and tweak them if a change is needed for your environment.
# In the case of `Go1` we only change the `ctrl_cost_weight` since it has a higher actuator force range.

# For the argument `main_body` we have to choose which body part is the main body
# (usually called something like "torso" or "trunk" in the model file) for the calculation
# of the `forward_reward`, in the case of `Go1` it is the `"trunk"`
# (Note: in most cases including this one, it can be left at the default value).

env = gym.make(
    "Ant-v5",
    xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
    forward_reward_weight=1,  # kept the same as the 'Ant' environment
    ctrl_cost_weight=0.05,  # changed because of the stronger motors of `Go1`
    contact_cost_weight=5e-4,  # kept the same as the 'Ant' environment
    healthy_reward=1,  # kept the same as the 'Ant' environment
    main_body=1,  # represents the "trunk" of the `Go1` robot
    healthy_z_range=(0.195, 0.75),
    include_cfrc_ext_in_observation=True,
    exclude_current_positions_from_observation=False,
    reset_noise_scale=0.1,
    frame_skip=25,
    max_episode_steps=1000,
)

# Note: If you need a different reward function, you can write your own `RewardWrapper`
# (see the documentation).

步驟 2.4 - 調整環境觀測引數¶

相關的引數是 include_cfrc_ext_in_observation 和 exclude_current_positions_from_observation。

# Here for `Go1` we have no particular reason to change them.

env = gym.make(
    "Ant-v5",
    xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
    forward_reward_weight=1,
    ctrl_cost_weight=0.05,
    contact_cost_weight=5e-4,
    healthy_reward=1,
    main_body=1,
    healthy_z_range=(0.195, 0.75),
    include_cfrc_ext_in_observation=True,  # kept the same as the 'Ant' environment
    exclude_current_positions_from_observation=False,  # kept the same as the 'Ant' environment
    reset_noise_scale=0.1,
    frame_skip=25,
    max_episode_steps=1000,
)


# Note: If you need additional observation elements (such as additional sensors),
# you can write your own `ObservationWrapper` (see the documentation).

步驟 3 - 訓練你的智慧體¶

最後，我們完成了，可以使用強化學習（RL）演算法訓練一個智慧體來讓 Go1 機器人行走/奔跑。注意：如果你使用自己的機器人模型遵循了本指南，你可能會在訓練過程中發現某些環境引數不符合預期，請隨時返回步驟 2 進行必要的更改。

def main():
    """Run the final Go1 environment setup."""
    # Note: The original tutorial includes an image showing the Go1 robot in the environment.
    # The image is available at: https://github.com/Kallinteris-Andreas/Gymnasium-kalli/assets/30759571/bf1797a3-264d-47de-b14c-e3c16072f695

    env = gym.make(
        "Ant-v5",
        xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
        forward_reward_weight=1,
        ctrl_cost_weight=0.05,
        contact_cost_weight=5e-4,
        healthy_reward=1,
        main_body=1,
        healthy_z_range=(0.195, 0.75),
        include_cfrc_ext_in_observation=True,
        exclude_current_positions_from_observation=False,
        reset_noise_scale=0.1,
        frame_skip=25,
        max_episode_steps=1000,
        render_mode="rgb_array",  # Change to "human" to visualize
    )

    # Example of running the environment for a few steps
    obs, info = env.reset()

    for _ in range(100):
        action = env.action_space.sample()  # Replace with your agent's action
        obs, reward, terminated, truncated, info = env.step(action)

        if terminated or truncated:
            obs, info = env.reset()

    env.close()
    print("Environment tested successfully!")

    # Now you would typically:
    # 1. Set up your RL algorithm
    # 2. Train the agent
    # 3. Evaluate the agent's performance

結語¶

你可以按照本指南建立大多數四足環境。要建立類人型/雙足機器人，你也可以使用 Gymnasium/MuJoCo/Humnaoid-v5 框架遵循本指南。

注意：原始教程包含一段訓練好的 Go1 機器人行走的影片演示。影片顯示該機器人根據製造商的資料，最高速度可達 4.7 米/秒。在原始教程中，此影片嵌入自：https://odysee.com/$/embed/@Kallinteris-Andreas:7/video0-step-0-to-step-1000:1?r=6fn5jA9uZQUZXGKVpwtqjz1eyJcS3hj3

# Author: @kallinteris-andreas (https://github.com/Kallinteris-Andreas)