封裝器¶
- class gymnasium.vector.VectorWrapper(env: VectorEnv)[source]¶
封裝向量化環境以實現模組化轉換。
此類是所有向量化環境封裝器的基類。子類可以重寫某些方法來改變原始向量化環境的行為,而無需修改原始程式碼。
注意
如果子類重寫了
__init__(),請不要忘記呼叫super().__init__(env)。- 引數:
env – 要封裝的環境
- step(actions: ActType) tuple[ObsType, ArrayType, ArrayType, ArrayType, dict[str, Any]][source]¶
使用動作逐步遍歷所有環境,並返回批處理資料。
- class gymnasium.vector.VectorObservationWrapper(env: VectorEnv)[source]¶
封裝向量化環境以實現觀測值的模組化轉換。
相當於向量化環境的
gymnasium.ObservationWrapper。- 引數:
env – 向量環境。
- class gymnasium.vector.VectorActionWrapper(env: VectorEnv)[source]¶
封裝向量化環境以實現動作的模組化轉換。
相當於向量化環境的
gymnasium.ActionWrapper。- 引數:
env – 要封裝的環境
- class gymnasium.vector.VectorRewardWrapper(env: VectorEnv)[source]¶
封裝向量化環境以實現獎勵的模組化轉換。
相當於向量化環境的
gymnasium.RewardWrapper。- 引數:
env – 要封裝的環境
僅用於向量環境的封裝器¶
- class gymnasium.wrappers.vector.DictInfoToList(env: VectorEnv)[source]¶
將向量化環境的資訊從
dict轉換為List[dict]。此封裝器將向量環境的資訊格式從字典轉換為字典列表。此封裝器旨在用於向量環境。如果使用其他對資訊執行操作(例如 RecordEpisodeStatistics)的封裝器,則此封裝器需要是外層封裝器。
即
DictInfoToList(RecordEpisodeStatistics(vector_env))示例
>>> import numpy as np >>> dict_info = { ... "k": np.array([0., 0., 0.5, 0.3]), ... "_k": np.array([False, False, True, True]) ... } ... >>> list_info = [{}, {}, {"k": 0.5}, {"k": 0.3}]
- 向量環境示例
>>> import numpy as np >>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3) >>> obs, info = envs.reset(seed=123) >>> info {} >>> envs = DictInfoToList(envs) >>> obs, info = envs.reset(seed=123) >>> info [{}, {}, {}]
- 另一個向量環境示例
>>> import numpy as np >>> import gymnasium as gym >>> envs = gym.make_vec("HalfCheetah-v4", num_envs=2) >>> _ = envs.reset(seed=123) >>> _ = envs.action_space.seed(123) >>> _, _, _, _, infos = envs.step(envs.action_space.sample()) >>> infos {'x_position': array([0.03332211, 0.10172355]), '_x_position': array([ True, True]), 'x_velocity': array([-0.06296527, 0.89345848]), '_x_velocity': array([ True, True]), 'reward_run': array([-0.06296527, 0.89345848]), '_reward_run': array([ True, True]), 'reward_ctrl': array([-0.24503504, -0.21944423], dtype=float32), '_reward_ctrl': array([ True, True])} >>> envs = DictInfoToList(envs) >>> _ = envs.reset(seed=123) >>> _ = envs.action_space.seed(123) >>> _, _, _, _, infos = envs.step(envs.action_space.sample()) >>> infos [{'x_position': np.float64(0.0333221090036294), 'x_velocity': np.float64(-0.06296527291998574), 'reward_run': np.float64(-0.06296527291998574), 'reward_ctrl': np.float32(-0.24503504)}, {'x_position': np.float64(0.10172354684460168), 'x_velocity': np.float64(0.8934584807363618), 'reward_run': np.float64(0.8934584807363618), 'reward_ctrl': np.float32(-0.21944423)}]
- 更改日誌
v0.24.0 - 最初新增為
VectorListInfov1.0.0 - 重新命名為
DictInfoToList
- 引數:
env (Env) – 要應用封裝器的環境
- class gymnasium.wrappers.vector.VectorizeTransformObservation(env: VectorEnv, wrapper: type[TransformObservation], **kwargs: Any)[source]¶
將單智慧體觀測轉換封裝器向量化,用於向量環境。
大多數用於單智慧體環境的 lambda 觀測封裝器都有向量化實現,建議使用者直接從 gymnasium.wrappers.vector… 匯入並使用它們。以下示例說明了需要自定義 lambda 觀測封裝器的情況。
- 示例 - 正常觀測
>>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> envs.close() >>> obs array([[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282], [ 0.02852531, 0.02858594, 0.0469136 , 0.02480598], [ 0.03517495, -0.000635 , -0.01098382, -0.03203924]], dtype=float32)
- 示例 - 應用自定義 lambda 觀測封裝器,複製環境中的觀測值
>>> import numpy as np >>> import gymnasium as gym >>> from gymnasium.spaces import Box >>> from gymnasium.wrappers import TransformObservation >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> old_space = envs.single_observation_space >>> new_space = Box(low=np.array([old_space.low, old_space.low]), high=np.array([old_space.high, old_space.high])) >>> envs = VectorizeTransformObservation(envs, wrapper=TransformObservation, func=lambda x: np.array([x, x]), observation_space=new_space) >>> obs, info = envs.reset(seed=123) >>> envs.close() >>> obs array([[[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282], [ 0.01823519, -0.0446179 , -0.02796401, -0.03156282]], [[ 0.02852531, 0.02858594, 0.0469136 , 0.02480598], [ 0.02852531, 0.02858594, 0.0469136 , 0.02480598]], [[ 0.03517495, -0.000635 , -0.01098382, -0.03203924], [ 0.03517495, -0.000635 , -0.01098382, -0.03203924]]], dtype=float32)
- 引數:
env – 要封裝的向量環境。
wrapper – 要向量化的封裝器
**kwargs – 封裝器的關鍵字引數
- class gymnasium.wrappers.vector.VectorizeTransformAction(env: VectorEnv, wrapper: type[TransformAction], **kwargs: Any)[source]¶
將單智慧體動作轉換封裝器向量化,用於向量環境。
- 示例 - 無動作轉換
>>> import gymnasium as gym >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> obs, rew, term, trunc, info = envs.step(envs.action_space.sample()) >>> envs.close() >>> obs array([[-4.6343064e-01, 9.8971417e-05], [-4.4488689e-01, -1.9375233e-03], [-4.3118435e-01, -1.5342437e-03]], dtype=float32)
- 示例 - 新增一個對動作應用 ReLU 的轉換
>>> import gymnasium as gym >>> from gymnasium.wrappers import TransformAction >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> envs = VectorizeTransformAction(envs, wrapper=TransformAction, func=lambda x: (x > 0.0) * x, action_space=envs.single_action_space) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> obs, rew, term, trunc, info = envs.step(envs.action_space.sample()) >>> envs.close() >>> obs array([[-4.6343064e-01, 9.8971417e-05], [-4.4354835e-01, -5.9898634e-04], [-4.3034542e-01, -6.9532328e-04]], dtype=float32)
- 引數:
env – 要封裝的向量環境
wrapper – 要向量化的封裝器
**kwargs – LambdaAction 封裝器的引數
- class gymnasium.wrappers.vector.VectorizeTransformReward(env: VectorEnv, wrapper: type[TransformReward], **kwargs: Any)[source]¶
將單智慧體獎勵轉換封裝器向量化,用於向量環境。
- 一個對獎勵應用 ReLU 的示例
>>> import gymnasium as gym >>> from gymnasium.wrappers import TransformReward >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> envs = VectorizeTransformReward(envs, wrapper=TransformReward, func=lambda x: (x > 0.0) * x) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> obs, rew, term, trunc, info = envs.step(envs.action_space.sample()) >>> envs.close() >>> rew array([-0., -0., -0.])
- 引數:
env – 要封裝的向量環境。
wrapper – 要向量化的封裝器
**kwargs – 封裝器的關鍵字引數
向量化通用封裝器¶
- class gymnasium.wrappers.vector.RecordEpisodeStatistics(env: VectorEnv, buffer_length: int = 100, stats_key: str = 'episode')[source]¶
此封裝器將跟蹤累積獎勵和回合長度。
在向量化環境中的任何回合結束時,回合統計資料將使用鍵
episode新增到info中,並且_episode鍵用於指示已終止或截斷回合的環境索引。>>> infos = { ... ... ... "episode": { ... "r": "<array of cumulative reward for each done sub-environment>", ... "l": "<array of episode length for each done sub-environment>", ... "t": "<array of elapsed time since beginning of episode for each done sub-environment>" ... }, ... "_episode": "<boolean array of length num-envs>" ... }
此外,最近的獎勵和回合長度儲存在緩衝區中,可以透過
wrapped_env.return_queue和wrapped_env.length_queue分別訪問。- 變數:
return_queue – 最近
deque_size個回合的累積獎勵length_queue – 最近
deque_size個回合的長度
示例
>>> from pprint import pprint >>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3) >>> envs = RecordEpisodeStatistics(envs) >>> obs, info = envs.reset(123) >>> _ = envs.action_space.seed(123) >>> end = False >>> while not end: ... obs, rew, term, trunc, info = envs.step(envs.action_space.sample()) ... end = term.any() or trunc.any() ... >>> envs.close() >>> pprint(info) {'_episode': array([ True, False, False]), '_final_info': array([ True, False, False]), '_final_observation': array([ True, False, False]), 'episode': {'l': array([11, 0, 0], dtype=int32), 'r': array([11., 0., 0.], dtype=float32), 't': array([0.007812, 0. , 0. ], dtype=float32)}, 'final_info': array([{}, None, None], dtype=object), 'final_observation': array([array([ 0.11448676, 0.9416149 , -0.20946532, -1.7619033 ], dtype=float32), None, None], dtype=object)}
- 引數:
env (Env) – 要應用封裝器的環境
buffer_length – 緩衝區
return_queue、length_queue和time_queue的大小stats_key – 用於儲存資料的資訊鍵
已實現的觀測封裝器¶
- class gymnasium.wrappers.vector.TransformObservation(env: VectorEnv, func: Callable[[ObsType], Any], observation_space: Space | None = None, single_observation_space: Space | None = None)[source]¶
透過提供給封裝器的函式來轉換觀測值。
此函式允許手動指定向量觀測函式以及單觀測函式。例如,當可以並行處理向量觀測或透過其他更最佳化的方法進行處理時,這是可取的。否則,應使用
VectorizeTransformObservation,其中只需定義single_func。- 示例 - 無觀測轉換
>>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> obs array([[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282], [ 0.02852531, 0.02858594, 0.0469136 , 0.02480598], [ 0.03517495, -0.000635 , -0.01098382, -0.03203924]], dtype=float32) >>> envs.close()
- 示例 - 有觀測轉換
>>> import gymnasium as gym >>> from gymnasium.spaces import Box >>> def scale_and_shift(obs): ... return (obs - 1.0) * 2.0 ... >>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> new_obs_space = Box(low=envs.observation_space.low, high=envs.observation_space.high) >>> envs = TransformObservation(envs, func=scale_and_shift, observation_space=new_obs_space) >>> obs, info = envs.reset(seed=123) >>> obs array([[-1.9635296, -2.0892358, -2.055928 , -2.0631256], [-1.9429494, -1.9428282, -1.9061728, -1.9503881], [-1.9296501, -2.00127 , -2.0219676, -2.0640786]], dtype=float32) >>> envs.close()
- 引數:
env – 要封裝的向量環境
func – 一個將轉換向量觀測值的函式。如果此轉換後的觀測值超出
env.observation_space的觀測空間,則提供一個observation_space。observation_space – 封裝器的觀測空間。如果為 None,則從
single_observation_space計算。如果single_observation_space也未提供,則假定與env.observation_space相同。single_observation_space – 非向量化環境的觀測空間。如果為 None,則假定與
env.single_observation_space相同。
- class gymnasium.wrappers.vector.FilterObservation(env: VectorEnv, filter_keys: Sequence[str | int])[source]¶
用於過濾字典或元組觀測空間的向量封裝器。
- 示例 - 建立一個帶有字典空間的向量化環境,演示如何過濾鍵
>>> import numpy as np >>> import gymnasium as gym >>> from gymnasium.spaces import Dict, Box >>> from gymnasium.wrappers import TransformObservation >>> from gymnasium.wrappers.vector import VectorizeTransformObservation, FilterObservation >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> make_dict = lambda x: {"obs": x, "junk": np.array([0.0])} >>> new_space = Dict({"obs": envs.single_observation_space, "junk": Box(low=-1.0, high=1.0)}) >>> envs = VectorizeTransformObservation(env=envs, wrapper=TransformObservation, func=make_dict, observation_space=new_space) >>> envs = FilterObservation(envs, ["obs"]) >>> obs, info = envs.reset(seed=123) >>> envs.close() >>> obs {'obs': array([[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282], [ 0.02852531, 0.02858594, 0.0469136 , 0.02480598], [ 0.03517495, -0.000635 , -0.01098382, -0.03203924]], dtype=float32)}
- 引數:
env – 要封裝的向量環境
filter_keys – 要包含的子空間,對於
Dict和Tuple空間分別使用字串列表或整數列表
- class gymnasium.wrappers.vector.FlattenObservation(env: VectorEnv)[source]¶
將觀測值展平的觀測封裝器。
示例
>>> import gymnasium as gym >>> envs = gym.make_vec("CarRacing-v3", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 96, 96, 3) >>> envs = FlattenObservation(envs) >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 27648) >>> envs.close()
- 引數:
env – 要封裝的向量環境
- class gymnasium.wrappers.vector.GrayscaleObservation(env: VectorEnv, keep_dim: bool = False)[source]¶
將 RGB 影像轉換為灰度圖的觀測封裝器。
示例
>>> import gymnasium as gym >>> envs = gym.make_vec("CarRacing-v3", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 96, 96, 3) >>> envs = GrayscaleObservation(envs) >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 96, 96) >>> envs.close()
- 引數:
env – 要封裝的向量環境
keep_dim – 是否在觀測中保留通道,如果為
True,則obs.shape == 3,否則obs.shape == 2
- class gymnasium.wrappers.vector.ResizeObservation(env: VectorEnv, shape: tuple[int, ...])[source]¶
使用 OpenCV 將影像觀測值調整為指定形狀。
示例
>>> import gymnasium as gym >>> envs = gym.make_vec("CarRacing-v3", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 96, 96, 3) >>> envs = ResizeObservation(envs, shape=(28, 28)) >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 28, 28, 3) >>> envs.close()
- 引數:
env – 要封裝的向量環境
shape – 調整後的觀測形狀
- class gymnasium.wrappers.vector.ReshapeObservation(env: VectorEnv, shape: int | tuple[int, ...])[source]¶
將基於陣列的觀測值重塑為指定形狀。
示例
>>> import gymnasium as gym >>> envs = gym.make_vec("CarRacing-v3", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 96, 96, 3) >>> envs = ReshapeObservation(envs, shape=(9216, 3)) >>> obs, info = envs.reset(seed=123) >>> obs.shape (3, 9216, 3) >>> envs.close()
- 引數:
env – 要封裝的向量環境
shape – 重塑後的觀測空間
- class gymnasium.wrappers.vector.RescaleObservation(env: VectorEnv, min_obs: floating | integer | ndarray, max_obs: floating | integer | ndarray)[source]¶
將觀測值線性重新縮放到最小值和最大值之間。
示例
>>> import gymnasium as gym >>> envs = gym.make_vec("MountainCar-v0", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> obs.min() np.float32(-0.46352962) >>> obs.max() np.float32(0.0) >>> envs = RescaleObservation(envs, min_obs=-5.0, max_obs=5.0) >>> obs, info = envs.reset(seed=123) >>> obs.min() np.float32(-0.90849805) >>> obs.max() np.float32(0.0) >>> envs.close()
- 引數:
env – 要封裝的向量環境
min_obs – 新的最小觀測邊界
max_obs – 新的最大觀測邊界
- class gymnasium.wrappers.vector.DtypeObservation(env: VectorEnv, dtype: Any)[source]¶
用於轉換觀測值資料型別的觀測封裝器。
示例
>>> import numpy as np >>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> obs.dtype dtype('float32') >>> envs = DtypeObservation(envs, dtype=np.float64) >>> obs, info = envs.reset(seed=123) >>> obs.dtype dtype('float64') >>> envs.close()
- 引數:
env – 要封裝的向量環境
dtype – 觀測值的新資料型別
- class gymnasium.wrappers.vector.NormalizeObservation(env: VectorEnv, epsilon: float = 1e-8)[source]¶
此封裝器將標準化觀測值,使得每個座標都以單位方差為中心。
屬性 _update_running_mean 允許凍結/繼續觀測統計資訊的執行平均值計算。如果為 True(預設),RunningMeanStd 將在每次步進和重置呼叫時更新。如果為 False,則使用計算出的統計資訊,但不再更新;這可以在評估期間使用。
注意
歸一化取決於過去的軌跡,如果封裝器是新例項化的或策略最近發生了變化,觀測值將不會正確歸一化。
- 沒有歸一化獎勵封裝器的示例
>>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> obs, info = envs.reset(seed=123) >>> _ = envs.action_space.seed(123) >>> for _ in range(100): ... obs, *_ = envs.step(envs.action_space.sample()) >>> np.mean(obs) np.float32(0.024251968) >>> np.std(obs) np.float32(0.62259156) >>> envs.close()
- 有歸一化獎勵封裝器的示例
>>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> envs = NormalizeObservation(envs) >>> obs, info = envs.reset(seed=123) >>> _ = envs.action_space.seed(123) >>> for _ in range(100): ... obs, *_ = envs.step(envs.action_space.sample()) >>> np.mean(obs) np.float32(-0.2359734) >>> np.std(obs) np.float32(1.1938739) >>> envs.close()
- 引數:
env (Env) – 要應用封裝器的環境
epsilon – 在縮放觀測值時使用的穩定性引數。
已實現的動作封裝器¶
- class gymnasium.wrappers.vector.TransformAction(env: VectorEnv, func: Callable[[ActType], Any], action_space: Space | None = None, single_action_space: Space | None = None)[source]¶
透過提供給封裝器的函式來轉換動作。
函式
func將應用於所有向量動作。如果func的觀測值超出env動作空間的邊界,則提供一個action_space來指定向量化環境的動作空間。- 示例 - 無動作轉換
>>> import gymnasium as gym >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> for _ in range(10): ... obs, rew, term, trunc, info = envs.step(envs.action_space.sample()) ... >>> envs.close() >>> obs array([[-0.46553135, -0.00142543], [-0.498371 , -0.00715587], [-0.46515748, -0.00624371]], dtype=float32)
- 示例 - 有動作轉換
>>> import gymnasium as gym >>> from gymnasium.spaces import Box >>> def shrink_action(act): ... return act * 0.3 ... >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> new_action_space = Box(low=shrink_action(envs.action_space.low), high=shrink_action(envs.action_space.high)) >>> envs = TransformAction(env=envs, func=shrink_action, action_space=new_action_space) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> for _ in range(10): ... obs, rew, term, trunc, info = envs.step(envs.action_space.sample()) ... >>> envs.close() >>> obs array([[-0.48468155, -0.00372536], [-0.47599354, -0.00545912], [-0.46543318, -0.00615723]], dtype=float32)
- 引數:
env – 要封裝的向量環境
func – 一個將轉換動作的函式。如果此轉換後的動作超出
env.action_space的動作空間,則提供一個action_space。action_space – 封裝器的動作空間。如果為 None,則從
single_action_space計算。如果single_action_space也未提供,則假定與env.action_space相同。single_action_space – 非向量化環境的動作空間。如果為 None,則假定與
env.single_action_space相同。
- class gymnasium.wrappers.vector.ClipAction(env: VectorEnv)[source]¶
將連續動作剪裁到有效的
Box觀測空間邊界內。- 示例 - 將超出邊界的動作傳遞給環境進行剪裁。
>>> import numpy as np >>> import gymnasium as gym >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> envs = ClipAction(envs) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> obs, rew, term, trunc, info = envs.step(np.array([5.0, -5.0, 2.0])) >>> envs.close() >>> obs array([[-0.4624777 , 0.00105192], [-0.44504836, -0.00209899], [-0.42884544, 0.00080468]], dtype=float32)
- 引數:
env – 要封裝的向量環境
- class gymnasium.wrappers.vector.RescaleAction(env: VectorEnv, min_action: float | int | ndarray, max_action: float | int | ndarray)[source]¶
將環境的連續動作空間仿射重新縮放到範圍 [min_action, max_action]。
- 示例 - 無動作縮放
>>> import numpy as np >>> import gymnasium as gym >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> for _ in range(10): ... obs, rew, term, trunc, info = envs.step(0.5 * np.ones((3, 1))) ... >>> envs.close() >>> obs array([[-0.44799727, 0.00266526], [-0.4351738 , 0.00133522], [-0.42683297, 0.00048403]], dtype=float32)
- 示例 - 有動作縮放
>>> import numpy as np >>> import gymnasium as gym >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> envs = RescaleAction(envs, 0.0, 1.0) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> for _ in range(10): ... obs, rew, term, trunc, info = envs.step(0.5 * np.ones((3, 1))) ... >>> envs.close() >>> obs array([[-0.48657528, -0.00395268], [-0.47377947, -0.00529102], [-0.46546045, -0.00614867]], dtype=float32)
- 引數:
env (Env) – 要封裝的向量環境
min_action (float, int or np.ndarray) – 每個動作的最小值。這可以是 numpy 陣列或標量。
max_action (float, int or np.ndarray) – 每個動作的最大值。這可以是 numpy 陣列或標量。
已實現的獎勵封裝器¶
- class gymnasium.wrappers.vector.TransformReward(env: VectorEnv, func: Callable[[ArrayType], ArrayType])[source]¶
一個獎勵封裝器,允許自定義函式修改步進獎勵。
- 有獎勵轉換的示例
>>> import gymnasium as gym >>> from gymnasium.spaces import Box >>> def scale_and_shift(rew): ... return (rew - 1.0) * 2.0 ... >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> envs = TransformReward(env=envs, func=scale_and_shift) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> obs, rew, term, trunc, info = envs.step(envs.action_space.sample()) >>> envs.close() >>> obs array([[-4.6343064e-01, 9.8971417e-05], [-4.4488689e-01, -1.9375233e-03], [-4.3118435e-01, -1.5342437e-03]], dtype=float32)
- 引數:
env (Env) – 要封裝的向量環境
func – (可呼叫):應用於獎勵的函式
- class gymnasium.wrappers.vector.ClipReward(env: VectorEnv, min_reward: float | ndarray | None = None, max_reward: float | ndarray | None = None)[source]¶
一個封裝器,用於將環境的獎勵剪裁到上限和下限之間。
- 有剪裁獎勵的示例
>>> import numpy as np >>> import gymnasium as gym >>> envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3) >>> envs = ClipReward(envs, 0.0, 2.0) >>> _ = envs.action_space.seed(123) >>> obs, info = envs.reset(seed=123) >>> for _ in range(10): ... obs, rew, term, trunc, info = envs.step(0.5 * np.ones((3, 1))) ... >>> envs.close() >>> rew array([0., 0., 0.])
- 引數:
env – 要封裝的向量環境
min_reward – 每一步的最小獎勵
max_reward – 每一步的最大獎勵
- class gymnasium.wrappers.vector.NormalizeReward(env: VectorEnv, gamma: float = 0.99, epsilon: float = 1e-8)[source]¶
此封裝器將縮放獎勵,使其指數移動平均值具有近似固定的方差。
屬性 _update_running_mean 允許凍結/繼續獎勵統計資訊的執行平均值計算。如果為 True(預設),RunningMeanStd 將在每次呼叫 self.normalize() 時更新。如果為 False,則使用計算出的統計資訊,但不再更新;這可以在評估期間使用。
注意
縮放取決於過去的軌跡,如果封裝器是新例項化的或策略最近發生了變化,獎勵將不會正確縮放。
- 沒有歸一化獎勵封裝器的示例
>>> import gymnasium as gym >>> import numpy as np >>> envs = gym.make_vec("MountainCarContinuous-v0", 3) >>> _ = envs.reset(seed=123) >>> _ = envs.action_space.seed(123) >>> episode_rewards = [] >>> for _ in range(100): ... observation, reward, *_ = envs.step(envs.action_space.sample()) ... episode_rewards.append(reward) ... >>> envs.close() >>> np.mean(episode_rewards) np.float64(-0.03359492141887935) >>> np.std(episode_rewards) np.float64(0.029028230434438706)
- 有歸一化獎勵封裝器的示例
>>> import gymnasium as gym >>> import numpy as np >>> envs = gym.make_vec("MountainCarContinuous-v0", 3) >>> envs = NormalizeReward(envs) >>> _ = envs.reset(seed=123) >>> _ = envs.action_space.seed(123) >>> episode_rewards = [] >>> for _ in range(100): ... observation, reward, *_ = envs.step(envs.action_space.sample()) ... episode_rewards.append(reward) ... >>> envs.close() >>> np.mean(episode_rewards) np.float64(-0.1598639586606745) >>> np.std(episode_rewards) np.float64(0.27800309628058434)
- 引數:
env (env) – 要應用封裝器的環境
epsilon (float) – 一個穩定性引數
gamma (float) – 在指數移動平均中使用的折扣因子。
已實現的資料轉換封裝器¶
- class gymnasium.wrappers.vector.ArrayConversion(env: VectorEnv, env_xp: ModuleType | str, target_xp: ModuleType | str, env_device: Any | None = None, target_device: Any | None = None)[source]¶
封裝一個返回與 Array API 相容陣列的向量環境,以便可以透過特定框架與其互動。
流行的 Array API 框架包括
numpy、torch、jax.numpy、cupy等。透過此封裝器,您可以將環境的輸出轉換為這些框架中的任何一個。相反,如果可能且無需移動資料或進行裝置傳輸,動作會自動映射回環境框架。注意
`gymnasium.wrappers.ArrayConversion` 的向量化版本。
示例
>>> import gymnasium as gym >>> envs = gym.make_vec("JaxEnv-vx", 3) >>> envs = ArrayConversion(envs, xp=np)
- 引數:
env – 要封裝的 Array API 相容環境
env_xp – 環境所使用的 Array API 框架
target_xp – 要轉換到的 Array API 框架
env_device – 環境所在的裝置
target_device – 應返回陣列的裝置
- class gymnasium.wrappers.vector.JaxToNumpy(env: VectorEnv)[source]¶
封裝一個 jax 向量環境,以便可以透過 numpy 陣列與其互動。
注意
`gymnasium.wrappers.JaxToNumpy` 的向量化版本。
動作必須以 numpy 陣列形式提供,觀測值、獎勵、終止和截斷將以 numpy 陣列形式返回。
示例
>>> import gymnasium as gym >>> envs = gym.make_vec("JaxEnv-vx", 3) >>> envs = JaxToNumpy(envs)
- 引數:
env – 要封裝的向量 jax 環境
- class gymnasium.wrappers.vector.JaxToTorch(env: VectorEnv, device: str | device | None = None)[source]¶
封裝一個基於 Jax 的向量環境,以便可以透過 PyTorch 張量與其互動。
動作必須以 PyTorch 張量形式提供,觀測值、獎勵、終止和截斷將以 PyTorch 張量形式返回。
示例
>>> import gymnasium as gym >>> envs = gym.make_vec("JaxEnv-vx", 3) >>> envs = JaxToTorch(envs)
- 引數:
env – 要封裝的基於 Jax 的向量環境
device – torch 張量應移動到的裝置
- class gymnasium.wrappers.vector.NumpyToTorch(env: VectorEnv, device: str | device | None = None)[source]¶
封裝一個基於 numpy 的環境,以便可以透過 PyTorch 張量與其互動。
示例
>>> import torch >>> import gymnasium as gym >>> from gymnasium.wrappers.vector import NumpyToTorch >>> envs = gym.make_vec("CartPole-v1", 3) >>> envs = NumpyToTorch(envs) >>> obs, _ = envs.reset(seed=123) >>> type(obs) <class 'torch.Tensor'> >>> action = torch.tensor(envs.action_space.sample()) >>> obs, reward, terminated, truncated, info = envs.step(action) >>> envs.close() >>> type(obs) <class 'torch.Tensor'> >>> type(reward) <class 'torch.Tensor'> >>> type(terminated) <class 'torch.Tensor'> >>> type(truncated) <class 'torch.Tensor'>
- 引數:
env – 要封裝的基於 NumPy 的向量環境
device – torch 張量應移動到的裝置