Learn From DI-zoo¶

What is DI-zoo¶

DI-zoo is a collection of reinforcement learning environments wrapped with DI-engine. It covers the vast majority of reinforcement learning environments, including basic environments like OpenAI Gym, as well as more complex environments such as SMAC. Besides, for each environment, DI-zoo provides the entries of different algorithms with their optimal configurations.

The structure of DI-zoo¶

For a certain environment/policy pair, in order to run RL experiment in DI-engine, DI-zoo mainly provides two files: the config.py file, including the key configuration required as well as the entry point to run the RL experiment; the env.py file, containing the encapsulation of the environment to run in DI-engine.

Note

Besides, some environment/policy pairs also possess a main.py entry file, which is the training pipeline left over from the previous version.

Here we briefly show the structure of DI-zoo based on the CartPole environment and DQN algorithm.

dizoo/
  classic_control/
    cartpole/
      config/cartpole_dqn_config.py # Config
      entry/cartpole_dqn_main.py  # Main
      envs/cartpole_env.py  # Env

How to use DI-zoo¶

You can directly execute the config.py file provided by DI-zoo to train a certain environment/policy pair. For CartPole/DQN, you can easily perform the RL experiment with the following code:

python dizoo/classic_control/cartpole/config/cartpole_dqn_config.py

DI-engine also provides the CLI tool for users, you can type the following command in your terminal:

ding -v

If the terminal returns the correct information, you can use this CLI tool for the common training and evaluation, and you can type ding -h for further usage。

To train CartPole/DQN, you can directly type the following command in the terminal:

ding -m serial -c cartpole_dqn_config.py -s 0

where -m serial means that the training pipeline you call is serial_pipeline. -c cartpole_dqn_config.py means that the config file you use is cartpole_dqn_config.py. -s 0 means seed is 0.

Customization of DI-zoo¶

You can customize your training process or tune the performance of your RL experiment by changing the configuration in config.py.

Here we use cartpole_dqn_config.py as an example:

from easydict import EasyDict

cartpole_dqn_config = dict(
    exp_name='cartpole_dqn_seed0',
    env=dict(
        collector_env_num=8,
        evaluator_env_num=5,
        n_evaluator_episode=5,
        stop_value=195,
        replay_path='cartpole_dqn_seed0/video',
    ),
    policy=dict(
        cuda=False,
        load_path='cartpole_dqn_seed0/ckpt/ckpt_best.pth.tar',  # necessary for eval
        model=dict(
            obs_shape=4,
            action_shape=2,
            encoder_hidden_size_list=[128, 128, 64],
            dueling=True,
        ),
        nstep=1,
        discount_factor=0.97,
        learn=dict(
            batch_size=64,
            learning_rate=0.001,
        ),
        collect=dict(n_sample=8),
        eval=dict(evaluator=dict(eval_freq=40, )),
        other=dict(
            eps=dict(
                type='exp',
                start=0.95,
                end=0.1,
                decay=10000,
            ),
            replay_buffer=dict(replay_buffer_size=20000, ),
        ),
    ),
)
cartpole_dqn_config = EasyDict(cartpole_dqn_config)
main_config = cartpole_dqn_config
cartpole_dqn_create_config = dict(
    env=dict(
        type='cartpole',
        import_names=['dizoo.classic_control.cartpole.envs.cartpole_env'],
    ),
    env_manager=dict(type='base'),
    policy=dict(type='dqn'),
    replay_buffer=dict(
        type='deque',
        import_names=['ding.data.buffer.deque_buffer_wrapper']
    ),
)
cartpole_dqn_create_config = EasyDict(cartpole_dqn_create_config)
create_config = cartpole_dqn_create_config

if __name__ == "__main__":
    # or you can enter `ding -m serial -c cartpole_dqn_config.py -s 0`
    from ding.entry import serial_pipeline
    serial_pipeline((main_config, create_config), seed=0)

The two dictionary objects cartpole_dqn_config and cartpole_dqn_create_config contain the key configurations required for CartPole/DQN training. You can change the behavior of your training pipeline by changing the configuration here. For example, by changing cartpole_dqn_config.policy.cuda , you can choose whether to use your cuda device to run the entire training process.

If you want to use other training pipelines provided by DI-engine, or use your own customized training pipelines, you only need to change the __main__ function of config.py that calls the training pipeline. For example, you can change the serial_pipeline in the example to parallel_pipeline to call the parallel training pipeline.

For the CLI tool ding, you can also change the previous cli command to

ding -m parallel -c cartpole_dqn_config.py -s 0

to call parallel_pipeline.

Note

To customize the training pipeline, you can refer to serial_pipeline , or refer to DQN example, using the the middleware provided by DI-engine to build the pipeline.

If you want to use your own environment in DI-engine, you can just inherit BaseEnv implemented by DI-engine. For this part you can refer to How to migrate your environment to DI-engine

List of algorithms and environments supported by DI-zoo¶

The algorithm documentation of DI-engine

The environment documentation of DI-engine

List of supported algorithms

List of supported environments