Developing effective locomotion policies for quadrupeds poses significant challenges in robotics due to the complex dynamics involved. Training quadrupeds to walk up and down stairs in the real world can damage the equipment and environment. Therefore, simulators play a key role in both safety and time constraints in the learning process, according to NVIDIA Technical Blog.
Leveraging Deep Reinforcement Learning
Leveraging deep reinforcement learning (RL) for training robots in a simulated environment can enable performing complex tasks more effectively and safely. However, this approach introduces a new challenge: how to ensure that this policy trained in simulation transfers seamlessly to the real world. In other words, how can the simulation-to-reality (sim-to-real) gap be closed?
Closing the sim-to-real gap requires a high-fidelity, physics-based simulator for training, a high-performance AI computer such as NVIDIA Jetson, and a robot with joint-level controls. The Reinforcement Learning Researcher Kit, developed in collaboration with Boston Dynamics, NVIDIA, and The AI Institute, brings these capabilities together for seamless deployment of quadrupeds from the virtual to the real world.
Isaac Lab: Bridging the Sim-to-Real Gap
Isaac Lab is a lightweight reference application built on the NVIDIA Isaac Sim platform specifically optimized for robot learning at scale. It leverages GPU-based parallelization for massively parallel physics-based simulation to improve final policy performance and reduce the training time of RL in robotics. With its high-fidelity physics and domain randomization capabilities, Isaac Lab bridges the sim-to-real gap, enabling seamless deployment of trained models onto physical robots, zero-shot.
This post explains how a locomotion RL policy is created for Spot in Isaac Sim and Isaac Lab and deployed on the hardware using the components from the RL Researcher Kit.
Training Quadruped Locomotion in Isaac Lab
Goal
Train the Spot robot to track target x, y, and yaw base velocities while walking on flat terrain.
Observation and Action Space
The target velocities are randomized at each reset and provided alongside other observations. The action space includes only the 12 DOF joint positions, which are passed to the low-level joint controller as the reference joint positions.
Domain Randomization
Various parameters are randomized at key training stages. These randomizations help the model ensure robustness for real-world deployment. This process is called domain randomization.
Network Architecture and RL Algorithm Details
The locomotion policy is structured as a Multilayer Perceptron (MLP) with three layers, containing [512, 256, 128] neurons, and it was trained using the Proximal Policy Optimization (PPO) algorithm from RSL-rl, which is optimized for GPU computation.
Deploying the Trained RL Policy on Spot with Jetson Orin
Deploying models trained in simulation to the real world for robotic applications poses several challenges, including real-time control, safety constraints, and other real-world conditions. The accurate physics and domain randomization features of Isaac Lab enable deploying the policy trained in simulation to the real Spot robot on Jetson Orin zero shot, achieving similar performance in both the virtual and real world.
Transferring the trained model to the Spot robot requires deploying the model to the edge and controlling the robot with low latency and high frequency. The NVIDIA Jetson AGX Orin high-performance computing capabilities and low-latency AI processing ensure rapid inference and response times, crucial for real-world robotics applications. Simulated policies can be directly deployed for inference, simplifying the deployment process.
Hardware and Network Setup on Jetson Orin
- Install SDK Manager on an external PC with Ubuntu 22.04.
- Flash Jetson Orin with JetPack 6 using the SDK Manager and follow the instructions for setup.
- Connect Jetson Orin to a display port, keyboard, and mouse.
- Log in to Jetson Orin and set up the wired network configuration manually for the Ethernet port.
Software Setup on Jetson
First, convert the simulated trained policy from .pt to .onnx and export the environment config. This is done on the PC for training.
On the training PC, create a folder and copy the env.yaml file and .onnx file to the folder. Then, copy the folder to Jetson Orin using SSH. Ensure the PC and Jetson are on the same network.
Next, run the following commands on Orin’s terminal from the home directory:
mkdir spot-rl-deployment && cd spot-rl-deployment && mkdir models git clone https://github.com/boston-dynamics/spot-rl-example.git cd spot-rl-example && mkdir external && cd external && mkdir spot_python_sdk
Download Spot Python SDK with the joint level API and unzip the content into the spot_python_sdk folder. Install the deployment code dependencies and convert the env.yaml file to env_cfg.json file.
Running the Policy
Power up Spot and ensure the Jetson Orin is powered on. Open the Spot app on the Spot tablet controller and release control to run the policy. Connect the PC to Spot local Wi-Fi and SSH to Orin. Connect the wireless gamepad to Orin using bluetoothctl and then run the RL policy.
Video 2 shows the real Spot robot in action after being trained in simulation.
The codebase provided in the Spot RL Researcher Kit is a starting point for creating custom RL tasks in simulation and then deploying them to hardware. For detailed guidance on how to use Isaac Lab to train a policy for specific tasks, see the documentation. Deployment of the trained policy on other robots is specific to the robot architecture; however, Spot users can modify the current deployment code if additional observations are needed for their application.
Get your Reinforcement Learning Researcher Kit and Spot robot and start developing your custom application. Learn more about Isaac Lab, built on Isaac Sim.
Stay up to date on LinkedIn, Instagram, X, and Facebook. Explore the NVIDIA documentation and YouTube channels, and join the NVIDIA Developer Robotics forum. Learn more with self-paced training and webinars on Isaac ROS and Isaac Sim.
Image source: Shutterstock