Robotic manipulation remains one of the most important research directions in embodied AI. While traditional kinematics-based controllers provide stable motion execution, they often struggle in unstructured environments where adaptability is required.
Recent advances in Reinforcement Learning (RL) have enabled robotic arms to learn task-oriented behaviors directly from interaction, making it possible to achieve robust control policies without manually designing every motion strategy.
In this project, we extend the original SO-ARM101 Isaac Lab implementation by integrating the AgileX Dual-Arm Nero Manipulator, allowing developers to quickly train and validate dual-arm RL policies using NVIDIA Isaac Lab.
Reposity
Open-source implementation:
GitHub - smalleha/isaac_so_arm101: Isaac Lab external project for SO-ARM100/101 arm robot. · GitHub
Project Structure
├── robots
│ ├── dual_nero
│ │ ├── dual_nero.py
│ │ ├── __init__.py
│ │ ├── meshes
│ │ └── urdf
│ │ └── dual_nero.urdf
├── scripts
│ ├── rl_games
│ │ ├── play.py
│ │ └── train.py
│ └── zero_agent.py
├── tasks
│ ├── __init__.py
│ └── reach
│ ├── agents
│ ├── dual_nero_joint_pos_env_cfg.py
│ ├── dual_nero_reach_env_cfg.py
│ ├── __init__.py
│ └── mdp
└── ui_extension_example.py
Key additions include:
| Column 1 |
Column 2 |
| Component |
Description |
| rl_games |
RL-Games training framework |
| dual_nero_reach_env_cfg.py |
Reach task environment definition |
| dual_nero_joint_pos_env_cfg.py |
Joint position control configuration |
| dual_nero |
Dual-arm robot model |
3. Importing the Robot into Isaac Lab
3.1 Preparing the URDF
Before importing the robot into Isaac Lab, mesh references inside the URDF should be converted to relative paths.
For example:
<link name="base_link">
<inertial>
<origin rpy="0 0 0" xyz="-0.0592395620981769 -0.068642440505388 0.0562764736144042"/>
<mass value="4.46524857458863"/>
<inertia ixx="0.0608289280191989" ixy="-2.54649959130438E-06" ixz="1.11851948046933E-07" iyy="0.0218722514454004" iyz="-0.000689477252402357" izz="0.0680524540174318"/>
</inertial>
<visual>
<origin rpy="0 0 0" xyz="0 0 0"/>
<geometry>
<mesh filename="../meshes/base_link.dae"/>
</geometry>
<material name="">
<color rgba="0.776470588235294 0.756862745098039 0.737254901960784 1"/>
</material>
</visual>
<collision>
<origin rpy="0 0 0" xyz="0 0 0"/>
<geometry>
<mesh filename="../meshes/base_link.dae"/>
</geometry>
</collision>
</link>
This ensures that the asset loader can correctly locate mesh resources during simulation.
3.2 Creating the Robot Configuration
Next, create a robot configuration file:
isaac_so_arm101/robots/dual_nero/dual_nero.py
This file defines:
The resulting DUAL_NERO_CFG object becomes the robot asset used by Isaac Lab during training.
from pathlib import Path
import isaaclab.sim as sim_utils
from isaaclab.actuators import ImplicitActuatorCfg
from isaaclab.assets.articulation import ArticulationCfg
TEMPLATE_ASSETS_DATA_DIR = Path(__file__).resolve().parent
DUAL_NERO_CFG = ArticulationCfg(
spawn=sim_utils.UrdfFileCfg(
fix_base=True,
merge_fixed_joints=False,
replace_cylinders_with_capsules=True,
asset_path=f"{TEMPLATE_ASSETS_DATA_DIR}/urdf/dual_nero.urdf",
activate_contact_sensors=False, # set as false while waiting for capsule implementation
rigid_props=sim_utils.RigidBodyPropertiesCfg(
disable_gravity=False,
max_depenetration_velocity=5.0,
),
articulation_props=sim_utils.ArticulationRootPropertiesCfg(
enabled_self_collisions=True,
solver_position_iteration_count=8,
solver_velocity_iteration_count=0,
),
joint_drive=sim_utils.UrdfConverterCfg.JointDriveCfg(
gains=sim_utils.UrdfConverterCfg.JointDriveCfg.PDGainsCfg(stiffness=0, damping=0)
),
),
init_state=ArticulationCfg.InitialStateCfg(
rot=(1.0, 0.0, 0.0, 0.0),
joint_pos={
"left_joint.*": 0.0,
"right_joint.*": 0.0,
"left_gripper_joint.*": 0.0,
"right_gripper_joint.*": 0.0,
},
# Set initial joint velocities to zero
joint_vel={".*": 0.0},
),
actuators={
"arm": ImplicitActuatorCfg(
joint_names_expr=["left_joint.*", "right_joint.*"],
effort_limit=25.0,
velocity_limit=1.5,
stiffness={
"left_joint1": 200.0,
"left_joint2": 170.0,
"left_joint3": 120.0,
"left_joint4": 80.0,
"left_joint5": 50.0,
"left_joint6": 20.0,
"left_joint7": 10.0,
"right_joint1": 200.0,
"right_joint2": 170.0,
"right_joint3": 120.0,
"right_joint4": 80.0,
"right_joint5": 50.0,
"right_joint6": 20.0,
"right_joint7": 10.0
},
damping={
"left_joint1": 100.0,
"left_joint2": 60.0,
"left_joint3": 70.0,
"left_joint4": 24.0,
"left_joint5": 20.0,
"left_joint6": 10.0,
"left_joint7": 5,
"right_joint1": 100.0,
"right_joint2": 60.0,
"right_joint3": 70.0,
"right_joint4": 24.0,
"right_joint5": 20.0,
"right_joint6": 10.0,
"right_joint7": 5,
},
),
"gripper": ImplicitActuatorCfg(
joint_names_expr=["left_gripper_joint.*","right_gripper_joint.*"],
effort_limit_sim=22, # Increased from 1.9 to 2.5 for stronger grip
velocity_limit_sim=1.5,
stiffness=800.0, # Increased from 25.0 to 60.0 for more reliable closing
damping=20.0, # Increased from 10.0 to 20.0 for stability
),
},
soft_joint_pos_limit_factor=0.9,
)
Create an init.py file inside the dual_nero directory so that Python recognizes it as a package.
4. Building the Reach Environment
Two environment configuration files are required:
tasks/reach/
├── dual_nero_joint_pos_env_cfg.py
└── dual_nero_reach_env_cfg.py
4.1 Joint Position Environment Configuration
dual_nero_joint_pos_env_cfg.py specifies:
-
Controlled joints
-
End-effector links
-
Action spaces
-
Command targets
import math
import isaaclab_tasks.manager_based.manipulation.reach.mdp as mdp
from isaaclab.utils import configclass
from isaac_so_arm101.robots import DUAL_NERO_CFG # noqa: F401
from isaac_so_arm101.tasks.reach.dual_nero_reach_env_cfg import Dual_NeroReachEnvCfg
from isaaclab.assets.articulation import ArticulationCfg
@configclass
class Dual_Nero_ReachEnvCfg(Dual_NeroReachEnvCfg):
def __post_init__(self):
# post init of parent
super().__post_init__()
# switch robot to OpenArm
self.scene.robot = DUAL_NERO_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot",)
# override rewards
self.rewards.left_end_effector_position_tracking.params["asset_cfg"].body_names = ["left_gripper_base"]
self.rewards.left_end_effector_position_tracking_fine_grained.params["asset_cfg"].body_names = [
"left_gripper_base"
]
self.rewards.left_end_effector_orientation_tracking.params["asset_cfg"].body_names = ["left_gripper_base"]
self.rewards.right_end_effector_position_tracking.params["asset_cfg"].body_names = ["right_gripper_base"]
self.rewards.right_end_effector_position_tracking_fine_grained.params["asset_cfg"].body_names = [
"right_gripper_base"
]
self.rewards.right_end_effector_orientation_tracking.params["asset_cfg"].body_names = ["right_gripper_base"]
# override actions
self.actions.left_arm_action = mdp.JointPositionActionCfg(
asset_name="robot",
joint_names=[
"left_joint.*",
],
scale=0.5,
use_default_offset=True,
)
self.actions.right_arm_action = mdp.JointPositionActionCfg(
asset_name="robot",
joint_names=[
"right_joint.*",
],
scale=0.5,
use_default_offset=True,
)
# override command generator body
# end-effector is along z-direction
self.commands.left_ee_pose.body_name = "left_gripper_base"
self.commands.right_ee_pose.body_name = "right_gripper_base"
@configclass
class Dual_Nero_ReachEnvCfg_PLAY(Dual_Nero_ReachEnvCfg):
def __post_init__(self):
# post init of parent
super().__post_init__()
# make a smaller scene for play
self.scene.num_envs = 50
self.scene.env_spacing = 2.5
# disable randomization for play
self.observations.policy.enable_corruption = False
4.2 Reach Task Definition
dual_nero_reach_env_cfg.py contains the full RL environment definition.
This includes:
1.Scene Configuration
-
Ground plane
-
Lighting
-
Robot asset
2.Command Generation
3.Observation Space
4.Reward Design
5.Curriculum Learning
6.Episode Settings
import math
from dataclasses import MISSING
import isaaclab.sim as sim_utils
from isaaclab.assets import ArticulationCfg, AssetBaseCfg
from isaaclab.envs import ManagerBasedRLEnvCfg
from isaaclab.managers import ActionTermCfg as ActionTerm
from isaaclab.managers import CurriculumTermCfg as CurrTerm
from isaaclab.managers import EventTermCfg as EventTerm
from isaaclab.managers import ObservationGroupCfg as ObsGroup
from isaaclab.managers import ObservationTermCfg as ObsTerm
from isaaclab.managers import RewardTermCfg as RewTerm
from isaaclab.managers import SceneEntityCfg
from isaaclab.managers import TerminationTermCfg as DoneTerm
from isaaclab.scene import InteractiveSceneCfg
from isaaclab.utils import configclass
from isaaclab.utils.noise import AdditiveUniformNoiseCfg as Unoise
import isaaclab_tasks.manager_based.manipulation.reach.mdp as mdp
##
# Scene definition
##
@configclass
class Dual_NeroReachSceneCfg(InteractiveSceneCfg):
"""Configuration for the scene with a robotic arm."""
# world
ground = AssetBaseCfg(
prim_path="/World/ground",
spawn=sim_utils.GroundPlaneCfg(),
init_state=AssetBaseCfg.InitialStateCfg(pos=(0.0, 0.0, 0)),
)
# robots
robot: ArticulationCfg = MISSING
# lights
light = AssetBaseCfg(
prim_path="/World/light",
spawn=sim_utils.DomeLightCfg(color=(0.75, 0.75, 0.75), intensity=2500.0),
)
##
# MDP settings
##
@configclass
class CommandsCfg:
"""Command terms for the MDP."""
left_ee_pose = mdp.UniformPoseCommandCfg(
asset_name="robot",
body_name=MISSING,
resampling_time_range=(4.0, 4.0),
debug_vis=True,
ranges=mdp.UniformPoseCommandCfg.Ranges(
pos_x=(0.5, 0.5),
pos_y=(0.15, 0.25),
pos_z=(0.3, 0.5),
roll=(-math.pi / 6, math.pi / 6),
pitch=(3 * math.pi / 2, 3 * math.pi / 2),
yaw=(8 * math.pi / 9, 10 * math.pi / 9),
),
)
right_ee_pose = mdp.UniformPoseCommandCfg(
asset_name="robot",
body_name=MISSING,
resampling_time_range=(4.0, 4.0),
debug_vis=True,
ranges=mdp.UniformPoseCommandCfg.Ranges(
pos_x=(0.5, 0.5),
pos_y=(-0.25, -0.15),
pos_z=(0.3, 0.5),
roll=(-math.pi / 6, math.pi / 6),
pitch=(3 * math.pi / 2, 3 * math.pi / 2),
yaw=(8 * math.pi / 9, 10 * math.pi / 9),
),
)
@configclass
class ActionsCfg:
"""Action specifications for the MDP."""
left_arm_action: ActionTerm = MISSING
right_arm_action: ActionTerm = MISSING
@configclass
class ObservationsCfg:
"""Observation specifications for the MDP."""
@configclass
class PolicyCfg(ObsGroup):
"""Observations for policy group."""
# observation terms (order preserved)
left_joint_pos = ObsTerm(
func=mdp.joint_pos_rel,
params={
"asset_cfg": SceneEntityCfg(
"robot",
joint_names=[
"left_joint.*",
],
)
},
noise=Unoise(n_min=-0.01, n_max=0.01),
)
right_joint_pos = ObsTerm(
func=mdp.joint_pos_rel,
params={
"asset_cfg": SceneEntityCfg(
"robot",
joint_names=[
"right_joint.*",
],
)
},
noise=Unoise(n_min=-0.01, n_max=0.01),
)
left_joint_vel = ObsTerm(
func=mdp.joint_vel_rel,
params={
"asset_cfg": SceneEntityCfg(
"robot",
joint_names=[
"left_joint.*",
],
)
},
noise=Unoise(n_min=-0.01, n_max=0.01),
)
right_joint_vel = ObsTerm(
func=mdp.joint_vel_rel,
params={
"asset_cfg": SceneEntityCfg(
"robot",
joint_names=[
"right_joint.*",
],
)
},
noise=Unoise(n_min=-0.01, n_max=0.01),
)
left_pose_command = ObsTerm(func=mdp.generated_commands, params={"command_name": "left_ee_pose"})
right_pose_command = ObsTerm(func=mdp.generated_commands, params={"command_name": "right_ee_pose"})
left_actions = ObsTerm(func=mdp.last_action, params={"action_name": "left_arm_action"})
right_actions = ObsTerm(func=mdp.last_action, params={"action_name": "right_arm_action"})
def __post_init__(self):
self.enable_corruption = True
self.concatenate_terms = True
# observation groups
policy: PolicyCfg = PolicyCfg()
@configclass
class EventCfg:
"""Configuration for events."""
reset_robot_joints = EventTerm(
func=mdp.reset_joints_by_scale,
mode="reset",
params={
"position_range": (0.5, 1.5),
"velocity_range": (0.0, 0.0),
},
)
@configclass
class RewardsCfg:
"""Reward terms for the MDP."""
left_end_effector_position_tracking = RewTerm(
func=mdp.position_command_error,
weight=-0.2,
params={
"asset_cfg": SceneEntityCfg("robot", body_names=MISSING),
"command_name": "left_ee_pose",
},
)
right_end_effector_position_tracking = RewTerm(
func=mdp.position_command_error,
weight=-0.25,
params={
"asset_cfg": SceneEntityCfg("robot", body_names=MISSING),
"command_name": "right_ee_pose",
},
)
left_end_effector_position_tracking_fine_grained = RewTerm(
func=mdp.position_command_error_tanh,
weight=0.1,
params={
"asset_cfg": SceneEntityCfg("robot", body_names=MISSING),
"std": 0.1,
"command_name": "left_ee_pose",
},
)
right_end_effector_position_tracking_fine_grained = RewTerm(
func=mdp.position_command_error_tanh,
weight=0.2,
params={
"asset_cfg": SceneEntityCfg("robot", body_names=MISSING),
"std": 0.1,
"command_name": "right_ee_pose",
},
)
left_end_effector_orientation_tracking = RewTerm(
func=mdp.orientation_command_error,
weight=-0.1,
params={
"asset_cfg": SceneEntityCfg("robot", body_names=MISSING),
"command_name": "left_ee_pose",
},
)
right_end_effector_orientation_tracking = RewTerm(
func=mdp.orientation_command_error,
weight=-0.1,
params={
"asset_cfg": SceneEntityCfg("robot", body_names=MISSING),
"command_name": "right_ee_pose",
},
)
# action penalty
action_rate = RewTerm(func=mdp.action_rate_l2, weight=-0.0001)
left_joint_vel = RewTerm(
func=mdp.joint_vel_l2,
weight=-0.0001,
params={
"asset_cfg": SceneEntityCfg(
"robot",
joint_names=[
"left_joint.*",
],
)
},
)
right_joint_vel = RewTerm(
func=mdp.joint_vel_l2,
weight=-0.0001,
params={
"asset_cfg": SceneEntityCfg(
"robot",
joint_names=[
"right_joint.*",
],
)
},
)
@configclass
class TerminationsCfg:
"""Termination terms for the MDP."""
time_out = DoneTerm(func=mdp.time_out, time_out=True)
@configclass
class CurriculumCfg:
"""Curriculum terms for the MDP."""
action_rate = CurrTerm(
func=mdp.modify_reward_weight,
params={"term_name": "action_rate", "weight": -0.005, "num_steps": 4500},
)
left_joint_vel = CurrTerm(
func=mdp.modify_reward_weight,
params={"term_name": "left_joint_vel", "weight": -0.001, "num_steps": 4500},
)
right_joint_vel = CurrTerm(
func=mdp.modify_reward_weight,
params={"term_name": "right_joint_vel", "weight": -0.001, "num_steps": 4500},
)
##
# Environment configuration
##
@configclass
class Dual_NeroReachEnvCfg(ManagerBasedRLEnvCfg):
"""Configuration for the reach end-effector pose tracking environment."""
# Scene settings
scene: Dual_NeroReachSceneCfg = Dual_NeroReachSceneCfg(num_envs=4096, env_spacing=2.5)
# Basic settings
observations: ObservationsCfg = ObservationsCfg()
actions: ActionsCfg = ActionsCfg()
commands: CommandsCfg = CommandsCfg()
# MDP settings
rewards: RewardsCfg = RewardsCfg()
terminations: TerminationsCfg = TerminationsCfg()
events: EventCfg = EventCfg()
curriculum: CurriculumCfg = CurriculumCfg()
def __post_init__(self):
"""Post initialization."""
# general settings
self.decimation = 2
self.sim.render_interval = self.decimation
self.episode_length_s = 24.0
self.viewer.eye = (3.5, 3.5, 3.5)
# simulation settings
self.sim.dt = 1.0 / 60.0
\`\`\`
**### 4.3 Registering the Environment**
Register the task inside:\`src/isaac_so_arm101/tasks/reach/\__init_\_.py\`
\`\`\`PYTHON
gym.register(
id="Isaac-Dual-Nero-Reach-v0",
entry_point="isaaclab.envs:ManagerBasedRLEnv",
kwargs={
"env_cfg_entry_point":f"{\__name_\_}.dual_nero_joint_pos_env_cfg:Dual_Nero_ReachEnvCfg",
"rsl_rl_cfg_entry_point": f"{agents.\__name_\_}.rsl_rl_ppo_cfg:ReachPPORunnerCfg",
"rl_games_cfg_entry_point": f"{agents.\__name_\_}:rl_games_ppo_cfg.yaml",
},
disable_env_checker=True,
)
5. Training the Reach Policy
Step 1.Activate the Isaac Lab Environment
conda activate env_isaaclab
Step 2.Navigate to the project directory:
cd isaac_so_arm101
Step 3.Train the whole project
Option 1:RSL-RL
uv run train \
--task Isaac-Dual-Nero-Reach-v0 \
--headless
uv run play \
--task Isaac-Dual-Nero-Reach-v0
dual_nero_rsl_rl.gif

Option 2: RL-Games
python3 scripts/rl_games/train.py \
--task Isaac-Dual-Nero-Reach-v0 \
--headless
python3 scripts/rl_games/play.py \
--task Isaac-Dual-Nero-Reach-v0
dual_nero_rl_games.gif

6. Results and Observations
Both frameworks successfully learn the dual-arm reaching task.
However, in our experiments:
-
RL-Games converges faster
-
Motion trajectories appear smoother
-
Final reaching accuracy is generally higher
For relatively complex robot morphologies such as dual-arm manipulators, RL-Games currently provides more stable performance and is recommended as the default training backend.
FAQ
Q1: Why is my Dual-Nero robot collapsing or shaking violently after loading the URDF?
This is usually caused by incorrect actuator parameters or unrealistic inertial properties.
Common causes include:
-
Joint stiffness set too high
-
Damping values too low
-
Incorrect mass distribution in the URDF
-
Self-collision configuration issues
-
Unstable simulation timestep
Before starting RL training, verify that the robot can remain stable under gravity using only PD control.
Quick check: If the robot cannot stand still without RL, the issue is likely in the robot model rather than the training algorithm.
Q2: Why does the reward improve, but the robot never reaches the target accurately?
A rising reward does not always indicate successful task completion.
Typical reasons include:
-
Reward weights are unbalanced
-
Orientation rewards dominate position rewards
-
Action penalties are too strong
-
Command sampling range is too large
-
End-effector link is incorrectly configured
In most reach tasks, incorrect reward shaping is the primary reason for poor final accuracy.
Q3: How do I verify that the end-effector link is configured correctly?
One of the most common mistakes in Isaac Lab reach tasks is assigning the wrong end-effector body.
For Dual-Nero, the target link should be:
left_gripper_base
right_gripper_base
Symptoms of an incorrect configuration include:
Always verify the body name in Isaac Sim before launching large-scale training.
Q4: Why does RL-Games perform better than RSL-RL for this task?
Both frameworks are PPO-based, but their implementations differ.
For large-scale manipulation environments:
RL-Games generally scales better with thousands of parallel environments
PPO updates are often more stable
Training throughput is higher on modern GPUs
For Dual-Nero reach experiments, RL-Games typically achieves smoother trajectories and faster convergence.
However, results may vary depending on reward design and task complexity.
Q5: My policy works in simulation but fails on the real robot. Why?
This is the most common Sim-to-Real issue.
Possible causes include:
-
Joint friction mismatch
-
Encoder noise
-
Latency differences
-
Payload variations
-
Inaccurate motor models
-
Unmodeled cable effects
To improve transfer performance:
-
Apply domain randomization
-
Add observation noise
-
Randomize dynamics parameters
-
Validate trajectories at low speed first
Successful simulation training is only the first step toward real-world deployment.
Have Question?
If you encounter any issues with environment installation, parameter configuration, or RL training, feel free to leave your questions for further discussion.
1 post - 1 participant
Read full topic