SLAP: Additional Experiments

Experiment Setup

Our team conducted two experiments to evaluate the performance of our SLAP system. Experiment I (SLAP ablations on RLBench tasks) involved training and evaluating five ablated variations of the SLAP's APM system and compare with PerAct's performance on same RLBench taks. Experiment II (SLAP ablations on held-out test set collected from the real-robot) focused on evaluating seven ablated versions of the SLAP's APM system on a held-out test set collected from a real robot. APM ablations and PerAct models in Experiment I share the same training and testing data, which was pre-collected from RLBench and stored in-memory. It is worth noting that the training time required for the PerAct model, as presented in [3], is typically around two weeks. However, our 20-epoch training of the PerAct model resulted in a training time of approximately 45 minutes, which is same as the SLAP training budget. Each model was trained for the same number of steps in order to establish a fair comparison (corresponding to 20 epochs). We provided ground truth interaction points to the ablations where it was applicable.

Details for Experiment I.
Ablation ID Description
A0 DR: depth noise, positional and rotational randomization as well as color jitter
A1 DR: depth noise, positional and rotational randomization; No color jitter
A2 Without any DR
A3 Interaction hotspot is provided, without any cropping of PCD; No DR
A4 No interaction hotspots, without any cropping of PCD; No DR

Experiment I Results: Ablations and PerAct on RLBench

Positional error per task
Fig 1. Positional error (in m) for chosen RLBench task
Positional error averaged across tasks
Fig 2. Positional error (in m) per ablation averaged over tasks in RLBench

Ablation A0 through A3 behave similarly on simulation data, however not providing the interaction hotspot leads to a significant degradation in performance of the action prediction module (APM). In Fig. 1., we observe that PerAct generally does better than our worse ablation, yet worse than our other models. Due to high positional error on one task, it averages to be closer to the overall error of our worse model in Fig. 2.

Experiment II: Real-world Data

This experiment studies the correlation of ablation performance seen in simulation with performance on real-world data. The real-world data has a larger variation over position and orientation of the object as well as a lot of clutter from objects on table which are not implicated in the given task. If correlations hold then we should see ablations B0 through B3 to be of similar efficacy on real-world data. The ablations studied in this experiment are defined as follows:

Details for Experiment II.
Ablation ID Description
B0 (APM as in paper)DR: depth noise, positional and rotational randomization as well as color jitter
B1 DR: depth noise, positional and rotational randomization; No color jitter
B2 Without any DR
B3 Interaction hotspot is provided, without any cropping of PCD; No DR
B4 No interaction hotspots, without any cropping of PCD; No DR
B5 B3 with DR; No color jitter
B6 B3 with DR and color jitter

We see that the trends noticed in simulation correlate with APM performance in rral-world. Our observations suggest that the primary design choice driving SLAP's performance gains is the division of prediction problem into predicting a reliable interaction point which informs the action pose.

Positional error per task on real-world data
Fig 3. Positional error in cms for each real-world task

Mobile Manipulator Demonstrations

SLAP running on a mobile manipulator with an egocentric camera. With very few demonstrations, SLAP is able to grasp the bottle.

Pick up bottle - Position 1

Pick up bottle - Position 2