Benchmarks & Debugging

Performance metrics dashboard for robot systems

Why this matters: If you can't measure it, you can't improve it. Benchmarking and debugging are the keys to taking a robot from "works sometimes" to "production ready."

Introduction: The Path to Reliability

Every robotics team has experienced the pain of a demo that works 9 times and fails spectacularly on the 10th. Systematic benchmarking transforms this chaos into engineering discipline.

Performance Metrics

What to Measure

Category	Metric	Target
Control	Control loop frequency	1000 Hz
Control	Tracking error (RMS)	under 1 cm
Perception	Detection latency	under 33 ms
Perception	Detection accuracy	over 95%
Navigation	Path following error	under 5 cm
System	CPU utilization	under 80%
System	Memory usage	under 4 GB
Battery	Runtime	over 4 hours

Measurement Infrastructure

import time
from dataclasses import dataclass, field
from typing import List

@dataclass
class BenchmarkResult:
    name: str
    samples: List[float] = field(default_factory=list)

    @property
    def mean(self):
        return sum(self.samples) / len(self.samples)

    @property
    def p99(self):
        sorted_samples = sorted(self.samples)
        idx = int(0.99 * len(sorted_samples))
        return sorted_samples[idx]

class Profiler:
    def __init__(self):
        self.results = {}

    def profile(self, name):
        return ProfilerContext(self, name)

class ProfilerContext:
    def __enter__(self):
        self.start = time.perf_counter_ns()
        return self

    def __exit__(self, *args):
        elapsed = (time.perf_counter_ns() - self.start) / 1e6  # ms
        self.profiler.results[self.name].samples.append(elapsed)

Real-time performance monitoring graphs

Debugging Strategies

1. Logging

Structured logging is essential:

import logging
import json

class RobotLogger:
    def __init__(self, name):
        self.logger = logging.getLogger(name)
        self.logger.setLevel(logging.DEBUG)

    def log_state(self, state, level="info"):
        msg = json.dumps({
            "timestamp": time.time(),
            "joint_positions": state.q.tolist(),
            "joint_velocities": state.dq.tolist(),
            "is_stable": state.stable
        })
        getattr(self.logger, level)(msg)

2. Visualization

RViz2 markers for debugging:

from visualization_msgs.msg import Marker

def publish_debug_marker(publisher, position, color="red"):
    marker = Marker()
    marker.header.frame_id = "world"
    marker.type = Marker.SPHERE
    marker.pose.position.x = position[0]
    marker.pose.position.y = position[1]
    marker.pose.position.z = position[2]
    marker.scale.x = marker.scale.y = marker.scale.z = 0.1

    colors = {"red": (1, 0, 0), "green": (0, 1, 0), "blue": (0, 0, 1)}
    marker.color.r, marker.color.g, marker.color.b = colors[color]
    marker.color.a = 1.0

    publisher.publish(marker)

3. Replay

Record and replay for debugging:

# Record all topics
ros2 bag record -a -o debug_session

# Play back at half speed
ros2 bag play debug_session --rate 0.5

# Play specific topics
ros2 bag play debug_session --topics /robot_state /camera/image

Common Failure Modes

Timing Issues

graph TD
    A[Control Loop - Expected: 1ms] --> B{Actual Time}
    B -->|under 1ms| C[OK: Sleep remainder]
    B -->|over 1ms| D[OVERRUN: Log warning]
    D --> E{over 5ms?}
    E -->|Yes| F[CRITICAL: Safety stop]

State Estimation Drift

Track cumulative error over time:

class DriftMonitor:
    def __init__(self, threshold=0.5):
        self.threshold = threshold
        self.checkpoints = []

    def check(self, estimated_pose, ground_truth):
        error = np.linalg.norm(
            estimated_pose[:3] - ground_truth[:3]
        )
        self.checkpoints.append({
            'time': time.time(),
            'error': error
        })

        if error > self.threshold:
            self.trigger_relocalization()

Debugging workflow for robot issues

Stress Testing

Monte Carlo Testing

Run thousands of randomized scenarios:

def monte_carlo_test(policy, n_trials=1000):
    results = []

    for i in range(n_trials):
        # Randomize initial conditions
        sim.reset()
        sim.randomize_terrain()
        sim.randomize_obstacles()

        # Run episode
        success = sim.run_episode(policy, max_steps=1000)
        results.append(success)

    success_rate = sum(results) / n_trials
    print(f"Success rate: {success_rate * 100:.1f}%")
    return success_rate

Adversarial Testing

Push the robot to its limits:

Maximum speed transitions
Edge-case joint configurations
Sudden obstacle appearance
Communication dropouts

Continuous Integration

Robot CI Pipeline

# .github/workflows/robot-ci.yml
name: Robot CI

on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4

      - name: Setup ROS 2
        uses: ros-tooling/setup-ros@v0.7
        with:
          required-ros-distributions: humble

      - name: Build
        run: |
          source /opt/ros/humble/setup.bash
          colcon build

      - name: Test
        run: |
          source install/setup.bash
          colcon test
          colcon test-result --verbose

Key Takeaways

Summary

Measure everything: Control frequency, errors, latency
Structured logging enables post-mortem analysis
Visualization makes abstract states concrete
Replay is your best debugging tool
Monte Carlo testing finds edge cases
CI/CD catches regressions early

Introduction: The Path to Reliability​

Performance Metrics​

What to Measure​

Measurement Infrastructure​

Debugging Strategies​

1. Logging​

2. Visualization​

3. Replay​

Common Failure Modes​

Timing Issues​

State Estimation Drift​

Stress Testing​

Monte Carlo Testing​

Adversarial Testing​

Continuous Integration​

Robot CI Pipeline​

Key Takeaways​

Further Reading​