Files
OmniSocketGo/scripts/boot/README.md

5.8 KiB

Robot B-Side Boot Chain

This directory contains the robot-side boot and recovery scripts.

Normal usage is:

sudo bash scripts/boot/install-systemd.sh
sudo systemctl start blitz-robot.target

After installation, blitz-robot.target is enabled and will start automatically on reboot.

To stop the chain now and disable boot-time autostart for future reboots:

sudo bash scripts/boot/disable-systemd.sh

Current Startup Order

The current cold-start chain is:

  1. blitz-boot-gate.service
  2. blitz-5g-dial.service
  3. blitz-ros-receiver.service
  4. blitz-b-side-omnid.service
  5. blitz-watchdog.service

There is no longer any automatic time-sync step in the boot chain.

What Each Script Does

  • robot-boot.env: default boot configuration
  • robot-boot.env.local: machine-local overrides
  • common.sh: shared env loading, logging, and helper functions
  • boot-gate.sh: fixed startup delay gate
  • 5g-dial.sh: brings up the 5G modem path and verifies routing
  • start-ros-receiver-service.sh: boot wrapper for ROS receiver
  • wait-for-unix-socket.sh: waits for the ROS receiver unix socket
  • start-b-side-omnid-service.sh: boot wrapper for b_side_omnid
  • blitz-watchdog.sh: runtime health watchdog and recovery orchestrator
  • blitz-fault-inject.sh: fault injection entrypoint
  • install-systemd.sh: installs systemd units into /etc/systemd/system
  • disable-systemd.sh: stops the boot chain and disables autostart

Important Configuration

Most machine-specific overrides should go into:

scripts/boot/robot-boot.env.local

Typical settings:

BLITZ_BOOT_DELAY_SEC="30"
BLITZ_LOG_FILE="/var/log/blitz-robot/startup.log"
BLITZ_RUNTIME_DIR="/run/blitz-robot"

BLITZ_5G_DIAL_DIR="${OMNISOCKETGO_ROOT}/scripts/boot"
BLITZ_5G_SERIAL_PORT="/dev/ttyUSB2"
BLITZ_5G_INTERFACE=""
BLITZ_5G_MODEM_SUBNET="192.168.224.0/22"
BLITZ_5G_GATEWAY="192.168.225.1"
BLITZ_5G_REMOVE_DEFAULT_ROUTE="1"
BLITZ_5G_ROUTE_TARGETS="106.55.173.235"
BLITZ_5G_INFO_JSON="${OMNISOCKETGO_ROOT}/scripts/boot/modem_network_info.json"

BLITZ_TIME_SERVER_IP="81.70.156.140"

BLITZ_ROS_USER="nvidia"
BLITZ_ROS_SOCKET_WAIT_SEC="20"
BLITZ_WATCHDOG_INTERVAL_SEC="5"
BLITZ_HEALTH_STALE_SEC="15"
BLITZ_OMNID_THREAD_HEARTBEAT_TIMEOUT_SEC="15"
BLITZ_NETWORK_FAIL_THRESHOLD="3"
BLITZ_NETWORK_RECOVERY_COOLDOWN_SEC="30"
BLITZ_WATCHDOG_ALLOW_FAULT_INJECTION="0"

BLITZ_TIME_SERVER_IP is still used, but only as the 5G route/ping health-check target. It is no longer used for automatic clock synchronization.

If BLITZ_TIME_SERVER_IP is left empty, the scripts fall back to the host part of ROBOT_SIDE_OMNISOCKET_SERVER_ADDR.

Install Or Upgrade

Run:

sudo bash scripts/boot/install-systemd.sh
sudo systemctl daemon-reload
sudo systemctl restart blitz-robot.target

install-systemd.sh will also remove any old blitz-time-sync.service unit left over from earlier versions.

Disable Autostart

To stop the currently running services and disable autostart for future reboots:

sudo bash scripts/boot/disable-systemd.sh

To re-enable later:

sudo bash scripts/boot/install-systemd.sh
sudo systemctl start blitz-robot.target

Logs

All boot-chain and watchdog logs are appended to:

/var/log/blitz-robot/startup.log

Follow the log live:

sudo tail -f /var/log/blitz-robot/startup.log

Check service state:

sudo systemctl status blitz-robot.target
sudo systemctl status blitz-5g-dial.service
sudo systemctl status blitz-ros-receiver.service
sudo systemctl status blitz-b-side-omnid.service
sudo systemctl status blitz-watchdog.service

Check systemd journal:

sudo journalctl -u blitz-robot.target -u blitz-5g-dial.service \
  -u blitz-ros-receiver.service -u blitz-b-side-omnid.service \
  -u blitz-watchdog.service -f

Runtime Status Files

The runtime status directory is:

/run/blitz-robot

Key files:

  • b-side-omnid.status.json
  • ros-receiver.status.json
  • watchdog.status.json

Pretty-print them:

sudo python3 -m json.tool /run/blitz-robot/watchdog.status.json
sudo python3 -m json.tool /run/blitz-robot/b-side-omnid.status.json
sudo python3 -m json.tool /run/blitz-robot/ros-receiver.status.json

Fault Injection

Available test commands:

sudo bash scripts/boot/blitz-fault-inject.sh bside-crash
sudo bash scripts/boot/blitz-fault-inject.sh bside-process-freeze
sudo bash scripts/boot/blitz-fault-inject.sh bside-video-thread-stall
sudo bash scripts/boot/blitz-fault-inject.sh bside-control-thread-stall
sudo bash scripts/boot/blitz-fault-inject.sh ros-crash
sudo bash scripts/boot/blitz-fault-inject.sh ros-freeze

For synthetic network fault injection, first enable it in robot-boot.env.local:

BLITZ_WATCHDOG_ALLOW_FAULT_INJECTION="1"

Then restart watchdog and inject:

sudo systemctl restart blitz-watchdog.service
sudo bash scripts/boot/blitz-fault-inject.sh network-down on
sudo bash scripts/boot/blitz-fault-inject.sh network-down off

Recovery Behavior Summary

  • If b_side_omnid dies or its status file goes stale, watchdog first tries a targeted b_side restart.
  • If ROS receiver dies, loses its socket, or its heartbeat goes stale, watchdog performs an ordered full restart:
    • stop b_side
    • restart ROS receiver
    • wait for unix socket
    • start b_side
  • If network checks fail repeatedly, watchdog stops b_side, runs 5g-dial.sh, waits for route recovery, and then restores services.
  • Camera disappearance is logged as degraded state. Reappearance triggers a b_side restart after the device is stable.

Notes

  • time-sync.sh and blitz-time-sync.service are intentionally removed from the automatic boot path.
  • b_side_omnid must already be built before boot-time startup.
  • bin/b_side_omnid missing, ROS env missing, or modem script missing will all show up in startup.log.