Files
OmniSocketGo/scripts/boot
2026-04-18 12:52:39 +08:00
..
2026-04-14 20:52:41 +08:00
2026-04-14 20:52:41 +08:00
2026-04-18 12:52:39 +08:00

Robot B-Side Boot Chain

This directory contains the robot-side boot and recovery scripts.

Normal usage is:

sudo bash scripts/boot/install-systemd.sh
sudo systemctl start blitz-robot.target

After installation, blitz-robot.target is enabled and will start automatically on reboot.

To stop the chain now and disable boot-time autostart for future reboots:

sudo bash scripts/boot/disable-systemd.sh

Current Startup Order

The current cold-start chain is:

  1. blitz-boot-gate.service
  2. blitz-5g-dial.service
  3. blitz-ros-receiver.service
  4. blitz-b-side-omnid.service
  5. blitz-watchdog.service

There is no longer any automatic time-sync step in the boot chain.

What Each Script Does

  • robot-boot.env: default boot configuration
  • robot-boot.env.local: machine-local overrides
  • common.sh: shared env loading, logging, and helper functions
  • boot-gate.sh: fixed startup delay gate
  • 5g-dial.sh: brings up the 5G modem path and verifies routing
  • start-ros-receiver-service.sh: boot wrapper for ROS receiver
  • wait-for-unix-socket.sh: waits for the ROS receiver unix socket
  • start-b-side-omnid-service.sh: boot wrapper for b_side_omnid
  • blitz-watchdog.sh: runtime health watchdog and recovery orchestrator
  • blitz-fault-inject.sh: fault injection entrypoint
  • install-systemd.sh: installs systemd units into /etc/systemd/system
  • disable-systemd.sh: stops the boot chain and disables autostart

Important Configuration

Most machine-specific overrides should go into:

scripts/boot/robot-boot.env.local

Typical settings:

BLITZ_BOOT_DELAY_SEC="30"
BLITZ_LOG_FILE="/var/log/blitz-robot/startup.log"
BLITZ_RUNTIME_DIR="/run/blitz-robot"

BLITZ_5G_DIAL_DIR="${OMNISOCKETGO_ROOT}/scripts/boot"
BLITZ_5G_SERIAL_PORT="/dev/ttyUSB2"
BLITZ_5G_INTERFACE=""
BLITZ_5G_MODEM_SUBNET="192.168.224.0/22"
BLITZ_5G_GATEWAY="192.168.225.1"
BLITZ_5G_REMOVE_DEFAULT_ROUTE="1"
BLITZ_5G_ROUTE_TARGETS="106.55.173.235"
BLITZ_5G_INFO_JSON="${OMNISOCKETGO_ROOT}/scripts/boot/modem_network_info.json"

BLITZ_TIME_SERVER_IP="81.70.156.140"

BLITZ_ROS_USER="nvidia"
BLITZ_ROS_SOCKET_WAIT_SEC="20"
BLITZ_WATCHDOG_INTERVAL_SEC="5"
BLITZ_HEALTH_STALE_SEC="15"
BLITZ_OMNID_THREAD_HEARTBEAT_TIMEOUT_SEC="15"
BLITZ_NETWORK_FAIL_THRESHOLD="3"
BLITZ_NETWORK_RECOVERY_COOLDOWN_SEC="30"
BLITZ_GPS_MONITOR_ENABLED="1"
BLITZ_GPS_DEVICE_GLOB="/dev/ttyCH341USB*"
BLITZ_GPS_CHECK_INTERVAL_SEC="10"
BLITZ_GPS_RESTART_UNITS="gpsd.socket gpsd.service"
BLITZ_WATCHDOG_ALLOW_FAULT_INJECTION="0"

BLITZ_TIME_SERVER_IP is still used, but only as the 5G route/ping health-check target. It is no longer used for automatic clock synchronization.

If BLITZ_TIME_SERVER_IP is left empty, the scripts fall back to the host part of ROBOT_SIDE_OMNISOCKET_SERVER_ADDR.

Install Or Upgrade

Run:

sudo bash scripts/boot/install-systemd.sh
sudo systemctl daemon-reload
sudo systemctl restart blitz-robot.target

install-systemd.sh will also remove any old blitz-time-sync.service unit left over from earlier versions.

Disable Autostart

To stop the currently running services and disable autostart for future reboots:

sudo bash scripts/boot/disable-systemd.sh

To re-enable later:

sudo bash scripts/boot/install-systemd.sh
sudo systemctl start blitz-robot.target

Logs

All boot-chain and watchdog logs are appended to:

/var/log/blitz-robot/startup.log

Follow the log live:

sudo tail -f /var/log/blitz-robot/startup.log

Check service state:

sudo systemctl status blitz-robot.target
sudo systemctl status blitz-5g-dial.service
sudo systemctl status blitz-ros-receiver.service
sudo systemctl status blitz-b-side-omnid.service
sudo systemctl status blitz-watchdog.service

Check systemd journal:

sudo journalctl -u blitz-robot.target -u blitz-5g-dial.service \
  -u blitz-ros-receiver.service -u blitz-b-side-omnid.service \
  -u blitz-watchdog.service -f

Runtime Status Files

The runtime status directory is:

/run/blitz-robot

Key files:

  • b-side-omnid.status.json
  • ros-receiver.status.json
  • watchdog.status.json

watchdog.status.json now also records gps_ok and gps_device_present so you can quickly tell whether the GPS USB serial node is currently visible and whether the last gpsd reconnect attempt succeeded.

Pretty-print them:

sudo python3 -m json.tool /run/blitz-robot/watchdog.status.json
sudo python3 -m json.tool /run/blitz-robot/b-side-omnid.status.json
sudo python3 -m json.tool /run/blitz-robot/ros-receiver.status.json

Fault Injection

Available test commands:

sudo bash scripts/boot/blitz-fault-inject.sh bside-crash
sudo bash scripts/boot/blitz-fault-inject.sh bside-process-freeze
sudo bash scripts/boot/blitz-fault-inject.sh bside-video-thread-stall
sudo bash scripts/boot/blitz-fault-inject.sh bside-control-thread-stall
sudo bash scripts/boot/blitz-fault-inject.sh ros-crash
sudo bash scripts/boot/blitz-fault-inject.sh ros-freeze

For synthetic network fault injection, first enable it in robot-boot.env.local:

BLITZ_WATCHDOG_ALLOW_FAULT_INJECTION="1"

Then restart watchdog and inject:

sudo systemctl restart blitz-watchdog.service
sudo bash scripts/boot/blitz-fault-inject.sh network-down on
sudo bash scripts/boot/blitz-fault-inject.sh network-down off

Recovery Behavior Summary

  • If b_side_omnid dies or its status file goes stale, watchdog first tries a targeted b_side restart.
  • If ROS receiver dies, loses its socket, or its heartbeat goes stale, watchdog performs an ordered full restart:
    • stop b_side
    • restart ROS receiver
    • wait for unix socket
    • start b_side
  • If network checks fail repeatedly, watchdog stops b_side, runs 5g-dial.sh, waits for route recovery, and then restores services.
  • While 5G is healthy, watchdog keeps every host route listed by BLITZ_TIME_SERVER_IP and BLITZ_5G_ROUTE_TARGETS pinned to the resolved 5G interface. When 5G becomes unhealthy, watchdog deletes those host routes so traffic can fall back to the remaining default network path. If that fallback path is still reachable, watchdog keeps b_side_omnid running instead of treating it as a full network outage.
  • Whenever watchdog changes or restores those host routes, it logs route-path lines for each target so you can see which interface Linux currently chooses for 81.70.156.140, 106.55.173.235, and any other configured 5G-pinned target.
  • If GPS monitoring is enabled, watchdog checks BLITZ_GPS_DEVICE_GLOB every BLITZ_GPS_CHECK_INTERVAL_SEC seconds. When the GPS serial device disappears and later reappears, watchdog restarts the units in BLITZ_GPS_RESTART_UNITS so gpsd can bind to the new device node again.
  • Camera disappearance is logged as degraded state. Reappearance triggers a b_side restart after the device is stable.

Notes

  • time-sync.sh and blitz-time-sync.service are intentionally removed from the automatic boot path.
  • b_side_omnid must already be built before boot-time startup.
  • bin/b_side_omnid missing, ROS env missing, or modem script missing will all show up in startup.log.