7.1 KiB
Robot B-Side Boot Chain
This directory contains the robot-side boot and recovery scripts.
Normal usage is:
sudo bash scripts/boot/install-systemd.sh
sudo systemctl start blitz-robot.target
After installation, blitz-robot.target is enabled and will start automatically on reboot.
To stop the chain now and disable boot-time autostart for future reboots:
sudo bash scripts/boot/disable-systemd.sh
Current Startup Order
The current cold-start chain is:
blitz-boot-gate.serviceblitz-5g-dial.serviceblitz-ros-receiver.serviceblitz-b-side-omnid.serviceblitz-watchdog.service
There is no longer any automatic time-sync step in the boot chain.
What Each Script Does
robot-boot.env: default boot configurationrobot-boot.env.local: machine-local overridescommon.sh: shared env loading, logging, and helper functionsboot-gate.sh: fixed startup delay gate5g-dial.sh: brings up the 5G modem path and verifies routingstart-ros-receiver-service.sh: boot wrapper for ROS receiverwait-for-unix-socket.sh: waits for the ROS receiver unix socketstart-b-side-omnid-service.sh: boot wrapper forb_side_omnidblitz-watchdog.sh: runtime health watchdog and recovery orchestratorblitz-fault-inject.sh: fault injection entrypointinstall-systemd.sh: installs systemd units into/etc/systemd/systemdisable-systemd.sh: stops the boot chain and disables autostart
Important Configuration
Most machine-specific overrides should go into:
scripts/boot/robot-boot.env.local
Typical settings:
BLITZ_BOOT_DELAY_SEC="30"
BLITZ_LOG_FILE="/var/log/blitz-robot/startup.log"
BLITZ_RUNTIME_DIR="/run/blitz-robot"
BLITZ_5G_DIAL_DIR="${OMNISOCKETGO_ROOT}/scripts/boot"
BLITZ_5G_SERIAL_PORT="/dev/ttyUSB2"
BLITZ_5G_INTERFACE=""
BLITZ_5G_MODEM_SUBNET="192.168.224.0/22"
BLITZ_5G_GATEWAY="192.168.225.1"
BLITZ_5G_REMOVE_DEFAULT_ROUTE="1"
BLITZ_5G_ROUTE_TARGETS="106.55.173.235"
BLITZ_5G_INFO_JSON="${OMNISOCKETGO_ROOT}/scripts/boot/modem_network_info.json"
BLITZ_TIME_SERVER_IP="81.70.156.140"
BLITZ_ROS_USER="nvidia"
BLITZ_ROS_SOCKET_WAIT_SEC="20"
BLITZ_WATCHDOG_INTERVAL_SEC="5"
BLITZ_HEALTH_STALE_SEC="15"
BLITZ_OMNID_THREAD_HEARTBEAT_TIMEOUT_SEC="15"
BLITZ_NETWORK_FAIL_THRESHOLD="3"
BLITZ_NETWORK_RECOVERY_COOLDOWN_SEC="30"
BLITZ_GPS_MONITOR_ENABLED="1"
BLITZ_GPS_DEVICE_GLOB="/dev/ttyCH341USB*"
BLITZ_GPS_CHECK_INTERVAL_SEC="10"
BLITZ_GPS_RESTART_UNITS="gpsd.socket gpsd.service"
BLITZ_WATCHDOG_ALLOW_FAULT_INJECTION="0"
BLITZ_TIME_SERVER_IP is still used, but only as the 5G route/ping health-check target. It is no longer used for automatic clock synchronization.
If BLITZ_TIME_SERVER_IP is left empty, the scripts fall back to the host part of ROBOT_SIDE_OMNISOCKET_SERVER_ADDR.
Install Or Upgrade
Run:
sudo bash scripts/boot/install-systemd.sh
sudo systemctl daemon-reload
sudo systemctl restart blitz-robot.target
install-systemd.sh will also remove any old blitz-time-sync.service unit left over from earlier versions.
Disable Autostart
To stop the currently running services and disable autostart for future reboots:
sudo bash scripts/boot/disable-systemd.sh
To re-enable later:
sudo bash scripts/boot/install-systemd.sh
sudo systemctl start blitz-robot.target
Logs
All boot-chain and watchdog logs are appended to:
/var/log/blitz-robot/startup.log
Follow the log live:
sudo tail -f /var/log/blitz-robot/startup.log
Check service state:
sudo systemctl status blitz-robot.target
sudo systemctl status blitz-5g-dial.service
sudo systemctl status blitz-ros-receiver.service
sudo systemctl status blitz-b-side-omnid.service
sudo systemctl status blitz-watchdog.service
Check systemd journal:
sudo journalctl -u blitz-robot.target -u blitz-5g-dial.service \
-u blitz-ros-receiver.service -u blitz-b-side-omnid.service \
-u blitz-watchdog.service -f
Runtime Status Files
The runtime status directory is:
/run/blitz-robot
Key files:
b-side-omnid.status.jsonros-receiver.status.jsonwatchdog.status.json
watchdog.status.json now also records gps_ok and gps_device_present so you can quickly tell whether the GPS USB serial node is currently visible and whether the last gpsd reconnect attempt succeeded.
Pretty-print them:
sudo python3 -m json.tool /run/blitz-robot/watchdog.status.json
sudo python3 -m json.tool /run/blitz-robot/b-side-omnid.status.json
sudo python3 -m json.tool /run/blitz-robot/ros-receiver.status.json
Fault Injection
Available test commands:
sudo bash scripts/boot/blitz-fault-inject.sh bside-crash
sudo bash scripts/boot/blitz-fault-inject.sh bside-process-freeze
sudo bash scripts/boot/blitz-fault-inject.sh bside-video-thread-stall
sudo bash scripts/boot/blitz-fault-inject.sh bside-control-thread-stall
sudo bash scripts/boot/blitz-fault-inject.sh ros-crash
sudo bash scripts/boot/blitz-fault-inject.sh ros-freeze
For synthetic network fault injection, first enable it in robot-boot.env.local:
BLITZ_WATCHDOG_ALLOW_FAULT_INJECTION="1"
Then restart watchdog and inject:
sudo systemctl restart blitz-watchdog.service
sudo bash scripts/boot/blitz-fault-inject.sh network-down on
sudo bash scripts/boot/blitz-fault-inject.sh network-down off
Recovery Behavior Summary
- If
b_side_omniddies or its status file goes stale, watchdog first tries a targetedb_siderestart. - If ROS receiver dies, loses its socket, or its heartbeat goes stale, watchdog performs an ordered full restart:
- stop
b_side - restart ROS receiver
- wait for unix socket
- start
b_side
- stop
- If network checks fail repeatedly, watchdog stops
b_side, runs5g-dial.sh, waits for route recovery, and then restores services. - While 5G is healthy, watchdog keeps every host route listed by
BLITZ_TIME_SERVER_IPandBLITZ_5G_ROUTE_TARGETSpinned to the resolved 5G interface. When 5G becomes unhealthy, watchdog deletes those host routes so traffic can fall back to the remaining default network path. If that fallback path is still reachable, watchdog keepsb_side_omnidrunning instead of treating it as a full network outage. - Whenever watchdog changes or restores those host routes, it logs
route-pathlines for each target so you can see which interface Linux currently chooses for81.70.156.140,106.55.173.235, and any other configured 5G-pinned target. - If GPS monitoring is enabled, watchdog checks
BLITZ_GPS_DEVICE_GLOBeveryBLITZ_GPS_CHECK_INTERVAL_SECseconds. When the GPS serial device disappears and later reappears, watchdog restarts the units inBLITZ_GPS_RESTART_UNITSsogpsdcan bind to the new device node again. - Camera disappearance is logged as degraded state. Reappearance triggers a
b_siderestart after the device is stable.
Notes
time-sync.shandblitz-time-sync.serviceare intentionally removed from the automatic boot path.b_side_omnidmust already be built before boot-time startup.bin/b_side_omnidmissing, ROS env missing, or modem script missing will all show up instartup.log.