# Robot B-Side Boot Chain This directory contains the robot-side boot and recovery scripts. Normal usage is: ```bash sudo bash scripts/boot/install-systemd.sh sudo systemctl start blitz-robot.target ``` After installation, `blitz-robot.target` is enabled and will start automatically on reboot. To stop the chain now and disable boot-time autostart for future reboots: ```bash sudo bash scripts/boot/disable-systemd.sh ``` ## Current Startup Order The current cold-start chain is: 1. `blitz-boot-gate.service` 2. `blitz-5g-dial.service` 3. `blitz-ros-receiver.service` 4. `blitz-b-side-omnid.service` 5. `blitz-watchdog.service` There is no longer any automatic time-sync step in the boot chain. ## What Each Script Does - `robot-boot.env`: default boot configuration - `robot-boot.env.local`: machine-local overrides - `common.sh`: shared env loading, logging, and helper functions - `boot-gate.sh`: fixed startup delay gate - `5g-dial.sh`: brings up the 5G modem path and verifies routing - `start-ros-receiver-service.sh`: boot wrapper for ROS receiver - `wait-for-unix-socket.sh`: waits for the ROS receiver unix socket - `start-b-side-omnid-service.sh`: boot wrapper for `b_side_omnid` - `blitz-watchdog.sh`: runtime health watchdog and recovery orchestrator - `blitz-fault-inject.sh`: fault injection entrypoint - `install-systemd.sh`: installs systemd units into `/etc/systemd/system` - `disable-systemd.sh`: stops the boot chain and disables autostart ## Important Configuration Most machine-specific overrides should go into: ```text scripts/boot/robot-boot.env.local ``` Typical settings: ```bash BLITZ_BOOT_DELAY_SEC="30" BLITZ_LOG_FILE="/var/log/blitz-robot/startup.log" BLITZ_RUNTIME_DIR="/run/blitz-robot" BLITZ_5G_DIAL_DIR="${OMNISOCKETGO_ROOT}/scripts/boot" BLITZ_5G_SERIAL_PORT="/dev/ttyUSB2" BLITZ_5G_INTERFACE="" BLITZ_5G_MODEM_SUBNET="192.168.224.0/22" BLITZ_5G_GATEWAY="192.168.225.1" BLITZ_5G_REMOVE_DEFAULT_ROUTE="1" BLITZ_5G_ROUTE_TARGETS="106.55.173.235" BLITZ_5G_INFO_JSON="${OMNISOCKETGO_ROOT}/scripts/boot/modem_network_info.json" BLITZ_TIME_SERVER_IP="81.70.156.140" BLITZ_ROS_USER="nvidia" BLITZ_ROS_SOCKET_WAIT_SEC="20" BLITZ_WATCHDOG_INTERVAL_SEC="5" BLITZ_HEALTH_STALE_SEC="15" BLITZ_OMNID_THREAD_HEARTBEAT_TIMEOUT_SEC="15" BLITZ_NETWORK_FAIL_THRESHOLD="3" BLITZ_NETWORK_RECOVERY_COOLDOWN_SEC="30" BLITZ_GPS_MONITOR_ENABLED="1" BLITZ_GPS_DEVICE_GLOB="/dev/ttyCH341USB*" BLITZ_GPS_CHECK_INTERVAL_SEC="10" BLITZ_GPS_RESTART_UNITS="gpsd.socket gpsd.service" BLITZ_WATCHDOG_ALLOW_FAULT_INJECTION="0" ``` `BLITZ_TIME_SERVER_IP` is still used, but only as the 5G route/ping health-check target. It is no longer used for automatic clock synchronization. If `BLITZ_TIME_SERVER_IP` is left empty, the scripts fall back to the host part of `ROBOT_SIDE_OMNISOCKET_SERVER_ADDR`. ## Install Or Upgrade Run: ```bash sudo bash scripts/boot/install-systemd.sh sudo systemctl daemon-reload sudo systemctl restart blitz-robot.target ``` `install-systemd.sh` will also remove any old `blitz-time-sync.service` unit left over from earlier versions. ## Disable Autostart To stop the currently running services and disable autostart for future reboots: ```bash sudo bash scripts/boot/disable-systemd.sh ``` To re-enable later: ```bash sudo bash scripts/boot/install-systemd.sh sudo systemctl start blitz-robot.target ``` ## Logs All boot-chain and watchdog logs are appended to: ```text /var/log/blitz-robot/startup.log ``` Follow the log live: ```bash sudo tail -f /var/log/blitz-robot/startup.log ``` Check service state: ```bash sudo systemctl status blitz-robot.target sudo systemctl status blitz-5g-dial.service sudo systemctl status blitz-ros-receiver.service sudo systemctl status blitz-b-side-omnid.service sudo systemctl status blitz-watchdog.service ``` Check systemd journal: ```bash sudo journalctl -u blitz-robot.target -u blitz-5g-dial.service \ -u blitz-ros-receiver.service -u blitz-b-side-omnid.service \ -u blitz-watchdog.service -f ``` ## Runtime Status Files The runtime status directory is: ```text /run/blitz-robot ``` Key files: - `b-side-omnid.status.json` - `ros-receiver.status.json` - `watchdog.status.json` `watchdog.status.json` now also records `gps_ok` and `gps_device_present` so you can quickly tell whether the GPS USB serial node is currently visible and whether the last `gpsd` reconnect attempt succeeded. Pretty-print them: ```bash sudo python3 -m json.tool /run/blitz-robot/watchdog.status.json sudo python3 -m json.tool /run/blitz-robot/b-side-omnid.status.json sudo python3 -m json.tool /run/blitz-robot/ros-receiver.status.json ``` ## Fault Injection Available test commands: ```bash sudo bash scripts/boot/blitz-fault-inject.sh bside-crash sudo bash scripts/boot/blitz-fault-inject.sh bside-process-freeze sudo bash scripts/boot/blitz-fault-inject.sh bside-video-thread-stall sudo bash scripts/boot/blitz-fault-inject.sh bside-control-thread-stall sudo bash scripts/boot/blitz-fault-inject.sh ros-crash sudo bash scripts/boot/blitz-fault-inject.sh ros-freeze ``` For synthetic network fault injection, first enable it in `robot-boot.env.local`: ```bash BLITZ_WATCHDOG_ALLOW_FAULT_INJECTION="1" ``` Then restart watchdog and inject: ```bash sudo systemctl restart blitz-watchdog.service sudo bash scripts/boot/blitz-fault-inject.sh network-down on sudo bash scripts/boot/blitz-fault-inject.sh network-down off ``` ## Recovery Behavior Summary - If `b_side_omnid` dies or its status file goes stale, watchdog first tries a targeted `b_side` restart. - If ROS receiver dies, loses its socket, or its heartbeat goes stale, watchdog performs an ordered full restart: - stop `b_side` - restart ROS receiver - wait for unix socket - start `b_side` - If network checks fail repeatedly, watchdog stops `b_side`, runs `5g-dial.sh`, waits for route recovery, and then restores services. - If GPS monitoring is enabled, watchdog checks `BLITZ_GPS_DEVICE_GLOB` every `BLITZ_GPS_CHECK_INTERVAL_SEC` seconds. When the GPS serial device disappears and later reappears, watchdog restarts the units in `BLITZ_GPS_RESTART_UNITS` so `gpsd` can bind to the new device node again. - Camera disappearance is logged as degraded state. Reappearance triggers a `b_side` restart after the device is stable. ## Notes - `time-sync.sh` and `blitz-time-sync.service` are intentionally removed from the automatic boot path. - `b_side_omnid` must already be built before boot-time startup. - `bin/b_side_omnid` missing, ROS env missing, or modem script missing will all show up in `startup.log`.