218 lines
6.4 KiB
Markdown
218 lines
6.4 KiB
Markdown
# Robot B-Side Boot Chain
|
|
|
|
This directory contains the robot-side boot and recovery scripts.
|
|
|
|
Normal usage is:
|
|
|
|
```bash
|
|
sudo bash scripts/boot/install-systemd.sh
|
|
sudo systemctl start blitz-robot.target
|
|
```
|
|
|
|
After installation, `blitz-robot.target` is enabled and will start automatically on reboot.
|
|
|
|
To stop the chain now and disable boot-time autostart for future reboots:
|
|
|
|
```bash
|
|
sudo bash scripts/boot/disable-systemd.sh
|
|
```
|
|
|
|
## Current Startup Order
|
|
|
|
The current cold-start chain is:
|
|
|
|
1. `blitz-boot-gate.service`
|
|
2. `blitz-5g-dial.service`
|
|
3. `blitz-ros-receiver.service`
|
|
4. `blitz-b-side-omnid.service`
|
|
5. `blitz-watchdog.service`
|
|
|
|
There is no longer any automatic time-sync step in the boot chain.
|
|
|
|
## What Each Script Does
|
|
|
|
- `robot-boot.env`: default boot configuration
|
|
- `robot-boot.env.local`: machine-local overrides
|
|
- `common.sh`: shared env loading, logging, and helper functions
|
|
- `boot-gate.sh`: fixed startup delay gate
|
|
- `5g-dial.sh`: brings up the 5G modem path and verifies routing
|
|
- `start-ros-receiver-service.sh`: boot wrapper for ROS receiver
|
|
- `wait-for-unix-socket.sh`: waits for the ROS receiver unix socket
|
|
- `start-b-side-omnid-service.sh`: boot wrapper for `b_side_omnid`
|
|
- `blitz-watchdog.sh`: runtime health watchdog and recovery orchestrator
|
|
- `blitz-fault-inject.sh`: fault injection entrypoint
|
|
- `install-systemd.sh`: installs systemd units into `/etc/systemd/system`
|
|
- `disable-systemd.sh`: stops the boot chain and disables autostart
|
|
|
|
## Important Configuration
|
|
|
|
Most machine-specific overrides should go into:
|
|
|
|
```text
|
|
scripts/boot/robot-boot.env.local
|
|
```
|
|
|
|
Typical settings:
|
|
|
|
```bash
|
|
BLITZ_BOOT_DELAY_SEC="30"
|
|
BLITZ_LOG_FILE="/var/log/blitz-robot/startup.log"
|
|
BLITZ_RUNTIME_DIR="/run/blitz-robot"
|
|
|
|
BLITZ_5G_DIAL_DIR="${OMNISOCKETGO_ROOT}/scripts/boot"
|
|
BLITZ_5G_SERIAL_PORT="/dev/ttyUSB2"
|
|
BLITZ_5G_INTERFACE=""
|
|
BLITZ_5G_MODEM_SUBNET="192.168.224.0/22"
|
|
BLITZ_5G_GATEWAY="192.168.225.1"
|
|
BLITZ_5G_REMOVE_DEFAULT_ROUTE="1"
|
|
BLITZ_5G_ROUTE_TARGETS="106.55.173.235"
|
|
BLITZ_5G_INFO_JSON="${OMNISOCKETGO_ROOT}/scripts/boot/modem_network_info.json"
|
|
|
|
BLITZ_TIME_SERVER_IP="81.70.156.140"
|
|
|
|
BLITZ_ROS_USER="nvidia"
|
|
BLITZ_ROS_SOCKET_WAIT_SEC="20"
|
|
BLITZ_WATCHDOG_INTERVAL_SEC="5"
|
|
BLITZ_HEALTH_STALE_SEC="15"
|
|
BLITZ_OMNID_THREAD_HEARTBEAT_TIMEOUT_SEC="15"
|
|
BLITZ_NETWORK_FAIL_THRESHOLD="3"
|
|
BLITZ_NETWORK_RECOVERY_COOLDOWN_SEC="30"
|
|
BLITZ_GPS_MONITOR_ENABLED="1"
|
|
BLITZ_GPS_DEVICE_GLOB="/dev/ttyCH341USB*"
|
|
BLITZ_GPS_CHECK_INTERVAL_SEC="10"
|
|
BLITZ_GPS_RESTART_UNITS="gpsd.socket gpsd.service"
|
|
BLITZ_WATCHDOG_ALLOW_FAULT_INJECTION="0"
|
|
```
|
|
|
|
`BLITZ_TIME_SERVER_IP` is still used, but only as the 5G route/ping health-check target. It is no longer used for automatic clock synchronization.
|
|
|
|
If `BLITZ_TIME_SERVER_IP` is left empty, the scripts fall back to the host part of `ROBOT_SIDE_OMNISOCKET_SERVER_ADDR`.
|
|
|
|
## Install Or Upgrade
|
|
|
|
Run:
|
|
|
|
```bash
|
|
sudo bash scripts/boot/install-systemd.sh
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl restart blitz-robot.target
|
|
```
|
|
|
|
`install-systemd.sh` will also remove any old `blitz-time-sync.service` unit left over from earlier versions.
|
|
|
|
## Disable Autostart
|
|
|
|
To stop the currently running services and disable autostart for future reboots:
|
|
|
|
```bash
|
|
sudo bash scripts/boot/disable-systemd.sh
|
|
```
|
|
|
|
To re-enable later:
|
|
|
|
```bash
|
|
sudo bash scripts/boot/install-systemd.sh
|
|
sudo systemctl start blitz-robot.target
|
|
```
|
|
|
|
## Logs
|
|
|
|
All boot-chain and watchdog logs are appended to:
|
|
|
|
```text
|
|
/var/log/blitz-robot/startup.log
|
|
```
|
|
|
|
Follow the log live:
|
|
|
|
```bash
|
|
sudo tail -f /var/log/blitz-robot/startup.log
|
|
```
|
|
|
|
Check service state:
|
|
|
|
```bash
|
|
sudo systemctl status blitz-robot.target
|
|
sudo systemctl status blitz-5g-dial.service
|
|
sudo systemctl status blitz-ros-receiver.service
|
|
sudo systemctl status blitz-b-side-omnid.service
|
|
sudo systemctl status blitz-watchdog.service
|
|
```
|
|
|
|
Check systemd journal:
|
|
|
|
```bash
|
|
sudo journalctl -u blitz-robot.target -u blitz-5g-dial.service \
|
|
-u blitz-ros-receiver.service -u blitz-b-side-omnid.service \
|
|
-u blitz-watchdog.service -f
|
|
```
|
|
|
|
## Runtime Status Files
|
|
|
|
The runtime status directory is:
|
|
|
|
```text
|
|
/run/blitz-robot
|
|
```
|
|
|
|
Key files:
|
|
|
|
- `b-side-omnid.status.json`
|
|
- `ros-receiver.status.json`
|
|
- `watchdog.status.json`
|
|
|
|
`watchdog.status.json` now also records `gps_ok` and `gps_device_present` so you can quickly tell whether the GPS USB serial node is currently visible and whether the last `gpsd` reconnect attempt succeeded.
|
|
|
|
Pretty-print them:
|
|
|
|
```bash
|
|
sudo python3 -m json.tool /run/blitz-robot/watchdog.status.json
|
|
sudo python3 -m json.tool /run/blitz-robot/b-side-omnid.status.json
|
|
sudo python3 -m json.tool /run/blitz-robot/ros-receiver.status.json
|
|
```
|
|
|
|
## Fault Injection
|
|
|
|
Available test commands:
|
|
|
|
```bash
|
|
sudo bash scripts/boot/blitz-fault-inject.sh bside-crash
|
|
sudo bash scripts/boot/blitz-fault-inject.sh bside-process-freeze
|
|
sudo bash scripts/boot/blitz-fault-inject.sh bside-video-thread-stall
|
|
sudo bash scripts/boot/blitz-fault-inject.sh bside-control-thread-stall
|
|
sudo bash scripts/boot/blitz-fault-inject.sh ros-crash
|
|
sudo bash scripts/boot/blitz-fault-inject.sh ros-freeze
|
|
```
|
|
|
|
For synthetic network fault injection, first enable it in `robot-boot.env.local`:
|
|
|
|
```bash
|
|
BLITZ_WATCHDOG_ALLOW_FAULT_INJECTION="1"
|
|
```
|
|
|
|
Then restart watchdog and inject:
|
|
|
|
```bash
|
|
sudo systemctl restart blitz-watchdog.service
|
|
sudo bash scripts/boot/blitz-fault-inject.sh network-down on
|
|
sudo bash scripts/boot/blitz-fault-inject.sh network-down off
|
|
```
|
|
|
|
## Recovery Behavior Summary
|
|
|
|
- If `b_side_omnid` dies or its status file goes stale, watchdog first tries a targeted `b_side` restart.
|
|
- If ROS receiver dies, loses its socket, or its heartbeat goes stale, watchdog performs an ordered full restart:
|
|
- stop `b_side`
|
|
- restart ROS receiver
|
|
- wait for unix socket
|
|
- start `b_side`
|
|
- If network checks fail repeatedly, watchdog stops `b_side`, runs `5g-dial.sh`, waits for route recovery, and then restores services.
|
|
- If GPS monitoring is enabled, watchdog checks `BLITZ_GPS_DEVICE_GLOB` every `BLITZ_GPS_CHECK_INTERVAL_SEC` seconds. When the GPS serial device disappears and later reappears, watchdog restarts the units in `BLITZ_GPS_RESTART_UNITS` so `gpsd` can bind to the new device node again.
|
|
- Camera disappearance is logged as degraded state. Reappearance triggers a `b_side` restart after the device is stable.
|
|
|
|
## Notes
|
|
|
|
- `time-sync.sh` and `blitz-time-sync.service` are intentionally removed from the automatic boot path.
|
|
- `b_side_omnid` must already be built before boot-time startup.
|
|
- `bin/b_side_omnid` missing, ROS env missing, or modem script missing will all show up in `startup.log`.
|