Lets see the device high temperature incident and monitor in prisma
sd-wan.
There are four temperature sensors in the ION 1200-S-5G device for CPU, ACPITZ,
Cellular Modem, and PSE. An incident DEVICEHW_TEMPERATURE_SENSOR is raised when one
or more thermal sensors report temperatures beyond the operationally safe threshold
value. This incident is helpful to monitor the temperature sensor trends in a
device. If a high thermal condition persists for a longer time, the device is shut
down in the following cases:
- If the device has a high CPU temperature, an incident is raised. The system will
monitor the temperatures every 5 minutes. If the high CPU temperature persists
for 3 continuous readings, the system will log an error and trigger a system
shutdown.
- If there are any 2 temperature sensors other than the CPU that cross the defined
thresholds, the system will be shut down. The system will monitor the
temperatures every 5 minutes, if there are any 2 sensors that cross the
threshold 3 times in a set of 5 continuous readings, the system will be shut
down.
The incident is cleared only if all sensors are within the threshold. It may take up
to 25 minutes (if multiple sensors reported high temperatures before falling within
the threshold) to clear the incident after all sensors are within the threshold
value. If the shutdown was system-initiated, then the following actions must be
taken:
- Initiate a system shutdown in order to prevent any further damage to the device
or the surroundings.
- Set the potential reboot reason as Thermal condition shutdown.
- After the system is shut down, you will need to manually bring the device back
up.
When operating in high temperatures, it is suggested to monitor the device
temperature activity on the Prisma SD-WAN web interface regularly. If the
high-temperature condition persists for 15-30 mins, the device will be shut down.
You need to bring the device back up manually.