I found myself fighting a X9DRD-7LN4F motherboard that has a pair of
Noctua NH-D9DX i4 3U HSF that was suffering from a
Lower Non-Recoverable Assertion. This was happening after
setting the threshold values with ipmitool, simply because it doesn't reliably read the lowest speed of the Noctua fan and instead thinks it is going 0 RPM. I found a solution that no one has tried (or at least I didn't see it).
Setting sensor "FAN1" Lower Non-Recoverable threshold to 0.000
Setting sensor "FAN1" Lower Critical threshold to 100.000
Setting sensor "FAN1" Lower Non-Critical threshold to 200.000
Setting sensor "FAN1" Upper Non-Critical threshold to 1700.000
Setting sensor "FAN1" Upper Critical threshold to 1800.000
Setting sensor "FAN1" Upper Non-Recoverable threshold to 1900.000
Sensor ID : FAN1 (0x41)
Entity ID : 29.1
Sensor Type (Threshold) : Fan
Sensor Reading : 1875 (+/- 0) RPM
Status : Upper Non-Recoverable
Lower Non-Recoverable : 0.000
Lower Critical : 75.000
Lower Non-Critical : 225.000
Upper Non-Critical : 1725.000
Upper Critical : 1800.000
Upper Non-Recoverable : 1875.000
Positive Hysteresis : 75.000
Negative Hysteresis : 75.000
Assertion Events :
Assertions Enabled : lcr- lnr- unc+ ucr+ unr+
Deassertions Enabled : lcr- lnr- unc+ ucr+ unr+
The first thing I had to solve / realize was that the
Hysteresis values were seemingly causing me grief by altering the values I was setting. Once I bumped the upper numbers I was setting, I ran into the fact that the RPM isn't properly read at low speed and it simply fails into full speed fans again and again. There is no way to turn off the
Lower Non-Recoverable Assertion and setting it to
0 does nothing. Setting to
-1 merely gets rounded / adjusted to
0 but then I realized I neded to get past that
Hysteresis issue again.
Setting sensor "FAN1" Lower Non-Recoverable threshold to -100.000
Setting sensor "FAN1" Lower Critical threshold to -100.000
Setting sensor "FAN1" Lower Non-Critical threshold to -100.000
Sensor ID : FAN1 (0x41)
Entity ID : 29.1
Sensor Type (Threshold) : Fan
Sensor Reading : 0 (+/- 0) RPM
Status : Lower Non-Recoverable
Lower Non-Recoverable : 19125.000
Lower Critical : 19125.000
Lower Non-Critical : 19125.000
Upper Non-Critical : 1875.000
Upper Critical : 1950.000
Upper Non-Recoverable : 2100.000
Positive Hysteresis : 75.000
Negative Hysteresis : 75.000
Assertion Events :
Assertions Enabled : lcr- lnr- unc+ ucr+ unr+
Deassertions Enabled : lcr- lnr- unc+ ucr+ unr+
Now, the server stays quiet after
ipmitool sensor thresh FAN1 lower -100 -100 -100; though I suppose you could simply set a high value instead of encouraging the overflow. It's always in a state of
Lower Non-Recoverable Assertion and it simply ignores it.