2022-12-21

Solving a Noctua fan reading 0 RPM in Supermicro IPMI

I found myself fighting a X9DRD-7LN4F motherboard that has a pair of Noctua NH-D9DX i4 3U HSF that was suffering from a Lower Non-Recoverable Assertion. This was happening after setting the threshold values with ipmitool, simply because it doesn't reliably read the lowest speed of the Noctua fan and instead thinks it is going 0 RPM. I found a solution that no one has tried (or at least I didn't see it).
Setting sensor "FAN1" Lower Non-Recoverable threshold to 0.000
Setting sensor "FAN1" Lower Critical threshold to 100.000
Setting sensor "FAN1" Lower Non-Critical threshold to 200.000
Setting sensor "FAN1" Upper Non-Critical threshold to 1700.000
Setting sensor "FAN1" Upper Critical threshold to 1800.000
Setting sensor "FAN1" Upper Non-Recoverable threshold to 1900.000

Sensor ID              : FAN1 (0x41)
 Entity ID             : 29.1
 Sensor Type (Threshold)  : Fan
 Sensor Reading        : 1875 (+/- 0) RPM
 Status                : Upper Non-Recoverable
 Lower Non-Recoverable : 0.000
 Lower Critical        : 75.000
 Lower Non-Critical    : 225.000
 Upper Non-Critical    : 1725.000
 Upper Critical        : 1800.000
 Upper Non-Recoverable : 1875.000
 Positive Hysteresis   : 75.000
 Negative Hysteresis   : 75.000
 Assertion Events      :
 Assertions Enabled    : lcr- lnr- unc+ ucr+ unr+
 Deassertions Enabled  : lcr- lnr- unc+ ucr+ unr+
The first thing I had to solve / realize was that the Hysteresis values were seemingly causing me grief by altering the values I was setting. Once I bumped the upper numbers I was setting, I ran into the fact that the RPM isn't properly read at low speed and it simply fails into full speed fans again and again. There is no way to turn off the Lower Non-Recoverable Assertion and setting it to 0 does nothing. Setting to -1 merely gets rounded / adjusted to 0 but then I realized I neded to get past that Hysteresis issue again.
Setting sensor "FAN1" Lower Non-Recoverable threshold to -100.000
Setting sensor "FAN1" Lower Critical threshold to -100.000
Setting sensor "FAN1" Lower Non-Critical threshold to -100.000

Sensor ID              : FAN1 (0x41)
 Entity ID             : 29.1
 Sensor Type (Threshold)  : Fan
 Sensor Reading        : 0 (+/- 0) RPM
 Status                : Lower Non-Recoverable
 Lower Non-Recoverable : 19125.000
 Lower Critical        : 19125.000
 Lower Non-Critical    : 19125.000
 Upper Non-Critical    : 1875.000
 Upper Critical        : 1950.000
 Upper Non-Recoverable : 2100.000
 Positive Hysteresis   : 75.000
 Negative Hysteresis   : 75.000
 Assertion Events      :
 Assertions Enabled    : lcr- lnr- unc+ ucr+ unr+
Deassertions Enabled  : lcr- lnr- unc+ ucr+ unr+
Now, the server stays quiet after ipmitool sensor thresh FAN1 lower -100 -100 -100; though I suppose you could simply set a high value instead of encouraging the overflow. It's always in a state of Lower Non-Recoverable Assertion and it simply ignores it.