Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dell Server - Cycles without Settling #201

Open
tjbennett opened this issue Feb 6, 2023 · 4 comments
Open

Dell Server - Cycles without Settling #201

tjbennett opened this issue Feb 6, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@tjbennett
Copy link

tjbennett commented Feb 6, 2023

Describe the bug
Dell Server 3930 - has 3 fans all of which can be controlled by a temperature sensor (or many sensors). Attempting to replace fancontrol with an easier, smoother method.

Install fan2go and tried to build a yaml file to represent server conditions

Use a definition of single cpu fan and representative sensor (cpu temp sensor).

In Dell 3930, /sys/class/hwmon/hwmon3/pwm1 represents as 128 even when in slow cycles. Therefore reading back a PWM setting is fraught with issues. A PWM setting should represent an RPM value and instead of reading back the PWM settings a relation to the RPM would be more representative of the PWM setting.

To Reproduce

Install fan2go, using go 1.20, and set yaml as following:

dbPath: "/etc/fan2go/fan2go.db"

runFanInitializationInParallel: false
maxRpmDiffForSettledFan: 250

tempSensorPollingRate: 2000ms
tempRollingWindowSize: 10

rpmPollingRate: 1s
rpmRollingWindowSize: 10

controllerAdjustmentTickRate: 200ms

fans:
  - id: cpu
    hwmon:
      platform: dell_smm-virtual-0
      index: 1
    neverStop: true
    curve: cpu_curve
    minPwm: 128
    startPwm: 128
    maxPwm: 255
    pwmMap:
      0 : 128
      170: 138
      255: 255

  - id: pcie
    hwmon:
      platform: dell_smm-virtual-0
      index: 2
    neverStop: true
    curve: pcie_curve
    minPwm: 128
    startPwm: 128
    maxPwm: 255
    pwmMap:
      0 : 128
      170: 138
      255: 255

  - id: psup
    hwmon:
      platform: dell_smm-virtual-0
      index: 3
    neverStop: true
    curve: psup_curve
    minPwm: 128
    startPwm: 128
    maxPwm: 255
    pwmMap:
      0 : 128
      170: 138
      255: 255

sensors:
  - id: cpu_package
    hwmon:
      platform: dell_smm-virtual-0
      index: 1


curves:
  - id: cpu_curve
    linear:
      sensor: cpu_package
      steps:
        - 40: 128
        - 50: 150
        - 80: 255

  - id: pcie_curve
    linear:
      sensor: cpu_package
      steps:
        - 40: 128
        - 50: 150
        - 80: 255

  - id: psup_curve
    linear:
      sensor: cpu_package
      steps:
        - 40: 128
        - 50: 150
        - 80: 255

  - id: case_avg_curve
    function:
      type: average
      curves:
        - cpu_curve
        - pcie_curve
        - psup_curve
  1. Start fan2go - in this instance used superuser "# go run main.go -c "fan2go.yaml""
  2. Observe system fail as writing "0" to pwm does not result in pwm reading as "0"
  3. Remove the check on matching values and get the system building the curve information
  4. Once that is done, let the system run. It will go high (RPM set to 255) then drop to low (128) in a cyclic manner, not in any way related to the temp sensor
  5. Run "stress --cpu 6" temperature rises and system does not adjust to cool system

Expected behavior
Expect the system to ramp up fan speed when temperature increases, slowly ramp fan system down as system cools.

Desktop (please complete the following information):

  • Distro: Ubuntu 20.04 server version

  • uname -a: Linux SG30d042e20c0a 5.4.0-137-generic Bugfix/#153 pwm settings improvements #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

  • sensors -v: sensors version 3.6.0 with libsensors version 3.6.0

  • fan2go version: (dev) b3f19b3

Additional context
There may be a great misunderstanding of the configuration or the intent of this program that has created this report.

@tjbennett tjbennett added the bug Something isn't working label Feb 6, 2023
@markusressel
Copy link
Owner

markusressel commented Feb 7, 2023

Hey @tjbennett thx for the report 🚀

FYI: For better readability, I have removed the comments from your configuration.

Observe system fail as writing "0" to pwm does not result in pwm reading as "0"

fan2go should not fail because of this, it should issue warnings about it though.

Remove the check on matching values and get the system building the curve information

If you set minPwm and maxPwm, building the curve is (currently) without use, since the curve is only used to determine these values. If they are already set in the config, the curve is not build because it is not necessary.

Once that is done, let the system run. It will go high (RPM set to 255) then drop to low (128) in a cyclic manner, not in any way related to the temp sensor

You specify a custom pwmMap definition for all of your fans:

  minPwm: 128
  startPwm: 128
  maxPwm: 255
  pwmMap:
    0 : 128
    170: 138
    255: 255

How did you get to these values? Are there no possible PWM values other than the three you have specified?

Setting a minPwm of 128 in combination with the given pwmMap will result in the fan always beeing set to at least 170, which will be mapped to a PWM value of 138 due to your config. There might be a misunderstanding in how the pwmMap config option works. Check out #161 (caution, very long thread) to get a more in-depth understanding of what the pwmMap config option is doing.

Increasing the value of tempSensorPollingRate to a 10 times greater value than default might also play into this. Can you test if reverting this to the default 200ms fixes the cycling?

@tjbennett
Copy link
Author

@markusressel Your response is helpful, I will go through the link you sent and using more knowledge attempt this again... hopefully by tomorrow.

@tjbennett
Copy link
Author

So @markusressel I have done that reading and found the similar pattern in the Dell server as a PWM input/RPM output.
I had completely misread/misunderstood the PWM settings and pwmMap.

The 3930 has a step functionality:

[0-64] -> idling fan, speed not truly constant
[65-191] -> speed step to ~4000RPM
[192-255] -> speed step to ~1490 RPM

Graph RPM v PWM ![annotated](https://user-images.githubusercontent.com/375072/218115263-31cda5e5-58df-4820-8533-bf1200bc0b4a.jpg)

The PWM value read back from /sys/class/hwmon/hwmon3/pwm1, is always 128 until the value written to pwm1 hits 192 and there is a pause for 6 seconds

I simplified the yaml file and get an "invalid argument" panic.

Graph RPM v PWM dbPath: "/etc/fan2go/fan2go.db"

runFanInitializationInParallel: false
maxRpmDiffForSettledFan: 250

tempSensorPollingRate: 200ms
tempRollingWindowSize: 10

rpmPollingRate: 1s
rpmRollingWindowSize: 10

controllerAdjustmentTickRate: 200ms

fans:

  • id: cpu
    hwmon:
    platform: dell_smm-isa-0000
    index: 1
    curve: cpu_curve
    pwmMap:
    0: 128
    68: 128
    192: 255

sensors:

  • id: cpu_package
    hwmon:
    platform: dell_smm-isa-0000
    index: 1

curves:

  • id: cpu_curve
    linear:
    sensor: cpu_package
    steps:
    - 40: 0
    - 65: 68
    - 85: 192

Panic output: using sudo fan2go -c "./fan2go.yaml"`

INFO Using configuration file at: ./fan2go.yaml
INFO Gathering sensor data for cpu...
INFO Loading fan curve data for fan 'cpu'...
WARNING Fan 'cpu' has not yet been analyzed, starting initialization sequence...
INFO Using pwm map override from config...
DEBUG  Distinct PWM value targets of fan cpu: [0 192]
INFO Measuring RPM curve...
DEBUG  Fan cpu: Actual PWM value differs from requested one, skipping. Requested: 0 Actual: 128
DEBUG  Fan cpu: Actual PWM value differs from requested one, skipping. Requested: 1 Actual: 128
ERROR  Cant attach empty fan curve data to fan cpu
ERROR  Failed to attach fan curve data to fan cpu: invalid argument
INFO Fan controller for fan cpu stopped.
WARNING Cannot send notification, missing env variable 'DISPLAY'!

Panic output when running sudo fan2go fan init -c "./fan2go.yaml" -i cpu

INFO Using configuration file at: ./fan2go.yaml
INFO Using persistence at: /etc/fan2go/fan2go.db
INFO Deleting existing data for fan 'cpu'...
INFO Using pwm map override from config...
INFO Measuring RPM curve...
  ERROR  Cant attach empty fan curve data to fan cpu
  ERROR  Failed to attach fan curve data to fan cpu: invalid argument

Any thoughts on what I have missed here?

@markusressel
Copy link
Owner

Firstly I would suggest to set minPwm, startPwmand maxPwm values, to skip the fan curve analysis. If the fan can only be set to three states, it's not possible to accurately measure it's curve, and the algorithm will fail (apparently in a spectacular manner by panicking 😅 ).

Given your PWM map is accurate, min and start can be set to 0, max can be set to 192. Fan control should then start even with a "fresh install" (or removed fan curve).

The fact that fan2go is panicking is not great though, I will see if there is something we can do about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants