Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are DallasSensor temperature readings averaged or median'ed #2543

Open
CraigMarkwardt opened this issue Oct 5, 2022 · 5 comments
Open

Are DallasSensor temperature readings averaged or median'ed #2543

CraigMarkwardt opened this issue Oct 5, 2022 · 5 comments
Labels

Comments

@CraigMarkwardt
Copy link
Contributor

Device

DS18B20

Version

1.15.0-dev

Question

I have a group of four DS18B20 sensors attached to a Wemos D1 Mini. These are all embedded in the ice of an ice rink. Occasionally, maybe two times per day, sometimes 3-4 times and sometimes never in a day, one of the sensors will report spurious values. I am requesting values every six seconds, and "averaging" 5 samples.
image
(the larger variations at the beginning of the sample set are when the temperature control loop was open loop)

Are the "every 5" values averaged or median'ed? When I look at the sensor.cpp code I see the default filter for temperature data is median, rather than a running average, but the code is complicated enough that it's hard to say.

When I look at the spurious filtered values, they are exact multiple of the correct value.
(spurious value) = (true value) x factor
where factor is 0.5, 0.75, 0.833, etc. The "true value" is found by looking by the near neighbors in time.

Here is a plot of X=(true value) and Y=(spurious value)/(true value) and the multiples are shown with dashed lines.
image

My thought is that the averaging is not being done properly, or a spurious sensor value of "0" is being inserted into an averaging filter and propagating through. I see that checksums seem to be enabled for DallasSensor.h so in principle the raw values inserted into the filter should be correct. Also, typically "median" filtering should not produce such a multiple effect, and instead should ignore such an outlier. Just trying to figure out where this could come from.

@CraigMarkwardt
Copy link
Contributor Author

And, I think from reading the code, there is a way to change a setting to determine which filter is used, but I don't see how this should be done.

@mcspr
Copy link
Collaborator

mcspr commented Oct 6, 2022

1.15.0-dev

This does not tell much :) How do you build? You should be able to see git hash throughout our build (WebUI, boot message, info in terminal), or when using any desktop client UI or by running git rev-parse HEAD while PWD is anywhere in the repo clone directory.

I am requesting values every six seconds, and "averaging" 5 samples.

Which means you have sensor reading every 6 seconds and sensor report set to 5 readings? For example, if you pull data from HTTP API and also use real-time setting, value is really the last reading not the value from the filter. iirc it is the same thing with Prometheus and Influx (...but I would need to re-check that)

Are the "every 5" values averaged or median'ed? When I look at the sensor.cpp code I see the default filter for temperature data is median, rather than a running average, but the code is complicated enough that it's hard to say.

Filter itself is set up on boot (check out get tmpFilter0 in the terminal), and then we only control its capacity (get snsReport / SENSOR_REPORT_EVERY flag).

if (!magnitude.filter) {
magnitude.filter_type = getSetting(
settings::keys::get(magnitude, settings::suffix::Filter),
magnitude::defaultFilter(magnitude));
magnitude.filter = magnitude::makeFilter(magnitude.filter_type);
}
// Some filters must be able store up to a certain amount of readings.
if (magnitude.filter->capacity() != reportEvery()) {
magnitude.filter->resize(reportEvery());
}

static size_t report_count { 0 };

report_count = (report_count + 1) % reportEvery();

(which by the looks of it may cause some wrong overall data as it happens before the value is actually read, and it is also shared between all measurements. plus the fact that we may report based on delta settings causing a filter reset)

From the description above, we allocate 5+1 slots for values in the filter and push values every reading. Since there is a +1, latest value is kept for the next report loop.

magnitude.last = value.raw;
magnitude.filter->update(value.raw);

void update(double value) override {
if (_values.size() < _values.capacity()) {
_values.push_back(value);
}
}

void resize(size_t capacity) override {
if (_capacity != capacity) {
_capacity = capacity;
_values.clear();
_values.reserve(_capacity + 1);
}
}

double value() const override {
double sum { 0.0 };
if (_values.size() > 2) {
auto median = [](double previous, double current, double next) {
if (previous < current) {
if (current < next) {
return current;
} else if (previous < next) {
return next;
} else {
return previous;
}
} else if (previous < next) {
return previous;
} else if (current < next) {
return next;
}
return current;
};
for (auto prev = _values.begin(); prev != (_values.end() - 2); ++prev) {
sum += median(*prev, *std::next(prev, 1), *std::next(prev, 2));
}
sum /= (_values.size() - 2);
} else if (_values.size() > 0) {
sum = _values.front();
}
return sum;
}

if (report) {
value.filtered = magnitude::process(magnitude, magnitude.filter->value());
// Make sure that report value is calculated using every read value before it
magnitude.filter->reset();

void reset() override {
if (_values.size() > 2) {
_values[0] = _values.back();
_values.resize(1);
} else {
_values.clear();
}
}

You should be able to pick both base filter and the filter code headers and build them using host CXX. Or, use the code/test/unit facilities to add another test case, if we have some sample data that we could compare with.

@CraigMarkwardt
Copy link
Contributor Author

Hi @mcspr , I know I'm responding a year later. Illness and family issues intervened and then this issue got pushed far down the stack. But I'm replying because I have found some bugs that can be easily fixed. Also, a lot has changed in one year so I was able to apply fixes to the current codebase.

These are a few simple diffs. Hopefully they can be easily applied. I don't frequently develop with github and I know I'll spend hours doing all the dancing to get a pull request prepared, so I hope you can just paste these in.

Results up front

Before the fixes ("noise spikes" due to invalid CRC data being reported as 0-temperature)
image

After the fixes (no invalid CRCs survive to reporting)
image

BUG 1: some sensors don't know their status until value()

The current code checks the sensor's status() after reading but before calling value(). DallasSensor only computes the checksum during value() so it doesn't know about errors until then.

This fix puts a check after value() as well.

diff --git a/code/espurna/sensor.cpp b/code/espurna/sensor.cpp
index 0d5504d5..666bebf9 100644
--- a/code/espurna/sensor.cpp
+++ b/code/espurna/sensor.cpp
@@ -3969,6 +3970,10 @@ void loop() {

             value.raw = magnitude.sensor->value(magnitude.slot);

+           if (!magnitude.sensor->status()) {
+                continue;
+            }
+
             // But, completely remove spurious values if relay is OFF
 #if RELAY_SUPPORT && SENSOR_POWER_CHECK_STATUS
             switch (magnitude.type) {

BUG 2: DallasSensor can report values before first reading

The DallasSensor technique is to pre-allocate zeroed memory for each sensor's scratchpad. It turns out that "all zeroes" is a legitimate scratchpad, including CRC, so it passes the CRC check. Until the first reading is successfully pulled from the actual sensor, DallasSensor is happy to return "0" temperature values. This usually happens for the first 1 or two samples, depending on snsRead.

The following fix adds some salt to the buffers to prevent a CRC clash until a true read can occur. Also the _error status is set to something not-OK, though I'm not sure that's very useful.

diff --git a/code/espurna/sensors/DallasSensor.h b/code/espurna/sensors/DallasSensor.h
index 6a4415a4..92e4ff67 100644
--- a/code/espurna/sensors/DallasSensor.h
+++ b/code/espurna/sensors/DallasSensor.h
@@ -140,9 +140,16 @@ class DallasSensor : public BaseSensor {
                 _previous = _gpio;
             }

+           // Make sure the buffers have invalid CRCs
+           for (size_t index = 0; index < _devices.size(); ++index) {
+             _devices[index].data[0] ^= 0xcc;
+             _devices[index].data[1] ^= 0xcc;
+           }
+
             _last_reading = TimeSource::now();
             _ready = true;
             _dirty = false;
+           _error = SENSOR_ERROR_NOT_READY;

         }

BUG 3: MovingAverage filter is a sum, not an average

The current MovingAverage filter computes a sum but never actually divides by the count to get an average. The result is the incorrect value for the average.

diff --git a/code/espurna/filters/MovingAverageFilter.h b/code/espurna/filters/MovingAverageFilter.h
index f255745b..96de5931 100644
--- a/code/espurna/filters/MovingAverageFilter.h
+++ b/code/espurna/filters/MovingAverageFilter.h
@@ -15,6 +15,7 @@ public:
         _sum = _sum + value - _values[_sample];
         _values[_sample] = value;
         _sample = (_sample + 1) % _values.capacity();
+       if (_count < _values.capacity() ) _count ++;
     }

     size_t capacity() const override {
@@ -22,12 +23,18 @@ public:
     }

     double value() const override {
-        return _sum;
+        if ( _count == 0 ) return 0;
+       return (double) _sum / _count;
+    }
+
+    bool status() const override {
+      return ( _count > 0 );
     }

     void resize(size_t size) override {
         _sum = 0.0;
         _sample = 0;
+        _count = 0;
         _values.clear();
         _values.resize(size, 0.0);
     }
@@ -35,5 +42,6 @@ public:
 private:
     std::vector<double> _values {{0.0}};
     size_t _sample = 0;
+    size_t _count = 0;
     double _sum = 0;
 };

As you will see, I added the status() method so the filter can indicate whether the filter has been properly initialized. That's optional, and below I have a patch for all the filters to implement status(). And then sensor.cpp has been modified to prevent reporting until the filter has been initialized.

SIDE NOTE: I don't think the MedianFilter is actually doing a median filter, but more like some kind of moving average. I'm not sure what was intended there, so it's hard to suggest a fix.

Changes to filters

diff --git a/code/espurna/filters/BaseFilter.h b/code/espurna/filters/BaseFilter.h
index 9e5cd672..19c0b0af 100644
--- a/code/espurna/filters/BaseFilter.h
+++ b/code/espurna/filters/BaseFilter.h
@@ -29,4 +29,8 @@ public:

     // Return filtered value
     virtual double value() const = 0;
+
+    virtual bool status() const {
+      return true;
+    }
 };
diff --git a/code/espurna/filters/LastFilter.h b/code/espurna/filters/LastFilter.h
index 45cc50f6..192445d7 100644
--- a/code/espurna/filters/LastFilter.h
+++ b/code/espurna/filters/LastFilter.h
@@ -11,10 +11,16 @@ class LastFilter : public BaseFilter {
 public:
     void update(double value) override {
         _value = value;
+       _status = true;
     }

     void reset() override {
         _value = 0;
+       _status = false;
+    }
+
+    bool status() const override {
+        return _status;
     }

     double value() const override {
diff --git a/code/espurna/filters/MaxFilter.h b/code/espurna/filters/MaxFilter.h
index f24a6a63..3057aae3 100644
--- a/code/espurna/filters/MaxFilter.h
+++ b/code/espurna/filters/MaxFilter.h
@@ -13,10 +13,16 @@ class MaxFilter : public BaseFilter {
 public:
     void update(double value) override {
         _value = std::max(value, _value);
+       _status = true;
     }

     void reset() override {
         _value = 0;
+       _status = false;
+    }
+
+    bool status() const override {
+        return _status;
     }

     double value() const {
@@ -25,4 +31,5 @@ public:

 private:
     double _value = 0;
+    bool _status = false;
 };
diff --git a/code/espurna/filters/MedianFilter.h b/code/espurna/filters/MedianFilter.h
index bf977856..5386a4dd 100644
--- a/code/espurna/filters/MedianFilter.h
+++ b/code/espurna/filters/MedianFilter.h
@@ -27,6 +27,11 @@ public:
         }
     }

+
+    bool status() const override {
+        return ( _values.size() > 0 );
+    }
+
     double value() const override {
         double sum { 0.0 };

and to allow sensors to take advantage of this

diff --git a/code/espurna/sensor.cpp b/code/espurna/sensor.cpp
index 0d5504d5..666bebf9 100644
--- a/code/espurna/sensor.cpp
+++ b/code/espurna/sensor.cpp
@@ -4009,6 +4014,11 @@ void loop() {
             // Initial status or after report counter overflows
             bool report { ready_to_report() };

+           // If the filter is not yet fully initialized, don't report
+           if (! magnitude.filter->status() ) {
+             report = false;
+           }
+
             // In case magnitude was configured with ${name}MaxDelta, override report check
             // when the value change is greater than the delta
             if (!std::isnan(magnitude.reported) && (magnitude.max_delta > build::DefaultMaxDelta)) {

Thanks for your help!

mcspr added a commit that referenced this issue Feb 8, 2024
mcspr added a commit that referenced this issue Feb 8, 2024
sensor loop assumes value(index) value is ok
ref. #2543

move all checks to pre(), value() returns pre-read value
- tick() and pre() are allowed to fail specific device
- pre() would fail the sensor if any of devices fail
- value(index) is never reached unless tick() and pre() succeed
@mcspr
Copy link
Collaborator

mcspr commented Feb 8, 2024

BUG 1: some sensors don't know their status until value()

True, but that was the intention behind sensor->pre() which is followed by sensor error check in sensor loop. I have changed most of the sensors to report value() from a pre-read value to avoid this issue specifically, dallas was overlooked so far but the commit above have moved things around.

edit: although... it can be also handled by adding a check for 'sensor->error(index)', so sensor does not lose everything on the wire because of N faulty readings. By default, simply reading 'sensor->-error()`. If sensor is 'multi'-sensor, readings can be discarded based on specific device issue (and dallas code right now already includes part of the per-device error reporting)

BUG 2: DallasSensor can report values before first reading

Should be fixed, since pre() would not allow value readings?

BUG 3: MovingAverage filter is a sum, not an average
SIDE NOTE: I don't think the MedianFilter is actually doing a median filter, but more like some kind of moving average. I'm not sure what was intended there, so it's hard to suggest a fix.

See https://github.com/xoseperez/espurna/blob/dev/code/test/unit/src/filters/filters.cpp
This is how it works right now, with the expected values from all filters. I would expect this would be much easier to run through some specific values compared to just waiting for the device to manifest them.

espurna/ci_script.sh

Lines 9 to 15 in 37d2837

# runs unit tests, using the host compiler and the esp8266 mock framework
# - https://github.com/esp8266/Arduino/blob/master/tests/host/Makefile
# - https://github.com/ThrowTheSwitch/Unity
pushd test/unit
cmake -B build
cmake --build build
cmake --build build --target test

mcspr added a commit to mcspr/espurna that referenced this issue Mar 2, 2024
ref. xoseperez#2543
search the bus preemptively and create sensor instances based on that
more clean-up for error handling, allow specific device to fail independently

also adds runtime settings
- `dallasIntvl` to change default 2s reading interval
- `dallasParasite` to change from non-powered mode
- `dallasPin` to swap onewire pin
mcspr added a commit to mcspr/espurna that referenced this issue Mar 17, 2024
ref. xoseperez#2543
search the bus preemptively and create sensor instances based on that
more clean-up for error handling, allow specific device to fail independently

most of the sensor was re-written
- general one-wire operations are part of the system driver now
- split class definitions for digital and temperature sensors
- system timer instead of polling, enforce ordering by forcing specific
  instance to handle conversion request, while notifying every other ones
- synchronize readings and conversion requests with the sensor reading
  interval, instead of relying on internal polling. now, it happens
  right after begin() and pre() instead of previosly (almost) random tick() calls
- auto-detect conversion time based on device resolution / resolution setting

initialization also adds runtime settings (read on boot)
- `dallasPin` to set current one-wire pin
- `dallasParasite` to change from non-powered mode (default is still on)
- `dallasResolution` to force sensor to be operated using 9...12bit res
  (0 to keep default one)
mcspr added a commit to mcspr/espurna that referenced this issue Mar 17, 2024
ref. xoseperez#2543
search the bus preemptively and create sensor instances based on that
more clean-up for error handling, allow specific device to fail independently

most of the sensor was re-written
- general one-wire operations are part of the system driver now
- split class definitions for digital and temperature sensors
- system timer instead of polling, enforce ordering by forcing specific
  instance to handle conversion request, while notifying every other ones
- synchronize readings and conversion requests with the sensor reading
  interval, instead of relying on internal polling. now, it happens
  right after begin() and pre() instead of previosly (almost) random tick() calls
- auto-detect conversion time based on device resolution / resolution setting

initialization also adds runtime settings (read on boot)
- `dallasPin` to set current one-wire pin
- `dallasParasite` to change from non-powered mode (default is still on)
- `dallasResolution` to force sensor to be operated using 9...12bit res
  (0 to keep default one)
mcspr added a commit to mcspr/espurna that referenced this issue Mar 19, 2024
ref. xoseperez#2543
search the bus preemptively and create sensor instances based on that
more clean-up for error handling, allow specific device to fail independently

most of the sensor was re-written
- general one-wire operations are part of the system driver now
- split class definitions for digital and temperature sensors
- system timer instead of polling, enforce ordering by forcing specific
  instance to handle conversion request, while notifying every other ones
- synchronize readings and conversion requests with the sensor reading
  interval, instead of relying on internal polling. now, it happens
  right after begin() and pre() instead of previosly (almost) random tick() calls
- auto-detect conversion time based on device resolution / resolution setting

initialization also adds runtime settings (read on boot)
- `dallasPin` to set current one-wire pin
- `dallasParasite` to change from non-powered mode (default is still on)
- `dallasResolution` to force sensor to be operated using 9...12bit res
  (0 to keep default one)
@mcspr
Copy link
Collaborator

mcspr commented May 15, 2024

test/dev is merged with dev, so these are no longer lingering commits...
5e599ec 3c61604 b580dd4 e92be04 6c382d7

'sensor->error(index)', 'sensor->value(index)' etc. is solved similar to other sensors - using separate class instance for each real sensor.

Test suggestion still stands, should the filter implementation(s) need a change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants