Alert CTR Score - Reseach discussion #13638

andrewm4894 · 2022-09-07T12:24:52Z

andrewm4894
Sep 7, 2022

This is a discussion around some internal research we are doing related to Alert CTR prediction that could end up as a feature in Netdata.

Idea

Build a model that will score each alert based on the probability of a click. This "Alert CTR Score" can then be used to rank, sort, filter etc alerts based on those the ML has learned tend to be more likely or less likely (than average) to result in a click.

Approach

Take all the clicks from alert emails sent by Netdata as our positive examples. Randomly sample a similar number of alert emails that did not result in clicks. This becomes the training data for a binary classification model. This model can then be used to score new alerts - those with a high score should tend on average to be more likely to solicit a click or response for the user than those with a low score.

User Value

A decent "Alert CTR Score" can then be just another Lego block that users could use in deciding how to filter/sort/respond to alerts.

Obviously, a low alert ctr score does not mean the alert "does not matter" it just means that on average users maybe have less of a tendency to click on such alerts then they might for one with a higher score.

Pros

A data driven way that users could use to help sort and navigate alerts.
Another potential "Lego block" (in addition to anomaly rate) for helping with alert fatigue.
Relatively easy to implement and deploy based on alert notification data.

Cons

How to deliver such a feature in a non-confusing way to users - e.g. alert ctr score of 80% does not mean an 80% probability you should respond to the alert.

Initial Research Results

If we train a model on full month of August data and then use a sample of emails from September, we see a plot like below. In this plot we have taken a random sample of 250,000 email alerts sent by Netdata in September (so not ever seen or trained on by the model) scored each one based on the model and then sorted all those alerts into 10 deciles. So, for example, decile 9 is basically the top 10% of scored alerts. In this group of alerts, we see that the average alert ctr score ("true prob mean") was 66.81% and the actual alert ctr rate ("true true mean") was 0.768%. This is (0.00768/0.002484) = 3.09 times higher ("uplift factor") than the average across the full sample of alerts (a "no model" benchmark of if you had to randomly guess). The difference between the lowest decile and the highest decile is even bigger at (0.00768/0.0008) = 9.6 times uplift from using the alert ctr score to rank alerts when it comes to comparing the lowest scored alerts to the highest scored alerts. This makes sense as it seems the model has learned what sorts of alerts very very rarely get clicked on and what ones have a much higher likelihood of getting clicked on. Its also nice then to see that the actual alert ctr rate follows the alert ctr score deciles as one would hope for.

If we do a similar exercise 50 times on a random sample of 50,000 alerts from the holdout data and plot the same lines we can see below plot that shows the stability of this result,

andrewm4894 · 2022-09-19T15:10:47Z

andrewm4894
Sep 19, 2022
Author

I'm noticing that the model is overfitting a bit on the alarm values themselves. Will have to go back to the drawing board a little to "round" or "bin" the alarm values in some sensible way such that the model has less chance to just overfit on really specific alarm values.

Some logic to try control for this, by for example rounding % values to just nearest 5% integer and things like that. Along with more data will help here.

Will iterate a bit more on this as is a problem i need to try resolve a bit.

0 replies

andrewm4894 · 2023-10-05T12:19:05Z

andrewm4894
Oct 5, 2023
Author

cross linking as have graduated this discussion into this issue netdata/netdata-cloud#760

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alert CTR Score - Reseach discussion #13638

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Alert CTR Score - Reseach discussion #13638

andrewm4894 Sep 7, 2022

Idea

Approach

User Value

Pros

Cons

Initial Research Results

Replies: 2 comments

andrewm4894 Sep 19, 2022 Author

andrewm4894 Oct 5, 2023 Author

andrewm4894
Sep 7, 2022

andrewm4894
Sep 19, 2022
Author

andrewm4894
Oct 5, 2023
Author