Don't clear clusters on each step of retraining. #12308
andrewm4894
started this conversation in
Ideas
Replies: 1 comment
-
We should do a validation of the anomaly bits when the new option is enabled vs. disabled. Also, @andrewm4894 you should post a |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
(@shyamvalsan as fyi)
@vkalintiris hello - following on from @underhood comment in here somewhere iirc
Would it be an easy change to initialise the cluster centroids at each retraining with the previous centroids?
e.g. just delete the
ClusterCenters.clear();
and either pass them topick_initial_centers()
or even direct tofind_clusters_using_kmeans()
and avoidpick_initial_centers()
altogether when you already have a previously fitted model.From an ML perspective this is a no brainer and actually something I was kicking myself that I did not spot as part of initial implementation (thank you @underhood).
I'm fairly confident it would also help optimize further resource usage as on average we will get better initial candidate centroids (or just use last centroids as the candidates and so do less work) and so the algo will have to do less iterations to converge in the typical case when things only changing a little.
I feel that this averaged over all dimensions (where a lot of the models will not change much at each retraining) there could be some real impact here and feels like could be an easy enough optimization to make (you can correct me if i'm wrong on that one though).
How feasible would it be to try this out and implement it as a param so we could just try it an see? Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions