Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are the "efficient-kan" and "official-kan" equivalent in terms of algorithms? #18

Closed
yuedajiong opened this issue May 11, 2024 · 12 comments

Comments

@yuedajiong
Copy link

as title

@Indoxer
Copy link

Indoxer commented May 12, 2024

As I know almost the same, only official version looks to have additional bias after each layer. Also, I am not sure if initialization is the same. + regularization loss is changed because of optimizations.

@yuedajiong
Copy link
Author

@Indoxer Thanks, you are so kindly.

@WhatMelonGua
Copy link

No, I'm not quite sure
I tried the official tutorial on the following link: Tutorial

*Including the use of the official LBFGS training strategy
The results showed that after completing all the one-time training, the model was almost identical to the official one
But if training is conducted in phases, it cannot be perfectly fitted(But the model is still effective, just slightly underperforming)
image
official KAN
image
Eff-KAN

@WhatMelonGua
Copy link

WhatMelonGua commented May 13, 2024

I think this is acceptable, after all, the model is very efficient, and some losses are normal. It's strange if there are no losses at all. While it effectively retains the characteristics of the official model, it also combines training optimization

@Indoxer
Copy link

Indoxer commented May 13, 2024

@WhatMelonGua, are you sure, that you didn't train spline_scaler and base_weights? Also did you have the same parameters in LBFGS optimizer (number of steps, etc.)?

@Indoxer
Copy link

Indoxer commented May 13, 2024

(spline_scaler not trained, base_weights not trained)
spline_scaler not trained, base_weights not trained
(spline_scaler trained, base_weights trained):
spline_scaler trained, base_weights trained

(I am using my modified version (but the same algorithm as efficient kan), so I am not sure)

@WhatMelonGua
Copy link

@WhatMelonGua, are you sure, that you didn't train spline_scaler and base_weights? Also did you have the same parameters in LBFGS optimizer (number of steps, etc.)?

Oh, yes, forgive me for forgetting
There are no such parameters, so for that reg_ variable (I don't know what it is), I simply took the default value of 1 and fixed many errors (perhaps I was fixing it blindly, just making it work)
And then the result was that the official "LBFGS" cannot be directly migrated here

@WhatMelonGua
Copy link

(spline_scaler not trained, base_weights not trained) spline_scaler not trained, base_weights not trained (spline_scaler trained, base_weights trained): spline_scaler trained, base_weights trained

(I am using my modified version (but the same algorithm as efficient kan), so I am not sure)

This may seem like our operations are similar
What a coincidence! 🤗

@Indoxer
Copy link

Indoxer commented May 13, 2024

@WhatMelonGua, are you sure, that you didn't train spline_scaler and base_weights? Also did you have the same parameters in LBFGS optimizer (number of steps, etc.)?

Oh, yes, forgive me for forgetting There are no such parameters, so for that reg_ variable (I don't know what it is), I simply took the default value of 1 and fixed many errors (perhaps I was fixing it blindly, just making it work) And then the result was that the official "LBFGS" cannot be directly migrated here

reg_ is regularization loss. loss = train_loss + lamb * reg_ for continual learning lamb=0.0 so loss = train_loss

@Indoxer
Copy link

Indoxer commented May 13, 2024

Here are my results and code, so you can compare

@Blealtan
Copy link
Owner

AFAIK the only difference is that the "efficient" regularization loss is different from the official one. But I'm not sure if the parallel associativity will introduce numerical error that's large enough to break some important features.

@Blealtan
Copy link
Owner

Blealtan commented May 20, 2024

Just found that I missed the bias term after each layer. Will update that soon.

I scanned over this long thread few days ago and totally missed the comment by @Indoxer lol

Repository owner locked and limited conversation to collaborators May 20, 2024
@Blealtan Blealtan converted this issue into discussion #35 May 20, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

4 participants