Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] How to refit a classifier? #6461

Open
TheLegendAli opened this issue May 19, 2024 · 4 comments
Open

[python-package] How to refit a classifier? #6461

TheLegendAli opened this issue May 19, 2024 · 4 comments
Labels

Comments

@TheLegendAli
Copy link

I have a timeseries data that I have to fit a classifier, I would like to re-train it every month with new data coming in. I would like to keep some consistency, so I prefer to pre_warm the tree with the previous boosting trees and refit the data, or just iterate few times to get the results I need. In other words I want similar tree structure more or less.

Seems like the refit function is the best way to do it but unfortunately It doesnt seem to be an option for scikit API only boosters. What is the best way I can approach this? I have done the following thus far:

{'objective': 'multiclassova',
 'num_class': 6,
 'boosting_type': 'gbdt',
 'reg_alpha': 0.0,
 'reg_lambda': 0.0,
 'num_leaves': 103,
 'feature_fraction': 0.9311573062675359,
 'bagging_fraction': 0.9568372729883741,
 'bagging_freq': 1,
 'min_child_samples': 54,
 'learning_rate': 0.056834238901176865,
 'max_depth': 99,
 'min_data_in_leaf': 2,
 'min_gain_to_split': 2.272224629201629,
 'drop_rate': 0.6143619198733702,
 'n_estimators': 232,
 'force_col_wise': True,
 'class_weight': {1: 1.2466666666666666,
  5: 0.29088888888888886,
  3: 0.3177184466019417,
  0: 0.6233333333333333,
  4: 1.3357142857142859,
  2: 3.85},
 'seed': 42,
 'random_state': 42,
 'verbose': -1,
 'eval_metric': 'precision'}
classifier_obj = lgb.LGBMClassifier(**params)
classifier_obj.fit(X_train, y_train, categorical_feature=categorical_data + binary_data)

booster = lgb.train(params={}, train_set=data_set['train_data'], categorical_feature=categorical_data + binary_data, init_model=classifier_obj, keep_training_booster=True, num_boost_round=1)

booster.refit(data=X_train, label=y_train)

I'm sure this is wrong by just looking at the outputs can anyone point to the right direction? Thanks

@jameslamb jameslamb changed the title How to refit a classifier? [python-package] How to refit a classifier? May 20, 2024
@jameslamb
Copy link
Collaborator

jameslamb commented May 26, 2024

Thanks for using LightGBM.

Is it absolutely necessary to "refit" (modify the values of the leaf nodes without changing the total number of trees)? Or would it bee acceptable to add more trees, trained on the newly-arrived data? If you clarify that precisely, it would help us to offer some advice.

Please also see this explanation: https://stackoverflow.com/questions/73664093/lightgbm-train-vs-update-vs-refit/73669068#73669068

@jameslamb
Copy link
Collaborator

jameslamb commented May 26, 2024

Also...I see that you double-posted this here and on Stack Overflow (link). Please do not do that.

Maintainers here also monitor the [lightgbm] tag on Stack Overflow. I could have been spending time preparing an answer here while another maintainer was spending time answering your Stack Overflow post, which would have been a waste of maintainers' limited attention that could otherwise have been spent improving this project. Double-posting also makes it less likely that others with a similar question will find the relevant discussion and answer.

@TheLegendAli
Copy link
Author

TheLegendAli commented May 26, 2024

Hi James, thanks for the response. It would be best if we can use the same exact "refit", if not I can use the update function?

also, I literally posted on StackOverflow about 30 min ago. I waited few days, in-case this got backlog and would take longer than expected to get a respond. Out of respect to you I will link this to StackOverflow. Thanks in advance.

@jameslamb
Copy link
Collaborator

It would be best if we can use the same exact "refit", if not I can use the update function?

What is preventing you from using Booster.refit()? You showed an example using that and said "this is wrong by just looking at the outputs", but didn't share those outputs or explain what is "wrong" about them.

We would be happy to help but need your help to understand what specifically you are looking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants