Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hydra_lm_train.py 사용법 #216

Open
apg0001 opened this issue Jan 25, 2024 · 0 comments
Open

hydra_lm_train.py 사용법 #216

apg0001 opened this issue Jan 25, 2024 · 0 comments

Comments

@apg0001
Copy link

apg0001 commented Jan 25, 2024

❓ Questions & Help

수고 많으십니다. 음성인식을 공부하고 있는 학생입니다. openspeech 사용 중 막힌 부분이 있어 질문드립니다.

  1. hydra_lm_train.py 파일을 실행시키면 다음과 같은 오류가 발생했다고 합니다.
    tokenizer가 없는 keyword라고 하는데 tokenizer를 지정 안 해주면 지정해주라고 하는데,,, 제 환경이 문제인 걸까요?
  2. hydra_lm_train.py 역할이 무엇인가요? language model을 만드는 것인지 Acoustic model에 language model을 붙여서 새로운 모델을 만드는 것인지 궁금합니다. 이게 아니라면 acoustic model에 다른 language model을 붙이는 방법이 있을까요?
  3. hydra_lm_train.py 사용 예시 코드를 볼 수 있을까요?

감사합니다.

Details

--입력
python ./openspeech_cli/hydra_lm_train.py dataset=ksponspeech dataset.dataset_path=C:\Users\lab1080\Desktop\openspeech\KsponSpeech dataset.test_dataset_path=C:\Users\lab1080\Desktop\openspeech\KsponSpeech_eval dataset.test_manifest_dir=C:\Users\lab1080\Desktop\openspeech\KsponSpeech_scripts dataset.manifest_file_path=C:\Users\lab1080\Desktop\openspeech\KSPONSPEECH_AUTO_MANIFEST model=listen_attend_spell lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu criterion=cross_entropy tokenizer=kspon_character

--출력
[2024-01-25 13:34:14,597][openspeech.utils][INFO] - dataset:
dataset: ksponspeech
dataset_path: C:\Users\lab1080\Desktop\openspeech\KsponSpeech
test_dataset_path: C:\Users\lab1080\Desktop\openspeech\KsponSpeech_eval
manifest_file_path: C:\Users\lab1080\Desktop\openspeech\KSPONSPEECH_AUTO_MANIFEST
test_manifest_dir: C:\Users\lab1080\Desktop\openspeech\KsponSpeech_scripts
preprocess_mode: phonetic
criterion:
criterion_name: cross_entropy
reduction: mean
lr_scheduler:
lr: 0.0001
scheduler_name: warmup_reduce_lr_on_plateau
lr_patience: 1
lr_factor: 0.3
peak_lr: 0.0001
init_lr: 1.0e-10
warmup_steps: 4000
model:
model_name: listen_attend_spell
num_encoder_layers: 3
num_decoder_layers: 2
hidden_state_dim: 512
encoder_dropout_p: 0.3
encoder_bidirectional: true
rnn_type: lstm
joint_ctc_attention: false
max_length: 128
num_attention_heads: 1
decoder_dropout_p: 0.2
decoder_attn_mechanism: dot
teacher_forcing_ratio: 1.0
optimizer: adam
trainer:
seed: 1
accelerator: dp
accumulate_grad_batches: 1
num_workers: 4
batch_size: 32
check_val_every_n_epoch: 1
gradient_clip_val: 5.0
logger: wandb
max_epochs: 20
save_checkpoint_n_steps: 10000
auto_scale_batch_size: binsearch
sampler: else
name: gpu
device: gpu
use_cuda: true
auto_select_gpus: true
tokenizer:
sos_token:
eos_token:
pad_token:
blank_token:
encoding: utf-8
unit: kspon_character
vocab_path: ../../../aihub_labels.csv

[2024-01-25 13:34:14,606][openspeech.utils][INFO] - Operating System : Windows 10
[2024-01-25 13:34:14,606][openspeech.utils][INFO] - Processor : Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
[2024-01-25 13:34:14,607][openspeech.utils][INFO] - CUDA is available : False
[2024-01-25 13:34:14,607][openspeech.utils][INFO] - PyTorch version : 1.13.1+cpu
wandb: Currently logged in as: apg0001 (dguyanglab). Use wandb login --relogin to force relogin
wandb: wandb version 0.16.2 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.15.12
wandb: Run data is saved locally in C:\Users\lab1080\Desktop\openspeech\outputs\2024-01-25\13-34-14\wandb\run-20240125_133417-5gndlwut
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run listen_attend_spell-ksponspeech
wandb: View project at https://wandb.ai/dguyanglab/listen_attend_spell-ksponspeech
wandb: View run at https://wandb.ai/dguyanglab/listen_attend_spell-ksponspeech/runs/5gndlwut
Traceback (most recent call last):
File "./openspeech_cli/hydra_lm_train.py", line 45, in hydra_main
data_module.setup(tokenizer=tokenizer)
TypeError: setup() got an unexpected keyword argument 'tokenizer'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant