Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

학습 완료된 모델을 불러와 하나의 음성 파일을 예측하고 싶습니다 #205

Open
youngchannelforyou opened this issue Aug 1, 2023 · 1 comment

Comments

@youngchannelforyou
Copy link

youngchannelforyou commented Aug 1, 2023

❓ Questions & Help

현재 학습 완료된 모델을 불러와 하나의 음성 파일을 예측하고 싶습니다. 패키지 내에 함수들은 다량의 데이터 셋으로 테스트하는 것 같아서요!... 이것저것 보면서 짜고 있는데 계속 에러가나 이렇게 질문드립니다.... 혹시 wav 파일 하나만 가지고 테스트해 해당 예측된 말 소리 텍스트를 볼 수 있을까요? 패키지 내에 어느 부분을 참고하면 될까요?... 죄송합니다

Details

소스코드입니다.

def hydra_main(configs: DictConfig) -> None:
use_cuda = configs.eval.use_cuda and torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")

tokenizer = TOKENIZER_REGISTRY[configs.tokenizer.unit](configs)

model = MODEL_REGISTRY[configs.model.model_name]
model = model.load_from_checkpoint(
    configs.eval.checkpoint_path, configs=configs, tokenizer=tokenizer
)
model.to(device)

audio_path = "/home/net/바탕화면/tool/sample/test.wav"
waveform, sample_rate = torchaudio.load(audio_path)

# Process audio
mel_transform = T.MelSpectrogram(sample_rate=sample_rate, n_mels=80)
mel_spectrogram = mel_transform(waveform)
input_tensor = mel_spectrogram.unsqueeze(0)

# Compute input_lengths
input_lengths = torch.tensor([mel_spectrogram.shape[1]])

# Run inference
with torch.no_grad():
    input_tensor = input_tensor.transpose(
        1, 0
    )  # (배치 크기, 시퀀스 길이, 특성 차원) -> (시퀀스 길이, 배치 크기, 특성 차원)
    outputs = model(input_tensor, input_lengths=input_lengths)

    # Convert predicted tokens to text using decoder
predicted_tokens = outputs["predictions"][0].argmax(dim=-1).tolist()
predicted_sentence = tokenizer.decode(predicted_tokens)

# Print predicted sentence
print(f"Predicted Sentence: {predicted_sentence}")

에러출력본 입니다.
Traceback (most recent call last):
File "/home/net/바탕화면/tool/tool_stt_test.py", line 147, in
hydra_main(configs)
File "/home/net/바탕화면/tool/tool_stt_test.py", line 45, in hydra_main
outputs = model(input_tensor, input_lengths=input_lengths)
File "/home/net/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/net/바탕화면/supspeaker/tool/openspeech/models/openspeech_encoder_decoder_model.py", line 136, in forward
encoder_outputs, encoder_logits, encoder_output_lengths = self.encoder(inputs, input_lengths)
File "/home/net/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/net/바탕화면/supspeaker/tool/openspeech/encoders/lstm_encoder.py", line 121, in forward
conv_outputs = nn.utils.rnn.pack_padded_sequence(inputs.transpose(0, 1), input_lengths.cpu())
File "/home/net/.local/lib/python3.8/site-packages/torch/nn/utils/rnn.py", line 263, in pack_padded_sequence
_VF._pack_padded_sequence(input, lengths, batch_first)
RuntimeError: Expected len(lengths) to be equal to batch_size, but got 1 (batch_size=2)

@DevTae
Copy link

DevTae commented Oct 30, 2023

@youngchannelforyou #162 issue 에 해당 내용에 대하여 다룬 코드가 있습니다!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants