-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect transformer mask size #2344
Comments
Hello @egaznep, thanks for opening this issue! Could you please have a look @TParcollet? Thanks :) |
Hello @egaznep I am not sure to understand the issue here. Could you provide a code snippet showing explicitly the error? The function length_to_mask() is expected to provide masks containing padding i.e. real size of the input tensor, and this for each sequence. Could you detail a bit more what you are trying to achieve? |
@TParcollet Here is a minimal working (or in this case crashing) example: import torch
import torch.nn as nn
from speechbrain.lobes.models.transformer.TransformerASR import TransformerASR
import random
# Instantiate the TransformerASR model
model = TransformerASR(
tgt_vocab=720,
input_size=80,
d_model=512,
nhead=1,
num_encoder_layers=1,
num_decoder_layers=1,
)
# Generate some dummy input with different lengths
input_lengths = torch.tensor([l for l in range(10,101,10)])
input_data = [torch.randn(length,80) for length in input_lengths]
input_targets = [torch.randint(low=0, high=720, size=(length.item(),)) for length in input_lengths]
# Pad the input sequences to have the same length
input_data = nn.utils.rnn.pad_sequence(input_data, batch_first=True)
input_targets = nn.utils.rnn.pad_sequence(input_targets, batch_first=True)
input_lengths = input_lengths/100.0
print(input_data.shape, input_targets.shape, input_lengths.shape)
output = model.forward(input_data, input_targets, wav_len=input_lengths) # works
output = model.forward(input_data[:-1], input_targets[:-1], wav_len=input_lengths[:-1]) # fails First call to the model.forward in line 29 has the following Second call to the model.forward in line 30, however, has the following |
Hello thanks. SpeechBrain padding is relative to the batch, not the dataset. The max len of wav_lens is the max len of the batch. |
I had this error while training a model using |
Hello @egaznep, any news on your side please ? |
I was swamped with some projects until now, and I'm out of office this week. I will try to reproduce when I am back, but I guess it's more likely an issue with that specific project and not really related with Speechbrain internals. Thank you for reminding me. |
Describe the bug
PyTorch
multi_head_attention
code enforces that data size and mask size matches.https://github.com/pytorch/pytorch/blob/df4e3d9d08f3d5d5439c3626be4bf29659488cdf/torch/nn/functional.py#L5442-L5444
However,
speechbrain
code generates masks according to the longestwav_len
, which can be shorter than the batch size with padding, resulting in the exception given in the codeblock above.speechbrain/speechbrain/lobes/models/transformer/TransformerASR.py
Line 146 in f6e297e
Solution:
The function
length_to_mask
accepts an optional argumentmax_len
, this could be used. Should I open a PR?speechbrain/speechbrain/dataio/dataio.py
Lines 758 to 772 in f6e297e
Expected behaviour
The mask should have been generated according to what PyTorch anticipates. Instead, an exception is thrown.
To Reproduce
No response
Environment Details
Relevant Log Output
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: