-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speech Translation using Coqui.ai on Intel Arc GPU 770 takes 23 seconds compared to CPU [3 sec] why ? #620
Comments
what's your TTS version? |
These are my time taken, with "wavs = synthesizer.tts(text, speaker_name=speaker_name)" repeatly several times.
You can do warm up for several times. Version:
|
@feng-intel : Please can you tell me your code how you ran this : |
text = "Alright, listen up. Tyres are still a bit cold, but they're getting there. Keep the pace steady and focus on getting them up to temp. We need those pressures closer to 30 psi, so keep an eye on that. Once the tyres are ready, we'll be good to go. Now get out there and give it everything you've got" ['Alright, listen up.', "Tyres are still a bit cold, but they're getting there.", 'Keep the pace steady and focus on getting them up to temp.', 'We need those pressures closer to 30 psi, so keep an eye on that.', "Once the tyres are ready, we'll be good to go.", "Now get out there and give it everything you've got"]
(tts) spandey2@imu-nex-nuc13x2-arc770-dut:~/tts$ pip list | grep torch |
You can follow this page to install latest intel extension for pytorch. This is my version
|
pip install torchaudio==2.1.0.post2+cxx11.abi --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ |
(tts) spandey2@imu-nex-nuc13x2-arc770-dut: |
This is from Intel employee. Let's talk internally for more info. |
Describe the issue
I am trying to run Synthesizing speech by TTS:
https://docs.coqui.ai/en/latest/
I have managed to run the below TTS code on XPU, but it takes 23 seconds , seriously, it takes 3 second on CPU :
Please can we check what issue on XPU:
(tts) spandey2@imu-nex-nuc13x2-arc770-dut:~/tts$ pip list | grep torch
intel-extension-for-pytorch 2.1.10+xpu
torch 2.1.0a0+cxx11.abi
torchaudio 2.1.0a0+cxx11.abi
torchvision 0.16.0a0+cxx11.abi
`(tts) spandey2@imu-nex-nuc13x2-arc770-dut:~/tts$ cat andrej_code_tts.py
In case of proxies, remove .intel.com from no proxies:
import os
IPEX
#import subprocess
#subprocess.run(["python", "-m", "pip", "install", "torch==2.1.0.post2", "torchvision==0.16.0.post2", "torchaudio==2.1.0.post2",
"intel-extension-for-pytorch==2.1.30+xpu", "oneccl_bind_pt==2.1.300+xpu",
"--extra-index-url", "https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"])
TTS dependency. Do it on TERMINAL:
sudo apt install espeak-ng
let's check also python can see the device
import torch
import intel_extension_for_pytorch as ipex
print(torch.version)
print(ipex.version)
[print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())]
from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer
#from IPython.display import Audio
import numpy as np
import soundfile as sf
model_manager = ModelManager()
model_path, config_path, model_item = model_manager.download_model("tts_models/en/vctk/vits")
synthesizer = Synthesizer(model_path, config_path, use_cuda=False)
Move the model to GPU and optimize it using IPEX
synthesizer.tts_model.to('xpu')
synthesizer.tts_model.eval() # Set the model to evaluation mode for inference
synthesizer.tts_model = ipex.optimize(synthesizer.tts_model, dtype=torch.float32)
speaker_manager = synthesizer.tts_model.speaker_manager
speaker_names = list(speaker_manager.name_to_id.keys())
print("Available speaker names:", speaker_names)
speaker_name = "p229" # Replace with the actual speaker name you want to use
text = "Your last lap time was 117.547 seconds. That's a bit slower than your best, but you're still doing well. Keep pushing, a really good lap is around 100 seconds. You've got this, let's keep improving."
Move input data to GPU and run inference with autocast for potential mixed precision
with torch.no_grad(), torch.xpu.amp.autocast(enabled=False):
wavs = synthesizer.tts(text, speaker_name=speaker_name)
if isinstance(wavs, list):
# Convert each NumPy array or scalar in the list to a PyTorch tensor
tensor_list = [torch.tensor(wav, dtype=torch.float32).unsqueeze(0) if np.isscalar(wav) else torch.tensor(wav, dtype=torch.float32) for wav in wavs]
# Concatenate the tensor list into a single tensor
wav_concatenated = torch.cat(tensor_list, dim=0)
else:
# If 'wavs' is already a tensor, use it directly
wav_concatenated = wavs
#Move the tensor to CPU and convert to NumPy array
wav_concatenated = wav_concatenated.cpu().numpy()
#Save the output to a WAV file
output_path = "output_vctk_vits.wav"
sf.write(output_path, wav_concatenated, synthesizer.tts_config.audio['sample_rate'])
`
The text was updated successfully, but these errors were encountered: