Skip to content

XiplusChenyu/Vocal-Track-Extraction-Deep-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vocal Track Extraction

Author: Chenyu Xi (cx2219)

Introduction

There are four models in this project: Deep Clustering Model, Hybrid Deep Clustering Model, U-net Model and UH-net Model. Models are trained on DSD100 dataset. The project is based on PyTorch.

Scripts

  • Data preprocess:

    • Build_Dataset.ipynb: generate dataset from DSD100
    • config.py: define project-level parameters
    • data_loader.py: define torch loader
    • mel_dealer.py: convert music file to melspectrogram and convert spectrogram back
  • Model defination:

    • unet_model.py: define U-net Model and UH-net Model
    • cluster_model.py: define Deep Clustering Model
    • hybrid_model.py: define Hybrid Deep Clustering Model
  • Model training:

    • utils.py: define loss functions
    • unet_train.py: train functions for u-net / uh-net model
    • hd_train.py: train functions for hybrid deep clustering model
    • dc_train.py: train functions for deep clustering model
    • train_dc.ipynb, train_hybrid.ipynb and train_unet.ipynb: train models
  • Model evaluation:

    • evaluation.py: define evaluation functions
    • music_decoder.py: retrieve audio file from model outputs

Current Sample Outputs

Audios

Original Music ( Vocal Track)
==> Hybrid Deep Clustering Model
==> U-net Model
==> UH-net Model

Masks

  • Masked Power Spectrograms:

  • Generated Masks:

Releases

No releases published

Packages

No packages published