Skip to content

Developed a music generation deep learning model using WGAN-GP and self-attention, aimed at creating melodic compositions.

Notifications You must be signed in to change notification settings


Repository files navigation

From Bytes to Beats: Music Generation Using WGAN-GP and Self-Attention Mechanism

This repository contains the implementation of a music generation model using Wasserstein Generative Adversarial Networks with Gradient Penalty (WGAN-GP), enhanced with a self-attention mechanism designed to create melodic compositions. This project aims to explore the capabilities of GANs in the realm of music generation. Utilizing the Lakh Pianoroll Dataset, the model applies advanced techniques to generate high-quality and coherent musical sequences.

Model Architecture

Generator Network (GenConvNet)

  • Input: Random noise vector.
  • Layers:
    • Six transposed convolutional layers.
    • Batch normalization and PReLU activation.
    • Self-attention mechanism after the fourth transposed convolutional layer.
    • Final layer uses a sigmoid activation function.
  • Output: Generated music sample with dimensions matching the input data.

Discriminator Network (DiscConvNet)

  • Input: Music sample (either real or generated).
  • Layers:
    • Five convolutional layers with PReLU activation and batch normalization.
    • Self-attention mechanism after the third convolutional layer.
    • Dropout layers for regularization.
    • Final linear layer for classification.
  • Output: Scalar representing the authenticity of the input sample.

Self-Attention Module

  • Implemented as a separate SelfAttention class.
  • Utilizes query, key, and value convolutions.
  • Applies softmax for attention and a learnable parameter gamma for scaling.
  • Enhances the model's ability to focus on different parts of the input sequence.
  • Improves the coherence and quality of the generated music.


  • Utilizes Wasserstein distance for a more stable training of GANs.
  • Gradient penalty term added for enforcing the Lipschitz constraint.

Training Details

  • Wasserstein loss with gradient penalty for stable training.
  • Separate optimizers for generator and discriminator with Adam optimizer.
  • Step learning rate scheduler for both networks.


  • Lakh Pianoroll Dataset: A diverse collection of MIDI files, ideal for training music generation models.
  • Dataset details: Lakh Pianoroll Dataset


Developed a music generation deep learning model using WGAN-GP and self-attention, aimed at creating melodic compositions.







No releases published


No packages published