Recently, I came across a YouTube video featuring voice replacement. Although the replacement effect is not very good., the separation of voice effect is still good, so I decided to save it. Since this is a GUI-based program, I can run this in docker via VNC。
About running GPU acceleration in NVIDIA Docker, after multiple attempts, I found that using Nvidia docker nvcr.io/nvidia/pytorch 22.12-py3 is working, while 23.08-py3 does not work. I haven’t tried other versions. In addition to running the GPU acceleration, you also need to install a desktop environment and VNC server in Docker. I will explain this later.
Basic Environment Setup
Some basic environment (such as anaconda and shared scripts) have already been set up in【 Shared Operations 】 article, and you can refer to it to ensure that all instructions work correctly.
Create conda environment
Since the dependencies of each project are different, a separate environment will be created for each case here.
1 2 |
conda create -n uvr python=3.8 conda activate uvr |
UVR – Ultimate Vocal Remover
UVR The project provides the function to remove the human voice and other background sounds, which is the main part of this article to be described.
Install the following package.
1 2 3 4 5 |
git clone https://github.com/Anjok07/ultimatevocalremovergui cd ultimatevocalremovergui # in requirements.txt, remove DORA if error occur pip3 install -r requirements.txt echo "conda activate uvr" > env.sh |
Start UVR and model download
Enter GUI environment and start UVR program
1 |
python UVR.py |
Click the “Start Processing” left-side icon to start downloading the model.First, download the VR model
Next, download the Demusc model
Speech separation
Afterward, go back to the main screen, select the music file you want to separate, and select the model.
Choose the input file and output directory. Choose the model DEMUCS, method is UVR_MODEL_1, open the GPU acceleration, and then you can start the speech separation process. The separation will generate two files, one for the instrument and one for the voice.
Further separation
Next, you need to remove some echoes from the voice part. Use the following figure and method.
Installation inside Docker
Here is the construction process of the environment in the NVIDIA Docker image “nvcr.io/nvidia/pytorch:22.12-py3”, with the same installation steps as before. It should be noted that you need to use tigervnc-standalone-server and not tightvnc because it does not support xrandr.
1 2 3 4 5 |
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -it -p 5901:5901 nvcr.io/nvidia/pytorch:22.12-py3 apt-get update apt-get install xfce4 xfce4-goodies xorg dbus-x11 x11-xserver-utils tigervnc-standalone-server net-tools vim firefox sudo python3-tk mkdir ~/.vnc vim ~/.vnc/xstartup |
In ~/.vnc/xstartup, enter the following content
1 2 3 4 5 6 7 8 |
#!/bin/sh unset SESSION_MANAGER unset DBUS_SESSION_BUS_ADDRESS startxfce4 & [ -x /etc/vnc/xstartup ] && exec /etc/vnc/xstartup [ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources xsetroot -solid grey |
Then run the command to start the VNC Server
1 2 |
chmod a+x ~/.vnc/xstartup vncserver :1 -localhost no |
Thus, you can use your VNC client to connect to the docker.
Conclusion
This is the prerequisite part of the voice replacement process, where the voices are separated first. Since the results of the voice replacement are not very good, it may not be necessary to explain it in detail.