RVC's Realtime AI Voice Changer - Is It Any Good?

121,382

2,764 0

Published 2024-03-03

Today, you will learn how to use RVC's free AI Voice Changer - FREE & Realtime! Transform your voice into your favorite YouTuber, VTuber, Anime Character, and more! We'll also talk about if this is better than W-Okada's voice changer or if you should just stick with that one.

Go to this link to install the Voice Changer:
github.com/RVC-Project/Retrieval-based-Voice-Conve…

How to get it to work with Discord and other apps:
   • How to Change Your Voice in Realtime ...

How to find your own models:
   • Where To Find AI Voice Models? (RIP A...

How to train your own model:
   • Video

W-Okada's Voice Changer:
   • How to Sound Like an Anime Girl With ...

Search the most complete list of AI Tools, also available in 中文, español, 日本語:
ai-search.io/

DISCLAIMER:
Please do not use these models for malicious, harmful, or deceitful things. Please use them to have fun and experience this new technological age.

~~~~~~~~~~~~Timecodes~~~~~~~~~~~~
Intro - 0:00
Installation Tutorial - 0:23
Using the Software - 3:42
Is it better than W-Okada? - 9:44
Wrapping up - 10:59
~~~~~~~~~~~~Timecodes~~~~~~~~~~~~

Here's our equipment, in case you're wondering:

GPU: RTX 4080 amzn.to/3OCOJ8e
Secondary GPU: GTX 1080 (too old, would not recommend)
Mic: Shure SM7B amzn.to/3DErjt1
Secondary mic: Maono PD400x amzn.to/3Klhwvu
CPU: i9 11900K amzn.to/3KmYs0b

If you found this helpful, consider supporting me here. Hopefully I can turn this from a side-hustle into a full-time thing!
ko-fi.com/aisearch

All Comments (21)

@MarioManTV 4 months ago

I've been experimenting with this for a bit, and I'm disappointed by how vague and incomplete the English documentation on these settings is. In an effort to remedy this, here's my breakdown of each setting: Response threshold: Controls the noise gate. Any sound below the threshold is suppressed. This is used to prevent background noise and hiss from being turned into strange mumbling. Equivalent to "S. Threshold" in w-okada. Not applicable in RVC WebUI. Pitch settings: Applies a pitch offset to your input voice. Every multiple of 12 setting increases or decreases the voice by an octave. Adjustments by 1 increase or decrease by a semitone. Using whole octaves is primarily used to ensure you can sing in the same key. Equivalent to "TUNE" in w-okada. Equivalent to "Transpose" in RVC WebUI. Index rate: When an index file is provided, this slider augments the target voice by preserving more of its accent and less of the input voice (to reduce tone leakage). This is particularly useful for voices trained with a low epoch count (around 200-ish or less). If set too high, it can cause strange pronunciation artifacts. I usually find something around 0.30 to sound good, but it varies by voice model. Equivalent to "INDEX" in w-okada. Equivalent to "Search feature ratio" in RVC WebUI. Loudness factor: How little to preserve the loudness of the input performance. At 0, the loudness of the cloned voice should match the loudness of the input voice. At 1, the cloned voice will always be at full loudness. 0 is useful if you want to distinguish between whispers, talking, screaming, etc. 1 is useful to have the cloned voice always speak loudly and clearly, as loud as the loudest things it was trained on (which can have artifacts such as mic clipping depending on the training set). Values in-between provide partial volume control biased toward being louder, the closer you get to 1. There is no equivalent in w-okada. Equivalent to "volume envelope scaling" in RVC WebUI. Pitch detection algorithm: Different algorithms are better at different things. rmvpe is the current state-of-the-art and works fastest and usually with the highest quality. Equivalent to "F0 Det." in w-okada. Equivalent to "pitch extraction algorithm" in RVC WebUI. Sample length: The realtime voice changer works by sending small chunks of audio for quick conversion, then stitching them together. Longer sample lengths feed in longer chunks, making the stitches less obvious and reducing GPU requirements but increasing output latency. On a low end GPU, setting this too low will make the GPU unable to keep up and produces stutters. On a high end GPU, setting this too low will cause warbling as an artifact of stitching many overly-short chunks together. Equivalent to "CHUNK" in w-okada. Not applicable in RVC WebUI. Number of CPUs: Self explanatory. Note, however, that rmvpe is a GPU-based pitch extractor and should be relatively unaffected by this setting. There is no equivalent in w-okada. Not applicable in RVC WebUI. Fade length: The length between chunks to crossfade together. Longer may reduce warbling. Equivalent to "overlap" in w-okada advanced settings. Not applicable in RVC WebUI. Extra inference time: How much old audio to load into each chunk. The extra context usually improves voice quality for the generated chunk but is more demanding for the GPU. Equivalent to "EXTRA" in w-okada. Not applicable in RVC WebUI. Input noise reduction: Attempts to remove non-speech background noise from the input to prevent sounds from being turned into strange mumbling. Equivalent to "NOISE" in w-okada. Not applicable in RVC WebUI. Output noise reduction: Applies the same noise reduction to the output voice. Possibly good for poorly trained voices with lots of background noise. There is no equivalent in w-okada, but the usefulness of this setting is dubious. Not applicable in RVC WebUI. Input voice monitor: Lets you hear the voice audio being passed in to the voice changer, sent to the target output device. Useful to ensure you are passing in the audio you actually want or to passthrough your audio without voice changing. Comparable to "monitor" settings in w-okada. Not applicable in RVC WebUI. Output converted voice: Outputs the voice conversion to the target output device. Main features RVC realtime has that w-okoda doesn't: Loudness factor controls. W-okoda seems to always use a value of 0. Significantly lower CPU usage at equivalent performance settings, in my experience. Main features that w-okoda has that RVC realtime doesn't: No system to save model presets. Input/output gain is missing. Input noise reduction is less robust compared to w-okoda, which offers echo reduction and multiple noise suppression techniques. Unlike w-okoda, you cannot passthrough to the input mic, instead requiring the use of virtual audio cable to pass the cloned voice into voice calls and microphone recording programs. In w-okoda, when the mic loudness falls below the response threshold, the tool is paused until speech is once again loud enough, saving GPU and CPU resources. RVC realtime always passes audio whenever it is running. Unlike w-okoda, you cannot monitor the cloned voice while outputting it. You can work around this by using the "listen" feature in the Windows sounds panel on a virtual audio cable instead. No built-in recording functionality. Missing most of the settings in the w-okoda "advanced settings" menu. No way to choose which GPU to run the voice model on. You can get around this by setting CUDA_VISIBLE_DEVICES=# in a terminal before launching the tool from there, where # is the index of your target GPU (0, 1, 2, etc.).
@Hollarite 4 months ago

these ai voices are scarily accurate, even the markiplier one
@DominicFlynn 4 months ago

you can use it in a real time environment like Zoom or Teams. You just need to delay the video the same length as what the AI voice is delayed. Use OBS for that.
@user-fx6ws4uc2h 5 months ago

hey can anyone help me? , when i try to load, it s just showing terminal, with two lines , and gui is not opening
@adad1817 21 days ago

it just can't work (No response / stopped working when I run it) any ideas of what could I have been possibly done wrong? The code can't even run, it just show the input and output device and then 'cuda_is_available: True' Then no more
@The_Spooky_Boi 5 months ago

I rlly do appreciate the effort u put in ur content i rlly do
@rjay3073 1 month ago

Your content can still be easily understood by a non-native English speaker. Thank u <3.🥰
@jimbarrofficial 3 months ago

Question, how do you convert mp3 or wav files to the correct format to use other clones?
@liangzx 5 months ago

Can you recommend a good text to speech AI that i can change the voice using RVC?
@TheDkmariolink 3 months ago

Anyone else having issues with this particular application and working in games? It seems like it does not work in the background?
@valermo3471 4 months ago

Kinda waiting and hoping for ElvenLabs to release a voice changer one day.
@iconsumepizza 2 months ago

it says 'runtime\python.exe' is not recognized as an internal or external command, operable program or batch file.
@aboodymk5179 5 months ago

Thank you for your efforts❤ Could you clarify if the real-time live voice changer can be connected to platforms like TikTok Live or Messenger calls, apart from Discord?"
@le-pink-boi 2 months ago

it say AttributeError: 'RVC' object has no attribute 'tgt_sr' and crashes
@Heldn100 4 months ago

thanks this was so helpful. why training our own model video is deleted ?
@MrGridStrom 2 months ago

I tried on both Linux and Windows, both of the latest releases are not working on ether OS. Ether ive missed something, or there is something i need to do, but don't understand.
@lucodk 6 days ago

when i turn it on i can only hear it on one side of my headphones and it sounds really deepfried😞
@0rrchids 5 months ago

How to uninstall RVC or Okada software if we want to update? is it okay to just delete the folder?
@CharmandrigoGG 21 days ago

Does this create a virtual microphone driver to use with other apps like vrchat?
@smilingchihuahua. 2 months ago

What Graphic card is recommended for Rvc voice changer that wont be choppy.