Bark: Text-to-Speech AI Voice Cloning & Text-Prompted Generative Audio

Bark: Text-to-Speech AI Voice Cloning & Text-Prompted Generative Audio

Bark is a revolutionary text-to-audio model created by Suno, based on the GPT-style models, which can generate highly realistic, multilingual speech as well as other audio — including music, background noise, and simple sound effects.

With Bark, users can also produce nonverbal communications like laughing, sighing, and crying, making it a versatile tool for a variety of applications.

Bark is a cutting-edge text-to-speech (TTS) technology that has taken the AI world by storm. Unlike the typical TTS engines that sound robotic and mechanic, Bark offers human-like voices that are highly realistic and natural sounding.

Bark uses GPT-style models to generate speech with minimal tweaking, producing highly expressive and emotive voices that can capture nuances such as tone, pitch, and rhythm. It offers a fantastic experience that can leave you wondering if you’re listening to human beings.

Notably, Bark supports multiple languages and can generate speech in Mandarin, French, Italian, Spanish, and other languages with impressive clarity and accuracy. With Bark, you can easily switch between languages and still enjoy high-quality sound effects.

Bark is not only intelligent but also intuitive, making it an ideal tool for individuals and businesses looking to create high-quality voice content for their platforms. Whether you’re looking to create podcasts, audiobooks, video game sounds, or any other form of voice content, Bark has you covered.

So, if you’re looking for a revolutionary text-to-speech technology that can elevate your voice content, Bark is the way to go!

The new SERP AI Bark fine-tune:

Multilingual Support

Bark supports various languages out-of-the-box and automatically determines the language from input text. This means that when prompted with code-switched text, Bark will attempt to employ the native accent for the respective languages. While English quality is currently the best, other languages are expected to further improve with scaling.

Music Generation

Bark can generate all types of audio, including music. In principle, Bark does not see a difference between speech and music. However, sometimes Bark chooses to generate text as music. To help it out, users can add music notes around their lyrics.

Voice/Audio Cloning

Bark has the capability to fully clone voices, including tone, pitch, emotion, and prosody. The model also attempts to preserve music, ambient noise, etc., from input audio.

To mitigate the misuse of this technology, audio history prompts are limited to a set of Suno-provided, fully synthetic options to choose from for each language.

Speaker Prompts

Users can provide certain speaker prompts such as NARRATOR, MAN, WOMAN, etc. However, these prompts are not always respected, especially if a conflicting audio history prompt is given.

Hardware and Inference Speed

Bark has been tested and works on both CPU and GPU (PyTorch 2.0+, CUDA 11.7, and CUDA 12.0). Running Bark requires running >100M parameter transformer models. On modern GPUs and PyTorch nightly, Bark can generate audio in roughly real-time. On older GPUs, default colab, or CPU, inference time might be 10–100x slower.

Details

Bark uses GPT-style models to generate audio from scratch, but the initial text prompt is embedded into high-level semantic tokens without the use of phonemes. This allows Bark to generalize to arbitrary instructions beyond speech that occur in the training data, such as music lyrics, sound effects, or other non-speech sounds.

A subsequent second model is used to convert the generated semantic tokens into audio codec tokens to generate the full waveform. To enable the community to use Bark via public code, EnCodec codec from Facebook is used to act as an audio representation.

Bark Examples

from bark import SAMPLE_RATE, generate_audio
from IPython.display import Audio

text_prompt = """
Hello, my name is Suno. And, uh - and I like pizza. [laughs]
But I also have other interests such as playing tic tac toe.
"""
audio_array = generate_audio(text_prompt)
Audio(audio_array, rate=SAMPLE_RATE)

Foreign Language

Bark supports various languages out-of-the-box and automatically determines language from input text. When prompted with code-switched text, Bark will attempt to employ the native accent for the respective languages. English quality is best for the time being, and we expect other languages to further improve with scaling.

text_prompt = """
Buenos días Miguel. Tu colega piensa que tu alemán es extremadamente malo.
But I suppose your english isn't terrible.
"""
audio_array = generate_audio(text_prompt)

Music

Bark can generate all types of audio, and, in principle, doesn’t see a difference between speech and music. Sometimes Bark chooses to generate text as music, but you can help it out by adding music notes around your lyrics.

text_prompt = """
♪ In the jungle, the mighty jungle, the lion barks tonight ♪
"""
audio_array = generate_audio(text_prompt)

![](230684766–97f5ea23-ad99–473c-924b-66b6fab24289.webm)

Voice Presets and Voice/Audio Cloning

Bark has the capability to fully clone voices — including tone, pitch, emotion and prosody. The model also attempts to preserve music, ambient noise, etc. from input audio. However, to mitigate misuse of this technology, we limit the audio history prompts to a limited set of Suno-provided, fully synthetic options to choose from for each language. Specify following the pattern: `{lang_code}_speaker_{0–9}`.

text_prompt = """
I have a silky smooth voice, and today I will tell you about
the exercise regimen of the common sloth.
"""
audio_array = generate_audio(text_prompt, history_prompt="en_speaker_1")

![](230684883-a344c619-a560–4ff5–8b99-b4463a34487b.webm)

Speaker Prompts

You can provide certain speaker prompts such as NARRATOR, MAN, WOMAN, etc. Please note that these are not always respected, especially if a conflicting audio history prompt is given.

```python

text_prompt = “””
WOMAN: I would like an oatmilk latte please.
MAN: Wow, that’s expensive!
“””
audio_array = generate_audio(text_prompt)

```

![](230684864–12d101a1-a726–471d-9d56-d18b108efcb8.webm)

Hardware and Inference Speed

Bark has been tested and works on both CPU and GPU (`pytorch 2.0+`, CUDA 11.7 and CUDA 12.0). Running Bark requires running >100M parameter transformer models. On modern GPUs and PyTorch nightly, Bark can generate audio in roughly realtime. On older GPUs, default colab, or CPU, inference time might be 10–100x slower.

If you don’t have new hardware available or if you want to play with bigger versions of our models, you can also sign up for early access to our model playground [here](https://3os84zs17th.typeform.com/suno-studio).

## [](https://github.com/suno-ai/bark#%EF%B8%8F-details)

⚙️ Details

Similar to [Vall-E](https://arxiv.org/abs/2301.02111) and some other amazing work in the field, Bark uses GPT-style models to generate audio from scratch. Different from Vall-E, the initial text prompt is embedded into high-level semantic tokens without the use of phonemes.

It can therefore generalize to arbitrary instructions beyond speech that occur in the training data, such as music lyrics, sound effects or other non-speech sounds. A

subsequent second model is used to convert the generated semantic tokens into audio codec tokens to generate the full waveform. To enable the community to use Bark via public code we used the fantastic [EnCodec codec](https://github.com/facebookresearch/encodec) from Facebook to act as an audio representation.

Below is a list of some known non-speech sounds, but we are finding more every day.

- `[laughter]`
- `[laughs]`
- `[sighs]`
- `[music]`
- `[gasps]`
- `[clears throat]`
- ` — ` or `…` for hesitations
- `♪` for song lyrics
- capitalization for emphasis of a word
- `MAN/WOMAN:` for bias towards speaker

Supported Languages

| Language | Status |
| — — — — — — — — — — — — -| — — — — — — -|
| English (en) | ✅ |
| German (de) | ✅ |
| Spanish (es) | ✅ |
| French (fr) | ✅ |
| Hindi (hi) | ✅ |
| Italian (it) | ✅ |
| Japanese (ja) | ✅ |
| Korean (ko) | ✅ |
| Polish (pl) | ✅ |
| Portuguese (pt) | ✅ |
| Russian (ru) | ✅ |
| Turkish (tr) | ✅ |
| Chinese, simplified (zh)| ✅ |
| Arabic | Coming soon!|
| Bengali | Coming soon!|
| Telugu | Coming soon!|

BARK Release

We’ve got some exciting news for you! Remember Bark, the new Text2Speech model was released recently? 🐶🔊

Well, guess what? We’ve managed to reverse engineer it! 🕵️‍♂️🔧

Introducing Bark: Text2Speech Voice Cloning 🐶

We know that Bark’s creators restricted voice cloning and added “allowed prompts” for safety reasons.

But we believe in freedom and creativity! 🌟

✊ So, we’ve cracked open the code and removed those pesky limitations! 🚫🔓

Now introducing: Bark Un-leashed! 🎉🐾

A set of easy-to-use Jupyter notebooks that’ll have you cloning audio with just 5–10 second samples of audio/text pairs in no time! 🎙️📝

Get ready to revolutionize your audio game with Bark Unleashed! 🚀

🎧 Just follow our simple instructions and let your imagination run wild! 🌈🧠

Happy cloning, folks! 🤩🔊

PS: Would you mind hitting that upvote button real quick? [ ^ ]

Now get BARKin!

🐶 Bark — Text2Speech Voice Cloning

🤖 Introducing Bark: Text2Speech Voice Cloning 🐶

We’ve got some exciting news for you! Remember Bark, the new Text2Speech model was released recently? 🐶🔊

Well, guess what? We’ve managed to reverse engineer it!

🕵️‍♂️🔧We know that Bark’s creators restricted voice cloning and added “allowed prompts…

Join our [Discord](https://serp.ly/@serpai/discord) to get help setting yours up!

How to Use Bark AI (Tutorials):

More awesome stuff:

🎁 SEO, Digital Marketing Resources: https://serp.ly/@devin/stuff
💌 SEO, Digital Marketing Insider Info: @ https://serp.ly/@devin/email

🎁 Artificial Intelligence Tools & Resources: https://serp.ly/@serpai/stuff
💌 Artificial Intelligence Insider Info: @ https://serp.ly/@serpai/email

👨‍👩‍👧‍👦 Join the SERP Community: https://serp.ly/@serp/discord
🔰 Join the AI Alliance: https://serp.ly/@serpai/badge-generators/alliance