Audio nodes

Spaces includes a set of audio nodes that let you generate voiceovers, music, and sound effects — all from text descriptions. Use them to add narration, soundtracks, and ambient audio to your video workflows without recording or licensing anything.

Available audio nodes

Node	What it does
Voiceover	Converts text into natural-sounding speech using AI voices
Music Generator	Composes original music tracks from text descriptions
Sound Effects	Generates AI-powered foley and ambient sounds from text
Video Audio Mix	Combines multiple audio tracks with video

All audio nodes are new additions to Spaces.

Voiceover, Music Generator, and Sound Effects are being rolled out gradually and may not be available in your account yet.

Voiceover

The Voiceover node converts text into natural-sounding speech. Choose from hundreds of AI voices across multiple providers to narrate scripts, create dialogue, or produce voice content.

How to use the Voiceover node

Add the node
Add a Voiceover node to your Space.
Enter your script
Type or paste your script into the text field — or connect a Text node to the input port.
Select a model
Choose ElevenLabs v2, ElevenLabs v3, or Gemini 2.5 Pro.
Pick a voice
Click the voice chip to open the voice library — browse and preview voices before selecting one.
Adjust parameters
Set speed, stability, and similarity boost if needed.
Generate
Set the number of generations from 1 to 10 and run the node.

Voice parameters

Parameter	Range	What it does
Speed	0.7x to 1.2x	Speaking rate — lower is slower, higher is faster
Stability	0 to 1	Voice consistency — lower is more expressive, higher is more stable
Similarity Boost	0 to 1	How closely the output matches the selected voice

Input and output

The Voiceover node accepts a Text input — your script — and outputs Audio — the generated voiceover. You can type directly on the card or connect a Text node for dynamic scripts.

Use cases

Video narration — Write a script, generate a voiceover, then combine it with your video using Video Audio Mix.
Podcast creation — Generate voiceovers for individual segments and combine them into a full episode.
Character dialogue — Use multiple Voiceover nodes with different voices, then mix them together for conversation scenes.
Lip-sync video — Generate a voiceover, then connect the audio output to a lip-sync video model so your character speaks in sync.

Voiceover models

Three AI models are available for voice generation. Each has different strengths — pick the one that matches your project.

Model	Provider	Speed	Quality	Best for
ElevenLabs v2 Turbo	ElevenLabs	Fast	Good	Quick narration, batch processing
ElevenLabs v3	ElevenLabs	Moderate	High	Final production, emotional narration
Gemini 2.5 Pro	Google	Moderate	High	Multi-language content, conversational tone

ElevenLabs v2 is the fastest option. Use it when you need to generate many voiceovers quickly or iterate on scripts during production. It supports Speed, Stability, and Similarity Boost parameters.

ElevenLabs v3 delivers more natural prosody and intonation than v2 — pacing, emphasis, and emotional tone feel closer to a human read. Use it for final output when quality matters most. Same parameters as v2.

Gemini 2.5 Pro excels at multi-language content and conversational delivery. It supports Temperature, System Instruction, and Language selection. Multi-speaker configuration is possible for dialogue-style output.

Gemini 2.5 Pro is in gradual rollout and may not be available to all users yet.

Quick guide: which model to choose

You need...	Use this
Fast turnaround	ElevenLabs v2
Best audio quality	ElevenLabs v3
Multiple languages	Gemini 2.5 Pro
Emotional, expressive narration	ElevenLabs v3
Conversational or dialogue tone	Gemini 2.5 Pro
Batch processing many clips	ElevenLabs v2

Music Generator

The Music Generator node creates original music from text descriptions. Describe the mood, genre, tempo, and instruments — the AI composes a unique track.

How to use the Music Generator node

Add the node
Add a Music Generator node to your Space.
Describe your music
Write a description of the music you want in the prompt field — or connect a Text node to the input port.
Select a model
Choose Google Lyria or ElevenLabs Music.
Set the duration
Up to 30 seconds for Lyria, up to 10 seconds for ElevenLabs.
Generate
Set the number of generations from 1 to 10 and run the node.

Input and output

The Music Generator node accepts a Text input — your music description — and outputs Audio — the generated track.

Use cases

Video soundtrack — Describe a mood and generate a background track, then combine it with your video using Video Audio Mix.
Podcast intro music — Generate a short, branded intro with ElevenLabs Music.
Background music — Create lo-fi, ambient, or genre-specific loops for content.
Compare styles — Write several genre descriptions, generate them all, and pick the best fit for your project.

Music Generator models

Two AI models are available for music generation, each optimized for different use cases.

Model	Provider	Max duration	Best for
Google Lyria	Google	30 seconds	Background music, soundtracks, ambient, varied genres
ElevenLabs Music	ElevenLabs	10 seconds	Jingles, intros, sound logos, short loops

Google Lyria excels at longer compositions with natural musical structure. It handles a wide range of genres and can produce pieces with evolving arrangement — intros, builds, and transitions.

ElevenLabs Music is optimized for short, concentrated pieces. The output is clean and well-defined — ideal for branding elements, transitions, and loop-ready clips.

Quick guide: which model to choose

You need...	Use this
More than 10 seconds	Google Lyria
Short, punchy audio	ElevenLabs Music
Varied instrumentation	Google Lyria
Quick generation	ElevenLabs Music
Soundtrack or background music	Google Lyria
Jingle, intro, or sound logo	ElevenLabs Music

Sound Effects

The Sound Effects node generates AI-powered audio from text descriptions. Describe any sound — from rain on a tin roof to a spaceship engine humming — and the AI creates it. Use it to add atmosphere and foley to video projects.

How to use the Sound Effects node

Add the node
Add a Sound Effects node to your Space.
Describe the sound
Write a description in the prompt field — or connect a Text node to the input port.
Set the duration
Choose the desired length for the sound effect.
Enable Loop if needed
Turn on Loop for sounds that need to play continuously, like ambient or background audio.
Generate
Set the number of generations from 1 to 10 and run the node.

Input and output

The Sound Effects node accepts a Text input — your sound description — and outputs Audio — the generated effect.

Use cases

Video foley — Describe scene sounds, generate them, and layer them onto your video with Video Audio Mix.
Ambient loops — Create continuous background audio like coffee shop ambiance or rain, with Loop enabled.
Podcast intros — Create unique audio branding with dramatic stings or transition sounds.
Game audio — Generate UI sounds, environmental ambience, or action effects for game prototyping.

Video Audio Mix

The Video Audio Mix node combines multiple audio tracks with a video. Use it as the final step in your audio workflow — connect your voiceover, music, and sound effects, then mix them with your generated or uploaded video.

Typical audio workflow

A common pattern for producing narrated video with a soundtrack in Spaces:

Write your script
Use a Text node or type directly into the Voiceover node.
Generate voiceover
The Voiceover node converts your script to natural speech.
Add music
The Music Generator creates a background track from a mood description.
Layer sound effects
The Sound Effects node generates ambient audio or foley.
Mix everything
Connect all audio outputs and your video to a Video Audio Mix node.

You can connect as many audio sources as you need before combining them with the final video.

Prompting tips

Good prompts lead to better audio. Here is what works for each node.

Voiceover

Your voiceover prompt is the script itself — write it exactly as you want it spoken. Keep sentences natural and conversational. If you need specific pacing or emphasis, choose the right model and adjust the voice parameters rather than trying to encode delivery instructions in the text.

Music Generator

Include genre, mood, tempo, and instruments for the most control.

Cinematic orchestral piece, dramatic, building tension, strings and brass, 120 BPM

Acoustic folk guitar, warm and nostalgic, fingerpicking style, 90 BPM

For ElevenLabs Music — shorter pieces — keep descriptions focused on a single mood or purpose.

Upbeat electronic jingle, happy, synth lead

Dark ambient drone, eerie, low frequency hum

Sound Effects

Be descriptive and specific. The more detail you include, the closer the result will match what you hear in your head.

Heavy rain on a tin roof with distant thunder — works better than just rain

Busy coffee shop ambiance with quiet conversation and espresso machine — works better than cafe sounds

Tips and best practices

Preview voices before committing. The voice library includes sample playback for every voice — listen before you generate.

Lower stability for expression. If your voiceover sounds too flat or robotic, try reducing the Stability parameter. This adds more variation and emotional range to the delivery.

Generate multiple variations. Audio generation — especially music and sound effects — is highly variable. Generate several versions of the same prompt and pick the best one.

Use Loop for ambient sounds. Enable the Loop toggle on the Sound Effects node when you need continuous background audio like rain, traffic, or office ambiance.

Draft with v2, finish with v3. Use ElevenLabs v2 for fast iteration on scripts, then switch to v3 for the final voiceover when quality matters.

Be specific about tempo in music prompts. Adding a BPM value like 120 BPM gives the AI a concrete target and produces more consistent results.

Use multiple Voiceover nodes for dialogue. Add several Voiceover nodes with different voices to create conversation scenes, then combine them with Video Audio Mix.

Combine Sound Effects for rich soundscapes. Layer multiple Sound Effects nodes — for example, rain plus distant traffic plus indoor echo — and mix them together for depth.

Audio nodes

Audio nodes

In this article

Available audio nodes

Voiceover

How to use the Voiceover node

Voice parameters

Input and output

Use cases

Voiceover models

Quick guide: which model to choose

Music Generator

How to use the Music Generator node

Input and output

Use cases

Music Generator models

Quick guide: which model to choose

Sound Effects

How to use the Sound Effects node

Input and output

Use cases

Video Audio Mix

Typical audio workflow

Prompting tips

Voiceover

Music Generator

Sound Effects

Tips and best practices