Add Sound To Videos with MMAudio! - ComfyUI Workflow & Windows One Click Install
Add sounds to your videos using the MMAudio Video to Audio Synthesis models in ComfyUI. MMAudio models analyze video content and generate high-quality, synchronized sound effects, ambient noise, or natural audio that match the scene—enabling effortless audio creation for cinematic, creative, or narrative projects, without manual audio editing.
Preloaded Models within the Installer (Low VRAM)
apple_DFN5B-CLIP-ViT-H-14-384_fp16.safetensors (ComfyUI\models\mmaudio)
Downloaded from: Kijai/MMAudio_safetensorsmmaudio_large_44k_v2_fp16.safetensors (ComfyUI\models\mmaudio)
Downloaded from: Kijai/MMAudio_safetensorsmmaudio_synchformer_fp16.safetensors (ComfyUI\models\mmaudio)
Downloaded from: Kijai/MMAudio_safetensorsmmaudio_vae_44k_fp16.safetensors (ComfyUI\models\mmaudio)
Downloaded from: Kijai/MMAudio_safetensors
The standard MMAudio diffusion models (FP32) are not included in the installer. These higher-fidelity models are available via the Kijai Hugging Face repository and can be manually placed in your ComfyUI/models/diffusion_models folder:
Kijai FP32 Diffusion Models
Speed
5-second videos typically generate audio in about 2–3 minutes using an RTX 4050 with 6GB VRAM. Faster GPUs can further reduce processing time.
System Requirements
Nvidia RTX 30XX, 40XX, or 50XX series GPU (FP16 support required; GTX 10XX/20XX not tested)
CUDA-compatible GPU with at least 6GB VRAM
Windows OS
At least 30GB free storage
What’s Included
Portable ComfyUI Windows Installer, fully pre-configured for MMAudio Audio Generation
Custom workflow for sound-to-video synthesis
Automatic downloads for all required nodes and models
Usage Notes
VERY IMPORTANT NOTE:
IF YOU ENCOUNTER ISSUES WHERE NODES STILL SHOW UP RED IN THE WORKFLOW, OR IF SOME LIBRARIES (LIKE FTFY) FAIL TO IMPORT PROPERLY, DO NOT JUST RESTART COMFYUI NORMALLY. INSTEAD, OPEN THE COMFYUI MANAGER USING THE ICON IN THE TOP RIGHT CORNER OF THE INTERFACE, AND RESTART COMFYUI THROUGH THE MANAGER. THIS SPECIFIC RESTART MECHANISM IS ESSENTIAL—A STANDARD RESTART DOES NOT CLEAR THESE ISSUES AND ONLY THE MANAGER RESTART WILL ENSURE PROPER LIBRARY IMPORT AND NODE INITIALIZATION.
Upload a reference video using the "Load Video" node.
Describe the sound you want in the prompt field; MMAudio will generate audio that matches the video content and your description (e.g., car engine noise, ocean waves, city ambience).
You may need to tweak your prompt and experiment with different settings to achieve the desired result.
No speech synthesis; works best for sound effects and ambient/nonverbal audio.

