MP3 to MP4 — Combine Audio + Cover Art into Video
Drop any audio file and an optional cover image to create an MP4 video — ready for YouTube, Instagram, TikTok, and podcasts
How to Create an MP4 from Audio + Image
Drag any audio file onto the first drop zone — MP3, WAV, FLAC, AAC, OGG, M4A, or OPUS. Files stay on your device, nothing is uploaded.
Drop one or more images. Multiple images play in order, each shown for equal time — like a slideshow synced to your audio.
Pick 720p (YouTube standard) or 1080p (Full HD). Select audio bitrate: 128 kbps for podcasts, 192 kbps for music, 256 kbps for high quality.
Click Create MP4. FFmpeg.wasm encodes the video in your browser with real-time progress. Preview it, then download — YouTube-ready.
Why Turn Audio into an MP4 Video?
YouTube, Instagram Reels, TikTok, Facebook, and most social platforms require video files — they don't accept raw MP3 uploads. If you've recorded a podcast, an original song, a lecture, or any audio content and want to share it anywhere beyond Spotify or SoundCloud, you need an MP4 wrapper around it.
- 🎵 Music releases — upload your track to YouTube with album art as the visual. Reach listeners who discover music on video platforms
- 🎙️ Podcast videos — share episodes on YouTube, Instagram, and TikTok. Show cover art + waveform or just keep it simple with a logo background
- 📚 Course audio — turn recorded lectures or guided meditations into videos for course platforms that require MP4 format
- 📱 Instagram Stories & Reels — Stories and Reels only accept video. Wrap your audio clip in an image to share it
- 🎤 Voice memos and interviews — share a recorded interview, speech, or voice memo as a video with a photo of the speaker
- 🎧 Audiobooks & narration — publish spoken-word content on YouTube where the audience is 2 billion monthly users
Best Image Size and Format for Audio Videos
The image you use becomes the entire visual of your video. Choosing the right dimensions for your target platform matters.
| Platform | Resolution | Aspect Ratio | Recommended Format | Notes |
|---|---|---|---|---|
| YouTube | 1920×1080 or 1280×720 | 16:9 widescreen | JPG or PNG | Standard for music, podcasts, talks. 1080p recommended. |
| YouTube Music | 3000×3000 → scaled to 720p | 1:1 square | JPG | Square album art centered with black bars. Use 720p output. |
| Instagram Reels | 1080×1920 | 9:16 portrait | JPG or PNG | Vertical video. Use "Original" resolution setting. |
| TikTok | 1080×1920 | 9:16 portrait | JPG or PNG | Vertical video preferred. 720p or Original both work. |
| 1280×720 or 1920×1080 | 16:9 or 1:1 | JPG or PNG | Both landscape and square work well. | |
| Podcast platforms | 1400×1400 to 3000×3000 | 1:1 square | JPG | Use "Original" resolution to preserve your cover art size. |
Tip: You can upload multiple images at once — they display in order with equal time slices based on your audio length. For a 10-second audio with 2 images, each shows for 5 seconds.
Supported Audio Input Formats
| Format | Common Use | Quality Note | Output in MP4 |
|---|---|---|---|
| MP3 | Music, podcasts, voice | Lossy — re-encoded to AAC | AAC at chosen bitrate |
| WAV | Studio recordings, DAW export | Lossless → re-encoded to AAC | AAC at chosen bitrate — some quality loss |
| FLAC | Lossless music, archival | Lossless → re-encoded to AAC | 256 kbps AAC preserves most quality |
| AAC / M4A | iPhone recordings, Apple Music | Re-encoded at new bitrate | AAC at chosen bitrate |
| OGG / OPUS | Web audio, Discord | Re-encoded to AAC | AAC at chosen bitrate |
| WMA | Windows recordings | Re-encoded to AAC | AAC at chosen bitrate |
Frequently Asked Questions
What does "MP3 to MP4" actually mean?
It means combining your audio file with a static image to create a video. The image becomes the visual background for the entire duration of the audio — like YouTube music videos where you see album art. The output is a standard H.264/AAC MP4 file that plays on every device and is accepted by every video platform.
Can I use audio formats other than MP3?
Yes. The converter accepts MP3, WAV, FLAC, AAC, M4A, OGG, OPUS, WMA, and AIFF. Any audio format that FFmpeg supports works. The audio is re-encoded to AAC inside the MP4 container for maximum compatibility.
Can I use multiple images as a slideshow?
Yes. Drop as many images as you want into the image zone. They will display in the order you add them, with each image shown for an equal slice of the audio duration — so a 60-second track with 3 images shows each for 20 seconds. You can reorder by removing and re-adding images, and the time labels update live once the audio loads.
Which resolution should I choose for YouTube?
1080p (1920×1080) gives the best results on YouTube — it qualifies for HD playback and looks sharp on all screens including 4K displays (the image scales well). For Instagram Reels and TikTok (vertical 9:16 format), choose "Original" and use a portrait image. For standard sharing where file size matters, 720p is perfect.
How long does it take to convert a 1-hour audio file?
Typically 3–8 minutes for a 1-hour file at 720p on a modern laptop. The converter uses 1 fps (one video frame per second) because the image is completely static — this makes encoding 24× faster than 24fps with almost zero visual difference. 1080p takes about 30–40% longer than 720p due to larger frame size.
Will the output MP4 be much larger than the original MP3?
Not dramatically. A 1-hour MP3 at 192 kbps is ~84 MB. Adding a static 720p image at 1 fps produces roughly a 90–120 MB MP4 — the video track stays small because there are only 3,600 near-identical frames. At 1080p the video track is somewhat larger. The audio is almost identical in size to your original (re-encoded to AAC).
Does my audio or image get uploaded anywhere?
No. Everything happens in your browser using FFmpeg.wasm — a full WebAssembly port of the professional FFmpeg tool. Your audio and image are read locally, processed locally, and the MP4 is saved locally. Nothing is sent to any server. 100% private.
Can I use an animated GIF as the background?
Yes. An animated GIF will loop through its frames while your audio plays — useful for simple motion backgrounds, waveform animations, or looping visuals. FFmpeg handles animated GIFs as input automatically. For a completely static background, use JPG or PNG instead.
Why is the audio re-encoded to AAC and not kept as MP3?
MP4 containers use AAC as the standard audio codec. While some players accept MP3 audio inside MP4, AAC is the universal standard for MP4 — it's what YouTube, Instagram, and every platform expects. Re-encoding at 192 kbps AAC is near-indistinguishable from the source 192 kbps MP3 in virtually all listening situations.