Zoom Recording Download: Export Meeting Video and Audio via API
Downloading a Zoom recording manually works for one meeting. For hundreds of meetings across multiple accounts, you need an API. But Zoom's native Cloud Recording API ties recordings to account ownership: you can retrieve recordings from meetings hosted on accounts you control, with Server-to-Server OAuth credentials for each account. If your users are on different Zoom accounts, or if you don't control the accounts hosting the meetings, the native path doesn't work.
The alternative is capturing recordings at the point of the meeting itself, using a bot participant that records from inside the call. The bot produces video and audio artifacts available via API endpoint, regardless of which Zoom account hosted the meeting and without requiring credentials for that account.
This guide covers the complete workflow for downloading Zoom meeting video (MP4) and audio via the MeetStream bot API: deploying the recording bot, handling the webhook events that signal artifact availability, and retrieving the files programmatically. Python examples throughout.
In this guide, we'll cover the video and audio recording configuration, the webhook payload structure for video.processed and audio.processed, the retrieval endpoints, and format considerations for downstream processing. Let's get into it.
Why recording download via bot API differs from the Zoom Cloud Recording API
It's worth being precise about the architectural difference. Zoom's Cloud Recording API stores recordings on Zoom's servers after the meeting ends. Retrieving them requires the recording:read OAuth scope on the meeting host's account. The host must have a paid Zoom plan (free accounts don't get cloud recording). And the API only returns recordings made by Zoom's own recording mechanism.
The bot API captures recordings from inside the meeting as a participant. The recording is produced by MeetStream's infrastructure, stored in MeetStream's system, and delivered to you via webhook and API endpoint. The meeting host's Zoom account type is irrelevant. The host's OAuth credentials are not involved. You only need the meeting URL.
This is why the bot approach scales across diverse customer environments. Your users can be on any Zoom plan, any Zoom account, and the recording workflow is identical.
MeetStream requires a one-time Zoom integration configuration, documented at docs.meetstream.ai. After that, every recording job is a single API call.
Configuring the bot for video and audio capture
The video_required parameter controls whether the bot captures video. Set it to true for MP4 output. The recording_config block controls retention and transcription. Here's a complete create_bot request for video and audio export:

curl -X POST https://api.meetstream.ai/api/v1/bots/create_bot \
-H "Authorization: Token YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"meeting_link": "https://zoom.us/j/123456789",
"bot_name": "Recording Bot",
"callback_url": "https://yourapp.com/webhooks/meetstream",
"video_required": true,
"recording_config": {
"transcript": {
"provider": {
"name": "deepgram"
}
},
"retention": {
"type": "timed",
"hours": 72
}
},
"automatic_leave": {
"waiting_room_timeout": 300,
"everyone_left_timeout": 60,
"voice_inactivity_timeout": 1800
}
}'The initial response:
{
"bot_id": "bot_abc123",
"transcript_id": "txn_xyz789",
"meeting_url": "https://zoom.us/j/123456789",
"status": "Joining"
}Save bot_id and transcript_id immediately. These are your retrieval keys for all artifacts. The bot_id is used for audio and video endpoints. The transcript_id is used for the transcript endpoint.
For audio-only recording (faster processing, smaller storage footprint, lower cost), set video_required: false. The audio.processed event still fires and audio is still downloadable. Only the video artifact is skipped.
The artifact webhook events
Three processing events signal that artifacts are ready for download. They fire in this order after bot.stopped:
// audio.processed, audio file is ready for download
{
"event": "audio.processed",
"bot_id": "bot_abc123",
"transcript_id": "txn_xyz789",
"timestamp": "2026-04-02T15:03:20Z"
}
// video.processed, MP4 file is ready for download
{
"event": "video.processed",
"bot_id": "bot_abc123",
"transcript_id": "txn_xyz789",
"timestamp": "2026-04-02T15:05:10Z"
}
// transcription.processed, transcript is ready for retrieval
{
"event": "transcription.processed",
"bot_id": "bot_abc123",
"transcript_id": "txn_xyz789",
"status": "completed",
"timestamp": "2026-04-02T15:07:45Z"
}Audio processing typically completes before video processing since audio is a subset of the video data and requires less encoding time. Transcription processing depends on audio processing (the transcription provider needs the audio file) and takes the longest due to the ASR computation.
Do not attempt to download artifacts before receiving the corresponding event. The files may not be finalized yet, and you'll get either a 404 or an incomplete file.
Downloading video and audio: Python implementation
import requests
import os
from flask import Flask, request, jsonify
app = Flask(__name__)
API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.meetstream.ai/api/v1"
HEADERS = {"Authorization": f"Token {API_KEY}"}
OUTPUT_DIR = "/data/recordings"
os.makedirs(OUTPUT_DIR, exist_ok=True)
@app.route('/webhooks/meetstream', methods=['POST'])
def webhook_handler():
data = request.json
event = data.get('event')
bot_id = data.get('bot_id')
transcript_id = data.get('transcript_id')
if event == 'audio.processed':
download_audio(bot_id)
elif event == 'video.processed':
download_video(bot_id)
elif event == 'transcription.processed':
download_transcript(transcript_id)
elif event == 'bot.stopped':
bot_status = data.get('bot_status')
if bot_status != 'Stopped':
# Non-clean exit, log and alert
print(f"Bot {bot_id} ended with status: {bot_status}")
return jsonify({'received': True}), 200
def download_audio(bot_id: str) -> str:
"""
Download audio artifact (MP3) and save to disk.
Returns: path to saved file
"""
url = f"{BASE_URL}/bots/{bot_id}/audio"
response = requests.get(url, headers=HEADERS, stream=True)
if response.status_code != 200:
print(f"Audio download failed: {response.status_code}")
return None
output_path = os.path.join(OUTPUT_DIR, f"{bot_id}_audio.mp3")
with open(output_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
size_mb = os.path.getsize(output_path) / (1024 * 1024)
print(f"Audio saved: {output_path} ({size_mb:.1f} MB)")
return output_path
def download_video(bot_id: str) -> str:
"""
Download video artifact (MP4) and save to disk.
Returns: path to saved file
"""
url = f"{BASE_URL}/bots/{bot_id}/video"
response = requests.get(url, headers=HEADERS, stream=True)
if response.status_code != 200:
print(f"Video download failed: {response.status_code}")
return None
output_path = os.path.join(OUTPUT_DIR, f"{bot_id}_video.mp4")
with open(output_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
size_mb = os.path.getsize(output_path) / (1024 * 1024)
print(f"Video saved: {output_path} ({size_mb:.1f} MB)")
return output_path
def download_transcript(transcript_id: str) -> dict:
"""
Fetch structured transcript with speaker attribution.
Returns: transcript dict with segments array
"""
url = f"{BASE_URL}/transcript/{transcript_id}/get_transcript"
response = requests.get(url, headers=HEADERS)
if response.status_code != 200:
print(f"Transcript fetch failed: {response.status_code}")
return None
transcript = response.json()
segments = transcript.get('segments', [])
print(f"Transcript fetched: {len(segments)} segments")
return transcript
The stream=True parameter in the requests calls is important for large files. It downloads the response body in chunks rather than loading the entire file into memory. For a two-hour meeting, the MP4 might be 500MB or more. Streaming download keeps memory usage constant regardless of file size.
MP4 vs raw audio: format considerations
The video endpoint returns an MP4 file. This is a composite recording: the meeting's active speaker view with audio mixed to a single stereo or mono track. It's the complete meeting as the bot experienced it, suitable for archival or sharing.
The audio endpoint returns an MP3 file. This is the audio extracted from the same recording, at standard MP3 quality. For transcription pipelines, summarization, or audio analysis tasks, using the audio file directly rather than extracting audio from the MP4 is cleaner and avoids an unnecessary ffmpeg step.
For per-speaker audio (separate audio tracks per participant), use the real-time WebSocket stream via live_audio_required rather than the post-call artifacts. Post-call artifacts are mixed-down recordings. Speaker separation in the post-call workflow comes from transcript diarization, not separate audio files.
File sizes scale roughly with meeting length and video resolution. A one-hour meeting at standard video quality produces roughly 200 to 400MB of video. If storage cost is a concern, set video_required: false for workflows where you only need audio and transcript, and only enable video when you specifically need the visual recording.
Managing retention and artifact availability
The recording_config.retention block controls how long artifacts remain available for download. The timed type with hours: 72 means artifacts are downloadable for 72 hours after the meeting ends. After that, they're deleted from MeetStream's storage.
Design your pipeline to download and store artifacts in your own system within the retention window. If your processing pipeline has an outage during the retention window, you may lose access to the artifacts. For critical recordings, either extend the retention hours or download and store files immediately when the processed events fire.
If you're building a system where users request recording exports asynchronously (for example, a "download recording" button in your application), store the bot_id and trigger the download on user action rather than immediately on the webhook event. Just make sure the user's action happens within the retention window.
Tradeoffs and what to watch for
Large video files take time to download. For a two-hour meeting on a reasonable server connection, expect 30 to 60 seconds for the video download. Build this into your UX expectations. If you're triggering a download in response to a user action, show appropriate progress indicators.

The bot records from its participant perspective. If the meeting uses a gallery view, the bot sees the gallery. If participants use virtual backgrounds or video filters, those are captured as-is. Screen shares are captured as the active view when someone is sharing. The recording is equivalent to what a human participant would see, not an isolated stream per participant.
Zoom meetings sometimes have variable connection quality. If participants have poor network connections, the bot's audio and video quality reflects those issues. The transcription provider processes whatever audio was captured, so a meeting with significant connection issues will have lower transcription accuracy.
How MeetStream fits in
MeetStream handles bot deployment, recording capture, artifact processing, and storage. Your application receives webhook events, downloads files from the API endpoints, and stores them in your own system. If you're also recording Google Meet or Teams calls, the same create_bot request structure and the same artifact download endpoints work identically across all three platforms.
Conclusion
Downloading Zoom recordings via API is a three-phase workflow: deploy a bot with video_required: true and a retention window in recording_config, handle the audio.processed and video.processed webhook events, and retrieve files via GET /bots/{bot_id}/video and GET /bots/{bot_id}/audio. Use streaming downloads for large files. Download artifacts within the retention window and store them in your own system. For audio-only workflows, set video_required: false to reduce processing time and storage. The transcript is a separate artifact retrieved via the transcript_id using GET /transcript/{id}/get_transcript after the transcription.processed event fires.
Get started free at meetstream.ai or see the full API reference at docs.meetstream.ai.
Frequently Asked Questions
How do I download a Zoom recording via API without host credentials?
Zoom's native Cloud Recording API requires credentials for the account that hosted the meeting. Using a recording bot API, you capture the meeting as a participant without needing the host's credentials. Deploy a bot with video_required set to true, wait for the video.processed webhook, then call GET /bots/{bot_id}/video to download the MP4. The only requirement is the meeting URL, not the host's Zoom account access.
What format is the Zoom recording download from the MeetStream API?
Video recordings are downloaded as MP4 files via the GET /bots/{bot_id}/video endpoint. Audio recordings are available as MP3 files via GET /bots/{bot_id}/audio. Both files represent the full meeting as captured from the bot's participant perspective. For transcription use cases, use the audio endpoint directly rather than extracting audio from the MP4 to avoid unnecessary conversion.
How long are Zoom recordings available for download?
The retention window is controlled by the recording_config.retention.hours parameter in your create_bot request. Set this to a value that covers your processing pipeline's needs. After the retention window expires, artifacts are deleted from MeetStream's storage. Download recordings to your own storage system promptly after receiving the processed webhook events, and set your retention window generously to handle processing delays.
How large are the Zoom recording files from the API?
File size depends on meeting length, video resolution, and compression. A one-hour meeting typically produces 200 to 400MB of MP4 video and 20 to 40MB of MP3 audio. For large files, use streaming downloads in your retrieval code rather than loading the entire file into memory. The Python example above shows how to use requests with stream=True and iter_content to handle large files efficiently.
Can I download recordings from Zoom meetings I didn't host?
Yes. The recording bot joins as a participant and captures the meeting from its own perspective. Participant-level recording does not require host credentials or host account access. The bot needs to be admitted to the meeting (either directly or from a waiting room), but no host Zoom account permissions are required. This is the key advantage over Zoom's native Cloud Recording API, which requires access to the hosting account's recordings.
