Transcript to VTT Converter
How to convert a transcript to WebVTT
To generate a WebVTT caption file, paste a transcript that includes
timestamps in the format HH:MM:SS. Each timestamp marks
the start of a new caption cue, and the text that follows will appear
on screen until the next timestamp is reached.
Once your transcript is pasted into the input area, choose whether you
want to remove speaker labels such as Speaker 1 or Speaker 2. You can also adjust the final cue padding, which controls how long
the last caption remains visible after the final timestamp.
Click Generate VTT to convert the transcript into a properly formatted WebVTT caption file. The generated output will appear in the preview panel, where you can review the cues before downloading the file.
When you're ready, click Download .vtt to save the caption file. The resulting WebVTT file can be used with HTML5 video players, podcast players, learning platforms, and other accessibility workflows that support WebVTT subtitles or caption tracks.
Most likely transcript sources
Most likely transcript sources
Microsoft Teams transcript exports
Teams often exports transcripts with timestamps and speaker labels in a format like this:
00:00:04 Speaker 1
So hello and welcome...
00:00:17 Speaker 1
The Michigan State University... This is probably the closest match to your example.
Zoom transcript exports (.txt)
Zoom cloud recordings can export a .txt transcript with
timestamps and speaker labels that look very similar.
Otter.ai / AI meeting transcription tools
Otter, Fireflies, Fathom, and similar tools often export plain text transcripts with timestamps and speaker labels that follow the same pattern.
Whisper / OpenAI transcription pipelines
If you output a text transcript with timestamps but not a caption file, you will often get something like this:
00:00:01
Alright.
00:00:04 Speaker 1
So hello and welcome... Podcast or lecture transcription tools
Some university lecture capture systems and podcast transcription tools also export transcripts in this style.
Frequently asked questions
Frequently asked questions
What transcript format does this tool accept?
The transcript should contain timestamps formatted as
HH:MM:SS. Each timestamp starts a new caption cue,
and the text following it becomes the caption content until the
next timestamp.
What is a WebVTT file?
WebVTT (Web Video Text Tracks) is a caption and subtitle format used by HTML5 media players. It defines when text should appear on screen and how long it should remain visible during playback.
While WebVTT is commonly used for video subtitles, it is also widely used for audio transcripts and podcast caption tracks. Many accessibility workflows use VTT files to display synchronized text alongside spoken content so users can read along with audio or video playback.
WebVTT files work with the HTML <track> element and
are supported by modern browsers, streaming platforms, learning management
systems, and other media players that support caption tracks.
Can I use this VTT file with HTML video?
Yes. The generated file works with the HTML
<track> element and most modern video players that
support WebVTT captions.
Why remove speaker labels?
Meeting transcripts often include labels like
Speaker 1 or Speaker 2. Removing them
helps captions read more naturally on screen and avoids repeating
speaker identifiers in every cue.
What's the difference between VTT and SRT?
Both VTT and SRT are subtitle formats used to display timed captions. SRT is older and widely supported across many video platforms, while WebVTT was designed for the web and works directly with HTML5 media players.
WebVTT supports additional features such as styling, positioning,
and metadata tracks. For modern web video and audio players,
WebVTT is often the preferred format because it integrates
directly with the HTML
<track> element.
Related tools: Caption Converter