How to convert a transcript to WebVTT

To generate a WebVTT caption file, paste a transcript that includes timestamps in the format HH:MM:SS. Each timestamp marks the start of a new caption cue, and the text that follows will appear on screen until the next timestamp is reached.

Once your transcript is pasted into the input area, choose whether you want to remove speaker labels such as Speaker 1 or Speaker 2. You can also adjust the final cue padding, which controls how long the last caption remains visible after the final timestamp.

Click Generate VTT to convert the transcript into a properly formatted WebVTT caption file. The generated output will appear in the preview panel, where you can review the cues before downloading the file.

When you're ready, click Download .vtt to save the caption file. The resulting WebVTT file can be used with HTML5 video players, podcast players, learning platforms, and other accessibility workflows that support WebVTT subtitles or caption tracks.

Most likely transcript sources

Microsoft Teams transcript exports

Teams often exports transcripts with timestamps and speaker labels in a format like this:

00:00:04 Speaker 1
                    So hello and welcome...
                    00:00:17 Speaker 1
                    The Michigan State University...

This is probably the closest match to your example.

Zoom transcript exports (.txt)

Zoom cloud recordings can export a .txt transcript with timestamps and speaker labels that look very similar.

Otter.ai / AI meeting transcription tools

Otter, Fireflies, Fathom, and similar tools often export plain text transcripts with timestamps and speaker labels that follow the same pattern.

Whisper / OpenAI transcription pipelines

If you output a text transcript with timestamps but not a caption file, you will often get something like this:

00:00:01
                    Alright.
                    00:00:04 Speaker 1
                    So hello and welcome...

Podcast or lecture transcription tools

Some university lecture capture systems and podcast transcription tools also export transcripts in this style.

Frequently asked questions

What transcript format does this tool accept?

The transcript should contain timestamps formatted as HH:MM:SS. Each timestamp starts a new caption cue, and the text following it becomes the caption content until the next timestamp.

What is a WebVTT file?

WebVTT (Web Video Text Tracks) is a caption and subtitle format used by HTML5 media players. It defines when text should appear on screen and how long it should remain visible during playback.

While WebVTT is commonly used for video subtitles, it is also widely used for audio transcripts and podcast caption tracks. Many accessibility workflows use VTT files to display synchronized text alongside spoken content so users can read along with audio or video playback.

WebVTT files work with the HTML <track> element and are supported by modern browsers, streaming platforms, learning management systems, and other media players that support caption tracks.

Can I use this VTT file with HTML video?

Yes. The generated file works with the HTML <track> element and most modern video players that support WebVTT captions.

Why remove speaker labels?

Meeting transcripts often include labels like Speaker 1 or Speaker 2. Removing them helps captions read more naturally on screen and avoids repeating speaker identifiers in every cue.

What's the difference between VTT and SRT?

Both VTT and SRT are subtitle formats used to display timed captions. SRT is older and widely supported across many video platforms, while WebVTT was designed for the web and works directly with HTML5 media players.

WebVTT supports additional features such as styling, positioning, and metadata tracks. For modern web video and audio players, WebVTT is often the preferred format because it integrates directly with the HTML <track> element.

Related tools: Caption Converter

Transcript to VTT Converter