FAQ / Adding captions to a course narration

How do I add captions to my course narration?

If you wrote the script before you recorded the audio, you already have your captions. You just need a tool that exports them in the right file format with the right timestamps. VoiceOverAndOver writes SRT and WebVTT files in the same step as the audio export, so a separate caption pass is not needed.

Three common approaches

Auto-transcribe after the fact. Pay a service or run a model that listens to the audio and produces captions. Cheap but always wrong on names, jargon, and homophones.
Hand-time captions in a caption editor. Slow. Painful. Error-prone. Avoid.
Use your existing script as the caption source. Free, accurate, and the captions already match what you said because the script was the source of truth.

The third approach is the one to use whenever your audio was scripted.

How VoiceOverAndOver handles it

Every paragraph in your project carries both the text and the audio. When you merge, the app builds a timeline: paragraph 1 from 0 to 4.2 seconds, paragraph 2 from 4.2 to 9.8 seconds, and so on. From that timeline it writes:

SRT - the format every video editor, YouTube, and most LMS platforms accept.
WebVTT - the format Vimeo and HTML5 video want.
Premiere/Resolve CSV markers - drop these into your NLE and every paragraph becomes a labeled marker on the timeline.
Audacity label track - if you import the audio into Audacity to do extra editing, you get every paragraph as a labeled region.

Note

Long paragraphs are automatically split into sub-cues so a single caption never sits on screen for too long. The last sub-cue of a paragraph snaps exactly to the paragraph boundary, so paragraph timing stays a perfect resync point even when individual sub-cues are estimated.

Workflow for an LMS

Record the narration in VoiceOverAndOver.
On the merge screen, tick SRT (or VTT, whichever your platform prefers).
Export. You get the audio file and the caption sidecar in the same folder.
Upload both to your course platform.

If the platform supports VTT only, export VTT. If it accepts both, SRT is the safer default - more tools understand it.

Fixing typos after the fact

If a learner spots a typo in a caption, fix the text on that row in the project, re-export, replace the sidecar file. The audio does not need to be rebuilt. That is the upside of keeping text and audio tied per-paragraph.

Back to the FAQ