Search

WebVTT format – Web Video Text Tracks format

17 Sep 2021

WebVTT format – Web Video Text Tracks format

  • 7070 Views
Copied Successfully!

>> A review with notes and thoughts for LeanBack Player <<

WebVTT File specifications

  • Encoding: UTF-8
  • MIME type: text/vtt
  • Line terminator: \r, \n or \r\n

WebVTT Format specifications

⇒ File header

WEBVTT

[cue]

⇒ Cue [cue] format

[one or more characters not containing the substring “–>” or \r, \n, \r\n]

[hh…:]mm:ss.msmsms –> [hh…:]mm:ss.msmsms [settings]
First line
Second line

  • [hh…:] for hour declarations is optional
  • [settings] for cue setting declarations are optional
  • Milliseconds separator is a full stop (.)
  • Cues have to be separated by one (or more) blank line

 

  • Notes, thoughts:
    • from specs: “A WebVTT timestamp representing the start time offset of the cue. The time represented by this WebVTT timestamp must be greater than or equal to the start time offsets of all previous cues in the file.”
    • thoughts: would be useful to have timestamps with same start time offset than previous cues in a file, e.g. we have two speakers and want to add different settings (see Cue settings part below) to them
Example:

WEBVTT

1
00:00:15.000 –> 00:00:18.000
At the left we can see…

2
00:00:18.167 –> 00:00:20.083
At the right we can see the…

3
00:00:20.083 –> 00:00:22.000
…the head-snarlers

⇒ Cue settings [settings]

  • Settings are added right after the timing, on the same line, separated with one (or more) space or tabulation
  • Settings can be combined
Example:

WEBVTT

1
00:00:15.000 –> 00:00:18.000 A:start L:10
At the left we can see…

2
00:00:18.167 –> 00:00:20.083 A:end S:75%
At the right we can see the…

3
00:00:20.083 –> 00:00:22.000 A:middle T:50%
…the head-snarlers

 

Clarification:

  • Hints:
    • click images to enlarge
    • images show the; green bordered images show dir=”ltr”, red bordered images show dir=”rtl”
    • [text track cue] shown with gray background
    • [text track cue line] blocks shown with blue border

 

  • General notes, thoughts:
    • [text track cue] size:
      • from specs: “If the [text track cue] writing direction is horizontal, then let width be ‘size vw’ and height be ‘auto’. Otherwise, let width be ‘auto’ and height be ‘size vh’.” (vw ≘ viewport’s width, vh ≘ viewport’s height)
      • this document differs from specs in that way that [text track cue] is as width (for horizontal, height for vertical) as the widest (for horizontal, highest for vertical) [text track cue line] within
    • the following settings should be implemented as default [text track cue] settings so they can be omitted from [cue]:
      • horizontal: by default a [text track cue] is positioned at the bottom center of thewith setting L:100% T:50%; text within a [text track cue] is aligned to the center with setting A:middle
      • vertical: by default a [text track cue] is positioned at the top right (for dir=”ltr”, bottom right for dir=”rtl”) of the

        with setting L:0% T:0%; text within a [text track cue] is vertical aligned to the top (for dir=”ltr”, bottom for dir=”rtl”) with setting A:start

    • style sheet:
      • from specs: “No style sheets are associated with nodes. (The nodes are subsequently restyled using style sheets after their boxes are generated, …)”
      • thoughts: most developers will provide a style sheet for the subtitles/captions. If no cue setting provided, first we should follow the default settings mentioned in the clarification hints above which should be able to be overwritten by a developers style sheet. If cue settings provided no styles sheets made by developer should be used.
    • for cues positioned at the bottom the players control bar will be shown above them
    • not yet clear what to do if width and/or height of [text track cue] box exceeds the

1.1) Text alignment: A:[start|middle|end]
  • Hints:
    • where [start|middle|end] means:
      • if direction is “LTR”: start ≘ left; end ≘ right
      • if direction is “RTL”: start ≘ right; end ≘ left
Text alignment A:start A:middle A:end
cue example WEBVTT

1

00:00:15.000 –> 00:00:18.000 A:start

Hello

everbody

WEBVTT

1

00:00:15.000 –> 00:00:18.000 A:middle

Hello

everbody

WEBVTT

1

00:00:15.000 –> 00:00:18.000 A:end

Hello

everbody

  • Notes, thoughts:
    • A:[start|middle|end] only for aligning the [text track cue line] blocks within the [text track cue]

1.2) Text position: T:[number]%
  • Hints:
    • where [number] is a positive integer
Text position T:0T:0% T:50%A:middle (see above) T:100%
cue example WEBVTT

1

00:00:15.000 –> 00:00:18.000 T:0%

Hello

everbody

WEBVTT

1

00:00:15.000 –> 00:00:18.000 T:50%

Hello

everbody

WEBVTT

1

00:00:15.000 –> 00:00:18.000 T:100%

Hello

everbody

Text alignment
and
Text position
A:start T:50% A:middle T:50%T:50% (see above) A:end T:50%
cue example WEBVTT

1

00:00:15.000 –> 00:00:18.000 A:start T:50%

Hello

everbody

WEBVTT

1

00:00:15.000 –> 00:00:18.000 A:middle T:50%

Hello

everbody

WEBVTT

1

00:00:15.000 –> 00:00:18.000 A:end T:50%

Hello

everbody

  • Notes, thoughts:
    • for block positioning the (upcoming) CSS3 Images property “object-position” could be very useful here if browsers would support it already

1.3) Line position: L:[number]%
  • Hints:
    • where [number] is a positive integer with “%”(percentage) present OR [number] is a positive or negative integer and “%” (percentage) not present
    • L:[number]% represents a specific position of [text track cue] box relative to the bottom of the
    • L:[number] represents a line number
Line position L:0% L:50% L:100%A:middle T:50% (see above)
cue example WEBVTT

1

00:00:15.000 –> 00:00:18.000 L:0%

Hello

everbody

WEBVTT

1

00:00:15.000 –> 00:00:18.000 L:50%

Hello

everbody

WEBVTT

1

00:00:15.000 –> 00:00:18.000 L:100%

Hello

everbody

Line position A:start T:0% L:100% A:middle T:50% L:50%L:50% (see above) A:end T:100% L:0%
cue example WEBVTT

1

00:00:15.000 –> 00:00:18.000 A:start T:0% L:100%

Hello

everbody

WEBVTT

1

00:00:15.000 –> 00:00:18.000 A:middle T:50% L:50%

Hello

everbody

WEBVTT

1

00:00:15.000 –> 00:00:18.000 A:end T:100% L:0%

Hello

everbody

  • Notes, thoughts:
    • for block positioning the (upcoming) CSS3 Images property “object-position” could be very useful here if browsers would support it already

1.4) Cue size: S:[number]%
  • Hints:
    • where [number] is in the range 0 ≤ number ≤ 100
    • S:[number]% represents a percentage [text track cue] size decrease

 

  • Notes, thoughts:
    • default size of [text track cue] box is 100%
    • size value does not change the text size but the width (when horizontal, height when vertical) of the [text track cue] box

1.5) Vertical alignment: D:vertical OR D:vertical-lr
  • Hints:
    • D:vertical represents a vertical aligned text where text is growing right to left
    • D:vertical-lr represents a vertical aligned text where text is growing left to right
    • Default cue setting for text position is A:middle and for line position is T:50% (if left/not added as cue setting)

 

Vertical alignment A:start T:0% D:vertical A:start T:0% D:vertical-lr
cue example WEBVTT

1

00:00:15.000 –> 00:00:18.000 A:start T:0% D:vertical

Hello

everbody

WEBVTT

1

00:00:15.000 –> 00:00:18.000 A:start T:0% D:vertical-lr

Hello

everbody

  • Notes, thoughts:
    • for text orientation the (upcoming) CSS3 Writing Modes property “text-orientation” could be very useful here if browsers would support it already

⇒ Cue text

→ Cue text replacements
  • & has to be replaced with &amp;
  • < has to be replaced with &lt;
  • > has to be replaced with &gt;

 

→ Cue text tags
  • Bold: <b>bold text</b>
  • Italic: <i>italic text</i>
  • Underline: <u>underlined text</u>
  • Ruby annotations: <ruby>text<rt>annotation</rt></ruby>

 

  • Hint: put intermediate timestamps (wrapped with <…>) in cue text to appear step-by-step like karaoke style

 

  • Notes, thoughts:
    • Ruby annotations not yet support by most modern browsers
Example:

00:01:07.395 –> 00:01:10.246
One… <00:01:08.350>Two… <00:01:09.125>Three…

→ Cue class tags
  • <c.CLASSName1.CLASSName2>styled text</c> has to be replaced with <span style=”CLASSName1 CLASSName2″>styled text</span>
Example:

00:01:07.395 –> 00:01:10.246
<c.vtt_black.vtt_uppercase>Hey!</c>
<c.vtt_uppercase>Hello!</c>

→ Cue voice tag
  • <v Voice Name>voice text</v> has to be replaced with <q title=”Voice Name”>voice text</q>

 

  • Notes, thoughts:
    • this differs from specs in the way that opened <v> tags should be closed with </v>
Example:

00:01:07.395 –> 00:01:10.246
<v John Do>Hey!</v>
<v Jane Doe>Hello!</v>

References

  • WHATWG WebVTT (draft) Specification
  • W3C Ressources
    • HTML5 Specification – 10.3.2 Timed text tracks
    • Web Media Text Tracks Community Group , Web Media Text Tracks Community Group Wiki
  • “Understanding WebVTT file format (draft)” by Julien Villetorte

Related News

14 Sep 2021

HTML5 video Iframe examples

  • 6232 Views
14 Sep 2021

  • 11593 Views
14 Sep 2021

  • 4915 Views
14 Sep 2021

  • 11145 Views