File: scc_reader.md

package info (click to toggle)
ttconv 1.2.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 2,460 kB
  • sloc: python: 19,811; sh: 44; makefile: 7
file content (65 lines) | stat: -rw-r--r-- 2,586 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# SCC Reader

## Overview

The SCC reader (`ttconv/scc/reader.py`) converts [SCC](https://docs.inqscribe.com/2.2/format_scc.html) documents into
the [data model](./data-model.md).

## Usage

The SCC reader accepts as input a [Scenarist Closed
Caption](https://www.govinfo.gov/content/pkg/CFR-2007-title47-vol1/pdf/CFR-2007-title47-vol1-sec15-119.pdf) document that conforms
to the [CEA-608](https://shop.cta.tech/products/line-21-data-services) encoding specification and returns a `model.ContentDocument`
object.

```python
import ttconv.scc.reader as scc_reader

doc = scc_reader.to_model("src/test/resources/scc/pop-on.scc")
# doc can then manipulated and written out using any of the writer modules
```

## Architecture

The input SCC document is read line-by-line. For each line, the time code prefix and following CEA-608 codes (see the
`ttconv/scc/codes` package) are processed to generate `SccCaptionParagraph` instances. Each paragraph associates a time and region
with the text (including line-breaks) it contains (see definition in `ttconv/scc/content.py`). The paragraphs are then converted to
a `model.P`, part of the output `model.ContentDocument` (see the `SccCaptionParagraph::to_paragraph()` method in
`ttconv/scc/paragraph.py`), following the recommendations specified in [SMPTE RP
2052-10:2013](https://ieeexplore.ieee.org/document/7289645).

The paragraph generation is based on the buffer-based mechanism defined in the CEA-608 format: a buffer of caption
content is filled while some other content is displayed. These buffering and displaying processes can be synchronous or
asynchronous, based on the caption style (see `ttconv/scc/style.py`).

`ttconv/scc/utils.py` contains utility functions to convert geometrical dimensions of different units,
and `ttconv/scc/disassembly.py` handles CEA-608 codes conversion to the _disassembly_ format.

## Disassembly

The SCC reader can dump SCC content in the [Disassemby](http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_TOOLS.HTML#ccd)
format, which is an ad-hoc a human-readable description of the SCC content.

```python
import ttconv.scc.reader as scc_reader

print(scc_reader.to_disassembly("src/test/resources/scc/pop-on.scc"))
```

For instance, the following SCC line:

```
00:00:00:22 9425 9425 94ad 94ad 9470 9470 4c6f 7265 6d20 6970 7375 6d20 646f 6c6f 7220 7369 7420 616d 6574 2c80
```

is converted to:

```
00:00:00:22 {RU2}{RU2}{CR}{CR}{1500}{1500}Lorem ipsum dolor sit amet,
```

This is useful for debugging.

## Tests

Sample SCC files can be found in the `src/test/resources/scc` directory.