> **Design proposal** — not yet implemented. Current implementation uses
> `vod/scripts/getVideo.sh` (yt-dlp) to download MP4s directly, with
> FFmpeg invoked by `live/scripts/start_live_stream.sh` for live HLS output.

# Encoding Pipeline Design
## Ingest, YouTube Intake, and HLS Packaging

## 1. Purpose

The encoding pipeline transforms church-owned source videos into Roku-compatible HLS assets.

It supports:

- manual MP4 uploads
- archive imports
- staff-managed YouTube intake for church-owned videos

Its job is to produce encoded assets once, store them, and make them available for scheduling.

## 2. Pipeline Overview

```text
Video Upload / YouTube Download
            │
            ▼
Azure Blob Storage (/source)
            │
            ▼
Encoding Queue
            │
            ▼
FFmpeg Worker
            │
            ▼
Azure Blob Storage (/hls)
            │
            ▼
Metadata Extractor
            │
            ▼
Video Catalog Update
```

## 3. Supported Input Sources

### Manual Upload

Church staff uploads MP4 files from local recording workflows.

### Archive Import

Existing church media library is copied into `/source`.

### YouTube Intake

Church-owned videos are downloaded automatically using the two-script pipeline
in `scripts/`:

1. `extract_required_videos.sh` — scans `schedule.json` and all package files,
   cross-checks `../videos/` for existing `.mp4` files, and writes only the
   **missing** video IDs to `scripts/requiredVideos.txt`.
2. `download_from_youtube.sh` — reads that list and fetches each video via
   `yt-dlp`, writing `{videoId}.mp4`, `{videoId}.jpg`, and `{videoId}.json`
   into `../videos/`. A `cookies.txt` (Netscape format) placed at the project
   root is used automatically; a read-only copy is passed to yt-dlp so the
   original is never modified.

Operational rule:

Use YouTube only for **church-owned** content, and keep the local `videos/`
directory as the system of record.

## 4. Recommended Process

```text
Record once
    ↓
Store master in Azure
    ↓
Encode for Roku
    ↓
Schedule package use
    ↓
Optionally publish to YouTube or other platforms
```

This avoids building your system around re-downloading from public platforms.

## 5. Ingest Workflow

### Manual Upload Workflow

1. staff uploads MP4
2. file is stored under `/source`
3. encoding job is queued
4. worker processes file
5. HLS assets are written to `/hls`
6. metadata is saved to video catalog

### YouTube Intake Workflow

1. run `extract_required_videos.sh` — produces `scripts/requiredVideos.txt`
   containing only IDs not yet downloaded
2. run `download_from_youtube.sh` — fetches each missing video from YouTube
   into `videos/{videoId}.mp4` with thumbnail and metadata sidecar files
3. encoding job is queued
4. worker processes file
5. HLS assets are stored
6. metadata is saved

Cron entry (daily at 2 am):

```
0 2 * * * cd /var/www/html/tv/scripts && \
          bash extract_required_videos.sh && \
          bash download_from_youtube.sh >/dev/null 2>&1
```

## 6. HLS Output Structure

Each source video is encoded into one HLS package.

Example:

```text
/source/sermon-2026-03-01.mp4
    ->
/hls/sermon-2026-03-01/
    master.m3u8
    /1080p
      index.m3u8
      seg0001.ts
    /720p
      index.m3u8
      seg0001.ts
    /480p
      index.m3u8
      seg0001.ts
```

## 7. FFmpeg Example

Example single-rendition command:

```bash
ffmpeg -i sermon.mp4   -c:v libx264   -c:a aac   -hls_time 6   -hls_playlist_type vod   -hls_segment_filename "seg%03d.ts"   index.m3u8
```

### Notes

- use AAC audio
- keep segment duration consistent
- generate stable naming
- preserve source masters

## 8. Metadata to Capture

For each encoded asset, store:

- `videoId`
- `title`
- `durationSec`
- `sourcePath`
- `hlsMasterPath`
- available variants
- segment count per variant
- thumbnail path
- encoding status
- created / updated timestamps

Example:

```json
{
  "videoId": "sermon-2026-03-01",
  "durationSec": 3120,
  "sourcePath": "/source/sermon-2026-03-01.mp4",
  "hlsMasterPath": "/hls/sermon-2026-03-01/master.m3u8",
  "variants": ["1080p", "720p", "480p"],
  "status": "ready"
}
```

## 9. Worker State Model

Recommended states:

- `queued`
- `processing`
- `ready`
- `failed`

This is enough for a first version.

## 10. Failure Handling

If encoding fails:

- leave the source master untouched
- mark the job as `failed`
- capture FFmpeg output logs
- allow retry
- notify administrators if needed

If partial output exists, either clean it up or mark it clearly as incomplete.

## 11. Queueing

Recommended options:

| Component | Suggested Azure Option |
|---|---|
| Queue | Azure Storage Queue or Service Bus |
| Worker | Azure Container Apps or VM |
| Source storage | Azure Blob Storage |
| Output storage | Azure Blob Storage |

For a small church-scale system, Azure Storage Queue plus one worker is usually enough.

## 12. Thumbnail Generation

If useful for admin UI, extract a thumbnail from each video during the encoding workflow.

Possible output:

```text
/images/sermon-2026-03-01.jpg
```

This helps with package editing and schedule management later.

## 13. Why Not Encode on Demand

Do not transcode when Roku asks for playback.

Reasons:

- playback becomes dependent on CPU availability
- startup becomes slower
- system becomes more fragile
- costs become less predictable
- debugging becomes harder

The correct model is:

```text
encode once → reuse many times
```

## 14. Operational Guidance

Recommended practices:

- keep source masters permanently
- encode immediately after upload
- validate output before exposing asset to packages
- do not allow schedules to reference assets still in `processing`
- use stable naming conventions

## 15. Naming Suggestions

Examples:

```text
sermon-2026-03-01
worship-2026-03-05
devotional-2026-03-06-am
```

Keep names predictable and date-friendly.

## 16. Minimal Phase 1 Implementation

Phase 1 can stay simple:

- one upload path
- one encoding worker
- HLS output to Blob Storage
- metadata saved in JSON or simple DB
- package assignment after encoding completes

Later phases can add:

- multiple workers
- automatic thumbnails
- multi-bitrate tuning
- admin UI

## 17. Summary

The encoding pipeline turns church-owned source video into Roku-ready streaming assets.

Its role is straightforward:

- ingest source files
- encode once
- store durable HLS output
- publish metadata for scheduling

That makes the rest of the system possible.
