himmi

ytdld: Building a Live TUI on Top of a Library That Wasn't Built for One

15 min read
engineeringdeep-divepythontextualcliytdld

ytdld is a terminal app for downloading YouTube content as MP3, MP4, or FLAC. Paste some links, flip a few switches, watch every track stream in parallel, done — powered by yt-dlp underneath, but built so you never have to remember a flag. The pitch is simple. The interesting part is everything yt-dlp doesn't give you for free: yt-dlp's download() call is synchronous and blocking, it renders its own progress straight to stdout, and it has no built-in concept of "please stop now." None of that is a flaw in yt-dlp — it's a single-purpose extraction/download library, not a UI toolkit. But it means a live, 8-way-parallel, cancellable, retryable dashboard has to be built entirely around those constraints rather than with any help from them.

This is a deep dive into that architecture: one UI-agnostic core driving two different front ends, a link resolver that trusts nothing about how a user might paste a URL, and the specific mechanism — raising an exception from inside a callback — that gives a blocking library a cancel button it was never designed to have.

The problem

Three things make "just wrap yt-dlp in a nice UI" harder than it sounds:

  1. yt-dlp owns the terminal by default. Left alone, it prints its own progress bars directly to stdout. A Textual app owns the terminal too — via the alternate screen buffer and its own render loop. Two things writing raw ANSI to the same terminal at once doesn't degrade gracefully; it corrupts the display.
  2. YoutubeDL.download() is one blocking call per invocation. There's no async def download(), no cancellation token, no "pause." Once you call it, you get control back when it's done, an error, or nothing until then.
  3. A batch of tracks means N independent downloads, each with fine-grained progress, but a "download all" batch has to look like one coherent operation to the user — one overall ETA, one pass/fail count, one retry action for just what broke.

ytdld's answer to all three is the same shape: don't fight yt-dlp's synchronous, single-purpose API — wrap it in thin, well-placed seams (a progress hook, a thread pool, an event dataclass) and put all the real complexity in what happens around those seams.

Architecture

The package is six focused modules, split cleanly along the UI-agnostic / UI-specific line:

src/yt_dld/
├── core.py       # link parsing, yt-dlp option building, the download primitive — no UI imports
├── config.py     # TOML config: the single source of defaults for both front ends
├── cli.py        # verb routing: tui | doctor | update | config | completions | headless
├── headless.py   # the scripting fast-path — plain scrolling rich output
├── picker.py     # questionary checkbox picker, for headless --select
├── doctor.py     # environment diagnostics (python/ffmpeg/yt-dlp/PATH)
├── notify.py     # best-effort desktop notification (osascript / notify-send)
└── tui/app.py    # the three-screen Textual application
Rendering diagram…

core.py's module docstring states the rule directly: "This module is deliberately UI-agnostic: both the Textual TUI and the headless flag path drive it. Nothing here imports rich or textual." That's not a stylistic preference — it's what makes it possible to add a third front end later (a web API, say) without touching a single line of download logic, and it's what let both existing front ends be built and tested against the exact same LinkProcessor, DownloadOptions, and download_entry primitives instead of two divergent implementations quietly drifting apart.

DependencyRole
yt-dlpThe actual extraction/download engine — everything else is orchestration around it
textualThe three-screen TUI, its widget model, and the async event loop that drives it
richHeadless-mode progress bars, tables, and styled console output
questionaryThe checkbox picker for headless --select
concurrent.futures.ThreadPoolExecutorThe actual parallelism primitive — for downloads, and for parallel metadata enrichment
tomllib (stdlib)Reading ~/.config/ytdld/config.toml; no third-party TOML dependency needed

CI runs the 16-test suite plus a CLI smoke test (--version, doctor, config, completions zsh) across Python 3.11, 3.12, and 3.13 on every push, alongside a separate sdist/wheel build job — a small matrix, but it's the one that actually matters for a tool installed as a global CLI: does it import and run cleanly on every supported interpreter.

Design decisions and tradeoffs

One core, two front ends — not two implementations

The most consequential structural decision is the one described above: core.py has no idea whether it's being called from a Textual worker thread or a plain headless loop. DownloadOptions is a single dataclass both front ends populate; download_entry is a single function both front ends call; LinkProcessor.sanitize/expand_playlists are the single implementation both use to turn pasted text into real tracks. The alternative — a TUI-specific download path and a separate headless one — is the more common shape for CLI tools that grew a UI later, and it's also how "the headless flag works but the TUI silently does something slightly different" bugs get introduced. Keeping exactly one implementation of "what does downloading a track actually mean" means a fix or a new option (a new SponsorBlock mode, say) is correct in both places by construction, not by remembering to update two files.

A regex-first sanitizer, because pasted YouTube links are never clean

_VIDEO_PATTERNS = [
    r"(?:v=|v%3D)([a-zA-Z0-9_-]{11})",
    r"youtu\.be/([a-zA-Z0-9_-]{11})",
    r"youtube\.com/(?:embed|shorts|v)/([a-zA-Z0-9_-]{11})",
]
_LIST_PATTERN = r"(?:list=|list%3D)([a-zA-Z0-9_-]{18,34})"

LinkProcessor.sanitize doesn't hand raw pasted text to yt-dlp and hope for the best — it extracts every valid video and playlist ID out of arbitrary text with these patterns, rebuilds clean canonical URLs, and de-duplicates while preserving order, before yt-dlp ever sees a URL. That's what makes the "paste a 200-track playlist link mixed with a couple of standalone youtu.be links and some url-encoded copy-paste from a chat app" case actually work: a v%3D (URL-encoded v=) pattern exists specifically because copy-pasting a link out of some clients percent-encodes the query string, and without that second pattern those links would silently vanish rather than resolve. Trusting yt-dlp's own URL parser to handle "whatever a human pastes" would work for a clean single URL and fail unpredictably for the messy multi-line paste that's the actual common case in a links-textarea UI.

Two-phase playlist resolution: resolve cheap, enrich only what's missing

Rendering diagram…

Resolving a playlist with extract_flat="in_playlist" is one cheap request that returns every entry's title for free — no need to hit each video individually. Standalone URLs pasted directly don't get that for free, so expand_playlists only enriches the entries that still have title == url after the flat pass, and only those get a full per-video metadata fetch, run in parallel over a thread pool sized to min(8, len(todo)). The alternative — resolving every single entry individually regardless of source — would turn pasting one big playlist into hundreds of redundant network requests for data the flat extraction already had. Doing the cheap thing first and only paying for enrichment where it's actually needed keeps "paste a playlist and hit resolve" fast regardless of playlist size.

ThreadPoolExecutor, not asyncio — because yt-dlp's actual concurrency primitive is a thread

Textual itself runs on an asyncio event loop, but the download engine underneath it is built on concurrent.futures.ThreadPoolExecutor in all three places parallelism happens: metadata enrichment in core.py, the headless batch loop in headless.py, and the TUI's _run_batch in tui/app.py. This isn't an accident of not reaching for asyncioYoutubeDL.download() is a blocking, synchronous call with no async variant, so the only way to run several downloads concurrently is to run several blocking calls on separate OS threads. Wrapping a blocking call in asyncio.to_thread at the call site would get you the same result with more ceremony; going straight to ThreadPoolExecutor matches the actual concurrency primitive yt-dlp exposes instead of layering an async abstraction over a library that was never going to honor it. Textual's own @work(thread=True, exclusive=True) decorator is the seam that reconciles the two worlds: it runs the entire batch inside one background thread relative to Textual's event loop, and self.app.call_from_thread(...) is the only path back onto the UI thread — every progress update, every row change, every log line crosses that one bridge.

Config as the single source of defaults, not two configs that can drift

config.py defines one DEFAULTS dict and reads/writes one ~/.config/ytdld/config.toml. The TUI's ConfigScreen.compose() seeds every Select/Switch/Input initial value from self.app.cfg, which is that same loaded config; the headless parser's _build_headless_parser(cfg) sets every argparse flag's default from the identical dict. Hitting Download in the TUI persists the current control values right back to that file, so tomorrow's headless run and tomorrow's TUI session both start from what you actually used last, not from two independently-drifting sets of hardcoded defaults. It's a small piece of plumbing, but it's what makes "the TUI and the flags agree with each other" true by construction instead of by discipline.

Postprocessor pipeline order is not incidental

if opts.sponsorblock in ("mark", "skip"):
    pps.append({"key": "SponsorBlock", ...})
if opts.sponsorblock == "skip":
    pps.append({"key": "ModifyChapters", "remove_sponsor_segments": SPONSOR_CATEGORIES})
if opts.mode == "mp3":
    pps.append({"key": "FFmpegExtractAudio", ...})
...
if opts.embed_metadata:
    pps.append({"key": "FFmpegMetadata"})
if opts.embed_thumbnail:
    pps.append({"key": "EmbedThumbnail"})

yt-dlp runs postprocessors in list order, and that order is a real dependency chain here: SponsorBlock has to fetch chapter data before ModifyChapters can cut segments out of the file, and both have to happen before FFmpegExtractAudio re-encodes the container, because operating on the merged final audio file instead of the source would mean re-deriving segment boundaries against different byte offsets. Metadata and thumbnail embedding go last because they're additive tags on whatever file already exists at that point — order-independent relative to each other, but still after the format conversion that produces the file they're tagging. test_mp4_full_options in the test suite pins keys[0] == "SponsorBlock" specifically to keep this ordering from silently regressing.

Hard problems, solved

yt-dlp writing straight to stdout would corrupt the live dashboard

build_ydl_opts sets quiet, no_warnings, and noprogress unconditionally, with a comment in the source explaining exactly why: "We render our own progress (TUI / headless); silence yt-dlp's so it never writes to stdout and corrupts the Textual display." Without this, yt-dlp's own progress renderer and Textual's alternate-screen render loop would both be writing raw terminal control sequences at once — not a graceful degradation, a genuinely corrupted display. Every byte of progress information ytdld actually needs still gets through, just via a different channel entirely: progress_hooks, which yt-dlp calls as plain Python function calls rather than terminal writes, carrying a status dict (downloading, finished, bytes, speed, ETA) that download_entry's hook() closure turns into a UI-agnostic ProgressEvent dataclass. Silencing the renderer and rerouting the data it was rendering are two different moves, and doing both is what lets ytdld build its own dashboard without losing any information yt-dlp was already computing.

Giving a blocking call a cancel button it doesn't have

YoutubeDL.download() blocks until it finishes, errors, or nothing. There's no cancellation token to pass in. ytdld's cancel mechanism is a threading.Event checked inside the progress hook itself:

def on_event(ev: ProgressEvent):
    if cancel.is_set():
        raise _Cancelled()
    ...

This works because of a specific, verifiable fact about yt-dlp's internals: FileDownloader._hook_progress calls every registered hook with a plain ph(status) — no try/except around that call. An exception raised inside a progress hook is not swallowed; it propagates straight back up through the downloader and through YoutubeDL.download() itself, uncaught, until something outside catches it. download_entry's own try/except Exception as exc is that catch point, turning the raised _Cancelled into an ordinary (False, "...") result the caller already knows how to handle. Pressing q mid-batch sets the event; every worker's next hook callback (typically within a fraction of a second, since yt-dlp calls the hook continuously during a transfer) raises, unwinds yt-dlp's own call stack, and the thread pool's one() function checks cancel.is_set() before reporting anything, so no download that was mid-abort gets misreported as failed. No monkey-patching yt-dlp, no forked dependency, no polling loop watching a flag from the outside — just relying on the one seam yt-dlp actually exposes (the hook) and the one property of that seam (exceptions propagate) that makes cancellation possible at all.

Retry-failed is not a separate code path

def action_retry(self) -> None:
    if not self.done or not self.failed:
        return
    retry_entries = [self.entries[i] for i in self.failed]
    self.app.push_screen(ProgressScreen(retry_entries, self.opts, self.do_notify))

Pressing r after a batch finishes doesn't invoke some dedicated "retry" function — it constructs a brand-new ProgressScreen with the same opts and just the subset of entries whose index landed in self.failed, and pushes it exactly the way the original download flow does. There is no second implementation of "run a batch of downloads and show progress" to keep in sync with the first; retry is a normal download run, scoped to a smaller list. test_retry_failed_spawns_new_screen in the TUI test suite verifies this end to end — three entries where one is engineered to fail, retry pushes a screen with exactly that one entry, and it downloads through the identical path as everything else.

A Select.Changed handler, not a validation error, for the MP4-only Quality field

The Quality control (best/2160p/.../480p) is meaningless for MP3 or FLAC — there's no video stream to cap. Rather than accepting an invalid combination and rejecting it at submit time, ConfigScreen disables the Quality Select outright whenever Mode isn't mp4, wired through a Select.Changed handler on #mode that calls _sync_quality_enabled() on every change and once on mount. It's a small interaction detail, but it's the difference between a form that tells you a combination doesn't apply and one that lets you set it and then explains why it didn't work after the fact — the kind of thing that's easy to skip in a CLI tool with argparse flags (where an irrelevant flag is just silently ignored) but costs nothing to get right once the surface is a real UI with visible controls.

What I'd do differently

Cancellation relying on yt-dlp's hook-exceptions-propagate behavior being unchanged in a future release is a real, acknowledged fragility — it's correct today (verified directly against the installed yt_dlp/downloader/common.py), but it's an implementation detail of a third-party library, not a documented contract. A more defensive version would wrap the download call in a way that doesn't depend on that specific propagation behavior continuing to hold across yt-dlp upgrades — worth revisiting if a future yt-dlp release ever starts catching hook exceptions internally. Windows support is the other open gap: read_clipboard() and notify() both branch on POSIX tools (pbpaste/wl-paste/xclip, osascript/notify-send) with no Windows path, which is an honest reflection of where the tool is actually used today rather than an oversight, but it's the natural next platform if that changes.

Stack at a glance

LanguagePython 3.11+, six focused modules under src/yt_dld/
Download engineyt-dlp, driven through a UI-agnostic core.py
Concurrencyconcurrent.futures.ThreadPoolExecutor, up to 8 workers
TUITextual — three screens (compose / select / live dashboard), bridged via call_from_thread
ConfigTOML at ~/.config/ytdld/config.toml, one DEFAULTS dict for both front ends
FormatsMP3 (VBR ~320 kbps), MP4 (quality-capped), FLAC (lossless) — all with embedded metadata + art
ExtrasSponsorBlock mark/skip, download-archive idempotency, browser-cookie passthrough
Tests16 tests (pytest + pytest-asyncio via Textual's Pilot harness), green on Python 3.11–3.13 in CI
Distributioncurl | bash one-liner, uv tool install, shell completions, doctor/update/config verbs

More deep dives in this series: BroSki, a Rust task runner whose transactional output promotion answers the same kind of question ytdld's cancellation model does — what happens to in-flight work when something stops it partway through — and SuperSay, an on-device TTS app that solves a different flavor of the exact problem here: keeping a UI responsive and honest while a slow, blocking, single-purpose engine does the real work underneath it.