chore(release): bump version to 2.3.0

fix(subs): handle WebVTT cue identifiers and overlapping multi-line cues
Some services use WebVTT files with: - Cue identifiers (Q0, Q1, etc.) before timing lines that pysubs2/pycaption incorrectly parses as subtitle text - Multi-line subtitles split into separate cues with 1ms offset times and different line: positions (e.g., line:77% for top, line:84% for bottom) Added detection and sanitization functions: - has_webvtt_cue_identifiers(): detects cue identifiers before timing - sanitize_webvtt_cue_identifiers(): removes problematic cue identifiers - has_overlapping_webvtt_cues(): detects overlapping cues needing merge - merge_overlapping_webvtt_cues(): merges cues sorted by line position
2026-03-09 16:09:01 +00:00 · 2026-01-18 19:03:14 +00:00 · 2026-01-18 04:44:08 +00:00 · 2026-01-17 13:36:57 +00:00 · 2026-01-16 14:16:47 +00:00 · 2026-01-16 13:43:50 +00:00
11 changed files with 372 additions and 53 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,39 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

+## [2.3.0] - 2026-01-18
+
+### Added
+
+- **Unicode Filenames Option**: New `unicode_filenames` config option to preserve native characters
+  - Allows disabling ASCII transliteration in filenames
+  - Preserves Korean, Japanese, Chinese, and other native language characters
+  - Closes #49
+
+### Fixed
+
+- **WebVTT Cue Handling**: Handle WebVTT cue identifiers and overlapping multi-line cues
+  - Added detection and sanitization for cue identifiers (Q0, Q1, etc.) before timing lines
+  - Added merging of overlapping cues with different line positions into multi-line subtitles
+  - Fixes parsing issues with pysubs2/pycaption on certain WebVTT files
+- **Widevine PSSH Filtering**: Filter Widevine PSSH by system ID instead of sorting
+  - Fixes KeyError crash when unsupported DRM systems are present in init segments
+- **TTML Negative Values**: Handle negative values in multi-value TTML attributes
+  - Fixes pycaption parse errors for attributes like `tts:extent="-5% 7.5%"`
+  - Closes #47
+- **ASS Font Names**: Strip whitespace from ASS font names
+  - Handles ASS subtitle files with spaces after commas in Style definitions
+  - Fixes #57
+- **Shaka-Packager Error Messages**: Include shaka-packager binary path in error messages
+- **N_m3u8DL-RE Merge and Decryption**: Handle merge and decryption properly
+  - Prevents audio corruption ("Box 'OG 2' size is too large") with DASH manifests
+  - Fixes duplicate init segment writing when using N_m3u8DL-RE
+- **DASH Placeholder KIDs**: Handle placeholder KIDs and improve DRM init from segments
+  - Detects and replaces placeholder/test KIDs in Widevine PSSH
+  - Adds CENC namespace support for kid/default_KID attributes
+- **PlayReady PSSH Comparison**: Correct PSSH system ID comparison in PlayReady
+  - Removes erroneous `.bytes` accessor from PSSH.SYSTEM_ID comparisons
+
 ## [2.2.0] - 2026-01-15

 ### Added
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"

 [project]
 name = "unshackle"
-version = "2.2.0"
+version = "2.3.0"
 description = "Modular Movie, TV, and Music Archival Software."
 authors = [{ name = "unshackle team" }]
 requires-python = ">=3.10,<3.13"
--- a/unshackle/commands/dl.py
+++ b/unshackle/commands/dl.py
@@ -1567,7 +1567,7 @@ class dl:
                        if subtitle.codec == Subtitle.Codec.SubStationAlphav4:
                            for line in subtitle.path.read_text("utf8").splitlines():
                                if line.startswith("Style: "):
-                                    font_names.append(line.removesuffix("Style: ").split(",")[1])
+                                    font_names.append(line.removeprefix("Style: ").split(",")[1].strip())

                    font_count, missing_fonts = self.attach_subtitle_fonts(
                        font_names, title, temp_font_files
--- a/unshackle/core/init.py
+++ b/unshackle/core/init.py
@@ -1 +1 @@
-__version__ = "2.2.0"
+__version__ = "2.3.0"
--- a/unshackle/core/config.py
+++ b/unshackle/core/config.py
@@ -95,6 +95,7 @@ class Config:
        self.update_check_interval: int = kwargs.get("update_check_interval", 24)
        self.scene_naming: bool = kwargs.get("scene_naming", True)
        self.series_year: bool = kwargs.get("series_year", True)
+        self.unicode_filenames: bool = kwargs.get("unicode_filenames", False)

        self.title_cache_time: int = kwargs.get("title_cache_time", 1800)  # 30 minutes default
        self.title_cache_max_retention: int = kwargs.get("title_cache_max_retention", 86400)  # 24 hours default
--- a/unshackle/core/drm/playready.py
+++ b/unshackle/core/drm/playready.py
@@ -168,7 +168,7 @@ class PlayReady:
            pssh_boxes.extend(list(get_boxes(init_data, b"pssh")))
            tenc_boxes.extend(list(get_boxes(init_data, b"tenc")))

-        pssh = next((b for b in pssh_boxes if b.system_ID == PSSH.SYSTEM_ID.bytes), None)
+        pssh = next((b for b in pssh_boxes if b.system_ID == PSSH.SYSTEM_ID), None)
        if not pssh:
            raise PlayReady.Exceptions.PSSHNotFound("PSSH was not found in track data.")

@@ -197,7 +197,7 @@ class PlayReady:
                if enc_key_id:
                    kid = UUID(bytes=base64.b64decode(enc_key_id))

-        pssh = next((b for b in pssh_boxes if b.system_ID == PSSH.SYSTEM_ID.bytes), None)
+        pssh = next((b for b in pssh_boxes if b.system_ID == PSSH.SYSTEM_ID), None)
        if not pssh:
            raise PlayReady.Exceptions.PSSHNotFound("PSSH was not found in track data.")

@@ -415,7 +415,7 @@ class PlayReady:
            p.wait()

            if p.returncode != 0 or had_error:
-                raise subprocess.CalledProcessError(p.returncode, arguments)
+                raise subprocess.CalledProcessError(p.returncode, [binaries.ShakaPackager, *arguments])

            path.unlink()
            if not stream_skipped:
--- a/unshackle/core/drm/widevine.py
+++ b/unshackle/core/drm/widevine.py
@@ -100,9 +100,7 @@ class Widevine:
            pssh_boxes.extend(list(get_boxes(init_data, b"pssh")))
            tenc_boxes.extend(list(get_boxes(init_data, b"tenc")))

-        pssh_boxes.sort(key=lambda b: {PSSH.SystemId.Widevine: 0, PSSH.SystemId.PlayReady: 1}[b.system_ID])
-
-        pssh = next(iter(pssh_boxes), None)
+        pssh = next((b for b in pssh_boxes if b.system_ID == PSSH.SystemId.Widevine), None)
        if not pssh:
            raise Widevine.Exceptions.PSSHNotFound("PSSH was not found in track data.")

@@ -141,9 +139,7 @@ class Widevine:
                if enc_key_id:
                    kid = UUID(bytes=base64.b64decode(enc_key_id))

-        pssh_boxes.sort(key=lambda b: {PSSH.SystemId.Widevine: 0, PSSH.SystemId.PlayReady: 1}[b.system_ID])
-
-        pssh = next(iter(pssh_boxes), None)
+        pssh = next((b for b in pssh_boxes if b.system_ID == PSSH.SystemId.Widevine), None)
        if not pssh:
            raise Widevine.Exceptions.PSSHNotFound("PSSH was not found in track data.")

@@ -371,7 +367,7 @@ class Widevine:
            p.wait()

            if p.returncode != 0 or had_error:
-                raise subprocess.CalledProcessError(p.returncode, arguments)
+                raise subprocess.CalledProcessError(p.returncode, [binaries.ShakaPackager, *arguments])

            path.unlink()
            if not stream_skipped:
--- a/unshackle/core/manifests/dash.py
+++ b/unshackle/core/manifests/dash.py
@@ -5,6 +5,7 @@ import html
 import logging
 import math
 import re
+import shutil
 import sys
 from copy import copy
 from functools import partial
@@ -466,7 +467,7 @@ class DASH:
        track.data["dash"]["timescale"] = int(segment_timescale)
        track.data["dash"]["segment_durations"] = segment_durations

-        if not track.drm and isinstance(track, (Video, Audio)):
+        if init_data and isinstance(track, (Video, Audio)):
            if isinstance(cdm, PlayReadyCdm):
                try:
                    track.drm = [PlayReady.from_init_data(init_data)]
@@ -527,8 +528,16 @@ class DASH:
            max_workers=max_workers,
        )

+        skip_merge = False
        if downloader.__name__ == "n_m3u8dl_re":
-            downloader_args.update({"filename": track.id, "track": track})
+            skip_merge = True
+            downloader_args.update(
+                {
+                    "filename": track.id,
+                    "track": track,
+                    "content_keys": drm.content_keys if drm else None,
+                }
+            )

        debug_logger = get_debug_logger()
        if debug_logger:
@@ -543,6 +552,7 @@ class DASH:
                    "downloader": downloader.__name__,
                    "has_drm": bool(track.drm),
                    "drm_types": [drm.__class__.__name__ for drm in (track.drm or [])],
+                    "skip_merge": skip_merge,
                    "save_path": str(save_path),
                    "has_init_data": bool(init_data),
                },
@@ -563,42 +573,56 @@ class DASH:
            control_file.unlink()

        segments_to_merge = [x for x in sorted(save_dir.iterdir()) if x.is_file()]
-        with open(save_path, "wb") as f:
-            if init_data:
-                f.write(init_data)
-            if len(segments_to_merge) > 1:
-                progress(downloaded="Merging", completed=0, total=len(segments_to_merge))
-            for segment_file in segments_to_merge:
-                segment_data = segment_file.read_bytes()
-                # TODO: fix encoding after decryption?
-                if (
-                    not drm
-                    and isinstance(track, Subtitle)
-                    and track.codec not in (Subtitle.Codec.fVTT, Subtitle.Codec.fTTML)
-                ):
-                    segment_data = try_ensure_utf8(segment_data)
-                    segment_data = (
-                        segment_data.decode("utf8")
-                        .replace("&lrm;", html.unescape("&lrm;"))
-                        .replace("&rlm;", html.unescape("&rlm;"))
-                        .encode("utf8")
-                    )
-                f.write(segment_data)
-                f.flush()
-                segment_file.unlink()
-                progress(advance=1)
+
+        if skip_merge:
+            # N_m3u8DL-RE handles merging and decryption internally
+            shutil.move(segments_to_merge[0], save_path)
+            if drm:
+                track.drm = None
+                events.emit(events.Types.TRACK_DECRYPTED, track=track, drm=drm, segment=None)
+        else:
+            with open(save_path, "wb") as f:
+                if init_data:
+                    f.write(init_data)
+                if len(segments_to_merge) > 1:
+                    progress(downloaded="Merging", completed=0, total=len(segments_to_merge))
+                for segment_file in segments_to_merge:
+                    segment_data = segment_file.read_bytes()
+                    # TODO: fix encoding after decryption?
+                    if (
+                        not drm
+                        and isinstance(track, Subtitle)
+                        and track.codec not in (Subtitle.Codec.fVTT, Subtitle.Codec.fTTML)
+                    ):
+                        segment_data = try_ensure_utf8(segment_data)
+                        segment_data = (
+                            segment_data.decode("utf8")
+                            .replace("&lrm;", html.unescape("&lrm;"))
+                            .replace("&rlm;", html.unescape("&rlm;"))
+                            .encode("utf8")
+                        )
+                    f.write(segment_data)
+                    f.flush()
+                    segment_file.unlink()
+                    progress(advance=1)

        track.path = save_path
        events.emit(events.Types.TRACK_DOWNLOADED, track=track)

-        if drm:
+        if not skip_merge and drm:
            progress(downloaded="Decrypting", completed=0, total=100)
            drm.decrypt(save_path)
            track.drm = None
            events.emit(events.Types.TRACK_DECRYPTED, track=track, drm=drm, segment=None)
            progress(downloaded="Decrypting", advance=100)

-        save_dir.rmdir()
+        # Clean up empty segment directory
+        if save_dir.exists() and save_dir.name.endswith("_segments"):
+            try:
+                save_dir.rmdir()
+            except OSError:
+                # Directory might not be empty, try removing recursively
+                shutil.rmtree(save_dir, ignore_errors=True)

        progress(downloaded="Downloaded")

@@ -766,6 +790,11 @@ class DASH:
    @staticmethod
    def get_drm(protections: list[Element]) -> list[DRM_T]:
        drm: list[DRM_T] = []
+        PLACEHOLDER_KIDS = {
+            UUID("00000000-0000-0000-0000-000000000000"),  # All zeros (key rotation default)
+            UUID("00010203-0405-0607-0809-0a0b0c0d0e0f"),  # Sequential 0x00-0x0f
+            UUID("00010203-0405-0607-0809-101112131415"),  # Shaka Packager test pattern
+        }

        for protection in protections:
            urn = (protection.get("schemeIdUri") or "").lower()
@@ -775,17 +804,27 @@ class DASH:
                if not pssh_text:
                    continue
                pssh = PSSH(pssh_text)
+                kid_attr = protection.get("kid") or protection.get("{urn:mpeg:cenc:2013}kid")
+                kid = UUID(bytes=base64.b64decode(kid_attr)) if kid_attr else None

-                kid = protection.get("kid")
-                if kid:
-                    kid = UUID(bytes=base64.b64decode(kid))
+                if not kid:
+                    default_kid_attr = protection.get("default_KID") or protection.get(
+                        "{urn:mpeg:cenc:2013}default_KID"
+                    )
+                    kid = UUID(default_kid_attr) if default_kid_attr else None

-                default_kid = protection.get("default_KID")
-                if default_kid:
-                    kid = UUID(default_kid)
+                if not kid:
+                    kid = next(
+                        (
+                            UUID(p.get("default_KID") or p.get("{urn:mpeg:cenc:2013}default_KID"))
+                            for p in protections
+                            if p.get("default_KID") or p.get("{urn:mpeg:cenc:2013}default_KID")
+                        ),
+                        None,
+                    )

-                if not pssh.key_ids and not kid:
-                    kid = next((UUID(p.get("default_KID")) for p in protections if p.get("default_KID")), None)
+                if kid and (not pssh.key_ids or all(k.int == 0 or k in PLACEHOLDER_KIDS for k in pssh.key_ids)):
+                    pssh.set_key_ids([kid])

                drm.append(Widevine(pssh=pssh, kid=kid))

--- a/unshackle/core/tracks/subtitle.py
+++ b/unshackle/core/tracks/subtitle.py
@@ -91,6 +91,12 @@ class Subtitle(Track):
                return Subtitle.Codec.TimedTextMarkupLang
            raise ValueError(f"The Content Profile '{profile}' is not a supported Subtitle Codec")

+    # WebVTT sanitization patterns (compiled once for performance)
+    _CUE_ID_PATTERN = re.compile(r"^[A-Za-z]+\d+$")
+    _TIMING_START_PATTERN = re.compile(r"^\d+:\d+[:\.]")
+    _TIMING_LINE_PATTERN = re.compile(r"^((?:\d+:)?\d+:\d+[.,]\d+)\s*-->\s*((?:\d+:)?\d+:\d+[.,]\d+)(.*)$")
+    _LINE_POS_PATTERN = re.compile(r"line:(\d+(?:\.\d+)?%?)")
+
    def __init__(
        self,
        *args: Any,
@@ -239,6 +245,11 @@ class Subtitle(Track):

            # Sanitize WebVTT timestamps before parsing
            text = Subtitle.sanitize_webvtt_timestamps(text)
+            # Remove cue identifiers that confuse parsers like pysubs2
+            text = Subtitle.sanitize_webvtt_cue_identifiers(text)
+            # Merge overlapping cues with line positioning into single multi-line cues
+            text = Subtitle.merge_overlapping_webvtt_cues(text)
+
            preserve_formatting = config.subtitle.get("preserve_formatting", True)

            if preserve_formatting:
@@ -277,6 +288,240 @@ class Subtitle(Track):
        # Replace negative timestamps with 00:00:00.000
        return re.sub(r"(-\d+:\d+:\d+\.\d+)", "00:00:00.000", text)

+    @staticmethod
+    def has_webvtt_cue_identifiers(text: str) -> bool:
+        """
+        Check if WebVTT content has cue identifiers that need removal.
+
+        Parameters:
+            text: The WebVTT content as string
+
+        Returns:
+            True if cue identifiers are detected, False otherwise
+        """
+        lines = text.split("\n")
+
+        for i, line in enumerate(lines):
+            line = line.strip()
+            if Subtitle._CUE_ID_PATTERN.match(line):
+                # Look ahead to see if next non-empty line is a timing line
+                j = i + 1
+                while j < len(lines) and not lines[j].strip():
+                    j += 1
+                if j < len(lines) and ("-->" in lines[j] or Subtitle._TIMING_START_PATTERN.match(lines[j].strip())):
+                    return True
+        return False
+
+    @staticmethod
+    def sanitize_webvtt_cue_identifiers(text: str) -> str:
+        """
+        Remove WebVTT cue identifiers that can confuse subtitle parsers.
+
+        Some services use cue identifiers like "Q0", "Q1", etc.
+        that appear on their own line before the timing line. These can be
+        incorrectly parsed as part of the previous cue's text content by
+        some parsers (like pysubs2).
+
+        Parameters:
+            text: The WebVTT content as string
+
+        Returns:
+            Sanitized WebVTT content with cue identifiers removed
+        """
+        if not Subtitle.has_webvtt_cue_identifiers(text):
+            return text
+
+        lines = text.split("\n")
+        sanitized_lines = []
+
+        i = 0
+        while i < len(lines):
+            line = lines[i].strip()
+
+            # Check if this line is a cue identifier followed by a timing line
+            if Subtitle._CUE_ID_PATTERN.match(line):
+                # Look ahead to see if next non-empty line is a timing line
+                j = i + 1
+                while j < len(lines) and not lines[j].strip():
+                    j += 1
+                if j < len(lines) and ("-->" in lines[j] or Subtitle._TIMING_START_PATTERN.match(lines[j].strip())):
+                    # This is a cue identifier, skip it
+                    i += 1
+                    continue
+
+            sanitized_lines.append(lines[i])
+            i += 1
+
+        return "\n".join(sanitized_lines)
+
+    @staticmethod
+    def _parse_vtt_time(t: str) -> int:
+        """Parse WebVTT timestamp to milliseconds. Returns 0 for malformed input."""
+        try:
+            t = t.replace(",", ".")
+            parts = t.split(":")
+            if len(parts) == 2:
+                m, s = parts
+                h = "0"
+            elif len(parts) >= 3:
+                h, m, s = parts[:3]
+            else:
+                return 0
+            sec_parts = s.split(".")
+            secs = int(sec_parts[0])
+            # Handle variable millisecond digits (e.g., .5 = 500ms, .50 = 500ms, .500 = 500ms)
+            ms = int(sec_parts[1].ljust(3, "0")[:3]) if len(sec_parts) > 1 else 0
+            return int(h) * 3600000 + int(m) * 60000 + secs * 1000 + ms
+        except (ValueError, IndexError):
+            return 0
+
+    @staticmethod
+    def has_overlapping_webvtt_cues(text: str) -> bool:
+        """
+        Check if WebVTT content has overlapping cues that need merging.
+
+        Detects cues with start times within 50ms of each other and the same end time,
+        which indicates multi-line subtitles split into separate cues.
+
+        Parameters:
+            text: The WebVTT content as string
+
+        Returns:
+            True if overlapping cues are detected, False otherwise
+        """
+        timings = []
+        for line in text.split("\n"):
+            match = Subtitle._TIMING_LINE_PATTERN.match(line)
+            if match:
+                start_str, end_str = match.group(1), match.group(2)
+                timings.append((Subtitle._parse_vtt_time(start_str), Subtitle._parse_vtt_time(end_str)))
+
+        # Check for overlapping cues (within 50ms start, same end)
+        for i in range(len(timings) - 1):
+            curr_start, curr_end = timings[i]
+            next_start, next_end = timings[i + 1]
+            if abs(curr_start - next_start) <= 50 and curr_end == next_end:
+                return True
+
+        return False
+
+    @staticmethod
+    def merge_overlapping_webvtt_cues(text: str) -> str:
+        """
+        Merge WebVTT cues that have overlapping/near-identical times but different line positions.
+
+        Some services use separate cues for each line of a multi-line subtitle, with
+        slightly different start times (1ms apart) and different line: positions.
+        This merges them into single cues with proper line ordering based on the
+        line: position (lower percentage = higher on screen = first line).
+
+        Parameters:
+            text: The WebVTT content as string
+
+        Returns:
+            WebVTT content with overlapping cues merged
+        """
+        if not Subtitle.has_overlapping_webvtt_cues(text):
+            return text
+
+        lines = text.split("\n")
+        cues = []
+        header_lines = []
+        in_header = True
+        i = 0
+
+        while i < len(lines):
+            line = lines[i]
+
+            if in_header:
+                if "-->" in line:
+                    in_header = False
+                else:
+                    header_lines.append(line)
+                    i += 1
+                    continue
+
+            match = Subtitle._TIMING_LINE_PATTERN.match(line)
+            if match:
+                start_str, end_str, settings = match.groups()
+                line_pos = 100.0  # Default to bottom
+                line_match = Subtitle._LINE_POS_PATTERN.search(settings)
+                if line_match:
+                    pos_str = line_match.group(1).rstrip("%")
+                    line_pos = float(pos_str)
+
+                content_lines = []
+                i += 1
+                while i < len(lines) and lines[i].strip() and "-->" not in lines[i]:
+                    content_lines.append(lines[i])
+                    i += 1
+
+                cues.append(
+                    {
+                        "start_ms": Subtitle._parse_vtt_time(start_str),
+                        "end_ms": Subtitle._parse_vtt_time(end_str),
+                        "start_str": start_str,
+                        "end_str": end_str,
+                        "line_pos": line_pos,
+                        "content": "\n".join(content_lines),
+                        "settings": settings,
+                    }
+                )
+            else:
+                i += 1
+
+        # Merge overlapping cues (within 50ms of each other with same end time)
+        merged_cues = []
+        i = 0
+        while i < len(cues):
+            current = cues[i]
+            group = [current]
+
+            j = i + 1
+            while j < len(cues):
+                other = cues[j]
+                if abs(current["start_ms"] - other["start_ms"]) <= 50 and current["end_ms"] == other["end_ms"]:
+                    group.append(other)
+                    j += 1
+                else:
+                    break
+
+            if len(group) > 1:
+                # Sort by line position (lower % = higher on screen = first)
+                group.sort(key=lambda x: x["line_pos"])
+                # Use the earliest start time from the group
+                earliest = min(group, key=lambda x: x["start_ms"])
+                merged_cues.append(
+                    {
+                        "start_str": earliest["start_str"],
+                        "end_str": group[0]["end_str"],
+                        "content": "\n".join(c["content"] for c in group),
+                        "settings": "",
+                    }
+                )
+            else:
+                merged_cues.append(
+                    {
+                        "start_str": current["start_str"],
+                        "end_str": current["end_str"],
+                        "content": current["content"],
+                        "settings": current["settings"],
+                    }
+                )
+
+            i = j if len(group) > 1 else i + 1
+
+        result_lines = header_lines[:]
+        if result_lines and result_lines[-1].strip():
+            result_lines.append("")
+
+        for cue in merged_cues:
+            result_lines.append(f"{cue['start_str']} --> {cue['end_str']}{cue['settings']}")
+            result_lines.append(cue["content"])
+            result_lines.append("")
+
+        return "\n".join(result_lines)
+
    @staticmethod
    def sanitize_webvtt(text: str) -> str:
        """
@@ -631,7 +876,7 @@ class Subtitle(Track):
                text = try_ensure_utf8(data).decode("utf8")
                text = text.replace("tt:", "")
                # negative size values aren't allowed in TTML/DFXP spec, replace with 0
-                text = re.sub(r'"(-\d+(\.\d+)?(px|em|%|c|pt))"', '"0"', text)
+                text = re.sub(r"-(\d+(?:\.\d+)?)(px|em|%|c|pt)", r"0\2", text)
                caption_set = pycaption.DFXPReader().read(text)
            elif codec == Subtitle.Codec.fVTT:
                caption_lists: dict[str, pycaption.CaptionList] = defaultdict(pycaption.CaptionList)
--- a/unshackle/core/utilities.py
+++ b/unshackle/core/utilities.py
@@ -120,9 +120,14 @@ def sanitize_filename(filename: str, spacer: str = ".") -> str:

    The spacer is safer to be a '.' for older DDL and p2p sharing spaces.
    This includes web-served content via direct links and such.
+
+    Set `unicode_filenames: true` in config to preserve native language
+    characters (Korean, Japanese, Chinese, etc.) instead of transliterating
+    them to ASCII equivalents.
    """
-    # replace all non-ASCII characters with ASCII equivalents
-    filename = unidecode(filename)
+    # optionally replace non-ASCII characters with ASCII equivalents
+    if not config.unicode_filenames:
+        filename = unidecode(filename)

    # remove or replace further characters as needed
    filename = "".join(c for c in filename if unicodedata.category(c) != "Mn")  # hidden characters
--- a/uv.lock
+++ b/uv.lock
@@ -1565,7 +1565,7 @@ wheels = [

 [[package]]
 name = "unshackle"
-version = "2.2.0"
+version = "2.3.0"
 source = { editable = "." }
 dependencies = [
    { name = "aiohttp-swagger3" },
Author	SHA1	Message	Date
Andy	abd8fc2eb9	chore(release): bump version to 2.3.0	2026-01-18 19:03:14 +00:00
Andy	e99cfddaec	fix(subs): handle WebVTT cue identifiers and overlapping multi-line cues Some services use WebVTT files with: - Cue identifiers (Q0, Q1, etc.) before timing lines that pysubs2/pycaption incorrectly parses as subtitle text - Multi-line subtitles split into separate cues with 1ms offset times and different line: positions (e.g., line:77% for top, line:84% for bottom) Added detection and sanitization functions: - has_webvtt_cue_identifiers(): detects cue identifiers before timing - sanitize_webvtt_cue_identifiers(): removes problematic cue identifiers - has_overlapping_webvtt_cues(): detects overlapping cues needing merge - merge_overlapping_webvtt_cues(): merges cues sorted by line position	2026-01-18 04:44:08 +00:00
Andy	4e11f69a58	fix(drm): filter Widevine PSSH by system ID instead of sorting The previous sorting approach crashed with KeyError when unsupported DRM systems were present in the init segment. Now uses direct filtering	2026-01-17 13:36:57 +00:00
Andy	aec3333888	fix(subs): handle negative TTML values in multi-value attributes The previous regex only matched negative size values when they were the entire quoted attribute (e.g., "-5%"). This failed for multi-value attributes like tts:extent="-5% 7.5%" causing pycaption parse errors. The new pattern matches negative values anywhere in the text and preserves the unit during replacement. Closes #47	2026-01-16 14:16:47 +00:00
Andy	68ad76cbb0	feat(config): add unicode_filenames option to preserve native characters Add config option to disable ASCII transliteration in filenames, allowing preservation of Korean, Japanese, Chinese, and other native language characters instead of converting them via unidecode. Closes #49	2026-01-16 13:43:50 +00:00
Andy	18b0534020	fix(subs): strip whitespace from ASS font names Use removeprefix instead of removesuffix and add strip() to handle ASS subtitle files that have spaces after commas in Style definitions. Fixes #57	2026-01-16 13:42:11 +00:00
Andy	d0cefa9d58	fix(drm): include shaka-packager binary in error messages	2026-01-16 13:26:15 +00:00
Andy	a01f335cfc	fix(dash): handle N_m3u8DL-RE merge and decryption - Add skip_merge flag for N_m3u8DL-RE to prevent duplicate init data - Pass content_keys to N_m3u8DL-RE for internal decryption handling - Use shutil.move() instead of manual merge when skip_merge is True - Skip manual decryption when N_m3u8DL-RE handles it internally Fixes audio corruption ("Box 'OG 2' size is too large") when using N_m3u8DL-RE with DASH manifests that have SegmentBase init data. The init segment was being written twice: once by N_m3u8DL-RE during its internal merge, and again by dash.py during post-processing.	2026-01-16 13:25:34 +00:00
Andy	b01fc3c8d1	fix(dash): handle placeholder KIDs and improve DRM init from segments - Add CENC namespace support for kid/default_KID attributes - Detect and replace placeholder/test KIDs in Widevine PSSH: - All zeros (key rotation default) - Sequential 0x00-0x0f pattern - Shaka Packager test pattern - Change DRM init condition from `not track.drm` to `init_data` to ensure DRM is always re-initialized from init segments Fixes issue where Widevine PSSH contains placeholder KIDs while the real KID is only in ContentProtection default_KID attributes.	2026-01-15 12:50:22 +00:00
Andy	44acfbdc89	fix(drm): correct PSSH system ID comparison in PlayReady Remove erroneous `.bytes` accessor from PSSH.SYSTEM_ID comparisons in from_track() and from_init_data() methods. The pyplayready PSSH.SYSTEM_ID is already the correct type for comparison with parsed PSSH box system_ID values.	2026-01-15 12:48:18 +00:00