Skip to content

Commit

Permalink
[FEATURE] Ability to migrate download archive files (#751)
Browse files Browse the repository at this point in the history
With the recent push to 'beautify subscriptions' (see https://github.com/jmbannon/ytdl-sub/releases/tag/2023.10.02), we need the ability to change subscription names from their legacy form:
```
rick_a:
  preset:
    - "tv_show"
  overrides:
    tv_show_name: "Rick A"
    url: "https://www.youtube.com/channel/UCuAXFkgsw1L7xaCfnd5JJOw"
```
into:
```
tv_show:
  "Rick A": "https://www.youtube.com/channel/UCuAXFkgsw1L7xaCfnd5JJOw"
```

This however has implications from ytdl-sub's legacy download archive naming. By default, we write archives to `.ytdl-sub-{subscription_name}-download-archive.json`. If we change the subscription name, the archive will not be found, causing a complete redownload. This new feature gives us the ability to migrate download archives to a new naming schema.

# Migrating to Beautified Subscriptions
## Step 0
BACK UP ALL CONFIG + SUBSCRIPTION FILES!!!!! If something goes wrong, restore your backup and try again and/or ask for help.

## Step 1
Since we know we'll be changing our `subscripion_name` to the value of `tv_show_name`, we can use that in our newly migrated download archive name by setting this within the `tv_show` preset (or whatever your 'base' preset is).
```
presets:
  tv_show:
    output_options:
      migrated_download_archive_name: ".{tv_show_name_sanitized}-download-archive.json" 
```

## Step 2
Perform a download as usual, via `ytdl-sub sub ...`. This will load the old archive, and save it into the new archive. You should see `MIGRATION DETECTED` within the logs. Ensure it completes successfully.

Perform another download invocation and ensure you see the `MIGRATION SUCCESSFUL` within the logs.

## Step 3
Now we can set:
```
presets:
  tv_show:
    output_options:
      # rename migrated_download_archive_name to just download_archive_name
      download_archive_name: ".{tv_show_name_sanitized}-download-archive.json" 

  overrides:
    tv_show_name: "{subscription_name}"
    url: "{subscription_value}"
```
Our download archives now default to our new format, and we set `tv_show_name` + `url` to use subscription values by default.

## Step 4
We can now beautify our subscription.yaml file to:
```
tv_show:
  "Rick A": "https://www.youtube.com/channel/UCuAXFkgsw1L7xaCfnd5JJOw"
```
  • Loading branch information
jmbannon authored Oct 3, 2023
1 parent ae3739c commit 309f259
Show file tree
Hide file tree
Showing 128 changed files with 1,257 additions and 330 deletions.
16 changes: 16 additions & 0 deletions src/ytdl_sub/config/preset_options.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@ class OutputOptions(StrictDictValidator):
thumbnail_name: "{title_sanitized}.{thumbnail_ext}"
info_json_name: "{title_sanitized}.{info_json_ext}"
download_archive_name: ".ytdl-sub-{subscription_name}-download-archive.json"
migrated_download_archive_name: ".ytdl-sub-{subscription_name_sanitized}-download-archive.json"
maintain_download_archive: True
keep_files_before: now
keep_files_after: 19000101
Expand All @@ -256,6 +257,7 @@ class OutputOptions(StrictDictValidator):
"thumbnail_name",
"info_json_name",
"download_archive_name",
"migrated_download_archive_name",
"maintain_download_archive",
"keep_files_before",
"keep_files_after",
Expand Down Expand Up @@ -298,6 +300,10 @@ def __init__(self, name, value):
validator=OverridesStringFormatterValidator,
default=DEFAULT_DOWNLOAD_ARCHIVE_NAME,
)
self._migrated_download_archive_name = self._validate_key_if_present(
key="migrated_download_archive_name",
validator=OverridesStringFormatterValidator,
)

self._maintain_download_archive = self._validate_key_if_present(
key="maintain_download_archive", validator=BoolValidator, default=False
Expand Down Expand Up @@ -358,6 +364,16 @@ def download_archive_name(self) -> Optional[OverridesStringFormatterValidator]:
"""
return self._download_archive_name

@property
def migrated_download_archive_name(self) -> Optional[OverridesStringFormatterValidator]:
"""
Optional. Intended to be used if you are migrating a subscription with either a new
subscription name or output directory. It will try to load the archive file using this name
first, and fallback to ``download_archive_name``. It will always save to this file
and remove the original ``download_archive_name``.
"""
return self._migrated_download_archive_name

@property
def maintain_download_archive(self) -> bool:
"""
Expand Down
10 changes: 10 additions & 0 deletions src/ytdl_sub/subscriptions/base_subscription.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from abc import ABC
from pathlib import Path
from typing import Optional

from ytdl_sub.config.config_validator import ConfigOptions
from ytdl_sub.config.preset import Preset
Expand All @@ -8,8 +9,11 @@
from ytdl_sub.config.preset_options import Overrides
from ytdl_sub.config.preset_options import YTDLOptions
from ytdl_sub.downloaders.url.validators import MultiUrlValidator
from ytdl_sub.utils.logger import Logger
from ytdl_sub.ytdl_additions.enhanced_download_archive import EnhancedDownloadArchive

logger = Logger.get("subscription")


class BaseSubscription(ABC):
"""
Expand Down Expand Up @@ -43,10 +47,16 @@ def __init__(
self._config_options = config_options
self._preset_options = preset_options

migrated_file_name: Optional[str] = None
if migrated_file_name_option := self.output_options.migrated_download_archive_name:
migrated_file_name = self.overrides.apply_formatter(migrated_file_name_option)

# TODO: Do not include this as part of the subscription
self._enhanced_download_archive = EnhancedDownloadArchive(
file_name=self.overrides.apply_formatter(self.output_options.download_archive_name),
working_directory=self.working_directory,
output_directory=self.output_directory,
migrated_file_name=migrated_file_name,
)

@property
Expand Down
55 changes: 47 additions & 8 deletions src/ytdl_sub/ytdl_additions/enhanced_download_archive.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@
from ytdl_sub.utils.file_handler import FileHandler
from ytdl_sub.utils.file_handler import FileHandlerTransactionLog
from ytdl_sub.utils.file_handler import FileMetadata
from ytdl_sub.utils.logger import Logger

logger = Logger.get("archive")


@dataclass
Expand Down Expand Up @@ -358,11 +361,25 @@ class EnhancedDownloadArchive:
"""

@classmethod
def _maybe_load_download_mappings(cls, mapping_file_path: str) -> DownloadMappings:
def _maybe_load_download_mappings(
cls, mapping_file_path: str, migrated_mapping_file_path: Optional[str]
) -> DownloadMappings:
"""
Tries to load download mappings if a file exists. Otherwise returns empty mappings.
"""
# If a mapping file exists in the output directory, load it up.
if migrated_mapping_file_path is not None:
if os.path.isfile(migrated_mapping_file_path):
logger.warning(
"MIGRATION SUCCESSFUL, loading migrated archive file. Can now set "
"`output_options.migrated_download_archive` to "
"`output_options.download_archive`"
)
return DownloadMappings.from_file(migrated_mapping_file_path)

logger.warning(
"MIGRATION DETECTED, will write archive file to %s", migrated_mapping_file_path
)

if os.path.isfile(mapping_file_path):
return DownloadMappings.from_file(json_file_path=mapping_file_path)
return DownloadMappings()
Expand All @@ -373,14 +390,14 @@ def __init__(
working_directory: str,
output_directory: str,
dry_run: bool = False,
migrated_file_name: Optional[str] = None,
):
self._file_name = file_name
self._file_handler = FileHandler(
working_directory=working_directory, output_directory=output_directory, dry_run=dry_run
)
self._download_mapping = self._maybe_load_download_mappings(
mapping_file_path=self.output_file_path
)
self._download_mapping = DownloadMappings() # gets reinitialized
self._migrated_file_name = migrated_file_name

self.num_entries_added: int = 0
self.num_entries_modified: int = 0
Expand Down Expand Up @@ -415,7 +432,8 @@ def reinitialize(self, dry_run: bool) -> "EnhancedDownloadArchive":
dry_run=dry_run,
)
self._download_mapping = self._maybe_load_download_mappings(
mapping_file_path=self.output_file_path
mapping_file_path=self._output_file_path,
migrated_mapping_file_path=self._migrated_file_path,
)
return self

Expand Down Expand Up @@ -456,14 +474,25 @@ def file_name(self) -> str:
return self._file_name

@property
def output_file_path(self) -> str:
def _output_file_path(self) -> str:
"""
Returns
-------
The download mapping's file path in the output directory.
"""
return str(Path(self.output_directory) / self.file_name)

@property
def _migrated_file_path(self) -> Optional[str]:
"""
Returns
-------
The migrated download mapping's file path in the output directory.
"""
if self._migrated_file_name:
return str(Path(self.output_directory) / self._migrated_file_name)
return None

@property
def working_file_path(self) -> str:
"""
Expand Down Expand Up @@ -543,7 +572,17 @@ def save_download_mappings(self) -> "EnhancedDownloadArchive":
-------
self
"""
if not self.get_file_handler_transaction_log().is_empty:
# If a migrated file name is present, always save to that file
if self._migrated_file_name:
self._download_mapping.to_file(output_json_file=self.working_file_path)
self.save_file_to_output_directory(
file_name=self.file_name, output_file_name=self._migrated_file_name
)
# and delete the old one if the name differs
if self._file_name != self._migrated_file_name:
self.delete_file_from_output_directory(file_name=self.file_name)
# Otherwise, only save if there are changes to the transaction log
elif not self.get_file_handler_transaction_log().is_empty:
self._download_mapping.to_file(output_json_file=self.working_file_path)
self.save_file_to_output_directory(file_name=self.file_name)
return self
Expand Down
66 changes: 60 additions & 6 deletions tests/e2e/youtube/test_playlist.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
from pathlib import Path
from typing import Dict

import pytest
from conftest import assert_logs
from e2e.conftest import mock_run_from_cli
from expected_download import assert_expected_downloads
from expected_transaction_log import assert_transaction_log_matches
from mergedeep import mergedeep

from ytdl_sub.config.config_file import ConfigFile
from ytdl_sub.downloaders.ytdlp import YTDLP
from ytdl_sub.subscriptions.subscription import Subscription

Expand Down Expand Up @@ -51,6 +56,50 @@ class TestPlaylist:
files exist and have the expected md5 file hashes.
"""

@classmethod
def _ensure_subscription_migrates(
cls,
config: ConfigFile,
subscription_name: str,
subscription_dict: Dict,
output_directory: Path,
):
# Ensure download archive migrates
mergedeep.merge(
subscription_dict,
{
"output_options": {
"migrated_download_archive_name": ".ytdl-sub-{tv_show_name_sanitized}-download-archive.json"
}
},
)
migrated_subscription = Subscription.from_dict(
config=config,
preset_name=subscription_name,
preset_dict=subscription_dict,
)
transaction_log = migrated_subscription.download()

assert_transaction_log_matches(
output_directory=output_directory,
transaction_log=transaction_log,
transaction_log_summary_file_name="youtube/test_playlist_archive_migrated.txt",
)
assert_expected_downloads(
output_directory=output_directory,
dry_run=False,
expected_download_summary_file_name="youtube/test_playlist_archive_migrated.json",
)

# Ensure no changes after migration
transaction_log = migrated_subscription.download()
assert transaction_log.is_empty
assert_expected_downloads(
output_directory=output_directory,
dry_run=False,
expected_download_summary_file_name="youtube/test_playlist_archive_migrated.json",
)

@pytest.mark.parametrize("dry_run", [True, False])
def test_playlist_download(
self,
Expand Down Expand Up @@ -84,16 +133,22 @@ def test_playlist_download(
expected_message="ExistingVideoReached, stopping additional downloads",
log_level="debug",
):
_ = playlist_subscription.download()
transaction_log = playlist_subscription.download()

# TODO: output_directory_nfo is always rewritten, fix!
# assert transaction_log.is_empty
assert transaction_log.is_empty
assert_expected_downloads(
output_directory=output_directory,
dry_run=dry_run,
expected_download_summary_file_name="youtube/test_playlist.json",
)

self._ensure_subscription_migrates(
config=music_video_config,
subscription_name="music_video_playlist_test",
subscription_dict=playlist_preset_dict,
output_directory=output_directory,
)

@pytest.mark.parametrize("dry_run", [True, False])
def test_playlist_download_from_cli_sub(
self,
Expand Down Expand Up @@ -132,10 +187,9 @@ def test_playlist_download_from_cli_sub(
expected_message="ExistingVideoReached, stopping additional downloads",
log_level="debug",
):
_ = mock_run_from_cli(args=args)[0][1]
transaction_log = mock_run_from_cli(args=args)[0][1]

# TODO: output_directory_nfo is always rewritten, fix!
# assert transaction_log.is_empty
assert transaction_log.is_empty
assert_expected_downloads(
output_directory=output_directory,
dry_run=dry_run,
Expand Down
2 changes: 0 additions & 2 deletions tests/expected_transaction_log.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,6 @@ def assert_transaction_log_matches(
# Split, ensure there are the same number of new lines
summary_lines: List[str] = summary.split("\n")
expected_summary_lines: List[str] = expected_summary.split("\n")
print(summary_lines)
print(expected_summary_lines)
assert len(summary_lines) == len(
expected_summary_lines
), f"Summary number of lines differ: {len(summary_lines) != len(expected_summary_lines)}"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
{
"Best Prebuilt TV Show by Date/.ytdl-sub-Best Prebuilt TV Show by Date-download-archive.json": "2bf8edeaf8b5658c4a42a9faa9bb2f60",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000001 - Mock Entry 20-3-thumb.jpg": "e80c508c4818454300133fe1dc1a9cd7",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000001 - Mock Entry 20-3.info.json": "INFO_JSON",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000001 - Mock Entry 20-3.mp4": "5f221fdf07f200a297427b5df953d96f",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000001 - Mock Entry 20-3.nfo": "dbe61f2c8ae41041773f713ba5376726",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000002 - Mock Entry 20-2-thumb.jpg": "e80c508c4818454300133fe1dc1a9cd7",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000002 - Mock Entry 20-2.info.json": "INFO_JSON",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000002 - Mock Entry 20-2.mp4": "240eb2e4df1abb10290f957d75f2522c",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000002 - Mock Entry 20-2.nfo": "0f071078c9fa2569bbcb998664d40681",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000003 - Mock Entry 20-1-thumb.jpg": "e80c508c4818454300133fe1dc1a9cd7",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000003 - Mock Entry 20-1.info.json": "INFO_JSON",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000003 - Mock Entry 20-1.mp4": "0c58e78e7727c893226b9fcbe39b1791",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000003 - Mock Entry 20-1.nfo": "837a61dca11bbe1874ea07cb8ef8a7c9",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000005 - Mock Entry 20-7-thumb.jpg": "e80c508c4818454300133fe1dc1a9cd7",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000005 - Mock Entry 20-7.info.json": "INFO_JSON",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000005 - Mock Entry 20-7.mp4": "e8b77ffd826f8f7233b875e311adaf34",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000005 - Mock Entry 20-7.nfo": "c1f5925e1eab8bd21e077df560879d94",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000006 - Mock Entry 20-6-thumb.jpg": "e80c508c4818454300133fe1dc1a9cd7",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000006 - Mock Entry 20-6.info.json": "INFO_JSON",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000006 - Mock Entry 20-6.mp4": "4343e692b53f0abce80559b59ef2fa0c",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000006 - Mock Entry 20-6.nfo": "9fdbaa70187252b0d303d128123c4e83",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000007 - Mock Entry 20-5-thumb.jpg": "e80c508c4818454300133fe1dc1a9cd7",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000007 - Mock Entry 20-5.info.json": "INFO_JSON",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000007 - Mock Entry 20-5.mp4": "dea5d69ae35e47aa78a51f464088a6ad",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000007 - Mock Entry 20-5.nfo": "5e5815bbc8b471df94d8536b117409f4",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000008 - Mock Entry 20-4-thumb.jpg": "e80c508c4818454300133fe1dc1a9cd7",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000008 - Mock Entry 20-4.info.json": "INFO_JSON",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000008 - Mock Entry 20-4.mp4": "b2f03dcefe44b8afc65b2500c873aec2",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000008 - Mock Entry 20-4.nfo": "a86c62cd14e6a42dd79639a6155165fe",
"Best Prebuilt TV Show by Date/Season 2021/s2021.e000004 - Mock Entry 21-1-thumb.jpg": "e80c508c4818454300133fe1dc1a9cd7",
"Best Prebuilt TV Show by Date/Season 2021/s2021.e000004 - Mock Entry 21-1.info.json": "INFO_JSON",
"Best Prebuilt TV Show by Date/Season 2021/s2021.e000004 - Mock Entry 21-1.mp4": "8b75d7f6f6f84cccf1867a91d15044a6",
"Best Prebuilt TV Show by Date/Season 2021/s2021.e000004 - Mock Entry 21-1.nfo": "8ee3845c514a411425b7e9198666b61c",
"Best Prebuilt TV Show by Date/tvshow.nfo": "2439ebc18e46a67064956cb940c992e9"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"Best Prebuilt TV Show by Date/.ytdl-sub-Best Prebuilt TV Show by Date-download-archive.json": "456d8882fc5e35d74f19b386d1d9a059",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000001 - Mock Entry 20-3-thumb.jpg": "e80c508c4818454300133fe1dc1a9cd7",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000001 - Mock Entry 20-3.info.json": "INFO_JSON",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000001 - Mock Entry 20-3.mp4": "5f221fdf07f200a297427b5df953d96f",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000001 - Mock Entry 20-3.nfo": "dbe61f2c8ae41041773f713ba5376726",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000002 - Mock Entry 20-2-thumb.jpg": "e80c508c4818454300133fe1dc1a9cd7",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000002 - Mock Entry 20-2.info.json": "INFO_JSON",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000002 - Mock Entry 20-2.mp4": "240eb2e4df1abb10290f957d75f2522c",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000002 - Mock Entry 20-2.nfo": "0f071078c9fa2569bbcb998664d40681",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000003 - Mock Entry 20-1-thumb.jpg": "e80c508c4818454300133fe1dc1a9cd7",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000003 - Mock Entry 20-1.info.json": "INFO_JSON",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000003 - Mock Entry 20-1.mp4": "0c58e78e7727c893226b9fcbe39b1791",
"Best Prebuilt TV Show by Date/Season 2020/s2020.e000003 - Mock Entry 20-1.nfo": "837a61dca11bbe1874ea07cb8ef8a7c9",
"Best Prebuilt TV Show by Date/Season 2021/s2021.e000004 - Mock Entry 21-1-thumb.jpg": "e80c508c4818454300133fe1dc1a9cd7",
"Best Prebuilt TV Show by Date/Season 2021/s2021.e000004 - Mock Entry 21-1.info.json": "INFO_JSON",
"Best Prebuilt TV Show by Date/Season 2021/s2021.e000004 - Mock Entry 21-1.mp4": "8b75d7f6f6f84cccf1867a91d15044a6",
"Best Prebuilt TV Show by Date/Season 2021/s2021.e000004 - Mock Entry 21-1.nfo": "8ee3845c514a411425b7e9198666b61c",
"Best Prebuilt TV Show by Date/tvshow.nfo": "2439ebc18e46a67064956cb940c992e9"
}
Loading

0 comments on commit 309f259

Please sign in to comment.