Skip already downloaded files #137

jhkennedy · 2021-08-02T17:48:17Z

When you've partially downloaded a batch/project (downloaded finished jobs before all finished, or in the subscription context especially), it'd nice to skip downloading already existing files.

Implementation options

Simplest would be to add a skip_existing: bool = False option to the download_files() function signatures that skips the download if that files already exists
We could get fancier and compare the sizes and download if they don't match since the size is reported in the API
We could also check the file checksum, but since it's not calculated in the API we'd have to calculate both on the fly

Overall, I lean towards the simplest here.

The text was updated successfully, but these errors were encountered:

tshreve · 2024-09-10T17:39:53Z

Hi, I ran into this same situation recently, and think it would be nice to have an option to skip existing files, as proposed above. My temporary solution was to add the following to util.py:

        if my_file.is_file():
            print(filepath, " already exists. Not downloading.")
            pass
        else:
            with session.get(url, stream=stream) as s:
                s.raise_for_status()
                tqdm = get_tqdm_progress_bar()
                with tqdm.wrapattr(open(filepath, "wb"), 'write', miniters=1, desc=filepath.name,
                                   total=int(s.headers.get('content-length', 0))) as f:
                    for chunk in s.iter_content(chunk_size=chunk_size):
                        if chunk:
                            f.write(chunk)

jhkennedy added enhancement New feature or request good first issue Good for newcomers labels Aug 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip already downloaded files #137

Skip already downloaded files #137

jhkennedy commented Aug 2, 2021

tshreve commented Sep 10, 2024

Skip already downloaded files #137

Skip already downloaded files #137

Comments

jhkennedy commented Aug 2, 2021

tshreve commented Sep 10, 2024