You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When you've partially downloaded a batch/project (downloaded finished jobs before all finished, or in the subscription context especially), it'd nice to skip downloading already existing files.
Implementation options
Simplest would be to add a skip_existing: bool = False option to the download_files() function signatures that skips the download if that files already exists
We could get fancier and compare the sizes and download if they don't match since the size is reported in the API
We could also check the file checksum, but since it's not calculated in the API we'd have to calculate both on the fly
Overall, I lean towards the simplest here.
The text was updated successfully, but these errors were encountered:
Hi, I ran into this same situation recently, and think it would be nice to have an option to skip existing files, as proposed above. My temporary solution was to add the following to util.py:
if my_file.is_file():
print(filepath, " already exists. Not downloading.")
pass
else:
with session.get(url, stream=stream) as s:
s.raise_for_status()
tqdm = get_tqdm_progress_bar()
with tqdm.wrapattr(open(filepath, "wb"), 'write', miniters=1, desc=filepath.name,
total=int(s.headers.get('content-length', 0))) as f:
for chunk in s.iter_content(chunk_size=chunk_size):
if chunk:
f.write(chunk)
When you've partially downloaded a batch/project (downloaded finished jobs before all finished, or in the subscription context especially), it'd nice to skip downloading already existing files.
Implementation options
skip_existing: bool = False
option to thedownload_files()
function signatures that skips the download if that files already existsOverall, I lean towards the simplest here.
The text was updated successfully, but these errors were encountered: