Stop checking for duplicates in imports #159

jessemortenson · 2025-01-27T21:00:27Z

Status quo data imports always fail if there is one or more duplicate items yielded by the scraper. Ideally scrapes do not emit duplicates, but due to the vagaries of our source sites, it can be time consuming in some cases to root them out. And in many cases the duplicates are identical, so doing extra imports of the same data does not cause harm. In those cases, eliminating the duplicates is actually pretty low priority work. But the fact that the import is halted turns it into high priority work.

This change introduces a runtime flag --allow_duplicates that can be used to change the behavior to warn only on duplicate items.

Stop checking for duplicates in imports

9d18729

jessemortenson requested a review from alexobaseki January 27, 2025 21:00

jessemortenson added 4 commits January 27, 2025 15:11

Fix linting

823d776

Refactor allow duplicates to an optional runtime flag

7fd7276

Update comment

1d7d74a

Add 6.20.14 release info

e7168a0

jessemortenson merged commit 5907f4c into main Jan 27, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop checking for duplicates in imports #159

Stop checking for duplicates in imports #159

jessemortenson commented Jan 27, 2025 •

edited

Loading

Stop checking for duplicates in imports #159

Stop checking for duplicates in imports #159

Conversation

jessemortenson commented Jan 27, 2025 • edited Loading

jessemortenson commented Jan 27, 2025 •

edited

Loading