-
-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: add CLI command to run OCR on previous images
- Loading branch information
1 parent
b2e55be
commit 6743d77
Showing
5 changed files
with
73 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Maintenance | ||
|
||
## How to launch OCR on previously uploaded images | ||
|
||
OCR (through Google Cloud Vision) is launched on every new proof image. However, if you want to launch OCR on previously uploaded images, you can do so by running the following command: | ||
|
||
```bash | ||
make cli args='run_ocr' | ||
``` | ||
|
||
To override existing OCR results, add the `--override` flag: | ||
|
||
```bash | ||
make cli args='run_ocr --override' | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
import argparse | ||
import glob | ||
|
||
import tqdm | ||
from django.conf import settings | ||
from django.core.management.base import BaseCommand | ||
|
||
from open_prices.proofs.utils import fetch_and_save_ocr_data | ||
|
||
|
||
class Command(BaseCommand): | ||
help = "Run OCR on images with missing OCR files." | ||
|
||
def add_arguments(self, parser: argparse.ArgumentParser) -> None: | ||
parser.add_argument( | ||
"--override", action="store_true", help="Override existing OCR data." | ||
) | ||
|
||
def handle(self, *args, **options) -> None: # type: ignore | ||
self.stdout.write("Starting OCR processing...") | ||
override = options["override"] | ||
processed = 0 | ||
|
||
for image_path_str in tqdm.tqdm( | ||
glob.iglob("**/*", root_dir=settings.IMAGES_DIR), desc="images" | ||
): | ||
image_path = settings.IMAGES_DIR / image_path_str | ||
result = fetch_and_save_ocr_data(image_path, override=override) | ||
processed += int(result) | ||
|
||
self.stdout.write("%d OCR saved" % processed) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters