This script retrieves images and PDF documents from an Object Storage bucket, sends them to Vision for recognition, and then saves the recognition results back to the Object Storage bucket.
Written in Python, it can be easily analyzed, modified, and optimized for your specific use case.
-
The user uploads images or documents in supported formats to the
input
directory (prefix) in the Object Storage bucket. -
The script retrieves the
input
folders from the bucket, generates a list of files for recognition, skipping unsupported formats and files that have already been recognized (by checking theresult
folder for the files in question). -
Then, it downloads files from the list one by one via direct links and sends them to Vision for recognition.
-
Vision receives the file, processes it, and returns the recognition result, which is saved to the
result
folder in both JSON and TXT formats.
You can run the script locally. To do this, specify the following environment variables:
Variable | Description |
---|---|
S3_BUCKET | Bucket name in Object Storage |
S3_PREFIX | Prefix (or directory) for incoming files, e.g., input |
S3_PREFIX_OUT | Prefix (or directory) for processing results, e.g., result |
S3_KEY | Static access key ID |
S3_SECRET | Static access key secret |
API_SECRET | API key secret |
FOLDER_ID | Folder ID |
You may want to separate files for processing and the processing results by using different prefixes (subfolders). Failing to do so may lead to unpredictable behavior.
To generate both an S3 key and an API key, create a service account and assign the storage.editor
and ai.vision.user
roles.
You can leverage a ready-to-use Terraform module that creates all required resources to start processing images and documents.