Document AI Processor with Node.js

This project demonstrates how to use Google Cloud's Document AI API to process a document (e.g., an image) and extract text and paragraph information. The script is written in Node.js and uses the @google-cloud/documentai library.

Features

Process an image file (e.g., JPEG) using Google Cloud's Document AI API.
Extract and display text and paragraphs from the document.
Provides a starting point for integrating Document AI into your Node.js projects.

Prerequisites

1. Google Cloud Setup

Enable the Document AI API on your Google Cloud Console.
Create a processor in the Document AI section of the Cloud Console. Note the PROJECT_ID, LOCATION, and PROCESSOR_ID values.
Set up a service account with the required permissions for Document AI.
Download the service account key JSON file and set the path in the GOOGLE_APPLICATION_CREDENTIALS environment variable.

2. Node.js Setup

Install Node.js (v14 or later is recommended).

Clone this repository:

git clone <repository-url>
cd <repository-name>

Install dependencies:
```
npm install
```

Environment Variables

Create a .env file in the root directory with the following variables:

PROJECT_ID=<Your Google Cloud Project ID>
LOCATION=<Processor Location, e.g., 'us' or 'eu'>
PROCESSOR_ID=<Your Processor ID>
GOOGLE_APPLICATION_CREDENTIALS=<Path to your service account key JSON file>

Usage

Prepare the document you want to process (e.g., an image file) and place it in the data/ folder. Update the filePath in the script if necessary.
Run the script:
```
node index.js
```
The extracted text and paragraph information will be logged to the console.

Code Structure

index.js: Main script for processing documents using the Document AI API.
data/two.jpeg: Sample image file for testing (you can replace this with your own image).
.env: Environment variables file (excluded from version control).

Output Example

Here’s an example of the output:

Starting the quickstart function...
Processor name: projects/<PROJECT_ID>/locations/<LOCATION>/processors/<PROCESSOR_ID>
File read successfully: data/two.jpeg
Image file encoded to base64.
Request object created.
Document processed successfully.
Text extracted from the document.
The document contains the following paragraphs:
Paragraph text:
<Extracted text from the document>

Dependencies

dotenv
@google-cloud/documentai
Node.js fs module for file handling.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Feel free to open issues or create pull requests if you have suggestions or improvements.

References

Author

Asutosh Sidhya

Check out my GitHub profile.

Disclaimer

Ensure you handle sensitive data, such as API keys and documents, securely and avoid committing them to your repository.

Instructions

Save this content into a README.md file in your repository.
Replace placeholders like <Your Google Cloud Project ID> and <repository-url> with your specific details.
Include a .gitignore file to exclude sensitive files such as .env and GOOGLE_APPLICATION_CREDENTIALS.

Let me know if you need additional adjustments or further enhancements!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.env_example		.env_example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
test.js		test.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document AI Processor with Node.js

Features

Prerequisites

1. Google Cloud Setup

2. Node.js Setup

Environment Variables

Usage

Code Structure

Output Example

Dependencies

License

Contributing

References

Author

Disclaimer

Instructions

About

Releases

Packages

Languages

License

sidhyaashu/Data_Extraction_OCR_Implementation

Folders and files

Latest commit

History

Repository files navigation

Document AI Processor with Node.js

Features

Prerequisites

1. Google Cloud Setup

2. Node.js Setup

Environment Variables

Usage

Code Structure

Output Example

Dependencies

License

Contributing

References

Author

Disclaimer

Instructions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages