Document Data Flow from Scrape to Sale #478

sfaria27 · 2023-12-28T22:41:29Z

As a team member, I need to create a comprehensive document that outlines the flow of data from the initial email scrape of a receipt to the final sale. This document will serve as a reference for stakeholders, detailing each step in the process and the systems involved.

Acceptance Criteria

Identify and document each stage in the data flow process from email scrape to sale.
Describe the role and functionality of each component or system involved.
Include details on data transformations, processing, and any intermediary steps.
Specify data formats and communication protocols between systems.
Incorporate relevant privacy and security considerations at each stage.

ricardobrg · 2023-12-29T15:36:47Z

1. Email Upload:

User Action: Upload the email body and its attachments through the TIKI platform.
Behind the Scenes: The email is securely transmitted to our system encrypted through HTTPS, preventing its interception by 3rd parties.

2. Publisher Identity Verification and Data Integrity Check:

What Happens:
- We ensure the identity of the publisher who submitted the email.
- We check that the email hasn't been tampered with during its journey to our system.
Why:
- To confirm that the email comes from the publisher.
Outcome:
- If the verification fails, the request is rejected.
- If everything checks out, we move to the next step.

3. User's License Check:

What Happens:
- We confirm whether the user has a valid license to submit its receipt data.
Why:
- To ensure compliance with legal agreements and data usage policies.
Outcome:
- If the user has a valid license, we proceed; otherwise, the request is rejected.

4. Create a Request Record:

What Happens:
- We generate a unique identifier for this specific email submission.
Why:
- To keep track of this specific transaction for future reference.
Outcome:
- An ID for this request is created with the status "in progress". No other data is saved from the request.

5. Text Extraction from Email:

What Happens:
- We use Amazon Textract to convert the text from the attahcments into plain text data.
Why:
- To extract meaningful information from the receipt email.
Outcome:
- The email attachments are discarded after submission
- The text content is obtained from the attachments.

6. Convert Text to Structured Receipt Data:

What Happens:
- We structure the extracted attachments text and the email body into a standardized format suitable for our data repository (TIKI Ocean).
Why:
- To organize the information in a way that our system can efficiently process and store.
Outcome:
- The text is transformed into structured Receipt Data.
- After transformation, the text and the email body are discarded.

7. Publish Receipt Data to TIKI Ocean:

What Happens:
- The transformed Receipt Data is sent to TIKI Ocean, our central data repository.
- The request record created earlier is updated with the status as "stored".
Why:
- To store the receipt data securely and make it accessible for data buyers.
Outcome:
- The receipt data is successfully stored in TIKI Ocean, and the transaction is recorded for future reference.

sfaria27 self-assigned this Dec 28, 2023

sfaria27 mentioned this issue Dec 28, 2023

Email Scrape Feature Documentation tiki-deprecated/apps#65

Closed

7 tasks

sfaria27 closed this as completed Dec 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document Data Flow from Scrape to Sale #478

Document Data Flow from Scrape to Sale #478

sfaria27 commented Dec 28, 2023

ricardobrg commented Dec 29, 2023

Document Data Flow from Scrape to Sale #478

Document Data Flow from Scrape to Sale #478

Comments

sfaria27 commented Dec 28, 2023

Acceptance Criteria

ricardobrg commented Dec 29, 2023

1. Email Upload:

2. Publisher Identity Verification and Data Integrity Check:

3. User's License Check:

4. Create a Request Record:

5. Text Extraction from Email:

6. Convert Text to Structured Receipt Data:

7. Publish Receipt Data to TIKI Ocean: