Skip to content
This repository has been archived by the owner on Oct 4, 2024. It is now read-only.

Document Data Flow from Scrape to Sale #478

Closed
Tracked by #65
sfaria27 opened this issue Dec 28, 2023 · 1 comment
Closed
Tracked by #65

Document Data Flow from Scrape to Sale #478

sfaria27 opened this issue Dec 28, 2023 · 1 comment
Assignees

Comments

@sfaria27
Copy link

As a team member, I need to create a comprehensive document that outlines the flow of data from the initial email scrape of a receipt to the final sale. This document will serve as a reference for stakeholders, detailing each step in the process and the systems involved.

Acceptance Criteria

  • Identify and document each stage in the data flow process from email scrape to sale.
  • Describe the role and functionality of each component or system involved.
  • Include details on data transformations, processing, and any intermediary steps.
  • Specify data formats and communication protocols between systems.
  • Incorporate relevant privacy and security considerations at each stage.
@ricardobrg
Copy link
Contributor

1. Email Upload:

  • User Action: Upload the email body and its attachments through the TIKI platform.
  • Behind the Scenes: The email is securely transmitted to our system encrypted through HTTPS, preventing its interception by 3rd parties.

2. Publisher Identity Verification and Data Integrity Check:

  • What Happens:
    • We ensure the identity of the publisher who submitted the email.
    • We check that the email hasn't been tampered with during its journey to our system.
  • Why:
    • To confirm that the email comes from the publisher.
  • Outcome:
    • If the verification fails, the request is rejected.
    • If everything checks out, we move to the next step.

3. User's License Check:

  • What Happens:
    • We confirm whether the user has a valid license to submit its receipt data.
  • Why:
    • To ensure compliance with legal agreements and data usage policies.
  • Outcome:
    • If the user has a valid license, we proceed; otherwise, the request is rejected.

4. Create a Request Record:

  • What Happens:
    • We generate a unique identifier for this specific email submission.
  • Why:
    • To keep track of this specific transaction for future reference.
  • Outcome:
    • An ID for this request is created with the status "in progress". No other data is saved from the request.

5. Text Extraction from Email:

  • What Happens:
    • We use Amazon Textract to convert the text from the attahcments into plain text data.
  • Why:
    • To extract meaningful information from the receipt email.
  • Outcome:
    • The email attachments are discarded after submission
    • The text content is obtained from the attachments.

6. Convert Text to Structured Receipt Data:

  • What Happens:
    • We structure the extracted attachments text and the email body into a standardized format suitable for our data repository (TIKI Ocean).
  • Why:
    • To organize the information in a way that our system can efficiently process and store.
  • Outcome:
    • The text is transformed into structured Receipt Data.
    • After transformation, the text and the email body are discarded.

7. Publish Receipt Data to TIKI Ocean:

  • What Happens:
    • The transformed Receipt Data is sent to TIKI Ocean, our central data repository.
    • The request record created earlier is updated with the status as "stored".
  • Why:
    • To store the receipt data securely and make it accessible for data buyers.
  • Outcome:
    • The receipt data is successfully stored in TIKI Ocean, and the transaction is recorded for future reference.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants