Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: azure document intelligence as a segmentation model #274

Closed
akhileshsharma99 opened this issue Dec 25, 2024 · 0 comments · Fixed by #302
Closed

feature: azure document intelligence as a segmentation model #274

akhileshsharma99 opened this issue Dec 25, 2024 · 0 comments · Fixed by #302
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@akhileshsharma99
Copy link
Collaborator

akhileshsharma99 commented Dec 25, 2024

Add the ability to use azure document layout analysis for the segmentation model. This will be configurable through the following field in the upload form:

pipeline: "Azure"

The output will be unified so SegmentType will now look like this:

type SegmentType =
  | "Title"
  | "SectionHeader"
  | "Text"
  | "List item"
  | "Table"
  | "Picture"
  | "Caption"
  | "Formula"
  | "Footnote"
  | "PageHeader"
  | "PageFooter"
  | "Page"

The following will be mapped from Azure layout analysis result to the output above:

  • paragraphs.role.null -> Text
  • paragraphs.role.title -> Title
  • paragraphs.role.sectionHeading -> SectionHeader
  • paragraphs.role.pageNumber -> PageFooter
  • results.figures[idx] -> Picture
  • results.tables[idx] -> Table
  • results.figures[idx].caption.elements -> Caption
  • results.tables[idx].caption.elements -> Caption

NOTE
No formula support as formulas can be inside paragraphs and that breaks our current implementation.
Any paragraphs inside the Picture and Table region will be removed to match our unified output.

@akhileshsharma99 akhileshsharma99 added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 25, 2024
@akhileshsharma99 akhileshsharma99 self-assigned this Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant