This repository has been archived by the owner on May 8, 2024. It is now read-only.
Replies: 2 comments
-
@ninpnin You could probably answer this best. |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
Lauler
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I was browsing through the repo trying to get a sense of
As I understand it you train a text based classifier on curated data in input/curation. Here are some questions I had:
I was hoping to explore using both image+text to segment protocols. It is however difficult to use the supplied images without either bounding box information for the annotations, or a link back to source document's textbox URIs so that bounding boxes can be retrieved via the original OCR.
Is the bounding box info included somewhere in riksdagen-corpus? If it isn't a request would be to consider including this info when preprocessing the protocols. It would be very useful to at least include it for the manually curated train/eval data, as this information already exists for the scanned/OCR:ed protocols. Including bbox would allow people to make use of the image modality.
Beta Was this translation helpful? Give feedback.
All reactions