Skip to content

Latest commit

 

History

History
13 lines (8 loc) · 1.33 KB

File metadata and controls

13 lines (8 loc) · 1.33 KB

🔎 DataCatalogue Object Detection Dataset

As part of our pipeline, we are experimenting with Document Layout Segmentation (DSL) or Document Layout Analysis (DLA). We used the web application Roboflow to manually annotate our dataset and train our object detection model based on YOLOv8. The annotations are based on the SegmOnto controlled vocabulary, with new classes defined by the COLaF project on their LADaS dataset.

Useful links:

  • Our dataset and model available on Roboflow
  • More on YOLOv8
  • Simon Gabay, Ariane Pinche, Kelly Christensen, Jean-Baptiste Camps, & Nicola Carboni. (2023). SegmOnto: A Controlled Vocabulary to Describe the Layout of Pages (Version 0.9). Genève, Lyon, Paris. https://segmonto.github.io/.

📝 Bibliography

  • Thibault Clérice, Juliette Janès, Hugo Scheithauer, Sarah Bénière, Laurent Romary, & Benoît Sagot. (2024, August 6-9). Layout Analysis Dataset with SegmOnto. DH 2024 - Annual Conference of the Alliance of Digital Humanities Organizations, Washington, D.C., United States. https://inria.hal.science/hal-04513725.