A tool using classical computer vision for segmentation of digital images composing of multiple panels. Especially effective on Comic panels, Collages and Memes.
Segmentation on complex image collage formats helps to improves performance of downstream tasks such as OCR, Object detection and other computer vision tools in an image processing pipeline.
While many image segmentation tools these days take a deep learning approach, PST chooses to rely on classical computer vision. This is possible due to the relatively small problem space of comics and digitally created image formats; coupled with exploiting some digital formatting trends such as pixel perfect panel delineations, this allows for accurate segmentation to be done with some clever heuristics.
Key benefits of PST over its deep learning counterparts include faster processing speeds as well as being much more lightweight. PST struggles with non computer generated graphics which are hand-drawn, images in the wild or unconventional internet image formats with no clear delineations between panels. However it is still very effective for images found on the internet such as screenshots, collages, memes, comics etc., with most other cases probably requiring a deep learning approach due to the broad problem space.
- Canny Edge Detection is used to detect the presence of all edges in the image
- A Probabilistic Hough Transformation is used to detect all straight lines, with filters of length and angle classifiers to eliminate all noisy lines.
- Resultant lines are then extrapolated and merged on the pixel level with other lines, with many lines congregating along key delimiting lines between panels, which are termed as segment lines.
- Segmentation Lines are then evaluated with other segmentation lines on the pixel level to determine likelihood of a panel edge existing there. Most of the irrelevant noise of naturally occuring straight lines in an image ar eliminated in this step.
- Images are sliced, and positional information of each panel is retained in bounding box notation and optionally returned in JSON format.
$ pip install -r requirements.txt
$ python3 PST.py
usage: PST.py
[-h]
[--filepath FILEPATH]
[--outputfilepath OUTPUTFILEPATH]
[--nosaveimage]
[--bboxjson]
[--pdf]
optional arguments:
--help
show this help message and exit
--filepath your/filepath/here
directories where images are received as input, default is set as 'Data/'
--outputfilepath your/filepath/here
directories where images are saved, default is set as 'Output/'
--nosaveimage
choose not to save images
--bbox json
choose to save image segmentation bounding boxes, with the same filename in JSON
--pdf
generates a pdf useful for troubleshooting
├── Main
│ ├── PST.py
│ ├── Data
│ │ ├── image0.jpg
│ │ └── image1.jpg
.
.
.
│ ├── Output
│ │ ├── image0
│ │ │ ├── 0.jpg
│ │ │ └── 1.jpg
│ │ ├── image1
│ │ │ ├── 0.jpg
│ │ │ └── ...
...