This script simplifies elements of html files exported from InDesign.
With our present configuration, InDesign outputs images and captions like so:
<div id="important-id">
<img />
<p class="caption"></p>
Where #important-id describes important rules to display the image.
Whereas this works fine, our workflow can benefit from a more streamlined structure to embed tools (i.e. zoomify) more efficiently. This would be:
<img id="important-id" />
<p class="caption"></p>
Install the system packages. On Debian:
$ apt-get install python3.5-venv python3-pip
You might want to install third-party python libraries and run the script in a virtual environment. Create the environment first:
$ cd your-work-folder
$ pyvenv-3.5 .venv
$ source .venv/bin/activate
Then install the dependencies:
(.venv) $ pip3 -r requirements.txt
(.venv) $ python3 input-file.xhtml output-file.xhtml
If needed, get tips by:
(.venv) $ python3 -h
The current version of the script outputs utf-8
files. If required a different encoding (i.e. utf-16
), please change this in the last part of
, when beautifulsoup
encodes the soup prior writing the output file.