Getting OCR-ed books from https://gallica.bnf.fr
The whole digital collection for various centuries is at https://gallica.bnf.fr/html/und/litteratures/les-classiques-de-la-litterature-acces-par-periode?mode=desktop. Our focus is on the following collections:
- https://gallica.bnf.fr/html/und/litteratures/les-classiques-de-la-litterature-du-xviiie-siecle?mode=desktop
- https://gallica.bnf.fr/html/und/litteratures/les-classiques-de-la-litterature-du-xixe-siecle?mode=desktop
- https://gallica.bnf.fr/html/und/litteratures/les-classiques-de-la-litterature-du-xxe-siecle?mode=desktop
Our task is to download all OCR-ed books from that collection for three centuries. The problem is that the APIs (https://api.bnf.fr/api-gallica-de-recherche#/recherche and https://api.bnf.fr/fr/api-document-de-gallica#/texte%20brut/get__ark__texteBrut) do not provide search over collections.