Geoparsing Tutorial Notebook

Jupyter notebook for geoparsing historical encyclopedia texts in French using the PERDIDO Geoparser.

This notebook is proposed by L. Moncla (INSA Lyon) and K. McDonough (The Alan Turing Institute) as part of the GEODE project.

Overview

In this tutorial, we demonstrate how to use a custom version of the Perdido geoparser python library developed in the GEODE project. We will use texts from Diderot and d’Alembert’s Encyclopédie as a case study for querying a corpus and wrangling geoparsed data. We will also compare Perdido’s NER annotations (e.g. it's output) to the results of other well-known python NER libraries (spaCy and Stanza).

In this tutorial, we'll learn about a few different things.

How to load data from TEI-XML files into a Python dataframe
Use Python dataframe for simple data analysis
Test the PERDIDO API for preprocessing French texts (part-of-speech tagging)
Test the PERDIDO API for geoparsing (geotagging + geocoding) Encyclopedie articles
Display custom geotagging results (PERDIDO TEI-XML) with the displaCy Named Entity Visualizer
Display geocoding results on a map

Open the notebook in the cloud

You can open this notebook in an executable and remote environment with or

Set up a python environment

Clone this github repository

git clone https://github.com/GEODE-project/perdido-geoparsing-notebook.git

Configure the environment with all dependencies

Create a new environment called tutorial-geoparsing-py39

conda create -n tutorial-geoparsing-py39 python=3.9

Activate the environment

conda activate tutorial-geoparsing-py39

Install fiona package with conda (avoid an issue with pip)

conda install fiona==1.8.21

Install dependencies with pip

pip install -r requirements.txt

Launch the jupyter server

jupyter notebook

Acknowledgement

Data courtesy the ARTFL Encyclopédie Project, University of Chicago.

The authors are grateful to the ASLAN project (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR).

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
data		data
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Tutorial-geoparsing.ipynb		Tutorial-geoparsing.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Geoparsing Tutorial Notebook

Overview

Open the notebook in the cloud

Set up a python environment

Clone this github repository

Configure the environment with all dependencies

Launch the jupyter server

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

GEODE-project/perdido-geoparsing-notebook

Folders and files

Latest commit

History

Repository files navigation

Geoparsing Tutorial Notebook

Overview

Open the notebook in the cloud

Set up a python environment

Clone this github repository

Configure the environment with all dependencies

Launch the jupyter server

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages