An ML API to compute similarity scores between sentences based on k-shingled substrings.
The API is programmed with the fastapi
Python package,
uses the packages datasketch
and kshingle
to compute similarity scores.
The deployment is configured for Docker Compose.
Call Docker Compose
export API_PORT=8082
docker-compose -f docker-compose.yml up --build
# or as oneliner:
API_PORT=8082 docker-compose -f docker-compose.yml up --build
(Start docker daemon before, e.g. open /Applications/Docker.app
on MacOS).
Check
curl http://localhost:8082
Notes: Only main.py
is used in Dockerfile
.
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
(If your git repo is stored in a folder with whitespaces, then don't use the subfolder .venv
. Use an absolute path without whitespaces.)
source .venv/bin/activate
# uvicorn app.main:app --reload
gunicorn app.main:app --reload --bind=0.0.0.0:8082 \
--worker-class=uvicorn.workers.UvicornH11Worker \
--workers=1 --timeout=600
curl -X POST "http://localhost:8082/similarities/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '["Die Kuh macht muh.", "Die Muh macht kuh."]'
- Check syntax:
flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')
- Run Unit Tests:
PYTHONPATH=. pytest
- Show the docs: http://localhost:8082/docs
- Show Redoc: http://localhost:8082/redoc
find . -type f -name "*.pyc" | xargs rm
find . -type d -name "__pycache__" | xargs rm -r
rm -r .pytest_cache
rm -r .venv
@software{ulf_hamster_2022_7096465,
author = {Ulf Hamster and
Luise Köhler},
title = {simiscore-kshingle},
month = sep,
year = 2022,
publisher = {Zenodo},
version = {0.1.0},
doi = {10.5281/zenodo.7096465},
url = {https://doi.org/10.5281/zenodo.7096465}
}
- Sebastián Ramírez, 2018, FastAPI, https://github.com/tiangolo/fastapi
- Eric Zhu, Vadim Markovtsev, aastafiev, Wojciech Łukasiewicz, ae-foster, Sinusoidal36, Ekevoo, Kevin Mann, Keyur Joshi, Peter Kubov, Qin TianHuan, Spandan Thakur, Stefano Ortolani, Titusz, Vojtech Letal, Zac Bentley, fpug, & oisincar. (2021). ekzhu/datasketch: v1.5.4 (v1.5.4). Zenodo. https://doi.org/10.5281/zenodo.5758425
- Ulf Hamster. (2022). kshingle: Shingling text data (0.10.0). Zenodo. https://doi.org/10.5281/zenodo.7096407
Please open an issue for support.
Please contribute using Github Flow. Create a branch, add commits, and open a pull request.
The "Evidence" project was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 433249742 (GU 798/27-1; GE 1119/11-1).
- till 31.Aug.2023 (v0.1.0) the code repository was maintained within the DFG project 433249742
- since 01.Sep.2023 (v0.2.0) the code repository is maintained by Ulf Hamster.