This is a deep learning project that tries to predict where an image was taken. It uses a Convolutional Neural Network to analyze visual features and distinguish between locations based on the unique architectural, environmental and infrastructural elements.
The project currently focuses on five major capitals (you can set custom cities and locations):
- Budapest
- Ottawa
- Tokyo
- Cairo
- Canberra
The model performs two key tasks:
- City Classification: Assigns an image to one of the pre-defined city categories using a softmax-activated output layer.
- Coordinate Regression: Estimates latitude and longitude values via a linear-activated output layer.
This project is divided into three main components:
- Backend: A FastAPI server that connects the model with the web interface.
- Model: The EfficientNetV2S-based neural network handling both classification and regression.
- Frontend: A web application built with Next.js, TypeScript and TailwindCSS to easily interact with the model.
Data Handling:
- Uses the Mapillary API to collect street-level images and metadata using the Mapillary API.
- Speeds up data gathering through concurrent API requests.
- Preprocesses images (resizing to 224x224 pixels) for efficient training.
Deep Learning Model:
- Uses the pre-trained EfficientNetV2S network with custom upper layers.
- Employs fine-tuning where the EfficientNetV2S base is frozen, and only the custom layers are trained with the Adam optimizer.
- Uses GRAD-CAM to produce heatmaps on request that reveal image regions influencing the model's decisions.
Training and Evaluation:
- Achieves approximately 83% accuracy in city classification.
- Saves
best_location_model.keras
as the best validation coordinates accuracy from training. - Saves
best_overall_model.keras
as the best overall model based on validation loss from training.
Frontend UI:
- Developed with Next.js, TypeScript and TailwindCSS.
- Provides an easily usable interface for users to interact with the model.
- Accessible online at Location Guesser,
- Python 3.10 or higher: Install from python.org.
- pip: Python package manager (comes with Python installations).
- CUDA Toolkit 12.8 (optional)
- cuDNN 9.7.1 (optional)
Install the required Python packages from requirements.txt
found in the root folder.
pip install -r requirements.txt
- Clone the repository
https://github.com/markbakos/geo-guesser.git
cd geo-guesser
- Set up environmental variables
- In the root folder (geo-guesser), in your .env file:
MAPILLARY_KEY=[Your API key]
- Prepare the dataset
- Set your desired locations to gather data from, or keep the original 5.
- Collect images using
mapillary_collection.py
- Using the trained model
- From console:
python -m predict path/to/saved/image --generate_heatmap
- With the UI:
- Use the deployed website: https://locationguesser.vercel.app
- Use the deployed website: https://locationguesser.vercel.app
- Start the FastAPI server:
uvicorn server:app
Feel free to fork this repository, make changes, and submit a pull request.
For any inquiries, feel free to reach out:
Email: markbakosss@gmail.com
GitHub: markbakos