This repository provides a Docker image definition for a spark installation spark-base-img
(prebuilt available on Docker Hub: leahtgu/spark-base-img) and a docker compose file for a pseudo-distributed spark cluster consisting of two containers (spak-master, spark-worker).
There are also some examples in the mounted-data/
folder which is automatically mounted to the primary container.
The image comes with a python installation that provides a convenient jupyter lab with packages pyspark
and mrjob
.
This setup is intended for educational purposes, particularly