Skip to content

rwth-pads/docker-spark

Repository files navigation

docker-spark

This repository provides a Docker image definition for a spark installation spark-base-img (prebuilt available on Docker Hub: leahtgu/spark-base-img) and a docker compose file for a pseudo-distributed spark cluster consisting of two containers (spak-master, spark-worker). There are also some examples in the mounted-data/ folder which is automatically mounted to the primary container. The image comes with a python installation that provides a convenient jupyter lab with packages pyspark and mrjob.

This setup is intended for educational purposes, particularly

  • running python-defined MapReduce jobs via mrjob on Spark, and
  • playing around with Spark Structured Streaming and the streaming k-means algorithm via pyspark.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages