Get into Scala/Spark/Zepplin

I wanted to get to know some new technologies, apart from the Python Stack, that are typically used for Data Exploration, Machine Learning, ...

In this zeppelin notebook I did some (light) EDA work, followed by typical binary classifiers on a public version of the famous Titanic dataset.

The notebook is self-contained, i.e. the data is downloaded inside.

Usage

Requirements:

Podman or Docker

Getting Started

As I am working on a Fedora 31 workstation, I used Podman over Docker because it does not need sudo accesses and it is preinstalled.

The zeppelin notebook can be started with:

podman run -p 8080:8080 --rm --name zeppelin apache/zeppelin:0.9.0

or to directly load the notebook inside:

podman run -p 8080:8080 --rm -v $PWD/notebook:/notebook -e ZEPPELIN_NOTEBOOK_DIR=/notebook --name zeppelin apache/zeppelin:0.9.0

However sometimes there are permission errors when mounting a volume into the container. Then you could choose the first version and simply import the notebook file via the gui.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
notebook		notebook
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Get into Scala/Spark/Zepplin

Usage

Requirements:

Getting Started

About

Releases

Packages

almajo/get_into_zeppelin_sparkml

Folders and files

Latest commit

History

Repository files navigation

Get into Scala/Spark/Zepplin

Usage

Requirements:

Getting Started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages