Following are the consolidated steps that helped me in successfully installing spark with jupyter:
- Create virtual environment named
jupyter
using conda (I always maintain separate virtual env's for every different setup):
~/miniconda3/bin/conda create --name jupyter python=3.4
- Activate that virtual environment and start installing our stuff there
source activate jupyter
- Installation part - install
scala-develop
kernel:
conda install -c conda-forge nb_conda
git clone https://github.com/alexarchambault/jupyter-scala.git
cd jupyter-scala
sbt publishLocal
./jupyter-scala --id scala-develop --name "Scala (develop)" --force
- Installation is over. You should see an output that looks something like this after running the last command where it mentions setting up of
scala-develop kernel
:
output: Run jupyter console with this kernel with
jupyter console --kernel scala-develop
Use this kernel from Jupyter notebook, running
jupyter notebook
and selecting the "Scala (develop)" kernel.
- Let's verify installation. For this, list kernels and it should show
scala-develop
kernel:
jupyter kernelspec list
scala-develop /Users/surthi/Library/Jupyter/kernels/scala-develop
python3 /Users/surthi/miniconda3/envs/jupyter/share/jupyter/kernels/python3
python2 /usr/local/share/jupyter/kernels/python2
- That's it!! Open Jupyter and create spark session. I've used spark 2.1.0 below.
jupiter notebook
https://github.com/jupyter-scala/jupyter-scala#spark
import $exclude.`org.slf4j:slf4j-log4j12`, $ivy.`org.slf4j:slf4j-nop:1.7.21` // for cleaner logs
import $profile.`hadoop-2.6`
import $ivy.`org.apache.spark::spark-sql:2.1.0` // adjust spark version - spark >= 2.0
import $ivy.`org.apache.hadoop:hadoop-aws:2.6.4`
import $ivy.`org.jupyter-scala::spark:0.4.2` // for JupyterSparkSession (SparkSession aware of the jupyter-scala kernel)
import org.apache.spark._
import org.apache.spark.sql._
import jupyter.spark.session._
val sparkSession = JupyterSparkSession.builder() // important - call this rather than SparkSession.builder()
.jupyter() // this method must be called straightaway after builder()
// .yarn("/etc/hadoop/conf") // optional, for Spark on YARN - argument is the Hadoop conf directory
// .emr("2.6.4") // on AWS ElasticMapReduce, this adds aws-related dependencies to spark jars list
// .master("local") // change to "yarn-client" on YARN
// .config("spark.executor.instances", "10")
// .config("spark.executor.memory", "3g")
// .config("spark.hadoop.fs.s3a.access.key", awsCredentials._1)
// .config("spark.hadoop.fs.s3a.secret.key", awsCredentials._2)
.appName("notebook")
.getOrCreate()