-
Notifications
You must be signed in to change notification settings - Fork 9
spark
Apache Spark comes as a module on the discovery cluster. You can currently run Spark on Discovery either in local mode or standalone-mode. In local mode, spark jobs run on a single machine, and are executed in parallel using multi-threading: this restricts parallelism to (at most) the number of cores in your machine.
By launching a standalone cluster, you can run your jobs over multiple machines simultaneously. However, configuring a standalone cluster requires a bit more effort than running spark in local mode: local mode runs practically "out-of-the-box" in Discovery, while launching a standalone cluster requires installing a few extra scripts. Both methods are covered below; if you wish to only run spark in local mode, you can ignore the second bullet.
Local Mode | Cluster Mode | |
---|---|---|
Parameters |
N : number of CPU cores, e.g., 10 |
MASTER_IP : IP address of cluster master, e.g., 10.100.8.52 |
Interpreter | pyspark --master local[N] |
pyspark --master spark://MASTER_IP:7077 |
Run mycript.py
|
spark-submit --master local[N] myscript.py |
spark-submit --master spark://MASTER_IP:7077 myscript.py |
Both local and standalone modes in spark offer a graphical user interface (GUI), from which you can monitor execution. In order to access this GUI first you need to find the IP address of the machine in which the driver of spark application is running. Find this machine (e.g., c3096) via squeue
, then type:
traceroute c3096
The following will be printed
traceroute to c3096 (10.99.252.65), 30 hops max, 60 byte packets
This means that the IP address of c3096
machine is 10.99.252.65
.
Cutting and pasting the URL (here, http://10.99.252.65:4040
) to a web browser will show you the webpage containing the GUI. You need to be within Northeastern's network to access the GUI. If you are off-campus, you need to first set up a VPN to access the GUI.
- Learning Spark: Online book available at NEU library.
- Spark documentation and programming guide
- Application submission guide: Guide for submitting applications to a cluster.
- Cluster mode overview: Main components of a cluster
- Discovery Cluster: overview of the Discovery cluster
- Interactive mode: instructions on how to reserve a node in discovery in interactive mode.
Back to the Discovery Cluster page
Back to main page.