Skip to content

richardsimoes/spark-workshop-scala

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Simple Spark Maven Project

A demonstration of a simple Spark project that uses Maven for building. The app simply counts the number of lines in a text file.

To build a jar:

mvn package

To build a jar against CDH:

mvn package -Pcdh

To run on a cluster with Spark installed:

spark-submit --class com.cloudera.datascience.MyApp --master local \
  target/simplesparkproject-0.0.1-SNAPSHOT.jar <input file>

To run a REPL that can reference the objects and classes defined in this project:

spark-shell --jars target/simplesparkproject-0.0.1-SNAPSHOT.jar --master local

The --master local argument means that the application will run in a single local process. If the cluster is running a Spark standalone cluster manager, you can replace it with --master spark://<master host>:<master port>. If the cluster is running YARN, you can replace it with --master yarn.

To pass configuration options on the command line, use the --conf option, e.g. --conf spark.serializer=org.apache.spark.serializer.KryoSerializer.

About

Some examples of the Spark API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages