ansible-spark

Ansible playbook to spin up a basic Spark cluster using Apache Bigtop on OpenStack

NOTE: Not yet fully implemented or tested

Dependencies

Tested with OpenStack Ubuntu 17.04 image, Ansible 2.3.1.0, Python 2.7
TODO: Client configuration

Playbook

TODO: write instructions for config & run project

Example Map Reduce Job

curl http://www.gutenberg.org/ebooks/20417 --output outline_of_science_20417.txt curl http://www.gutenberg.org/ebooks/5000 --output davinci_5000.txt curl http://www.gutenberg.org/ebooks/4300 --output ulysses_4300.txt

sudo su hdfs -c "hdfs dfs -mkdir /data" sudo su hdfs -c "hdfs dfs -mkdir /output" sudo su hdfs -c "hdfs dfs -mkdir /tmp" sudo su hdfs -c "hdfs dfs -chmod go+rwx /data" sudo su hdfs -c "hdfs dfs -chmod go+rwx /output" sudo su hdfs -c 'hdfs dfs -chmod go+rwx /tmp'

sudo su hdfs -c "hdfs dfs -copyFromLocal *.txt /data/"

sudo su yarn -c 'hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /data /output/run1'

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
foundation/tasks		foundation/tasks
hadoop/tasks		hadoop/tasks
hdfs_datanode/tasks		hdfs_datanode/tasks
hdfs_namenode/tasks		hdfs_namenode/tasks
.gitignore		.gitignore
README.md		README.md
spark.yml		spark.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ansible-spark

Dependencies

Playbook

Example Map Reduce Job

About

Releases

Packages

paulwelch/ansible-spark

Folders and files

Latest commit

History

Repository files navigation

ansible-spark

Dependencies

Playbook

Example Map Reduce Job

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages