Record of data investigation, experimentation, and other thoughts are in my Lab Notebook. All relevant Java source code is in src/main/java
.
The project can be built with Gradle, specifically gradle fatJar
will create a jar file that can be used with Spark. See my Lab Notebook for relevant spark-submit
commands.