Skip to content

Latest commit

 

History

History
9 lines (9 loc) · 427 Bytes

README.md

File metadata and controls

9 lines (9 loc) · 427 Bytes

Apache Spark Using Python3 for data analysis

  • Batch Processing using Apache Spark and Python3 for data exploration
  • Dataset was downloded from https://www.kaggle.com/
  • Focusing on Pyspark SQL libraries
    • from pyspark.sql.types import BooleanType
    • from pyspark.sql.functions import udf
    • from pyspark.sql import functions as F
    • from pyspark.sql import SparkSession
    • from pyspark.sql import Window