Apache-Spark-3-for-Data-Engineering-Analytics

Introduction to Spark:

PySpark is a library that can be used to run python application using Apache Spark Capability in other words PySpark is Python API for Spark.
Spark is not programming language. a. write spark applications using Java, Scala, R and Python b. PySpark allows you to write python based data processing applications that execute on a distributed cluster in parallel

Apache Spark is an analytical processing engine for large scale powerful distributed data processing and also machine learning applications.

Basic set- up for PySpark on Ubuntu for distributed Machine Learning. Prerequisites:

Required Packages:

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Introduction_to_Spark_and_Installation		Introduction_to_Spark_and_Installation
RDD_Crash_Course		RDD_Crash_Course
SQL_and_Databricks		SQL_and_Databricks
Spark_Execution_Concepts		Spark_Execution_Concepts
Structured_API-Spark_DataFrame		Structured_API-Spark_DataFrame
README.md		README.md