Spark for Machine Learning & AI Apache Spark is one of the most widely used and supported open-source tools for machine learning and big data. In this course, discover how to work with this powerful platform for machine learning. Instructor Dan Sullivan discusses MLlib—the Spark machine learning library—which provides tools for data scientists and analysts who would rather find solutions to business problems than code, test, and maintain their own machine learning libraries. He shows how to use DataFrames to organize data structure, and he covers data preparation and the most commonly used types of machine learning algorithms: clustering, classification, regression, and recommendations. By the end of the course, you will have experience loading data into Spark, preprocessing data as needed to apply MLlib algorithms, and applying those algorithms to a variety of machine learning problems.
- Machine learning workflows
- Organizing data in DataFrames
- Preprocessing and data preparation steps for machine learning
- Clustering data
- Classification algorithms
- Regression methods available in Spark MLlib
- Common approaches to designing recommendation systems