Skip to content
#

spark-dataframes

Here are 42 public repositories matching this topic...

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .

  • Updated Nov 16, 2022
  • Scala

This project utilizes PySpark DataFrames and PySpark RDD to implement item-based collaborative filtering. By calculating cosine similarity scores or identifying movies with the highest number of shared viewers, the system recommends 10 similar movies for a given target movie that aligns users’ preferences.

  • Updated Jun 29, 2024
  • Jupyter Notebook

Use this project to join data from multiple csv files. Currently in this project we support one to one and one to many join. Along with this you can find how to use kafka producer efficiently with spark.

  • Updated Jul 1, 2022
  • Java

Improve this page

Add a description, image, and links to the spark-dataframes topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the spark-dataframes topic, visit your repo's landing page and select "manage topics."

Learn more