-
Updated
Aug 15, 2024 - Python
dataingestionframework
Here are 6 public repositories matching this topic...
This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.
-
Updated
Aug 17, 2024 - Python
The Spark Memory Configuration Calculator is designed to help data engineers and Spark developers quickly determine the optimal memory and core configurations for their Spark clusters. With this tool, you can avoid common pitfalls and ensure your cluster resources are used efficiently, leading to better performance and lower costs.
-
Updated
Aug 15, 2024 - Python
Automated Wafer Sensor Fault Detection with CI/CD Pipeline This project implements a system for wafer sensor fault detection using machine learning.
-
Updated
Aug 10, 2023 - Jupyter Notebook
-
Updated
Aug 15, 2024 - Python
This repository highlights my ability to develop and integrate diverse Python solutions, ranging from API creation and data management to cloud service integration. Each project in this repository serves a specific purpose, demonstrating both fundamental concepts and practical applications that are essential in real-world software development.
-
Updated
Aug 17, 2024 - Python
Improve this page
Add a description, image, and links to the dataingestionframework topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the dataingestionframework topic, visit your repo's landing page and select "manage topics."