Skip to content

24jmwangi/KwanzaTukule

Repository files navigation

Kwanza Tukule Study Case Analytics Pipeline

License

Run Analytics pipeline Every 5 Hours

This repository contains the analytics pipeline for the Kwanza Tukule study case project. The pipeline is designed to ingest, clean, transform,Analyze and visualize data from Google Sheets using Google Colab and Looker Studio. The pipeline is automated using GitHub Actions, which runs every 5 hours.


Table of Contents

  1. Overview
  2. Architecture
  3. Notebook & Dashboard
  4. GitHub Actions Workflow
  5. Installation
  6. Usage
  7. Contributing

Overview

The Kwanza Tukule Analytics Pipeline automates the process of:

  • Data Ingestion: Pulling data from Google Sheets using the Google Sheets API.
  • Data Cleaning ,Transformation, Analysis: Processing the data in Google Colab.
  • Data Visualization: Visualizing the transformed data in Looker Studio.

The pipeline is scheduled to run every 5 hours using GitHub Actions.


Architecture

Below is the architecture of the analytics pipeline:

Architecture

Key Components:

  1. Data Source: Google Sheets.
  2. Ingestion: Data is pulled using the Google Sheets API.
  3. Cleaning and Transformation, Analysis: Performed in Google Colab.
  4. Visualization: Data is visualized in Looker Studio.
  5. Automation: GitHub Actions triggers the pipeline every 5 hours.

Notebook

The core logic of the Analysis is implemented in the following Jupyter Notebook:

📒 Kwanza Tukule Case Study Notebook

This notebook contains the code for data ingestion, cleaning, transformation,Analysis and preparation for visualization.

LOOKER DASHBOARD

https://lookerstudio.google.com/s/ou_fip2m4aY


GitHub Actions Workflow

The pipeline is automated using GitHub Actions. The workflow is defined in the following YAML file:

📄 GitHub Actions Workflow

The workflow runs every 5 hours and executes the notebook


Installation (running locally) - (Recommended- run in google colab or using actions)

To set up this project locally, follow these steps:

  1. Clone the repository:
    git clone https://github.com/24jmwangi/KwanzaTukule.git
  2. Navigate to the project directory:
    cd KwanzaTukule
  3. Install dependencies (if any):
    pip install -r requirements.txt
  4. Open the notebook in Google Colab or Jupyter:
    jupyter notebook KWANZA_TUKULE_CASE_STUDY.ipynb

Usage

To use the pipeline:

  1. Ensure your Google Sheets API credentials are set up.
  2. Update the notebook with your Google Sheet ID and range.
  3. Run the notebook to ingest, clean,transform, Analyze the data.
  4. Visualize the data in Looker Studio.

For automation, the GitHub Actions workflow will handle the execution every 5 hours.


Contributing

Contributions are welcome! If you'd like to contribute, please follow these steps:

  1. Fork the repository.
  2. Create a new branch:
    git checkout -b feature/your-feature-name
  3. Commit your changes:
    git commit -m "Add your commit message here"
  4. Push to the branch:
    git push origin feature/your-feature-name
  5. Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.


About

KwanzaTukule Data Analysis case study

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published