Skip to content

phatnguyen080401/lambda-architecture

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lambda-architecture

In this project, we are trying to build data pipeline using Lambda architecture to handle massive quantities of data by taking advantage of both batch and stream processing methods. Besides, we also analyze Twitter's tweets.

Prerequisite

  • Python 3.*
  • Apache Spark 3.2.*
  • Account for Twitter API

Setup

  1. Config.ini file
    • Change config.template.ini to config.ini
    • Adjust some basic value in config.ini
  2. logs folder
    • Grant full permission : sudo chmod a+rwx src/logs

Usage

  1. Clone repository
  git clone 
  1. Run Docker containers
  make start-docker
  1. Setup virtual env for project
  make setup-env
  1. Run project
   make start-all
  1. Analyze
  Go to notebook for analyzing

Common Error

  1. If not find twitter keyspace, run container cassandra-init-schema again

Releases

No releases published

Packages

No packages published

Languages