Lambda-architecture

In this project, we are trying to build data pipeline using Lambda architecture to handle massive quantities of data by taking advantage of both batch and stream processing methods. Besides, we also analyze Twitter's tweets.

Prerequisite

Python 3.*
Apache Spark 3.2.*
Account for Twitter API

Setup

Config.ini file
- Change config.template.ini to config.ini
- Adjust some basic value in config.ini
logs folder
- Grant full permission : sudo chmod a+rwx src/logs

Usage

Clone repository

  git clone

Run Docker containers

  make start-docker

Setup virtual env for project

  make setup-env

Run project

   make start-all

Analyze

  Go to notebook for analyzing

Common Error

If not find twitter keyspace, run container cassandra-init-schema again

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
deploy		deploy
scripts		scripts
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lambda-architecture

Prerequisite

Setup

Usage

Common Error

About

Releases

Packages

Languages

phatnguyen080401/lambda-architecture

Folders and files

Latest commit

History

Repository files navigation

Lambda-architecture

Prerequisite

Setup

Usage

Common Error

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages