-
Directory:
02_pinot-kafka
-
Objective: Implement real-time data ingestion into Apache Pinot using Apache Kafka as the data source, simulating a stream of movie ratings.
-
Setup:
-
Make sure Docker Compose is running with the necessary services including Kafka and Apache Pinot.
-
Navigate to the
02_pinot-kafka
directory where the necessary files and scripts are located.
-
-
Tasks:
-
Description: Prepare the environment by building and launching the required Docker containers, which include Apache Kafka (Kraft mode) and Apache Pinot components.
-
Action:
docker compose build --no-cache docker compose up -d
-
Description: Create a Kafka topic named
movie_ratings
for streaming movie rating data. -
Action:
docker exec -it kafka kafka-topics.sh \ --bootstrap-server localhost:9092 \ --create \ --topic movie_ratings
-
Description: Set up a REALTIME table in Apache Pinot to ingest data from the Kafka
movie_ratings
topic. -
Action:
docker exec -it pinot-controller ./bin/pinot-admin.sh \ AddTable \ -tableConfigFile /tmp/pinot/table/ratings.table.json \ -schemaFile /tmp/pinot/table/ratings.schema.json \ -exec
-
Verification:
-
Check the real-time data streaming into Pinot by querying the
movie_ratings
table in the Pinot console:http://localhost:9000/#/query?query=select+*+from+movie_ratings+limit+10&tracing=false&useMSE=false
-
-
Description: Explore advanced features of Apache Pinot such as running multi-stage joins between real-time and batch data.
-
Action:
-
Enable 'Use Multi-Stage Engine' in the Pinot console to perform complex queries.
select r.rating as latest_rating, m.rating as initial_rating, m.title, m.genres, m.releaseYear from movies m left join movie_ratings r on m.movieId = r.movieId where r.rating > 0.9 order by r.rating desc limit 10
-