Part 2: Real-time Ingestion with Kafka

Directory: 02_pinot-kafka
Objective: Implement real-time data ingestion into Apache Pinot using Apache Kafka as the data source, simulating a stream of movie ratings.
Setup:
- Make sure Docker Compose is running with the necessary services including Kafka and Apache Pinot.
- Navigate to the 02_pinot-kafka directory where the necessary files and scripts are located.
Tasks:

Step 1: Build and Launch with Docker

Description: Prepare the environment by building and launching the required Docker containers, which include Apache Kafka (Kraft mode) and Apache Pinot components.

Action:

docker compose build --no-cache
docker compose up -d

Step 2: Optional Create a Kafka Topic

Description: Create a Kafka topic named movie_ratings for streaming movie rating data.

Action:

docker exec -it kafka kafka-topics.sh \
    --bootstrap-server localhost:9092 \
    --create \
    --topic movie_ratings

Step 3: Configure Pinot Tables

Description: Set up a REALTIME table in Apache Pinot to ingest data from the Kafka movie_ratings topic.

Action:

docker exec -it pinot-controller ./bin/pinot-admin.sh \
    AddTable \
    -tableConfigFile /tmp/pinot/table/ratings.table.json \
    -schemaFile /tmp/pinot/table/ratings.schema.json \
    -exec

Verification:
- Check the real-time data streaming into Pinot by querying the movie_ratings table in the Pinot console:
  http://localhost:9000/#/query?query=select+*+from+movie_ratings+limit+10&tracing=false&useMSE=false

Step 4: Apache Pinot Advanced Usage (Optional)

Description: Explore advanced features of Apache Pinot such as running multi-stage joins between real-time and batch data.

Action:

Enable 'Use Multi-Stage Engine' in the Pinot console to perform complex queries.

select
    r.rating as latest_rating,
    m.rating as initial_rating,
    m.title,
    m.genres,
    m.releaseYear
from movies m
         left join movie_ratings r on m.movieId = r.movieId
where r.rating > 0.9
order by r.rating desc
    limit 10

Clean Up

To stop and remove all services related to this part of the workshop, run:
```
make destroy
```

Troubleshooting

If encountering issues such as 'No space left on device' during the Docker build process, free up space using:
```
docker system prune -f
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.adoc

README.adoc

Part 2: Real-time Ingestion with Kafka

Step 1: Build and Launch with Docker

Step 2: Optional Create a Kafka Topic

Step 3: Configure Pinot Tables

Step 4: Apache Pinot Advanced Usage (Optional)

Clean Up

Troubleshooting

Files

README.adoc

Latest commit

History

README.adoc

File metadata and controls

Part 2: Real-time Ingestion with Kafka

Step 1: Build and Launch with Docker

Step 2: Optional Create a Kafka Topic

Step 3: Configure Pinot Tables

Step 4: Apache Pinot Advanced Usage (Optional)

Clean Up

Troubleshooting