Skip to content

Latest commit

 

History

History
87 lines (79 loc) · 2.48 KB

File metadata and controls

87 lines (79 loc) · 2.48 KB

Part 2: Real-time Ingestion with Kafka

  • Directory: 02_pinot-kafka

  • Objective: Implement real-time data ingestion into Apache Pinot using Apache Kafka as the data source, simulating a stream of movie ratings.

  • Setup:

    • Make sure Docker Compose is running with the necessary services including Kafka and Apache Pinot.

    • Navigate to the 02_pinot-kafka directory where the necessary files and scripts are located.

  • Tasks:

Step 1: Build and Launch with Docker

  • Description: Prepare the environment by building and launching the required Docker containers, which include Apache Kafka (Kraft mode) and Apache Pinot components.

  • Action:

    docker compose build --no-cache
    docker compose up -d

Step 2: Optional Create a Kafka Topic

  • Description: Create a Kafka topic named movie_ratings for streaming movie rating data.

  • Action:

    docker exec -it kafka kafka-topics.sh \
        --bootstrap-server localhost:9092 \
        --create \
        --topic movie_ratings

Step 3: Configure Pinot Tables

  • Description: Set up a REALTIME table in Apache Pinot to ingest data from the Kafka movie_ratings topic.

  • Action:

    docker exec -it pinot-controller ./bin/pinot-admin.sh \
        AddTable \
        -tableConfigFile /tmp/pinot/table/ratings.table.json \
        -schemaFile /tmp/pinot/table/ratings.schema.json \
        -exec
  • Verification:

    • Check the real-time data streaming into Pinot by querying the movie_ratings table in the Pinot console:

      http://localhost:9000/#/query?query=select+*+from+movie_ratings+limit+10&tracing=false&useMSE=false

Step 4: Apache Pinot Advanced Usage (Optional)

  • Description: Explore advanced features of Apache Pinot such as running multi-stage joins between real-time and batch data.

  • Action:

    • Enable 'Use Multi-Stage Engine' in the Pinot console to perform complex queries.

      select
          r.rating as latest_rating,
          m.rating as initial_rating,
          m.title,
          m.genres,
          m.releaseYear
      from movies m
               left join movie_ratings r on m.movieId = r.movieId
      where r.rating > 0.9
      order by r.rating desc
          limit 10

Clean Up

  • To stop and remove all services related to this part of the workshop, run:

    make destroy

Troubleshooting

  • If encountering issues such as 'No space left on device' during the Docker build process, free up space using:

    docker system prune -f