Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable IRIS to use FAQs #187

Merged
merged 24 commits into from
Jan 30, 2025
Merged

Enable IRIS to use FAQs #187

merged 24 commits into from
Jan 30, 2025

Conversation

cremertim
Copy link
Contributor

@cremertim cremertim commented Jan 6, 2025

New Features

  • Added new Pipeline for FAQ ingestion
  • Added faq retrival pipeline to allow answer based on FAQs
  • Created new data transfer objects (DTOs) for managing faqs and faq status from Artemis

Summary by CodeRabbit

  • New Features

    • Added FAQ ingestion and retrieval functionality to the application.
    • Introduced a new pipeline for handling FAQ-related operations.
    • Implemented FAQ search and content retrieval in course chat.
    • Added new webhook endpoints for FAQ ingestion and deletion.
    • Enhanced the get_pipeline function to support FAQ ingestion feature.
  • Improvements

    • Enhanced vector database schema to support FAQ storage.
    • Added new data transfer objects for FAQ management.
    • Integrated FAQ retrieval into existing chat pipeline.
    • Improved logging and error handling for FAQ processes.
  • Technical Updates

    • Implemented status callback mechanism for FAQ pipeline execution.
    • Introduced new classes and methods for managing FAQs and their ingestion.
    • Added new prompts for FAQ retrieval and citation formatting.
    • Enhanced the CitationPipeline to support formatting of FAQ data.
    • Added new enumeration for information types in citation handling.

Copy link
Contributor

coderabbitai bot commented Jan 6, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

This pull request introduces comprehensive enhancements related to FAQ (Frequently Asked Questions) functionality across various components of the application. Key changes include the addition of data transfer objects, pipeline configurations, retrieval mechanisms, and webhook endpoints. These modifications facilitate the ingestion, retrieval, and management of FAQs within courses, improving the system's capability to provide contextual information to students through an AI-driven interface.

Changes

File Change Summary
app/common/PipelineEnum.py Added two new enum members: IRIS_FAQ_INGESTION and IRIS_FAQ_RETRIEVAL_PIPELINE
app/domain/data/faq_dto.py Introduced FaqDTO with attributes for FAQ data representation
app/domain/ingestion/deletionPipelineExecutionDto.py Added FaqDeletionExecutionDto for managing FAQ deletion pipeline execution
app/domain/ingestion/ingestion_pipeline_execution_dto.py Created FaqIngestionPipelineExecutionDto for FAQ ingestion pipeline execution
app/pipeline/chat/course_chat_pipeline.py Enhanced CourseChatPipeline with FAQ retrieval and integration mechanisms
app/pipeline/faq_ingestion_pipeline.py Implemented FaqIngestionPipeline for managing FAQ data ingestion
app/pipeline/prompts/faq_retrieval_prompts.py Added prompts for FAQ retrieval and query generation
app/pipeline/shared/citation_pipeline.py Modified CitationPipeline to support FAQ information formatting
app/retrieval/faq_retrieval.py Created FaqRetrieval class for comprehensive FAQ retrieval strategies
app/vector_database/database.py Updated VectorDatabase to initialize FAQ schema
app/vector_database/faq_schema.py Defined schema for FAQ data in vector database
app/web/routers/pipelines.py Added support for FAQ ingestion pipeline feature
app/web/routers/webhooks.py Introduced webhook endpoints for FAQ ingestion and deletion
app/web/status/faq_ingestion_status_callback.py Created FaqIngestionStatus callback for tracking FAQ ingestion pipeline status

Possibly related PRs

Suggested Reviewers

  • Hialus
  • bassner
  • MichaelOwenDyer

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a36dfcc and 1ca23bd.

📒 Files selected for processing (2)
  • app/pipeline/chat/course_chat_pipeline.py (8 hunks)
  • app/pipeline/faq_ingestion_pipeline.py (1 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (18)
app/pipeline/faq_ingestion_pipeline.py (3)

87-88: Consolidate nested context managers.

Consider using a single with statement for both batch_update_lock and self.collection.batch.rate_limit() to reduce nesting and improve readability.

-with batch_update_lock:
-    with self.collection.batch.rate_limit(requests_per_minute=600) as batch:
+with batch_update_lock, self.collection.batch.rate_limit(requests_per_minute=600) as batch:
     ...
🧰 Tools
🪛 Ruff (0.8.2)

87-88: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


140-146: Return value from chunk_data to avoid confusion.

The method doesn’t return anything. If returning None is intentional, explicitly return None to improve clarity.

 def chunk_data(self, path: str) -> List[Dict[str, str]]:
     ...
     return
+    # or, for explicitness:
+    # return None
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


1-146: Apply Black formatting.

The linter warns the file needs to be reformatted. Please run Black to ensure consistent code style.

🧰 Tools
🪛 Ruff (0.8.2)

6-6: openai.OpenAI imported but unused

Remove unused import: openai.OpenAI

(F401)


87-88: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py (3)

2-2: Evaluate the usage of logger from asyncio.log.

You already import the standard logging module. If it’s intentional to use asyncio.log.logger, clarify to avoid confusion for future maintainers. Otherwise, consider switching to the conventional logging approach.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


191-191: Check string matching for 'YES'.

Strict equality to "YES" could cause edge-case failures if the response includes extra whitespace or a different capitalization. Consider normalizing the casing or trimming whitespace before comparison.

-return response == "YES"
+return response.strip().upper() == "YES"
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


1-395: Apply Black formatting.

The linter warns the file needs to be reformatted. Please run Black to ensure consistent code style.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/domain/data/faq_dto.py (1)

1-13: Apply Black formatting.

The linter warns the file needs to be reformatted. Please run Black to ensure consistent code style.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/domain/ingestion/deletionPipelineExecutionDto.py (1)

18-23: Apply Black formatting.

The file also needs reformatting according to the pipeline.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/domain/ingestion/ingestion_pipeline_execution_dto.py (1)

18-23: Apply Black formatting.

The file also needs reformatting according to the pipeline.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/web/status/faq_ingestion_status_callback.py (1)

17-45: Add robust error handling.

Consider adding error handling around URL construction and the super().__init__ call to anticipate potential issues in production—for example, invalid or inaccessible endpoints.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/vector_database/faq_schema.py (2)

1-1: Remove unused logging import.
This import is never referenced in the file. Eliminating unused imports helps reduce code clutter.

- import logging
 from enum import Enum
🧰 Tools
🪛 Ruff (0.8.2)

1-1: logging imported but unused

Remove unused import: logging

(F401)


31-31: Remove unused variable assignment.
The local variable properties is assigned but never used. Eliminating unused variables helps keep the code clean.

- properties = collection.config.get(simple=True).properties
🧰 Tools
🪛 Ruff (0.8.2)

31-31: Local variable properties is assigned to but never used

Remove assignment to unused variable properties

(F841)

app/pipeline/shared/citation_pipeline.py (2)

1-1: Remove unused logging import.
This import is never referenced, so deleting it helps maintain code cleanliness.

- import logging
 import os
 from asyncio.log import logger
🧰 Tools
🪛 Ruff (0.8.2)

1-1: logging imported but unused

Remove unused import: logging

(F401)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


80-80: Rename unused loop variable.
The loop index variable i is never used. Renaming it to _ or _i clarifies that it’s intentionally unused.

 for i, faq in enumerate(faqs):
     faq = "Question: {}, Answer: {}".format(
         ...
+ # rename i -> _
 for _, faq in enumerate(faqs):
     faq = ...
🧰 Tools
🪛 Ruff (0.8.2)

80-80: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/pipeline/chat/course_chat_pipeline.py (4)

13-13: Remove unused import.

Static analysis indicates that SystemMessagePromptTemplate is imported but not used. Please remove it to keep the code clean.

- from langchain_core.prompts import (
-     ChatPromptTemplate, SystemMessagePromptTemplate,
- )
+ from langchain_core.prompts import (
+     ChatPromptTemplate,
+ )
🧰 Tools
🪛 Ruff (0.8.2)

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


85-85: Consider initializing retrieved_faqs to an empty list.

Currently, retrieved_faqs is None by default. If the downstream logic doesn’t handle None checks or if you need to append to it before any retrieval, you may encounter NoneType errors. Initializing it to an empty list can help.

- retrieved_faqs: List[dict] = None
+ retrieved_faqs: List[dict] = []
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


300-329: Explore error handling for no matched FAQs.

The function returns an empty string if no FAQs are found. Consider adding some feedback to clarify that no matching FAQs were retrieved, otherwise there's a risk of returning an empty answer with no context.

 if not self.retrieved_faqs:
+    return "No matching FAQs found."
 result = ""
 for faq in self.retrieved_faqs:
     ...
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


510-516: Fix docstring reference.

The docstring still references “lectures” instead of “FAQs.” Update it to reflect the correct resource type being checked.

 :return: True if there are indexed lectures for the course, False otherwise
+:return: True if there are indexed FAQs for the course, False otherwise
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ddd77de and 3b12cd6.

⛔ Files ignored due to path filters (1)
  • docker/pyris-dev.yml is excluded by !**/*.yml
📒 Files selected for processing (14)
  • app/common/PipelineEnum.py (1 hunks)
  • app/domain/data/faq_dto.py (1 hunks)
  • app/domain/ingestion/deletionPipelineExecutionDto.py (2 hunks)
  • app/domain/ingestion/ingestion_pipeline_execution_dto.py (2 hunks)
  • app/pipeline/chat/course_chat_pipeline.py (10 hunks)
  • app/pipeline/faq_ingestion_pipeline.py (1 hunks)
  • app/pipeline/prompts/faq_retrieval_prompts.py (1 hunks)
  • app/pipeline/shared/citation_pipeline.py (4 hunks)
  • app/retrieval/faq_retrieval.py (1 hunks)
  • app/vector_database/database.py (2 hunks)
  • app/vector_database/faq_schema.py (1 hunks)
  • app/web/routers/pipelines.py (1 hunks)
  • app/web/routers/webhooks.py (5 hunks)
  • app/web/status/faq_ingestion_status_callback.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Run linters
app/domain/ingestion/deletionPipelineExecutionDto.py

[warning] File needs to be reformatted using Black

app/domain/data/faq_dto.py

[warning] File needs to be reformatted using Black

app/pipeline/chat/course_chat_pipeline.py

[warning] File needs to be reformatted using Black

app/pipeline/shared/citation_pipeline.py

[warning] File needs to be reformatted using Black

app/pipeline/faq_ingestion_pipeline.py

[warning] File needs to be reformatted using Black

app/web/status/faq_ingestion_status_callback.py

[warning] File needs to be reformatted using Black

app/domain/ingestion/ingestion_pipeline_execution_dto.py

[warning] File needs to be reformatted using Black

app/web/routers/webhooks.py

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py

[warning] File needs to be reformatted using Black

🪛 Ruff (0.8.2)
app/pipeline/chat/course_chat_pipeline.py

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

app/pipeline/shared/citation_pipeline.py

1-1: logging imported but unused

Remove unused import: logging

(F401)


80-80: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

app/vector_database/faq_schema.py

1-1: logging imported but unused

Remove unused import: logging

(F401)


31-31: Local variable properties is assigned to but never used

Remove assignment to unused variable properties

(F841)

app/pipeline/faq_ingestion_pipeline.py

6-6: openai.OpenAI imported but unused

Remove unused import: openai.OpenAI

(F401)


87-88: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Push Docker Image
🔇 Additional comments (13)
app/pipeline/faq_ingestion_pipeline.py (1)

56-79: Handle potential dto attribute issues in __call__.

self.dto is assumed to have non-None values for faq inside the method. If dto or dto.faq can be None, consider adding safeguards or checks before accessing dto.faq.

Would you like a script to search for other places where dto might be used without null checks across the codebase to confirm it's always set?

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py (2)

148-149: Confirm question assessment coverage.

The condition if not self.assess_question(chat_history, student_query): return [] might skip retrieving FAQs entirely. Ensure you have tests verifying that valid queries pass this check.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


223-225: Record token usage more uniformly.

You're appending self.llm.tokens to self.tokens. Verify that any prior usage of tokens in the pipeline isn’t overwritten or missed. If concurrency is involved, consider thread-safe approaches.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/domain/data/faq_dto.py (1)

4-10: DTO field defaults and alias usage look good.

The structure appears straightforward, with well-defined fields and aliases. Everything looks consistent and clear.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/domain/ingestion/deletionPipelineExecutionDto.py (1)

18-23: Ensure alignment with the ingestion/deletion flow.

The new FaqDeletionExecutionDto references FaqDTO with alias pyrisFaqWebhookDTO. Verify that references in the deletion pipeline only rely on that alias or confirm that all references are consistent.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/domain/ingestion/ingestion_pipeline_execution_dto.py (1)

18-23: DTO usage consistency.

Similar to the deletion DTO, ensure calls expecting pyrisFaqWebhookDTO come from the correct field. This prevents mismatch errors when hooking up your ingestion pipeline.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/common/PipelineEnum.py (1)

17-18: Enums added successfully.

The new enum members align well with existing naming conventions. Consider adding brief docstrings or comments indicating their use cases, but otherwise, the additions look good.

app/pipeline/prompts/faq_retrieval_prompts.py (1)

1-19: Prompts seem clearly defined.

These prompt strings are well-structured and descriptive. To further improve maintainability, consider documenting any guidelines or constraints (e.g., word limit enforcement) so that future maintainers understand the context and usage.

app/vector_database/database.py (1)

32-32: Schema initialization looks good.

The new faqs schema initialization follows the existing pattern. Verify that repeated queries for FAQ data do not require reinitializing the schema for each call. Otherwise, this approach appears consistent with the rest of the file.

app/web/routers/webhooks.py (2)

74-100: Overall concurrency approach looks consistent.
Your usage of Semaphore and finally blocks ensures that resources are always released. This is in line with best practices.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


153-179: Webhook naming and structure are consistent with the codebase.
The new endpoints follow the existing patterns for ingestion and deletion.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/web/routers/pipelines.py (1)

298-305: Implementation of "FAQ_INGESTION" variant is correct.
The newly added switch case for "FAQ_INGESTION" properly returns a default variant, mirroring other feature switch patterns.

app/pipeline/chat/course_chat_pipeline.py (1)

Line range hint 1-9999: Reformat code with Black.

The file needs to be formatted according to Black. Please run the formatter to comply with the pipeline requirements.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/pipeline/faq_ingestion_pipeline.py Outdated Show resolved Hide resolved
app/web/status/faq_ingestion_status_callback.py Outdated Show resolved Hide resolved
app/web/routers/webhooks.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (1)
app/pipeline/shared/citation_pipeline.py (1)

Line range hint 144-144: Fix incorrect logger.error usage

At line 144, the logger.error method is not used correctly. Pass the exception using exc_info or include it in the message.

Replace:

logger.error("citation pipeline failed", e)

With:

logger.error(f"citation pipeline failed: {e}")

Or, to include traceback information:

logger.error("citation pipeline failed", exc_info=True)
🧰 Tools
🪛 Ruff (0.8.2)

1-1: logging imported but unused

Remove unused import: logging

(F401)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

🧹 Nitpick comments (14)
app/retrieval/faq_retrieval.py (4)

181-187: Remove unnecessary exception handling

In the assess_question method, exceptions are caught and immediately re-raised without additional handling or logging. This try-except block can be removed to simplify the code.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


215-222: Remove unnecessary exception handling

In rewrite_student_query, the try-except block catches exceptions and immediately re-raises them without adding context or handling. Consider removing the try-except block unless specific handling is needed.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


253-260: Remove unnecessary exception handling

In rewrite_elaborated_query, exceptions are caught and immediately re-raised. Unless you're adding specific handling or logging, the try-except block can be omitted.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


66-82: Consider initializing self.tokens in the superclass

If multiple pipelines use self.tokens, consider moving its initialization to the superclass Pipeline for consistency and to avoid duplication.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/pipeline/prompts/faq_citation_prompt.txt (3)

1-1: Add missing commas for clarity

In line 1, add commas after introductory phrases to improve readability.

Change:

In the paragraphs below you are provided with an answer to a question. Underneath the answer you will find the faqs that the answer was based on.

To:

In the paragraphs below, you are provided with an answer to a question. Underneath the answer, you will find the FAQs that the answer was based on.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~1-~1: Possible missing comma found.
Context: In the paragraphs below you are provided with an answer to a qu...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~1-~1: Possible missing comma found.
Context: ...an answer to a question. Underneath the answer you will find the faqs that the answer ...

(AI_HYDRA_LEO_MISSING_COMMA)


7-8: Clarify instructions regarding citations

Lines 7 and 8 contain contradictory instructions about including citations. Consider rephrasing for clarity.


32-32: Reduce excessive exclamation marks

Using multiple exclamation marks may seem unprofessional. Consider reducing them for a formal tone.

Change:

If the answer actually does not contain any information from the paragraphs, please do not include any citations and return '!NONE!'.

To:

If the answer does not contain any information from the paragraphs, please do not include any citations and return 'NONE'.
🧰 Tools
🪛 LanguageTool

[style] ~32-~32: Using many exclamation marks might seem excessive (in this case: 4 exclamation marks for a text that’s 2043 characters long)
Context: ...o not include any citations and return '!NONE!'.

(EN_EXCESSIVE_EXCLAMATION)

app/pipeline/shared/citation_pipeline.py (2)

1-1: Remove unused import logging

The logging module is imported but not used in the file. Remove it to clean up the code.

🧰 Tools
🪛 Ruff (0.8.2)

1-1: logging imported but unused

Remove unused import: logging

(F401)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


84-91: Rename unused loop variable

In the create_formatted_faq_string method, the loop variable i is not used. Rename it to _ to indicate it's intentionally unused.

Change:

for i, faq in enumerate(faqs):

To:

for _, faq in enumerate(faqs):
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/pipeline/chat/course_chat_pipeline.py (5)

13-13: Remove unused import

The SystemMessagePromptTemplate is imported but never used in the code.

-    ChatPromptTemplate, SystemMessagePromptTemplate,
+    ChatPromptTemplate,
🧰 Tools
🪛 Ruff (0.8.2)

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


300-329: Consider using string join for better readability

The FAQ result string concatenation could be improved using join() for better readability and performance.

-            result = ""
-            for faq in self.retrieved_faqs:
-                res = "[FAQ ID: {}, FAQ Question: {}, FAQ Answer: {}]".format(
-                    faq.get(FaqSchema.FAQ_ID.value),
-                    faq.get(FaqSchema.QUESTION_TITLE.value),
-                    faq.get(FaqSchema.QUESTION_ANSWER.value),
-                )
-                result += res
-            return result
+            return "\n".join(
+                "[FAQ ID: {}, FAQ Question: {}, FAQ Answer: {}]".format(
+                    faq.get(FaqSchema.FAQ_ID.value),
+                    faq.get(FaqSchema.QUESTION_TITLE.value),
+                    faq.get(FaqSchema.QUESTION_ANSWER.value),
+                )
+                for faq in self.retrieved_faqs
+            )
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


429-433: Remove extra blank lines

There are unnecessary blank lines after the FAQ tool addition.

            if self.should_allow_faq_tool(dto.course.id):
                tool_list.append(faq_content_retrieval)
-

-

-
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


454-459: Consolidate progress callbacks

The "Augmenting response..." callback is duplicated for paragraphs and FAQs. Consider consolidating these callbacks.

-                self.callback.in_progress("Augmenting response ...")
                 out = self.citation_pipeline(self.retrieved_paragraphs, out, InformationType.PARAGRAPHS)
             self.tokens.extend(self.citation_pipeline.tokens)

             if self.retrieved_faqs:
-                self.callback.in_progress("Augmenting response ...")
                 out = self.citation_pipeline(self.retrieved_faqs, out, InformationType.FAQS, base_url=dto.settings.artemis_base_url)
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


511-528: Fix docstring return description

The docstring incorrectly mentions "lectures" instead of "FAQs" in the return description.

        """
        Checks if there are indexed faqs for the given course

        :param course_id: The course ID
-        :return: True if there are indexed lectures for the course, False otherwise
+        :return: True if there are indexed FAQs for the course, False otherwise
        """
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b12cd6 and 1017477.

📒 Files selected for processing (4)
  • app/pipeline/chat/course_chat_pipeline.py (10 hunks)
  • app/pipeline/prompts/faq_citation_prompt.txt (1 hunks)
  • app/pipeline/shared/citation_pipeline.py (5 hunks)
  • app/retrieval/faq_retrieval.py (1 hunks)
🧰 Additional context used
🪛 LanguageTool
app/pipeline/prompts/faq_citation_prompt.txt

[uncategorized] ~1-~1: Possible missing comma found.
Context: In the paragraphs below you are provided with an answer to a qu...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~1-~1: Possible missing comma found.
Context: ...an answer to a question. Underneath the answer you will find the faqs that the answer ...

(AI_HYDRA_LEO_MISSING_COMMA)


[style] ~32-~32: Using many exclamation marks might seem excessive (in this case: 4 exclamation marks for a text that’s 2043 characters long)
Context: ...o not include any citations and return '!NONE!'.

(EN_EXCESSIVE_EXCLAMATION)

🪛 Ruff (0.8.2)
app/pipeline/chat/course_chat_pipeline.py

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

app/pipeline/shared/citation_pipeline.py

1-1: logging imported but unused

Remove unused import: logging

(F401)


83-83: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

🪛 GitHub Actions: Run linters
app/pipeline/chat/course_chat_pipeline.py

[warning] File needs to be reformatted using Black

app/pipeline/shared/citation_pipeline.py

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py

[warning] File needs to be reformatted using Black

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Push Docker Image
🔇 Additional comments (1)
app/pipeline/chat/course_chat_pipeline.py (1)

Line range hint 1-537: Implementation looks good! 👍

The FAQ integration is well-implemented, following existing patterns and conventions. The code successfully enables IRIS to use FAQs as intended in the PR objectives.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py Outdated Show resolved Hide resolved
app/retrieval/faq_retrieval.py Outdated Show resolved Hide resolved
app/retrieval/faq_retrieval.py Outdated Show resolved Hide resolved
app/retrieval/faq_retrieval.py Outdated Show resolved Hide resolved
app/pipeline/shared/citation_pipeline.py Show resolved Hide resolved
coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 13, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
app/pipeline/chat/course_chat_pipeline.py (5)

13-13: Remove unused import

The import SystemMessagePromptTemplate is not used in the code.

-from langchain_core.prompts import (
-    ChatPromptTemplate, SystemMessagePromptTemplate,
-)
+from langchain_core.prompts import ChatPromptTemplate
🧰 Tools
🪛 Ruff (0.8.2)

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


307-336: Add error handling for FAQ retrieval

While the implementation is good, it should handle potential errors from the FAQ retriever, similar to other tools in the pipeline.

 def faq_content_retrieval() -> str:
     self.callback.in_progress("Retrieving faq content ...")
-    self.retrieved_faqs = self.faq_retriever(
-        chat_history=history,
-        student_query=query.contents[0].text_content,
-        result_limit=5,
-        course_name=dto.course.name,
-        course_id=dto.course.id,
-        base_url=dto.settings.artemis_base_url,
-    )
+    try:
+        self.retrieved_faqs = self.faq_retriever(
+            chat_history=history,
+            student_query=query.contents[0].text_content,
+            result_limit=5,
+            course_name=dto.course.name,
+            course_id=dto.course.id,
+            base_url=dto.settings.artemis_base_url,
+        )
+    except Exception as e:
+        logger.error("Error retrieving FAQs", exc_info=e)
+        return "Error retrieving FAQ content"
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


436-440: Fix formatting: Remove extra blank lines

There are unnecessary blank lines that should be removed to maintain consistent formatting.

 if self.should_allow_faq_tool(dto.course.id):
     tool_list.append(faq_content_retrieval)
-

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


518-535: Fix docstring return description

The docstring's return description mentions lectures instead of FAQs.

     """
     Checks if there are indexed faqs for the given course

     :param course_id: The course ID
-    :return: True if there are indexed lectures for the course, False otherwise
+    :return: True if there are indexed FAQs for the course, False otherwise
     """
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


543-544: Remove extra blank lines at the end of file

There are unnecessary blank lines at the end of the file.

def datetime_to_string(dt: Optional[datetime]) -> str:
    if dt is None:
        return "No date provided"
    else:
        return dt.strftime("%Y-%m-%d %H:%M:%S")
-

-
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1017477 and cfb1240.

📒 Files selected for processing (1)
  • app/pipeline/chat/course_chat_pipeline.py (10 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
app/pipeline/chat/course_chat_pipeline.py

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

🪛 GitHub Actions: Run linters
app/pipeline/chat/course_chat_pipeline.py

[warning] File needs to be reformatted using Black

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Push Docker Image
🔇 Additional comments (2)
app/pipeline/chat/course_chat_pipeline.py (2)

86-86: LGTM! Good addition of the FAQ retrieval attribute

The retrieved_faqs attribute is well-placed alongside the existing retrieved_paragraphs attribute, maintaining consistency in the class structure.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


464-466: LGTM! Good integration with the citation pipeline

The FAQ augmentation is well-integrated with the existing citation pipeline, maintaining consistency with the paragraph augmentation pattern.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (14)
app/web/routers/webhooks.py (4)

74-99: Improve function documentation.

The docstring should be more descriptive and include parameter and return type information.

-    """
-    Run the exercise chat pipeline in a separate thread
-    """
+    """
+    Run the FAQ update pipeline in a separate thread.
+    
+    Args:
+        dto (FaqIngestionPipelineExecutionDto): The data transfer object containing FAQ information
+        
+    Note:
+        This function is executed with semaphore control to limit concurrent operations
+    """
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


102-126: Address language consistency and error message specificity.

Two improvements needed:

  1. Translate the German comment to English
  2. Make the error message more specific to FAQ deletion
-            # Hier würd dann die Methode zum entfernen aus der Datenbank kommen
+            # Method to remove FAQ from the database
             pipeline = FaqIngestionPipeline(client=client, dto=None, callback=callback)
             pipeline.delete_faq(dto.faq.faq_id, dto.faq.course_id)

         except Exception as e:
-            logger.error(f"Error Ingestion pipeline: {e}")
+            logger.error(f"Error in FAQ deletion pipeline: {e}")
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


153-165: Remove unnecessary return statement.

FastAPI automatically returns None for void functions, making the explicit return statement redundant.

    thread = Thread(target=run_faq_update_pipeline_worker, args=(dto,))
    thread.start()
-    return
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


166-179: Remove unnecessary return statement.

Similar to the ingestion webhook, the explicit return statement is redundant.

    logging.info("Starting faq deletion")
    thread = Thread(target=run_faq_delete_pipeline_worker, args=(dto,))
    thread.start()
-    return
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/pipeline/chat/course_chat_pipeline.py (4)

13-13: Remove unused import.

The SystemMessagePromptTemplate import is not used in the code.

-    ChatPromptTemplate, SystemMessagePromptTemplate,
+    ChatPromptTemplate,
🧰 Tools
🪛 Ruff (0.8.2)

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


307-336: Improve FAQ result formatting for better readability.

Consider using a more structured format with line breaks for better readability.

-                res = "[FAQ ID: {}, FAQ Question: {}, FAQ Answer: {}]".format(
-                    faq.get(FaqSchema.FAQ_ID.value),
-                    faq.get(FaqSchema.QUESTION_TITLE.value),
-                    faq.get(FaqSchema.QUESTION_ANSWER.value),
-                )
+                res = """
+FAQ Entry:
+  ID: {}
+  Question: {}
+  Answer: {}
+
+""".format(
+                    faq.get(FaqSchema.FAQ_ID.value),
+                    faq.get(FaqSchema.QUESTION_TITLE.value),
+                    faq.get(FaqSchema.QUESTION_ANSWER.value),
+                )
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


436-440: Remove extra newlines.

Multiple consecutive empty lines affect code readability.

            if self.should_allow_faq_tool(dto.course.id):
                tool_list.append(faq_content_retrieval)
-

-
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


518-524: Fix incorrect docstring return description.

The return description mentions "lectures" instead of "FAQs".

        """
        Checks if there are indexed faqs for the given course

        :param course_id: The course ID
-        :return: True if there are indexed lectures for the course, False otherwise
+        :return: True if there are indexed FAQs for the course, False otherwise
        """
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py (6)

70-74: Consider making LLM requirements configurable

The hardcoded values for gpt_version_equivalent and context_length should be moved to configuration to make the code more maintainable and flexible.

-                gpt_version_equivalent=4.25,
-                context_length=16385,
-                privacy_compliance=True,
+                gpt_version_equivalent=config.GPT_VERSION,
+                context_length=config.CONTEXT_LENGTH,
+                privacy_compliance=config.PRIVACY_COMPLIANCE,
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


150-155: Make hybrid_factor configurable

The hardcoded value of 0.9 for hybrid_factor should be configurable to allow fine-tuning of the retrieval process.

         response = self.search_in_db(
             query=rewritten_query,
-            hybrid_factor=0.9,
+            hybrid_factor=self.config.HYBRID_FACTOR,
             result_limit=result_limit,
             course_id=course_id,
         )
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


184-190: Extract common error handling pattern

The same error handling pattern is repeated across multiple methods. Consider extracting it into a decorator.

def handle_llm_errors(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except Exception as e:
            logger.error(f"Error in {func.__name__}: {str(e)}")
            raise e
    return wrapper

@handle_llm_errors
def assess_question(self, ...):
    # Implementation

Also applies to: 217-225, 255-263

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


276-286: Add logging for filter conditions

Add logging for the filter condition to help with debugging and monitoring.

         if course_id:
+            logger.debug(f"Applying course_id filter: {course_id}")
             # Create a filter for course_id
             filter_weaviate = Filter.by_property(FaqSchema.COURSE_ID.value).equal(
                 course_id
             )
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


316-358: Optimize thread pool usage

Consider using a single thread pool for both sets of tasks to reduce overhead.

-        with concurrent.futures.ThreadPoolExecutor() as executor:
+        with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
             # Schedule the rewrite tasks to run in parallel
             rewritten_query_future = executor.submit(
                 self.rewrite_student_query,
                 chat_history,
                 student_query,
                 course_language,
                 course_name,
             )
             # ... other tasks ...
-
-            # Get the results once both tasks are complete
-            rewritten_query = rewritten_query_future.result()
-            hypothetical_answer_query = hypothetical_answer_query_future.result()
-
-            # Execute the database search tasks
-        with concurrent.futures.ThreadPoolExecutor() as executor:
             response_future = executor.submit(
                 self.search_in_db,
                 query=rewritten_query_future.result(),
                 hybrid_factor=0.9,
                 result_limit=result_limit,
                 course_id=course_id,
             )
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


360-385: Make default language configurable

The hardcoded default language "english" should be moved to configuration.

     def fetch_course_language(self, course_id):
         """
         Fetch the language of the course based on the course ID.
         If no specific language is set, it defaults to English.
         """
-        course_language = "english"
+        course_language = self.config.DEFAULT_COURSE_LANGUAGE
🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cfb1240 and ce2904a.

📒 Files selected for processing (3)
  • app/pipeline/chat/course_chat_pipeline.py (10 hunks)
  • app/retrieval/faq_retrieval.py (1 hunks)
  • app/web/routers/webhooks.py (5 hunks)
🧰 Additional context used
🪛 GitHub Actions: Run linters
app/web/routers/webhooks.py

[warning] File needs to be reformatted using Black

app/pipeline/chat/course_chat_pipeline.py

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py

[warning] File needs to be reformatted using Black

🪛 Ruff (0.8.2)
app/pipeline/chat/course_chat_pipeline.py

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Push Docker Image
🔇 Additional comments (3)
app/web/routers/webhooks.py (1)

11-19: LGTM! Clean import organization.

The new imports and DTOs are well-organized and properly support the FAQ functionality.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py (2)

42-58: Well-implemented merge function!

The merge_retrieved_chunks function efficiently handles deduplication using dictionary lookups, and the implementation is clear and well-documented.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black


113-113: Use logger.info instead of logging.info

For consistency with the module-level logger initialization, use logger.info instead of logging.info.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py Outdated Show resolved Hide resolved
coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 13, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (9)
app/pipeline/shared/citation_pipeline.py (1)

81-96: Improve FAQ string formatting implementation.

Several improvements can be made to the FAQ string formatting:

  1. The loop control variable i is unused
  2. Consider using f-strings for better readability
  3. Consider using string joining instead of concatenation for better performance
     def create_formatted_faq_string(self, faqs, base_url):
         formatted_string = ""
-        for i, faq in enumerate(faqs):
-            faq = "FAQ ID {}, CourseId {} , FAQ Question title {} and FAQ Question Answer {} and FAQ link {}".format(
-                faq.get(FaqSchema.FAQ_ID.value),
-                faq.get(FaqSchema.COURSE_ID.value),
-                faq.get(FaqSchema.QUESTION_TITLE.value),
-                faq.get(FaqSchema.QUESTION_ANSWER.value),
-                f"{base_url}/courses/{faq.get(FaqSchema.COURSE_ID.value)}/faq/?faqId={faq.get(FaqSchema.FAQ_ID.value)}",
-            )
-            formatted_string += faq
+        formatted_faqs = []
+        for _, faq in enumerate(faqs):
+            faq_link = f"{base_url}/courses/{faq.get(FaqSchema.COURSE_ID.value)}/faq/?faqId={faq.get(FaqSchema.FAQ_ID.value)}"
+            formatted_faq = (
+                f"FAQ ID {faq.get(FaqSchema.FAQ_ID.value)}, "
+                f"CourseId {faq.get(FaqSchema.COURSE_ID.value)}, "
+                f"FAQ Question title {faq.get(FaqSchema.QUESTION_TITLE.value)} and "
+                f"FAQ Question Answer {faq.get(FaqSchema.QUESTION_ANSWER.value)} and "
+                f"FAQ link {faq_link}"
+            )
+            formatted_faqs.append(formatted_faq)
+        formatted_string = "\n".join(formatted_faqs)
🧰 Tools
🪛 Ruff (0.8.2)

86-86: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

app/pipeline/faq_ingestion_pipeline.py (2)

88-89: Combine nested context managers for better readability.

The nested with statements can be combined using a single with statement.

Apply this diff:

-        with batch_update_lock:
-            with self.collection.batch.rate_limit(requests_per_minute=600) as batch:
+        with batch_update_lock, self.collection.batch.rate_limit(requests_per_minute=600) as batch:
🧰 Tools
🪛 Ruff (0.8.2)

88-89: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


137-141: Make the return value explicit in chunk_data method.

The method implicitly returns None. For better clarity, make the return value explicit.

Apply this diff:

    def chunk_data(self, path: str) -> List[Dict[str, str]]:
        """
        Faqs are so small, they do not need to be chunked into smaller parts
        """
-        return
+        return []  # Return empty list as FAQs don't need chunking
app/web/routers/webhooks.py (3)

115-116: Translate German comment to English.

For consistency, translate the German comment "Hier würd dann die Methode zum entfernen aus der Datenbank kommen" to English.

Apply this diff:

-            # Hier würd dann die Methode zum entfernen aus der Datenbank kommen
+            # Here comes the method to remove from the database

76-124: Reduce code duplication in worker functions.

The error handling pattern is duplicated across worker functions. Consider extracting it into a decorator or utility function.

Create a new utility function:

def handle_pipeline_errors(func):
    def wrapper(*args, **kwargs):
        with semaphore:
            try:
                return func(*args, **kwargs)
            except Exception as e:
                logger.error(f"Error in pipeline: {e}")
                logger.error(traceback.format_exc())
                capture_exception(e)
            finally:
                semaphore.release()
    return wrapper

Then apply it to the worker functions:

+@handle_pipeline_errors
 def run_faq_update_pipeline_worker(dto: FaqIngestionPipelineExecutionDto):
     """
     Run the exercise chat pipeline in a separate thread
     """
-    with semaphore:
-        try:
-            callback = FaqIngestionStatus(...)
-            pipeline()
-        except Exception as e:
-            logger.error(f"Error Faq Ingestion pipeline: {e}")
-            logger.error(traceback.format_exc())
-            capture_exception(e)
-        finally:
-            semaphore.release()
+    callback = FaqIngestionStatus(...)
+    pipeline()

164-165: Remove unnecessary return statements.

The empty return statements in webhook endpoints are unnecessary as FastAPI automatically returns None.

Apply this diff:

    def faq_ingestion_webhook(dto: FaqIngestionPipelineExecutionDto):
        thread = Thread(target=run_faq_update_pipeline_worker, args=(dto,))
        thread.start()
-       return

    def faq_deletion_webhook(dto: FaqDeletionExecutionDto):
        logging.info("Starting faq deletion")
        thread = Thread(target=run_faq_delete_pipeline_worker, args=(dto,))
        thread.start()
-       return

Also applies to: 178-179

app/retrieval/faq_retrieval.py (3)

45-60: Consider preserving chunk IDs in the returned list.

The function discards the IDs when returning the merged chunks. Consider preserving the IDs for traceability and debugging purposes.

Apply this diff to preserve the IDs:

-    return [properties for uuid, properties in merged_chunks.items()]
+    return [{"id": uuid, "properties": properties} for uuid, properties in merged_chunks.items()]

185-190: Simplify error handling.

The try-except block that just re-raises the exception is unnecessary and can be removed.

Apply this diff:

-        try:
-            response = (prompt | self.pipeline).invoke({})
-            logger.info(f"Response from assessment pipeline: {response}")
-            return response == "YES"
-        except Exception as e:
-            raise e
+        response = (prompt | self.pipeline).invoke({})
+        logger.info(f"Response from assessment pipeline: {response}")
+        return response == "YES"

364-364: Capitalize the default language.

For consistency with language codes, capitalize the default language.

Apply this diff:

-        course_language = "english"
+        course_language = "English"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce2904a and 27abf69.

📒 Files selected for processing (10)
  • app/domain/data/faq_dto.py (1 hunks)
  • app/domain/ingestion/deletionPipelineExecutionDto.py (2 hunks)
  • app/domain/ingestion/ingestion_pipeline_execution_dto.py (2 hunks)
  • app/pipeline/chat/course_chat_pipeline.py (10 hunks)
  • app/pipeline/chat/exercise_chat_agent_pipeline.py (1 hunks)
  • app/pipeline/faq_ingestion_pipeline.py (1 hunks)
  • app/pipeline/shared/citation_pipeline.py (5 hunks)
  • app/retrieval/faq_retrieval.py (1 hunks)
  • app/web/routers/webhooks.py (5 hunks)
  • app/web/status/faq_ingestion_status_callback.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • app/pipeline/chat/exercise_chat_agent_pipeline.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • app/domain/ingestion/deletionPipelineExecutionDto.py
  • app/domain/data/faq_dto.py
  • app/web/status/faq_ingestion_status_callback.py
  • app/domain/ingestion/ingestion_pipeline_execution_dto.py
🧰 Additional context used
🪛 Ruff (0.8.2)
app/pipeline/chat/course_chat_pipeline.py

14-14: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

app/pipeline/faq_ingestion_pipeline.py

6-6: openai.OpenAI imported but unused

Remove unused import: openai.OpenAI

(F401)


88-89: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

app/pipeline/shared/citation_pipeline.py

1-1: logging imported but unused

Remove unused import: logging

(F401)


86-86: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

🪛 GitHub Actions: Run linters
app/pipeline/chat/course_chat_pipeline.py

[error] 12-12: Unused import: 'langchain_core.prompts.SystemMessagePromptTemplate' is imported but never used


[warning] 313-316: Multiple lines exceed maximum length of 120 characters


[warning] 318-318: Line too long (140 > 120 characters)

app/pipeline/faq_ingestion_pipeline.py

[error] 6-6: Unused import: 'openai.OpenAI' is imported but never used

app/pipeline/shared/citation_pipeline.py

[error] 1-1: Unused import: 'logging' is imported but never used


[warning] 107-107: Line too long (127 > 120 characters)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Push Docker Image
🔇 Additional comments (9)
app/pipeline/shared/citation_pipeline.py (2)

20-22: LGTM! Clean enum implementation.

The InformationType enum is well-defined and follows Python best practices by inheriting from both str and Enum.


1-3: ⚠️ Potential issue

Fix duplicate logger imports.

The code has redundant imports for logging functionality:

  • Line 1: import logging
  • Line 3: from asyncio.log import logger
-import logging
-from asyncio.log import logger
+import logging

+logger = logging.getLogger(__name__)

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.2)

1-1: logging imported but unused

Remove unused import: logging

(F401)

🪛 GitHub Actions: Run linters

[error] 1-1: Unused import: 'logging' is imported but never used

app/pipeline/faq_ingestion_pipeline.py (1)

1-6: 🛠️ Refactor suggestion

Consolidate logging imports and remove unused import.

The code has redundant logging imports and an unused OpenAI import:

  1. Remove the unused openai.OpenAI import
  2. Consolidate logging by using a single logger instance

Apply this diff:

import logging
-from asyncio.log import logger
from typing import Optional, List, Dict
from langchain_core.output_parsers import StrOutputParser
-from openai import OpenAI
from weaviate import WeaviateClient

+logger = logging.getLogger(__name__)

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.2)

6-6: openai.OpenAI imported but unused

Remove unused import: openai.OpenAI

(F401)

🪛 GitHub Actions: Run linters

[error] 6-6: Unused import: 'openai.OpenAI' is imported but never used

app/retrieval/faq_retrieval.py (5)

70-87: LGTM!

The initialization is well-structured with appropriate requirements and configurations for the LLM model, embedding model, and reranker pipeline.


132-162: LGTM!

The function implements a robust basic retrieval logic with appropriate question assessment and query rewriting.


218-225: Simplify error handling.

The try-except block that just re-raises the exception is unnecessary and can be removed.

Apply this diff:

-        try:
-            response = (prompt | self.pipeline).invoke({})
-            token_usage = self.llm.tokens
-            token_usage.pipeline = PipelineEnum.IRIS_FAQ_RETRIEVAL_PIPELINE
-            self.tokens.append(self.llm.tokens)
-            return response
-        except Exception as e:
-            raise e
+        response = (prompt | self.pipeline).invoke({})
+        token_usage = self.llm.tokens
+        token_usage.pipeline = PipelineEnum.IRIS_FAQ_RETRIEVAL_PIPELINE
+        self.tokens.append(self.llm.tokens)
+        return response

256-263: Simplify error handling.

The try-except block that just re-raises the exception is unnecessary and can be removed.

Apply this diff:

-        try:
-            response = (prompt | self.pipeline).invoke({})
-            token_usage = self.llm.tokens
-            token_usage.pipeline = PipelineEnum.IRIS_FAQ_RETRIEVAL_PIPELINE
-            self.tokens.append(self.llm.tokens)
-            return response
-        except Exception as e:
-            raise e
+        response = (prompt | self.pipeline).invoke({})
+        token_usage = self.llm.tokens
+        token_usage.pipeline = PipelineEnum.IRIS_FAQ_RETRIEVAL_PIPELINE
+        self.tokens.append(self.llm.tokens)
+        return response

265-302: LGTM!

The function implements a robust database search with appropriate filtering and hybrid search capabilities.

app/pipeline/chat/course_chat_pipeline.py (1)

526-541: LGTM!

The function implements a robust check for FAQ availability with appropriate handling of course_id.

app/pipeline/shared/citation_pipeline.py Show resolved Hide resolved
app/pipeline/shared/citation_pipeline.py Show resolved Hide resolved
app/pipeline/faq_ingestion_pipeline.py Show resolved Hide resolved
app/pipeline/faq_ingestion_pipeline.py Show resolved Hide resolved
app/retrieval/faq_retrieval.py Outdated Show resolved Hide resolved
app/pipeline/chat/course_chat_pipeline.py Outdated Show resolved Hide resolved
app/pipeline/chat/course_chat_pipeline.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
app/pipeline/chat/course_chat_pipeline.py (1)

526-541: Fix docstring return description.

The docstring incorrectly states "indexed lectures" instead of "indexed FAQs" in the return description.

Apply this diff to fix the docstring:

     def should_allow_faq_tool(self, course_id: int) -> bool:
         """
         Checks if there are indexed faqs for the given course

         :param course_id: The course ID
-        :return: True if there are indexed lectures for the course, False otherwise
+        :return: True if there are indexed FAQs for the course, False otherwise
         """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27abf69 and 82a55c3.

📒 Files selected for processing (1)
  • app/pipeline/chat/course_chat_pipeline.py (9 hunks)
🧰 Additional context used
🪛 GitHub Actions: Run linters
app/pipeline/chat/course_chat_pipeline.py

[warning] 312-312: Line too long (122 > 120 characters)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Push Docker Image
🔇 Additional comments (4)
app/pipeline/chat/course_chat_pipeline.py (4)

23-23: LGTM! Clean integration of FAQ-related imports and attributes.

The new imports and class attribute are well-organized and follow the existing patterns in the codebase.

Also applies to: 45-49, 86-86


121-121: LGTM! Clean initialization of FAQ retriever.

The FAQ retriever is properly initialized following the same pattern as the lecture retriever.


439-441: LGTM! Clean integration of FAQ support in the pipeline.

The FAQ tool is properly integrated into the pipeline, following the same patterns as the lecture tool:

  1. Conditional tool addition based on availability
  2. Response augmentation using the citation pipeline

Also applies to: 467-474


309-339: 🛠️ Refactor suggestion

Fix line length in docstring and consider enhancing error handling.

  1. The docstring contains lines exceeding 120 characters.
  2. The function could benefit from error handling for edge cases.

Apply this diff to fix the line length issues and add error handling:

     def faq_content_retrieval() -> str:
         """
         Use this tool to retrieve information from indexed FAQs.
-        It is suitable when no other tool fits, you think it is a common question or the question is frequently asked,
-        or the question could be effectively answered by an FAQ. Also use this if the question is explicitly
-        organizational and course-related. An organizational question about the course might be
-        "What is the course structure?" or "How do I enroll?" or exam related content like "When is the exam".
-        The tool performs a RAG retrieval based on the chat history to find the most relevant FAQs.
-        Each FAQ follows this format: FAQ ID, FAQ Question, FAQ Answer.
-        Respond to the query concisely and solely using the answer from the relevant FAQs.
+        It is suitable when no other tool fits, you think it is a common question or the question is 
+        frequently asked, or the question could be effectively answered by an FAQ. Also use this if the 
+        question is explicitly organizational and course-related.
+        
+        An organizational question about the course might be "What is the course structure?" or 
+        "How do I enroll?" or exam related content like "When is the exam".
+        
+        The tool performs a RAG retrieval based on the chat history to find the most relevant FAQs.
+        Each FAQ follows this format: FAQ ID, FAQ Question, FAQ Answer.
+        
+        Respond to the query concisely and solely using the answer from the relevant FAQs.
         This tool should only be used once per query.
         """
         self.callback.in_progress("Retrieving faq content ...")
-        self.retrieved_faqs = self.faq_retriever(
-            chat_history=history,
-            student_query=query.contents[0].text_content,
-            result_limit=10,
-            course_name=dto.course.name,
-            course_id=dto.course.id,
-            base_url=dto.settings.artemis_base_url,
-        )
+        try:
+            self.retrieved_faqs = self.faq_retriever(
+                chat_history=history,
+                student_query=query.contents[0].text_content,
+                result_limit=10,
+                course_name=dto.course.name,
+                course_id=dto.course.id,
+                base_url=dto.settings.artemis_base_url,
+            )
+        except Exception as e:
+            logger.error("Failed to retrieve FAQs", exc_info=e)
+            return "Failed to retrieve FAQs. Please try again later."

         result = ""
         for faq in self.retrieved_faqs:
             res = "[FAQ ID: {}, FAQ Question: {}, FAQ Answer: {}]".format(
                 faq.get(FaqSchema.FAQ_ID.value),
                 faq.get(FaqSchema.QUESTION_TITLE.value),
                 faq.get(FaqSchema.QUESTION_ANSWER.value),
             )
             result += res
         return result

Likely invalid or redundant comment.

🧰 Tools
🪛 GitHub Actions: Run linters

[warning] 312-312: Line too long (122 > 120 characters)

app/pipeline/faq_ingestion_pipeline.py Outdated Show resolved Hide resolved
app/retrieval/faq_retrieval.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (9)
app/pipeline/faq_ingestion_pipeline.py (4)

25-26: Simplify lock variable assignment.

The reassignment of batch_update_lock is redundant since it's already imported from lecture_ingestion_pipeline.

- # we use the same lock as the lecture ingestion pipeline
- batch_update_lock = batch_update_lock

72-79: Improve error handling and fix grammatical error.

Two suggestions for improvement:

  1. The error message "Failed to faq into the database" is grammatically incorrect
  2. Consider catching specific exceptions instead of generic Exception
-        except Exception as e:
+        except (ValueError, IOError) as e:
             logger.error(f"Error updating faq: {e}")
             self.callback.error(
-                f"Failed to faq into the database: {e}",
+                f"Failed to ingest FAQ into the database: {e}",
                 exception=e,
                 tokens=self.tokens,
             )

88-89: Combine nested with statements.

Combine the nested with statements for better readability.

-        with batch_update_lock:
-            with self.collection.batch.rate_limit(requests_per_minute=600) as batch:
+        with batch_update_lock, self.collection.batch.rate_limit(requests_per_minute=600) as batch:
🧰 Tools
🪛 Ruff (0.8.2)

88-89: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


137-141: Fix return value to match type hint.

The method signature indicates it should return List[Dict[str, str]], but it returns None. Consider returning an empty list to match the type hint.

     def chunk_data(self, path: str) -> List[Dict[str, str]]:
         """
         Faqs are so small, they do not need to be chunked into smaller parts
         """
-        return
+        return []
app/retrieval/basic_retrieval.py (5)

44-63: Correct the docstring to reflect actual parameters.
The docstring mentions a user_question parameter, but this function signature only accepts prompt and chat_history. Update the docstring for clarity and consistency.

-    Adds the chat history and user question to the prompt
-        :param chat_history: The chat history
-        :param user_question: The user question
+    Adds the last four messages from the chat history to the prompt
+        :param chat_history: The chat history

98-128: Handle possible variations of "YES" in assessment logic.
Relying on an exact string match may miss valid affirmative answers (like "Yes " or "yes"). A safer check could be case-insensitive and trimmed.

- return response == "YES"
+ return response.strip().lower() == "yes"

166-197: Gracefully handle embedding or query failures.
If self.llm_embedding.embed(query) or the query.hybrid(...) call fails, the method will raise an exception. Consider adding localized error handling or retries without overcomplicating the flow.


198-267: Reuse thread pools for performance.
Creating two separate ThreadPoolExecutors in quick succession may add unnecessary overhead. Consider reusing a single executor in the same scope (or a shared executor at class level) to reduce resource churn.
[performance]


275-298: Normalize or validate retrieved languages.
Users may store inconsistent language values. Consider normalizing or at least validating fetched results to ensure recognized language codes (e.g., "english", "EN", or locale tags).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf52342 and 6163e6f.

📒 Files selected for processing (3)
  • app/pipeline/faq_ingestion_pipeline.py (1 hunks)
  • app/retrieval/basic_retrieval.py (1 hunks)
  • app/retrieval/faq_retrieval.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/retrieval/faq_retrieval.py
🧰 Additional context used
🪛 Ruff (0.8.2)
app/pipeline/faq_ingestion_pipeline.py

1-1: threading imported but unused

Remove unused import: threading

(F401)


88-89: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Mend Security Check
🔇 Additional comments (7)
app/pipeline/faq_ingestion_pipeline.py (4)

1-1: Remove unused import.

The threading module is imported but never used since the lock is imported from lecture_ingestion_pipeline.

- import threading
🧰 Tools
🪛 Ruff (0.8.2)

1-1: threading imported but unused

Remove unused import: threading

(F401)


44-48: Consider using a standard GPT version number.

The GPT version requirement of 4.25 seems unusually specific. Consider using a standard version number (e.g., 4.0) unless there's a specific feature requirement from version 4.25.

Could you clarify if there are specific features from GPT-4.25 that are required for this implementation?


98-104: Improve error handling in batch update.


106-120: Fix inconsistent return values in delete_old_faqs method.

app/retrieval/basic_retrieval.py (3)

26-41: Validate data structure of retrieved chunks.
Currently, the code assumes that each chunk contains an "id" and "properties". Consider adding safeguards or checks to handle potentially malformed chunks, especially if other pipeline stages might generate incomplete data.


130-164: Confirm concurrency safety for token management.
This method appends token usage to self.tokens in a try-except block. Consider concurrency patterns that might arise from parallel calls. Centralize and handle token usage consistently across all pipeline methods.


268-274: Interface method looks appropriate.
The abstract get_schema_properties method is well defined for subclasses to implement. No concerns here.

app/retrieval/basic_retrieval.py Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
app/retrieval/faq_retrieval_utils.py (1)

25-40: Translate docstring to English for consistency.

The docstring is in German while the rest of the codebase is in English. This could make it harder for non-German speaking developers to understand the function's purpose.

Apply this diff to translate the docstring:

-    """
-    Formatiert die abgerufenen FAQs in einen String.
-
-    :param retrieved_faqs: Liste der abgerufenen FAQs
-    :return: Formatierter String mit den FAQ-Daten
-    """
+    """
+    Formats the retrieved FAQs into a string.
+
+    :param retrieved_faqs: List of retrieved FAQs
+    :return: Formatted string containing the FAQ data
+    """
app/pipeline/chat/exercise_chat_agent_pipeline.py (1)

42-43: Remove unused import.

The FaqSchema import is not used in the code.

Apply this diff:

-from ...vector_database.faq_schema import FaqSchema

Also applies to: 46-46

app/pipeline/chat/course_chat_pipeline.py (1)

45-46: Remove unused import.

The FaqSchema import is not used in the code.

Apply this diff:

-from ...vector_database.faq_schema import FaqSchema

Also applies to: 49-49

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6163e6f and 430b777.

📒 Files selected for processing (3)
  • app/pipeline/chat/course_chat_pipeline.py (8 hunks)
  • app/pipeline/chat/exercise_chat_agent_pipeline.py (7 hunks)
  • app/retrieval/faq_retrieval_utils.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
app/pipeline/chat/course_chat_pipeline.py

49-49: ...vector_database.faq_schema.FaqSchema imported but unused

Remove unused import: ...vector_database.faq_schema.FaqSchema

(F401)

app/pipeline/chat/exercise_chat_agent_pipeline.py

46-46: ...vector_database.faq_schema.FaqSchema imported but unused

Remove unused import: ...vector_database.faq_schema.FaqSchema

(F401)

🔇 Additional comments (11)
app/retrieval/faq_retrieval_utils.py (1)

6-22: LGTM!

The function correctly checks for the existence of indexed FAQs for a given course. The implementation is efficient as it limits the query to 1 result since we only need to check existence.

app/pipeline/chat/exercise_chat_agent_pipeline.py (5)

109-109: LGTM!

The attribute is correctly typed and initialized.


143-144: LGTM!

Good job on improving code clarity by using more descriptive variable names and maintaining consistency with other retrievers.


401-424: Fix line length issues in docstring.

The docstring contains lines that exceed the maximum length of 120 characters.

Apply this diff to fix the line length issues:

-            It is suitable when no other tool fits, it is a common question or the question is frequently asked,
-            or the question could be effectively answered by an FAQ. Also use this if the question is explicitly
-            organizational and course-related. An organizational question about the course might be
-            "What is the course structure?" or "How do I enroll?" or exam related content like "When is the exam".
-            The tool performs a RAG retrieval based on the chat history to find the most relevant FAQs.
-            Each FAQ follows this format: FAQ ID, FAQ Question, FAQ Answer.
-            Respond to the query concisely and solely using the answer from the relevant FAQs.
-            This tool should only be used once per query.
+            It is suitable when no other tool fits, it is a common question or the question is frequently asked, or
+            the question could be effectively answered by an FAQ. Also use this if the question is explicitly organizational
+            and course-related.
+            An organizational question about the course might be "What is the course structure?" or "How do I enroll?" or
+            exam related content like "When is the exam".
+            The tool performs a RAG retrieval based on the chat history to find the most relevant FAQs. Each FAQ follows
+            this format: FAQ ID, FAQ Question, FAQ Answer.
+            Respond to the query concisely and solely using the answer from the relevant FAQs. This tool should only be used
+            once per query.

544-547: LGTM!

The FAQ tool is correctly added to the tool list with appropriate conditional logic.


570-572: LGTM!

The code has been reformatted for better readability without any functional changes.

app/pipeline/chat/course_chat_pipeline.py (5)

87-87: LGTM!

The attribute is correctly typed and initialized.


121-122: LGTM!

Good job on improving code clarity by using more descriptive variable names and maintaining consistency with other retrievers.


310-334: Fix line length issues in docstring.

The docstring contains lines that exceed the maximum length of 120 characters.

Apply this diff to fix the line length issues:

-            It is suitable when no other tool fits, it is a common question or the question is frequently asked,
-            or the question could be effectively answered by an FAQ. Also use this if the question is explicitly
-            organizational and course-related. An organizational question about the course might be
-            "What is the course structure?" or "How do I enroll?" or exam related content like "When is the exam".
-            The tool performs a RAG retrieval based on the chat history to find the most relevant FAQs.
-            Each FAQ follows this format: FAQ ID, FAQ Question, FAQ Answer.
-            Respond to the query concisely and solely using the answer from the relevant FAQs.
-            This tool should only be used once per query.
+            It is suitable when no other tool fits, it is a common question or the question is frequently asked, or
+            the question could be effectively answered by an FAQ. Also use this if the question is explicitly organizational
+            and course-related.
+            An organizational question about the course might be "What is the course structure?" or "How do I enroll?" or
+            exam related content like "When is the exam".
+            The tool performs a RAG retrieval based on the chat history to find the most relevant FAQs. Each FAQ follows
+            this format: FAQ ID, FAQ Question, FAQ Answer.
+            Respond to the query concisely and solely using the answer from the relevant FAQs. This tool should only be used
+            once per query.

433-435: LGTM!

The FAQ tool is correctly added to the tool list with appropriate conditional logic.


461-468: LGTM!

The FAQ citation is correctly implemented with appropriate information type and base URL.

bassner
bassner previously approved these changes Jan 30, 2025
# Conflicts:
#	app/common/PipelineEnum.py
#	app/web/routers/pipelines.py
#	docker/pyris-dev.yml
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 430b777 and 1242ef0.

📒 Files selected for processing (3)
  • app/common/PipelineEnum.py (1 hunks)
  • app/pipeline/chat/exercise_chat_agent_pipeline.py (6 hunks)
  • app/web/routers/pipelines.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • app/common/PipelineEnum.py
  • app/web/routers/pipelines.py
🧰 Additional context used
🪛 Ruff (0.8.2)
app/pipeline/chat/exercise_chat_agent_pipeline.py

46-46: ...vector_database.faq_schema.FaqSchema imported but unused

Remove unused import: ...vector_database.faq_schema.FaqSchema

(F401)

🪛 GitHub Actions: Run linters
app/pipeline/chat/exercise_chat_agent_pipeline.py

[warning] 46-46: Unused import: '...vector_database.faq_schema.FaqSchema' is imported but never used

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Build and Push Docker Image
  • GitHub Check: Analyze (python)
🔇 Additional comments (4)
app/pipeline/chat/exercise_chat_agent_pipeline.py (4)

109-109: LGTM! Well-structured attribute addition.

The retrieved_faqs attribute is properly typed as List[dict] and initialized to None.


143-144: LGTM! Clear and consistent retriever initialization.

Good refactoring to rename retriever to lecture_retriever for better clarity, and consistent initialization of faq_retriever.


544-547: LGTM! Clean and consistent tool integration.

The FAQ tool is properly integrated following the same pattern as the lecture tool, with appropriate conditional checking using should_allow_faq_tool.


401-424: Verify the FAQ result limit.

The FAQ retrieval is limited to 10 results while lecture retrieval is limited to 5. Please verify if this difference is intentional and if 10 results might be too verbose for the chat response.

Otherwise, the implementation is well-structured and follows the established patterns:

  • Clear docstring explaining when to use the tool
  • Proper progress callback
  • Consistent error handling
✅ Verification successful

The FAQ result limit of 10 is consistent with the codebase patterns.

The higher limit for FAQs (10) compared to lecture content (5) appears to be an intentional design choice, consistently implemented across different pipelines. This makes sense as FAQs are typically more concise than lecture content, and having more FAQ results allows for better coverage of related questions.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the typical length of FAQ responses
# Look for other result limits in the codebase for consistency

# Check for other result_limit usage
rg "result_limit\s*=" -A 2 -B 2

# Check FAQ formatting to understand the impact of 10 results
ast-grep --pattern 'def format_faqs($$$)'

Length of output: 5398

app/pipeline/chat/exercise_chat_agent_pipeline.py Outdated Show resolved Hide resolved
coderabbitai[bot]
coderabbitai bot previously requested changes Jan 30, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1242ef0 and cf2feac.

📒 Files selected for processing (1)
  • app/pipeline/chat/exercise_chat_agent_pipeline.py (6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Push Docker Image
🔇 Additional comments (5)
app/pipeline/chat/exercise_chat_agent_pipeline.py (5)

42-43: LGTM! The new imports enhance FAQ functionality.

The imports are correctly placed and follow the project's import style.

The FaqSchema import is still unused and should be removed as mentioned in the previous review.


108-108: LGTM! The retrieved_faqs attribute is properly typed.

The attribute is correctly initialized as List[dict] = None to store FAQ retrieval results.


142-143: LGTM! Clear and consistent retriever initialization.

Good renaming of retriever to lecture_retriever for better clarity, and consistent initialization of faq_retriever.


544-545: LGTM! Consistent conditional tool addition.

The FAQ tool is correctly added to the tool list using the same pattern as the lecture tool.


400-424: LGTM! Well-documented and properly implemented FAQ retrieval.

The implementation follows the established pattern and includes clear usage guidelines in the docstring.

Verify if retrieving 10 FAQs (line 416) is the optimal limit for this use case. Let's check other retrieval implementations:

✅ Verification successful

The result limit of 10 FAQs is appropriate and consistent with the codebase

The implementation aligns with other FAQ retrievers in the codebase, which also use a limit of 10. This higher limit (compared to 3-5 used for other content types) is suitable for FAQs as they are typically shorter and more focused, allowing for better coverage of common questions.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for result_limit parameter in retrieval implementations
rg -A 1 "result_limit\s*=" app/retrieval/

Length of output: 1149


Script:

#!/bin/bash
# Search for instances where result_limit is specified in pipeline files
rg -B 2 -A 2 "result_limit\s*=\s*\d+" app/pipeline/

Length of output: 2743

app/pipeline/chat/exercise_chat_agent_pipeline.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
app/pipeline/chat/exercise_chat_agent_pipeline.py (1)

400-423: Add error handling and empty results validation.

While the implementation follows the pattern of lecture_content_retrieval, consider these improvements:

  1. Add validation for empty results
  2. Include error handling for retriever failures
  3. Consider a more concise docstring
 def faq_content_retrieval() -> str:
     """
-    Use this tool to retrieve information from indexed FAQs.
-    It is suitable when no other tool fits, it is a common question or the question is frequently asked,
-    or the question could be effectively answered by an FAQ. Also use this if the question is explicitly
-    organizational and course-related. An organizational question about the course might be
-    "What is the course structure?" or "How do I enroll?" or exam related content like "When is the exam".
-    The tool performs a RAG retrieval based on the chat history to find the most relevant FAQs.
-    Each FAQ follows this format: FAQ ID, FAQ Question, FAQ Answer.
-    Respond to the query concisely and solely using the answer from the relevant FAQs.
-    This tool should only be used once per query.
+    Retrieve and provide answers from indexed FAQs for common, organizational,
+    or course-related questions. Uses RAG retrieval based on chat history.
+    Format: FAQ ID, Question, Answer. Use once per query.
     """
     self.callback.in_progress("Retrieving faq content ...")
-    self.retrieved_faqs = self.faq_retriever(
-        chat_history=chat_history,
-        student_query=query.contents[0].text_content,
-        result_limit=10,
-        course_name=dto.course.name,
-        course_id=dto.course.id,
-        base_url=dto.settings.artemis_base_url,
-    )
+    try:
+        self.retrieved_faqs = self.faq_retriever(
+            chat_history=chat_history,
+            student_query=query.contents[0].text_content,
+            result_limit=10,
+            course_name=dto.course.name,
+            course_id=dto.course.id,
+            base_url=dto.settings.artemis_base_url,
+        )
+        if not self.retrieved_faqs:
+            return "No relevant FAQs found for your question."
 
-    result = format_faqs(self.retrieved_faqs)
-    return result
+        return format_faqs(self.retrieved_faqs)
+    except Exception as e:
+        logger.error("Error retrieving FAQs", exc_info=e)
+        return "Sorry, I encountered an error while retrieving FAQs."
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf2feac and a36dfcc.

📒 Files selected for processing (2)
  • app/pipeline/chat/exercise_chat_agent_pipeline.py (6 hunks)
  • app/web/routers/pipelines.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/web/routers/pipelines.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Push Docker Image
🔇 Additional comments (4)
app/pipeline/chat/exercise_chat_agent_pipeline.py (4)

42-43: LGTM! Well-structured imports and class attribute.

The new imports and class attribute for FAQ functionality are properly organized and typed.

Also applies to: 108-108


142-143: Good refactoring of retriever variables!

The renaming of retriever to lecture_retriever improves clarity, and the new faq_retriever follows the same initialization pattern.


543-545: LGTM! Consistent tool list modification.

The FAQ tool is properly added to the tool list with appropriate conditional checking, following the same pattern as the lecture tool.


544-545: Verify FAQ retrieval functionality in production.

Let's verify that the FAQ retrieval is working as expected:

✅ Verification successful

FAQ tool implementation is properly verified and safe

The FAQ retrieval functionality is correctly implemented with proper safeguards:

  • Verifies FAQ existence before enabling the tool
  • Uses proper database querying and schema validation
  • Implements consistent error handling
  • Maintains identical behavior across course and exercise chat contexts
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify FAQ retrieval functionality
# 1. Check if any courses have FAQs indexed
# 2. Verify the structure of indexed FAQs

# Check for indexed FAQs across all courses
echo "Checking for indexed FAQs..."
ast-grep --pattern 'class FaqRetrieval {
  $$$
  __call__($$$) {
    $$$
  }
}'

# Check FAQ data structure
echo "Verifying FAQ data structure..."
rg -A 5 "FAQ" --type python

Length of output: 280


Script:

#!/bin/bash
# Add Python file type and search for FAQ-related code
echo "Searching for FAQ-related functionality..."
rg -i "faq" --type-add 'py:*.py' --type py -A 5

echo "Looking for should_allow_faq_tool implementation..."
ast-grep --pattern 'def should_allow_faq_tool($$$):
  $$$'

echo "Searching for faq_content_retrieval definition..."
rg "faq_content_retrieval" --type-add 'py:*.py' --type py -A 3

Length of output: 59052

@bassner bassner dismissed coderabbitai[bot]’s stale review January 30, 2025 13:44

go away with your nitpicks

@bassner bassner merged commit eb7faf8 into main Jan 30, 2025
7 checks passed
@bassner bassner deleted the feature/faq/basic-faq-pipe branch January 30, 2025 13:46
@bassner bassner added this to the 1.1.0 milestone Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants