Enable IRIS to use FAQs #187

cremertim · 2025-01-06T16:20:14Z

New Features

Added new Pipeline for FAQ ingestion
Added faq retrival pipeline to allow answer based on FAQs
Created new data transfer objects (DTOs) for managing faqs and faq status from Artemis

Summary by CodeRabbit

New Features
- Added FAQ ingestion and retrieval functionality to the application.
- Introduced a new pipeline for handling FAQ-related operations.
- Implemented FAQ search and content retrieval in course chat.
- Added new webhook endpoints for FAQ ingestion and deletion.
- Enhanced the get_pipeline function to support FAQ ingestion feature.
Improvements
- Enhanced vector database schema to support FAQ storage.
- Added new data transfer objects for FAQ management.
- Integrated FAQ retrieval into existing chat pipeline.
- Improved logging and error handling for FAQ processes.
Technical Updates
- Implemented status callback mechanism for FAQ pipeline execution.
- Introduced new classes and methods for managing FAQs and their ingestion.
- Added new prompts for FAQ retrieval and citation formatting.
- Enhanced the CitationPipeline to support formatting of FAQ data.
- Added new enumeration for information types in citation handling.

coderabbitai · 2025-01-06T16:24:52Z

Caution

Review failed

The pull request is closed.

Walkthrough

This pull request introduces comprehensive enhancements related to FAQ (Frequently Asked Questions) functionality across various components of the application. Key changes include the addition of data transfer objects, pipeline configurations, retrieval mechanisms, and webhook endpoints. These modifications facilitate the ingestion, retrieval, and management of FAQs within courses, improving the system's capability to provide contextual information to students through an AI-driven interface.

Changes

File	Change Summary
`app/common/PipelineEnum.py`	Added two new enum members: `IRIS_FAQ_INGESTION` and `IRIS_FAQ_RETRIEVAL_PIPELINE`
`app/domain/data/faq_dto.py`	Introduced `FaqDTO` with attributes for FAQ data representation
`app/domain/ingestion/deletionPipelineExecutionDto.py`	Added `FaqDeletionExecutionDto` for managing FAQ deletion pipeline execution
`app/domain/ingestion/ingestion_pipeline_execution_dto.py`	Created `FaqIngestionPipelineExecutionDto` for FAQ ingestion pipeline execution
`app/pipeline/chat/course_chat_pipeline.py`	Enhanced `CourseChatPipeline` with FAQ retrieval and integration mechanisms
`app/pipeline/faq_ingestion_pipeline.py`	Implemented `FaqIngestionPipeline` for managing FAQ data ingestion
`app/pipeline/prompts/faq_retrieval_prompts.py`	Added prompts for FAQ retrieval and query generation
`app/pipeline/shared/citation_pipeline.py`	Modified `CitationPipeline` to support FAQ information formatting
`app/retrieval/faq_retrieval.py`	Created `FaqRetrieval` class for comprehensive FAQ retrieval strategies
`app/vector_database/database.py`	Updated `VectorDatabase` to initialize FAQ schema
`app/vector_database/faq_schema.py`	Defined schema for FAQ data in vector database
`app/web/routers/pipelines.py`	Added support for FAQ ingestion pipeline feature
`app/web/routers/webhooks.py`	Introduced webhook endpoints for FAQ ingestion and deletion
`app/web/status/faq_ingestion_status_callback.py`	Created `FaqIngestionStatus` callback for tracking FAQ ingestion pipeline status

Possibly related PRs

Enable IRIS to use FAQs #187: The main PR adds two new members to the PipelineEnum class, which directly relates to the changes in PR Enable IRIS to use FAQs #187 that also adds IRIS_FAQ_INGESTION and IRIS_FAQ_RETRIEVAL_PIPELINE to the same enumeration.
FAQ rewriting pipeline #191: The main PR's addition of IRIS_FAQ_INGESTION and IRIS_FAQ_RETRIEVAL_PIPELINE is relevant to PR FAQ rewriting pipeline #191, which introduces the IRIS_REWRITING_PIPELINE to the same PipelineEnum class, indicating a broader enhancement of the pipeline functionalities.
Track token usage of iris requests #165: The main PR's changes to PipelineEnum are also related to PR Track token usage of iris requests #165, which introduces the PipelineEnum class itself, defining various pipeline stages, including the ones added in the main PR.

Suggested Reviewers

Hialus
bassner
MichaelOwenDyer

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a36dfcc and 1ca23bd.

📒 Files selected for processing (2)

app/pipeline/chat/course_chat_pipeline.py (8 hunks)
app/pipeline/faq_ingestion_pipeline.py (1 hunks)

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (18)

app/pipeline/faq_ingestion_pipeline.py (3)
87-88: Consolidate nested context managers.

Consider using a single with statement for both batch_update_lock and self.collection.batch.rate_limit() to reduce nesting and improve readability.
-with batch_update_lock:
-    with self.collection.batch.rate_limit(requests_per_minute=600) as batch:
+with batch_update_lock, self.collection.batch.rate_limit(requests_per_minute=600) as batch:
     ...
🧰 Tools

🪛 Ruff (0.8.2)

87-88: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

140-146: Return value from chunk_data to avoid confusion.

The method doesn’t return anything. If returning None is intentional, explicitly return None to improve clarity.
 def chunk_data(self, path: str) -> List[Dict[str, str]]:
     ...
     return
+    # or, for explicitness:
+    # return None
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

1-146: Apply Black formatting.

The linter warns the file needs to be reformatted. Please run Black to ensure consistent code style.

🧰 Tools

🪛 Ruff (0.8.2)

6-6: openai.OpenAI imported but unused

Remove unused import: openai.OpenAI

(F401)

87-88: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black
app/retrieval/faq_retrieval.py (3)
2-2: Evaluate the usage of logger from asyncio.log.

You already import the standard logging module. If it’s intentional to use asyncio.log.logger, clarify to avoid confusion for future maintainers. Otherwise, consider switching to the conventional logging approach.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

191-191: Check string matching for 'YES'.

Strict equality to "YES" could cause edge-case failures if the response includes extra whitespace or a different capitalization. Consider normalizing the casing or trimming whitespace before comparison.
-return response == "YES"
+return response.strip().upper() == "YES"
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

1-395: Apply Black formatting.

The linter warns the file needs to be reformatted. Please run Black to ensure consistent code style.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black
app/domain/data/faq_dto.py (1)

1-13: Apply Black formatting.

The linter warns the file needs to be reformatted. Please run Black to ensure consistent code style.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/domain/ingestion/deletionPipelineExecutionDto.py (1)

18-23: Apply Black formatting.

The file also needs reformatting according to the pipeline.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/domain/ingestion/ingestion_pipeline_execution_dto.py (1)

18-23: Apply Black formatting.

The file also needs reformatting according to the pipeline.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/web/status/faq_ingestion_status_callback.py (1)

17-45: Add robust error handling.

Consider adding error handling around URL construction and the super().__init__ call to anticipate potential issues in production—for example, invalid or inaccessible endpoints.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black
app/vector_database/faq_schema.py (2)
1-1: Remove unused logging import.
This import is never referenced in the file. Eliminating unused imports helps reduce code clutter.
- import logging
 from enum import Enum
🧰 Tools

🪛 Ruff (0.8.2)

1-1: logging imported but unused

Remove unused import: logging

(F401)

31-31: Remove unused variable assignment.
The local variable properties is assigned but never used. Eliminating unused variables helps keep the code clean.
- properties = collection.config.get(simple=True).properties
🧰 Tools

🪛 Ruff (0.8.2)

31-31: Local variable properties is assigned to but never used

Remove assignment to unused variable properties

(F841)
app/pipeline/shared/citation_pipeline.py (2)
1-1: Remove unused logging import.
This import is never referenced, so deleting it helps maintain code cleanliness.
- import logging
 import os
 from asyncio.log import logger
🧰 Tools

🪛 Ruff (0.8.2)

1-1: logging imported but unused

Remove unused import: logging

(F401)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

80-80: Rename unused loop variable.
The loop index variable i is never used. Renaming it to _ or _i clarifies that it’s intentionally unused.
 for i, faq in enumerate(faqs):
     faq = "Question: {}, Answer: {}".format(
         ...
+ # rename i -> _
 for _, faq in enumerate(faqs):
     faq = ...
🧰 Tools

🪛 Ruff (0.8.2)

80-80: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black
app/pipeline/chat/course_chat_pipeline.py (4)
13-13: Remove unused import.

Static analysis indicates that SystemMessagePromptTemplate is imported but not used. Please remove it to keep the code clean.
- from langchain_core.prompts import (
-     ChatPromptTemplate, SystemMessagePromptTemplate,
- )
+ from langchain_core.prompts import (
+     ChatPromptTemplate,
+ )
🧰 Tools

🪛 Ruff (0.8.2)

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

85-85: Consider initializing retrieved_faqs to an empty list.

Currently, retrieved_faqs is None by default. If the downstream logic doesn’t handle None checks or if you need to append to it before any retrieval, you may encounter NoneType errors. Initializing it to an empty list can help.
- retrieved_faqs: List[dict] = None
+ retrieved_faqs: List[dict] = []
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

300-329: Explore error handling for no matched FAQs.

The function returns an empty string if no FAQs are found. Consider adding some feedback to clarify that no matching FAQs were retrieved, otherwise there's a risk of returning an empty answer with no context.
 if not self.retrieved_faqs:
+    return "No matching FAQs found."
 result = ""
 for faq in self.retrieved_faqs:
     ...
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

510-516: Fix docstring reference.

The docstring still references “lectures” instead of “FAQs.” Update it to reflect the correct resource type being checked.
 :return: True if there are indexed lectures for the course, False otherwise
+:return: True if there are indexed FAQs for the course, False otherwise
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ddd77de and 3b12cd6.

⛔ Files ignored due to path filters (1)

docker/pyris-dev.yml is excluded by !**/*.yml

📒 Files selected for processing (14)

app/common/PipelineEnum.py (1 hunks)
app/domain/data/faq_dto.py (1 hunks)
app/domain/ingestion/deletionPipelineExecutionDto.py (2 hunks)
app/domain/ingestion/ingestion_pipeline_execution_dto.py (2 hunks)
app/pipeline/chat/course_chat_pipeline.py (10 hunks)
app/pipeline/faq_ingestion_pipeline.py (1 hunks)
app/pipeline/prompts/faq_retrieval_prompts.py (1 hunks)
app/pipeline/shared/citation_pipeline.py (4 hunks)
app/retrieval/faq_retrieval.py (1 hunks)
app/vector_database/database.py (2 hunks)
app/vector_database/faq_schema.py (1 hunks)
app/web/routers/pipelines.py (1 hunks)
app/web/routers/webhooks.py (5 hunks)
app/web/status/faq_ingestion_status_callback.py (1 hunks)

🧰 Additional context used

🪛 GitHub Actions: Run linters

app/domain/ingestion/deletionPipelineExecutionDto.py

[warning] File needs to be reformatted using Black

app/domain/data/faq_dto.py

[warning] File needs to be reformatted using Black

app/pipeline/chat/course_chat_pipeline.py

[warning] File needs to be reformatted using Black

app/pipeline/shared/citation_pipeline.py

[warning] File needs to be reformatted using Black

app/pipeline/faq_ingestion_pipeline.py

[warning] File needs to be reformatted using Black

app/web/status/faq_ingestion_status_callback.py

[warning] File needs to be reformatted using Black

app/domain/ingestion/ingestion_pipeline_execution_dto.py

[warning] File needs to be reformatted using Black

app/web/routers/webhooks.py

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py

[warning] File needs to be reformatted using Black

🪛 Ruff (0.8.2)

app/pipeline/chat/course_chat_pipeline.py

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

app/pipeline/shared/citation_pipeline.py

1-1: logging imported but unused

Remove unused import: logging

(F401)

80-80: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

app/vector_database/faq_schema.py

1-1: logging imported but unused

Remove unused import: logging

(F401)

31-31: Local variable properties is assigned to but never used

Remove assignment to unused variable properties

(F841)

app/pipeline/faq_ingestion_pipeline.py

6-6: openai.OpenAI imported but unused

Remove unused import: openai.OpenAI

(F401)

87-88: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Build and Push Docker Image

🔇 Additional comments (13)

app/pipeline/faq_ingestion_pipeline.py (1)

56-79: Handle potential dto attribute issues in __call__.

self.dto is assumed to have non-None values for faq inside the method. If dto or dto.faq can be None, consider adding safeguards or checks before accessing dto.faq.

Would you like a script to search for other places where dto might be used without null checks across the codebase to confirm it's always set?

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py (2)

148-149: Confirm question assessment coverage.

The condition if not self.assess_question(chat_history, student_query): return [] might skip retrieving FAQs entirely. Ensure you have tests verifying that valid queries pass this check.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

223-225: Record token usage more uniformly.

You're appending self.llm.tokens to self.tokens. Verify that any prior usage of tokens in the pipeline isn’t overwritten or missed. If concurrency is involved, consider thread-safe approaches.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/domain/data/faq_dto.py (1)

4-10: DTO field defaults and alias usage look good.

The structure appears straightforward, with well-defined fields and aliases. Everything looks consistent and clear.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/domain/ingestion/deletionPipelineExecutionDto.py (1)

18-23: Ensure alignment with the ingestion/deletion flow.

The new FaqDeletionExecutionDto references FaqDTO with alias pyrisFaqWebhookDTO. Verify that references in the deletion pipeline only rely on that alias or confirm that all references are consistent.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/domain/ingestion/ingestion_pipeline_execution_dto.py (1)

18-23: DTO usage consistency.

Similar to the deletion DTO, ensure calls expecting pyrisFaqWebhookDTO come from the correct field. This prevents mismatch errors when hooking up your ingestion pipeline.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/common/PipelineEnum.py (1)

17-18: Enums added successfully.

The new enum members align well with existing naming conventions. Consider adding brief docstrings or comments indicating their use cases, but otherwise, the additions look good.

app/pipeline/prompts/faq_retrieval_prompts.py (1)

1-19: Prompts seem clearly defined.

These prompt strings are well-structured and descriptive. To further improve maintainability, consider documenting any guidelines or constraints (e.g., word limit enforcement) so that future maintainers understand the context and usage.

app/vector_database/database.py (1)

32-32: Schema initialization looks good.

The new faqs schema initialization follows the existing pattern. Verify that repeated queries for FAQ data do not require reinitializing the schema for each call. Otherwise, this approach appears consistent with the rest of the file.

app/web/routers/webhooks.py (2)

74-100: Overall concurrency approach looks consistent.
Your usage of Semaphore and finally blocks ensures that resources are always released. This is in line with best practices.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

153-179: Webhook naming and structure are consistent with the codebase.
The new endpoints follow the existing patterns for ingestion and deletion.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/web/routers/pipelines.py (1)

298-305: Implementation of "FAQ_INGESTION" variant is correct.
The newly added switch case for "FAQ_INGESTION" properly returns a default variant, mirroring other feature switch patterns.

app/pipeline/chat/course_chat_pipeline.py (1)

Line range hint 1-9999: Reformat code with Black.

The file needs to be formatted according to Black. Please run the formatter to comply with the pipeline requirements.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/pipeline/faq_ingestion_pipeline.py

app/web/status/faq_ingestion_status_callback.py

app/web/routers/webhooks.py

coderabbitai

Actionable comments posted: 5

🔭 Outside diff range comments (1)

app/pipeline/shared/citation_pipeline.py (1)
Line range hint 144-144: Fix incorrect logger.error usage

At line 144, the logger.error method is not used correctly. Pass the exception using exc_info or include it in the message.

Replace:
logger.error("citation pipeline failed", e)
With:
logger.error(f"citation pipeline failed: {e}")
Or, to include traceback information:
logger.error("citation pipeline failed", exc_info=True)
🧰 Tools

🪛 Ruff (0.8.2)

1-1: logging imported but unused

Remove unused import: logging

(F401)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

🧹 Nitpick comments (14)

app/retrieval/faq_retrieval.py (4)

181-187: Remove unnecessary exception handling

In the assess_question method, exceptions are caught and immediately re-raised without additional handling or logging. This try-except block can be removed to simplify the code.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

215-222: Remove unnecessary exception handling

In rewrite_student_query, the try-except block catches exceptions and immediately re-raises them without adding context or handling. Consider removing the try-except block unless specific handling is needed.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

253-260: Remove unnecessary exception handling

In rewrite_elaborated_query, exceptions are caught and immediately re-raised. Unless you're adding specific handling or logging, the try-except block can be omitted.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

66-82: Consider initializing self.tokens in the superclass

If multiple pipelines use self.tokens, consider moving its initialization to the superclass Pipeline for consistency and to avoid duplication.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black
app/pipeline/prompts/faq_citation_prompt.txt (3)
1-1: Add missing commas for clarity

In line 1, add commas after introductory phrases to improve readability.

Change:
In the paragraphs below you are provided with an answer to a question. Underneath the answer you will find the faqs that the answer was based on.
To:
In the paragraphs below, you are provided with an answer to a question. Underneath the answer, you will find the FAQs that the answer was based on.
🧰 Tools

🪛 LanguageTool

[uncategorized] ~1-~1: Possible missing comma found.
Context: In the paragraphs below you are provided with an answer to a qu...

(AI_HYDRA_LEO_MISSING_COMMA)

[uncategorized] ~1-~1: Possible missing comma found.
Context: ...an answer to a question. Underneath the answer you will find the faqs that the answer ...

(AI_HYDRA_LEO_MISSING_COMMA)

7-8: Clarify instructions regarding citations

Lines 7 and 8 contain contradictory instructions about including citations. Consider rephrasing for clarity.

32-32: Reduce excessive exclamation marks

Using multiple exclamation marks may seem unprofessional. Consider reducing them for a formal tone.

Change:
If the answer actually does not contain any information from the paragraphs, please do not include any citations and return '!NONE!'.
To:
If the answer does not contain any information from the paragraphs, please do not include any citations and return 'NONE'.
🧰 Tools

🪛 LanguageTool

[style] ~32-~32: Using many exclamation marks might seem excessive (in this case: 4 exclamation marks for a text that’s 2043 characters long)
Context: ...o not include any citations and return '!NONE!'.

(EN_EXCESSIVE_EXCLAMATION)
app/pipeline/shared/citation_pipeline.py (2)
1-1: Remove unused import logging

The logging module is imported but not used in the file. Remove it to clean up the code.

🧰 Tools

🪛 Ruff (0.8.2)

1-1: logging imported but unused

Remove unused import: logging

(F401)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

84-91: Rename unused loop variable

In the create_formatted_faq_string method, the loop variable i is not used. Rename it to _ to indicate it's intentionally unused.

Change:
for i, faq in enumerate(faqs):
To:
for _, faq in enumerate(faqs):
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black
app/pipeline/chat/course_chat_pipeline.py (5)
13-13: Remove unused import

The SystemMessagePromptTemplate is imported but never used in the code.
-    ChatPromptTemplate, SystemMessagePromptTemplate,
+    ChatPromptTemplate,
🧰 Tools

🪛 Ruff (0.8.2)

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

300-329: Consider using string join for better readability

The FAQ result string concatenation could be improved using join() for better readability and performance.
-            result = ""
-            for faq in self.retrieved_faqs:
-                res = "[FAQ ID: {}, FAQ Question: {}, FAQ Answer: {}]".format(
-                    faq.get(FaqSchema.FAQ_ID.value),
-                    faq.get(FaqSchema.QUESTION_TITLE.value),
-                    faq.get(FaqSchema.QUESTION_ANSWER.value),
-                )
-                result += res
-            return result
+            return "\n".join(
+                "[FAQ ID: {}, FAQ Question: {}, FAQ Answer: {}]".format(
+                    faq.get(FaqSchema.FAQ_ID.value),
+                    faq.get(FaqSchema.QUESTION_TITLE.value),
+                    faq.get(FaqSchema.QUESTION_ANSWER.value),
+                )
+                for faq in self.retrieved_faqs
+            )
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

429-433: Remove extra blank lines

There are unnecessary blank lines after the FAQ tool addition.
            if self.should_allow_faq_tool(dto.course.id):
                tool_list.append(faq_content_retrieval)
-

-

-
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

454-459: Consolidate progress callbacks

The "Augmenting response..." callback is duplicated for paragraphs and FAQs. Consider consolidating these callbacks.
-                self.callback.in_progress("Augmenting response ...")
                 out = self.citation_pipeline(self.retrieved_paragraphs, out, InformationType.PARAGRAPHS)
             self.tokens.extend(self.citation_pipeline.tokens)

             if self.retrieved_faqs:
-                self.callback.in_progress("Augmenting response ...")
                 out = self.citation_pipeline(self.retrieved_faqs, out, InformationType.FAQS, base_url=dto.settings.artemis_base_url)
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

511-528: Fix docstring return description

The docstring incorrectly mentions "lectures" instead of "FAQs" in the return description.
        """
        Checks if there are indexed faqs for the given course

        :param course_id: The course ID
-        :return: True if there are indexed lectures for the course, False otherwise
+        :return: True if there are indexed FAQs for the course, False otherwise
        """
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b12cd6 and 1017477.

📒 Files selected for processing (4)

app/pipeline/chat/course_chat_pipeline.py (10 hunks)
app/pipeline/prompts/faq_citation_prompt.txt (1 hunks)
app/pipeline/shared/citation_pipeline.py (5 hunks)
app/retrieval/faq_retrieval.py (1 hunks)

🧰 Additional context used

🪛 LanguageTool

app/pipeline/prompts/faq_citation_prompt.txt

[uncategorized] ~1-~1: Possible missing comma found.
Context: In the paragraphs below you are provided with an answer to a qu...

(AI_HYDRA_LEO_MISSING_COMMA)

[uncategorized] ~1-~1: Possible missing comma found.
Context: ...an answer to a question. Underneath the answer you will find the faqs that the answer ...

(AI_HYDRA_LEO_MISSING_COMMA)

[style] ~32-~32: Using many exclamation marks might seem excessive (in this case: 4 exclamation marks for a text that’s 2043 characters long)
Context: ...o not include any citations and return '!NONE!'.

(EN_EXCESSIVE_EXCLAMATION)

🪛 Ruff (0.8.2)

app/pipeline/chat/course_chat_pipeline.py

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

app/pipeline/shared/citation_pipeline.py

1-1: logging imported but unused

Remove unused import: logging

(F401)

83-83: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

🪛 GitHub Actions: Run linters

app/pipeline/chat/course_chat_pipeline.py

[warning] File needs to be reformatted using Black

app/pipeline/shared/citation_pipeline.py

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py

[warning] File needs to be reformatted using Black

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Build and Push Docker Image

🔇 Additional comments (1)

app/pipeline/chat/course_chat_pipeline.py (1)

Line range hint 1-537: Implementation looks good! 👍

The FAQ integration is well-implemented, following existing patterns and conventions. The code successfully enables IRIS to use FAQs as intended in the PR objectives.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py

app/pipeline/shared/citation_pipeline.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

app/pipeline/chat/course_chat_pipeline.py (5)
13-13: Remove unused import

The import SystemMessagePromptTemplate is not used in the code.
-from langchain_core.prompts import (
-    ChatPromptTemplate, SystemMessagePromptTemplate,
-)
+from langchain_core.prompts import ChatPromptTemplate
🧰 Tools

🪛 Ruff (0.8.2)

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

307-336: Add error handling for FAQ retrieval

While the implementation is good, it should handle potential errors from the FAQ retriever, similar to other tools in the pipeline.
 def faq_content_retrieval() -> str:
     self.callback.in_progress("Retrieving faq content ...")
-    self.retrieved_faqs = self.faq_retriever(
-        chat_history=history,
-        student_query=query.contents[0].text_content,
-        result_limit=5,
-        course_name=dto.course.name,
-        course_id=dto.course.id,
-        base_url=dto.settings.artemis_base_url,
-    )
+    try:
+        self.retrieved_faqs = self.faq_retriever(
+            chat_history=history,
+            student_query=query.contents[0].text_content,
+            result_limit=5,
+            course_name=dto.course.name,
+            course_id=dto.course.id,
+            base_url=dto.settings.artemis_base_url,
+        )
+    except Exception as e:
+        logger.error("Error retrieving FAQs", exc_info=e)
+        return "Error retrieving FAQ content"
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

436-440: Fix formatting: Remove extra blank lines

There are unnecessary blank lines that should be removed to maintain consistent formatting.
 if self.should_allow_faq_tool(dto.course.id):
     tool_list.append(faq_content_retrieval)
-
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

518-535: Fix docstring return description

The docstring's return description mentions lectures instead of FAQs.
     """
     Checks if there are indexed faqs for the given course

     :param course_id: The course ID
-    :return: True if there are indexed lectures for the course, False otherwise
+    :return: True if there are indexed FAQs for the course, False otherwise
     """
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

543-544: Remove extra blank lines at the end of file

There are unnecessary blank lines at the end of the file.
def datetime_to_string(dt: Optional[datetime]) -> str:
    if dt is None:
        return "No date provided"
    else:
        return dt.strftime("%Y-%m-%d %H:%M:%S")
-

-
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1017477 and cfb1240.

📒 Files selected for processing (1)

app/pipeline/chat/course_chat_pipeline.py (10 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

app/pipeline/chat/course_chat_pipeline.py

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

🪛 GitHub Actions: Run linters

app/pipeline/chat/course_chat_pipeline.py

[warning] File needs to be reformatted using Black

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Build and Push Docker Image

🔇 Additional comments (2)

app/pipeline/chat/course_chat_pipeline.py (2)

86-86: LGTM! Good addition of the FAQ retrieval attribute

The retrieved_faqs attribute is well-placed alongside the existing retrieved_paragraphs attribute, maintaining consistency in the class structure.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

464-466: LGTM! Good integration with the citation pipeline

The FAQ augmentation is well-integrated with the existing citation pipeline, maintaining consistency with the paragraph augmentation pattern.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

… feature/faq/basic-faq-pipe

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (14)

app/web/routers/webhooks.py (4)
74-99: Improve function documentation.

The docstring should be more descriptive and include parameter and return type information.
-    """
-    Run the exercise chat pipeline in a separate thread
-    """
+    """
+    Run the FAQ update pipeline in a separate thread.
+    
+    Args:
+        dto (FaqIngestionPipelineExecutionDto): The data transfer object containing FAQ information
+        
+    Note:
+        This function is executed with semaphore control to limit concurrent operations
+    """
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

102-126: Address language consistency and error message specificity.

Two improvements needed:

Translate the German comment to English

Make the error message more specific to FAQ deletion
-            # Hier würd dann die Methode zum entfernen aus der Datenbank kommen
+            # Method to remove FAQ from the database
             pipeline = FaqIngestionPipeline(client=client, dto=None, callback=callback)
             pipeline.delete_faq(dto.faq.faq_id, dto.faq.course_id)

         except Exception as e:
-            logger.error(f"Error Ingestion pipeline: {e}")
+            logger.error(f"Error in FAQ deletion pipeline: {e}")
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

153-165: Remove unnecessary return statement.

FastAPI automatically returns None for void functions, making the explicit return statement redundant.
    thread = Thread(target=run_faq_update_pipeline_worker, args=(dto,))
    thread.start()
-    return
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

166-179: Remove unnecessary return statement.

Similar to the ingestion webhook, the explicit return statement is redundant.
    logging.info("Starting faq deletion")
    thread = Thread(target=run_faq_delete_pipeline_worker, args=(dto,))
    thread.start()
-    return
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black
app/pipeline/chat/course_chat_pipeline.py (4)
13-13: Remove unused import.

The SystemMessagePromptTemplate import is not used in the code.
-    ChatPromptTemplate, SystemMessagePromptTemplate,
+    ChatPromptTemplate,
🧰 Tools

🪛 Ruff (0.8.2)

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

307-336: Improve FAQ result formatting for better readability.

Consider using a more structured format with line breaks for better readability.
-                res = "[FAQ ID: {}, FAQ Question: {}, FAQ Answer: {}]".format(
-                    faq.get(FaqSchema.FAQ_ID.value),
-                    faq.get(FaqSchema.QUESTION_TITLE.value),
-                    faq.get(FaqSchema.QUESTION_ANSWER.value),
-                )
+                res = """
+FAQ Entry:
+  ID: {}
+  Question: {}
+  Answer: {}
+
+""".format(
+                    faq.get(FaqSchema.FAQ_ID.value),
+                    faq.get(FaqSchema.QUESTION_TITLE.value),
+                    faq.get(FaqSchema.QUESTION_ANSWER.value),
+                )
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

436-440: Remove extra newlines.

Multiple consecutive empty lines affect code readability.
            if self.should_allow_faq_tool(dto.course.id):
                tool_list.append(faq_content_retrieval)
-

-
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

518-524: Fix incorrect docstring return description.

The return description mentions "lectures" instead of "FAQs".
        """
        Checks if there are indexed faqs for the given course

        :param course_id: The course ID
-        :return: True if there are indexed lectures for the course, False otherwise
+        :return: True if there are indexed FAQs for the course, False otherwise
        """
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black
app/retrieval/faq_retrieval.py (6)
70-74: Consider making LLM requirements configurable

The hardcoded values for gpt_version_equivalent and context_length should be moved to configuration to make the code more maintainable and flexible.
-                gpt_version_equivalent=4.25,
-                context_length=16385,
-                privacy_compliance=True,
+                gpt_version_equivalent=config.GPT_VERSION,
+                context_length=config.CONTEXT_LENGTH,
+                privacy_compliance=config.PRIVACY_COMPLIANCE,
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

150-155: Make hybrid_factor configurable

The hardcoded value of 0.9 for hybrid_factor should be configurable to allow fine-tuning of the retrieval process.
         response = self.search_in_db(
             query=rewritten_query,
-            hybrid_factor=0.9,
+            hybrid_factor=self.config.HYBRID_FACTOR,
             result_limit=result_limit,
             course_id=course_id,
         )
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

184-190: Extract common error handling pattern

The same error handling pattern is repeated across multiple methods. Consider extracting it into a decorator.
def handle_llm_errors(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except Exception as e:
            logger.error(f"Error in {func.__name__}: {str(e)}")
            raise e
    return wrapper

@handle_llm_errors
def assess_question(self, ...):
    # Implementation
Also applies to: 217-225, 255-263

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

276-286: Add logging for filter conditions

Add logging for the filter condition to help with debugging and monitoring.
         if course_id:
+            logger.debug(f"Applying course_id filter: {course_id}")
             # Create a filter for course_id
             filter_weaviate = Filter.by_property(FaqSchema.COURSE_ID.value).equal(
                 course_id
             )
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

316-358: Optimize thread pool usage

Consider using a single thread pool for both sets of tasks to reduce overhead.
-        with concurrent.futures.ThreadPoolExecutor() as executor:
+        with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
             # Schedule the rewrite tasks to run in parallel
             rewritten_query_future = executor.submit(
                 self.rewrite_student_query,
                 chat_history,
                 student_query,
                 course_language,
                 course_name,
             )
             # ... other tasks ...
-
-            # Get the results once both tasks are complete
-            rewritten_query = rewritten_query_future.result()
-            hypothetical_answer_query = hypothetical_answer_query_future.result()
-
-            # Execute the database search tasks
-        with concurrent.futures.ThreadPoolExecutor() as executor:
             response_future = executor.submit(
                 self.search_in_db,
                 query=rewritten_query_future.result(),
                 hybrid_factor=0.9,
                 result_limit=result_limit,
                 course_id=course_id,
             )
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

360-385: Make default language configurable

The hardcoded default language "english" should be moved to configuration.
     def fetch_course_language(self, course_id):
         """
         Fetch the language of the course based on the course ID.
         If no specific language is set, it defaults to English.
         """
-        course_language = "english"
+        course_language = self.config.DEFAULT_COURSE_LANGUAGE
🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cfb1240 and ce2904a.

📒 Files selected for processing (3)

app/pipeline/chat/course_chat_pipeline.py (10 hunks)
app/retrieval/faq_retrieval.py (1 hunks)
app/web/routers/webhooks.py (5 hunks)

🧰 Additional context used

🪛 GitHub Actions: Run linters

app/web/routers/webhooks.py

[warning] File needs to be reformatted using Black

app/pipeline/chat/course_chat_pipeline.py

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py

[warning] File needs to be reformatted using Black

🪛 Ruff (0.8.2)

app/pipeline/chat/course_chat_pipeline.py

13-13: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Build and Push Docker Image

🔇 Additional comments (3)

app/web/routers/webhooks.py (1)

11-19: LGTM! Clean import organization.

The new imports and DTOs are well-organized and properly support the FAQ functionality.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py (2)

42-58: Well-implemented merge function!

The merge_retrieved_chunks function efficiently handles deduplication using dictionary lookups, and the implementation is clear and well-documented.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

113-113: Use logger.info instead of logging.info

For consistency with the module-level logger initialization, use logger.info instead of logging.info.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] File needs to be reformatted using Black

app/retrieval/faq_retrieval.py

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (9)

app/pipeline/shared/citation_pipeline.py (1)

81-96: Improve FAQ string formatting implementation.

Several improvements can be made to the FAQ string formatting:

The loop control variable i is unused
Consider using f-strings for better readability
Consider using string joining instead of concatenation for better performance

     def create_formatted_faq_string(self, faqs, base_url):
         formatted_string = ""
-        for i, faq in enumerate(faqs):
-            faq = "FAQ ID {}, CourseId {} , FAQ Question title {} and FAQ Question Answer {} and FAQ link {}".format(
-                faq.get(FaqSchema.FAQ_ID.value),
-                faq.get(FaqSchema.COURSE_ID.value),
-                faq.get(FaqSchema.QUESTION_TITLE.value),
-                faq.get(FaqSchema.QUESTION_ANSWER.value),
-                f"{base_url}/courses/{faq.get(FaqSchema.COURSE_ID.value)}/faq/?faqId={faq.get(FaqSchema.FAQ_ID.value)}",
-            )
-            formatted_string += faq
+        formatted_faqs = []
+        for _, faq in enumerate(faqs):
+            faq_link = f"{base_url}/courses/{faq.get(FaqSchema.COURSE_ID.value)}/faq/?faqId={faq.get(FaqSchema.FAQ_ID.value)}"
+            formatted_faq = (
+                f"FAQ ID {faq.get(FaqSchema.FAQ_ID.value)}, "
+                f"CourseId {faq.get(FaqSchema.COURSE_ID.value)}, "
+                f"FAQ Question title {faq.get(FaqSchema.QUESTION_TITLE.value)} and "
+                f"FAQ Question Answer {faq.get(FaqSchema.QUESTION_ANSWER.value)} and "
+                f"FAQ link {faq_link}"
+            )
+            formatted_faqs.append(formatted_faq)
+        formatted_string = "\n".join(formatted_faqs)

🧰 Tools

🪛 Ruff (0.8.2)

86-86: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

app/pipeline/faq_ingestion_pipeline.py (2)

88-89: Combine nested context managers for better readability.

The nested with statements can be combined using a single with statement.

Apply this diff:
-        with batch_update_lock:
-            with self.collection.batch.rate_limit(requests_per_minute=600) as batch:
+        with batch_update_lock, self.collection.batch.rate_limit(requests_per_minute=600) as batch:
🧰 Tools

🪛 Ruff (0.8.2)

88-89: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

137-141: Make the return value explicit in chunk_data method.

The method implicitly returns None. For better clarity, make the return value explicit.

Apply this diff:
    def chunk_data(self, path: str) -> List[Dict[str, str]]:
        """
        Faqs are so small, they do not need to be chunked into smaller parts
        """
-        return
+        return []  # Return empty list as FAQs don't need chunking

app/web/routers/webhooks.py (3)

115-116: Translate German comment to English.

For consistency, translate the German comment "Hier würd dann die Methode zum entfernen aus der Datenbank kommen" to English.

Apply this diff:

-            # Hier würd dann die Methode zum entfernen aus der Datenbank kommen
+            # Here comes the method to remove from the database

76-124: Reduce code duplication in worker functions.

The error handling pattern is duplicated across worker functions. Consider extracting it into a decorator or utility function.

Create a new utility function:

def handle_pipeline_errors(func):
    def wrapper(*args, **kwargs):
        with semaphore:
            try:
                return func(*args, **kwargs)
            except Exception as e:
                logger.error(f"Error in pipeline: {e}")
                logger.error(traceback.format_exc())
                capture_exception(e)
            finally:
                semaphore.release()
    return wrapper

Then apply it to the worker functions:

+@handle_pipeline_errors
 def run_faq_update_pipeline_worker(dto: FaqIngestionPipelineExecutionDto):
     """
     Run the exercise chat pipeline in a separate thread
     """
-    with semaphore:
-        try:
-            callback = FaqIngestionStatus(...)
-            pipeline()
-        except Exception as e:
-            logger.error(f"Error Faq Ingestion pipeline: {e}")
-            logger.error(traceback.format_exc())
-            capture_exception(e)
-        finally:
-            semaphore.release()
+    callback = FaqIngestionStatus(...)
+    pipeline()

164-165: Remove unnecessary return statements.

The empty return statements in webhook endpoints are unnecessary as FastAPI automatically returns None.

Apply this diff:

    def faq_ingestion_webhook(dto: FaqIngestionPipelineExecutionDto):
        thread = Thread(target=run_faq_update_pipeline_worker, args=(dto,))
        thread.start()
-       return

    def faq_deletion_webhook(dto: FaqDeletionExecutionDto):
        logging.info("Starting faq deletion")
        thread = Thread(target=run_faq_delete_pipeline_worker, args=(dto,))
        thread.start()
-       return

Also applies to: 178-179

app/retrieval/faq_retrieval.py (3)

45-60: Consider preserving chunk IDs in the returned list.

The function discards the IDs when returning the merged chunks. Consider preserving the IDs for traceability and debugging purposes.

Apply this diff to preserve the IDs:
-    return [properties for uuid, properties in merged_chunks.items()]
+    return [{"id": uuid, "properties": properties} for uuid, properties in merged_chunks.items()]
185-190: Simplify error handling.

The try-except block that just re-raises the exception is unnecessary and can be removed.

Apply this diff:
-        try:
-            response = (prompt | self.pipeline).invoke({})
-            logger.info(f"Response from assessment pipeline: {response}")
-            return response == "YES"
-        except Exception as e:
-            raise e
+        response = (prompt | self.pipeline).invoke({})
+        logger.info(f"Response from assessment pipeline: {response}")
+        return response == "YES"
364-364: Capitalize the default language.

For consistency with language codes, capitalize the default language.

Apply this diff:
-        course_language = "english"
+        course_language = "English"

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce2904a and 27abf69.

📒 Files selected for processing (10)

app/domain/data/faq_dto.py (1 hunks)
app/domain/ingestion/deletionPipelineExecutionDto.py (2 hunks)
app/domain/ingestion/ingestion_pipeline_execution_dto.py (2 hunks)
app/pipeline/chat/course_chat_pipeline.py (10 hunks)
app/pipeline/chat/exercise_chat_agent_pipeline.py (1 hunks)
app/pipeline/faq_ingestion_pipeline.py (1 hunks)
app/pipeline/shared/citation_pipeline.py (5 hunks)
app/retrieval/faq_retrieval.py (1 hunks)
app/web/routers/webhooks.py (5 hunks)
app/web/status/faq_ingestion_status_callback.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

app/pipeline/chat/exercise_chat_agent_pipeline.py

🚧 Files skipped from review as they are similar to previous changes (4)

app/domain/ingestion/deletionPipelineExecutionDto.py
app/domain/data/faq_dto.py
app/web/status/faq_ingestion_status_callback.py
app/domain/ingestion/ingestion_pipeline_execution_dto.py

🧰 Additional context used

🪛 Ruff (0.8.2)

app/pipeline/chat/course_chat_pipeline.py

14-14: langchain_core.prompts.SystemMessagePromptTemplate imported but unused

Remove unused import: langchain_core.prompts.SystemMessagePromptTemplate

(F401)

app/pipeline/faq_ingestion_pipeline.py

6-6: openai.OpenAI imported but unused

Remove unused import: openai.OpenAI

(F401)

88-89: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

app/pipeline/shared/citation_pipeline.py

1-1: logging imported but unused

Remove unused import: logging

(F401)

86-86: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

🪛 GitHub Actions: Run linters

app/pipeline/chat/course_chat_pipeline.py

[error] 12-12: Unused import: 'langchain_core.prompts.SystemMessagePromptTemplate' is imported but never used

[warning] 313-316: Multiple lines exceed maximum length of 120 characters

[warning] 318-318: Line too long (140 > 120 characters)

app/pipeline/faq_ingestion_pipeline.py

[error] 6-6: Unused import: 'openai.OpenAI' is imported but never used

app/pipeline/shared/citation_pipeline.py

[error] 1-1: Unused import: 'logging' is imported but never used

[warning] 107-107: Line too long (127 > 120 characters)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Build and Push Docker Image

🔇 Additional comments (9)

app/pipeline/shared/citation_pipeline.py (2)

20-22: LGTM! Clean enum implementation.

The InformationType enum is well-defined and follows Python best practices by inheriting from both str and Enum.

1-3: ⚠️ Potential issue

Fix duplicate logger imports.

The code has redundant imports for logging functionality:

Line 1: import logging

Line 3: from asyncio.log import logger
-import logging
-from asyncio.log import logger
+import logging

+logger = logging.getLogger(__name__)
Likely invalid or redundant comment.

🧰 Tools

🪛 Ruff (0.8.2)

1-1: logging imported but unused

Remove unused import: logging

(F401)

🪛 GitHub Actions: Run linters

[error] 1-1: Unused import: 'logging' is imported but never used
app/pipeline/faq_ingestion_pipeline.py (1)

1-6: 🛠️ Refactor suggestion

Consolidate logging imports and remove unused import.

The code has redundant logging imports and an unused OpenAI import:

Remove the unused openai.OpenAI import

Consolidate logging by using a single logger instance

Apply this diff:
import logging
-from asyncio.log import logger
from typing import Optional, List, Dict
from langchain_core.output_parsers import StrOutputParser
-from openai import OpenAI
from weaviate import WeaviateClient

+logger = logging.getLogger(__name__)
Likely invalid or redundant comment.

🧰 Tools

🪛 Ruff (0.8.2)

6-6: openai.OpenAI imported but unused

Remove unused import: openai.OpenAI

(F401)

🪛 GitHub Actions: Run linters

[error] 6-6: Unused import: 'openai.OpenAI' is imported but never used
app/retrieval/faq_retrieval.py (5)

70-87: LGTM!

The initialization is well-structured with appropriate requirements and configurations for the LLM model, embedding model, and reranker pipeline.

132-162: LGTM!

The function implements a robust basic retrieval logic with appropriate question assessment and query rewriting.

218-225: Simplify error handling.

The try-except block that just re-raises the exception is unnecessary and can be removed.

Apply this diff:
-        try:
-            response = (prompt | self.pipeline).invoke({})
-            token_usage = self.llm.tokens
-            token_usage.pipeline = PipelineEnum.IRIS_FAQ_RETRIEVAL_PIPELINE
-            self.tokens.append(self.llm.tokens)
-            return response
-        except Exception as e:
-            raise e
+        response = (prompt | self.pipeline).invoke({})
+        token_usage = self.llm.tokens
+        token_usage.pipeline = PipelineEnum.IRIS_FAQ_RETRIEVAL_PIPELINE
+        self.tokens.append(self.llm.tokens)
+        return response
256-263: Simplify error handling.

The try-except block that just re-raises the exception is unnecessary and can be removed.

Apply this diff:
-        try:
-            response = (prompt | self.pipeline).invoke({})
-            token_usage = self.llm.tokens
-            token_usage.pipeline = PipelineEnum.IRIS_FAQ_RETRIEVAL_PIPELINE
-            self.tokens.append(self.llm.tokens)
-            return response
-        except Exception as e:
-            raise e
+        response = (prompt | self.pipeline).invoke({})
+        token_usage = self.llm.tokens
+        token_usage.pipeline = PipelineEnum.IRIS_FAQ_RETRIEVAL_PIPELINE
+        self.tokens.append(self.llm.tokens)
+        return response
265-302: LGTM!

The function implements a robust database search with appropriate filtering and hybrid search capabilities.
app/pipeline/chat/course_chat_pipeline.py (1)

526-541: LGTM!

The function implements a robust check for FAQ availability with appropriate handling of course_id.

app/pipeline/shared/citation_pipeline.py

app/pipeline/faq_ingestion_pipeline.py

app/retrieval/faq_retrieval.py

app/pipeline/chat/course_chat_pipeline.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

app/pipeline/chat/course_chat_pipeline.py (1)
526-541: Fix docstring return description.

The docstring incorrectly states "indexed lectures" instead of "indexed FAQs" in the return description.

Apply this diff to fix the docstring:
     def should_allow_faq_tool(self, course_id: int) -> bool:
         """
         Checks if there are indexed faqs for the given course

         :param course_id: The course ID
-        :return: True if there are indexed lectures for the course, False otherwise
+        :return: True if there are indexed FAQs for the course, False otherwise
         """

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27abf69 and 82a55c3.

📒 Files selected for processing (1)

app/pipeline/chat/course_chat_pipeline.py (9 hunks)

🧰 Additional context used

🪛 GitHub Actions: Run linters

app/pipeline/chat/course_chat_pipeline.py

[warning] 312-312: Line too long (122 > 120 characters)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Build and Push Docker Image

🔇 Additional comments (4)

app/pipeline/chat/course_chat_pipeline.py (4)

23-23: LGTM! Clean integration of FAQ-related imports and attributes.

The new imports and class attribute are well-organized and follow the existing patterns in the codebase.

Also applies to: 45-49, 86-86

121-121: LGTM! Clean initialization of FAQ retriever.

The FAQ retriever is properly initialized following the same pattern as the lecture retriever.

439-441: LGTM! Clean integration of FAQ support in the pipeline.

The FAQ tool is properly integrated into the pipeline, following the same patterns as the lecture tool:

Conditional tool addition based on availability
Response augmentation using the citation pipeline

Also applies to: 467-474

309-339: 🛠️ Refactor suggestion

Fix line length in docstring and consider enhancing error handling.

The docstring contains lines exceeding 120 characters.
The function could benefit from error handling for edge cases.

Apply this diff to fix the line length issues and add error handling:

     def faq_content_retrieval() -> str:
         """
         Use this tool to retrieve information from indexed FAQs.
-        It is suitable when no other tool fits, you think it is a common question or the question is frequently asked,
-        or the question could be effectively answered by an FAQ. Also use this if the question is explicitly
-        organizational and course-related. An organizational question about the course might be
-        "What is the course structure?" or "How do I enroll?" or exam related content like "When is the exam".
-        The tool performs a RAG retrieval based on the chat history to find the most relevant FAQs.
-        Each FAQ follows this format: FAQ ID, FAQ Question, FAQ Answer.
-        Respond to the query concisely and solely using the answer from the relevant FAQs.
+        It is suitable when no other tool fits, you think it is a common question or the question is 
+        frequently asked, or the question could be effectively answered by an FAQ. Also use this if the 
+        question is explicitly organizational and course-related.
+        
+        An organizational question about the course might be "What is the course structure?" or 
+        "How do I enroll?" or exam related content like "When is the exam".
+        
+        The tool performs a RAG retrieval based on the chat history to find the most relevant FAQs.
+        Each FAQ follows this format: FAQ ID, FAQ Question, FAQ Answer.
+        
+        Respond to the query concisely and solely using the answer from the relevant FAQs.
         This tool should only be used once per query.
         """
         self.callback.in_progress("Retrieving faq content ...")
-        self.retrieved_faqs = self.faq_retriever(
-            chat_history=history,
-            student_query=query.contents[0].text_content,
-            result_limit=10,
-            course_name=dto.course.name,
-            course_id=dto.course.id,
-            base_url=dto.settings.artemis_base_url,
-        )
+        try:
+            self.retrieved_faqs = self.faq_retriever(
+                chat_history=history,
+                student_query=query.contents[0].text_content,
+                result_limit=10,
+                course_name=dto.course.name,
+                course_id=dto.course.id,
+                base_url=dto.settings.artemis_base_url,
+            )
+        except Exception as e:
+            logger.error("Failed to retrieve FAQs", exc_info=e)
+            return "Failed to retrieve FAQs. Please try again later."

         result = ""
         for faq in self.retrieved_faqs:
             res = "[FAQ ID: {}, FAQ Question: {}, FAQ Answer: {}]".format(
                 faq.get(FaqSchema.FAQ_ID.value),
                 faq.get(FaqSchema.QUESTION_TITLE.value),
                 faq.get(FaqSchema.QUESTION_ANSWER.value),
             )
             result += res
         return result

Likely invalid or redundant comment.

🧰 Tools

🪛 GitHub Actions: Run linters

[warning] 312-312: Line too long (122 > 120 characters)

app/pipeline/faq_ingestion_pipeline.py

app/retrieval/faq_retrieval.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (9)

app/pipeline/faq_ingestion_pipeline.py (4)
25-26: Simplify lock variable assignment.

The reassignment of batch_update_lock is redundant since it's already imported from lecture_ingestion_pipeline.
- # we use the same lock as the lecture ingestion pipeline
- batch_update_lock = batch_update_lock
72-79: Improve error handling and fix grammatical error.

Two suggestions for improvement:

The error message "Failed to faq into the database" is grammatically incorrect

Consider catching specific exceptions instead of generic Exception
-        except Exception as e:
+        except (ValueError, IOError) as e:
             logger.error(f"Error updating faq: {e}")
             self.callback.error(
-                f"Failed to faq into the database: {e}",
+                f"Failed to ingest FAQ into the database: {e}",
                 exception=e,
                 tokens=self.tokens,
             )
88-89: Combine nested with statements.

Combine the nested with statements for better readability.
-        with batch_update_lock:
-            with self.collection.batch.rate_limit(requests_per_minute=600) as batch:
+        with batch_update_lock, self.collection.batch.rate_limit(requests_per_minute=600) as batch:
🧰 Tools

🪛 Ruff (0.8.2)

88-89: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

137-141: Fix return value to match type hint.

The method signature indicates it should return List[Dict[str, str]], but it returns None. Consider returning an empty list to match the type hint.
     def chunk_data(self, path: str) -> List[Dict[str, str]]:
         """
         Faqs are so small, they do not need to be chunked into smaller parts
         """
-        return
+        return []
app/retrieval/basic_retrieval.py (5)
44-63: Correct the docstring to reflect actual parameters.
The docstring mentions a user_question parameter, but this function signature only accepts prompt and chat_history. Update the docstring for clarity and consistency.
-    Adds the chat history and user question to the prompt
-        :param chat_history: The chat history
-        :param user_question: The user question
+    Adds the last four messages from the chat history to the prompt
+        :param chat_history: The chat history
98-128: Handle possible variations of "YES" in assessment logic.
Relying on an exact string match may miss valid affirmative answers (like "Yes " or "yes"). A safer check could be case-insensitive and trimmed.
- return response == "YES"
+ return response.strip().lower() == "yes"
166-197: Gracefully handle embedding or query failures.
If self.llm_embedding.embed(query) or the query.hybrid(...) call fails, the method will raise an exception. Consider adding localized error handling or retries without overcomplicating the flow.

198-267: Reuse thread pools for performance.
Creating two separate ThreadPoolExecutors in quick succession may add unnecessary overhead. Consider reusing a single executor in the same scope (or a shared executor at class level) to reduce resource churn.
[performance]

275-298: Normalize or validate retrieved languages.
Users may store inconsistent language values. Consider normalizing or at least validating fetched results to ensure recognized language codes (e.g., "english", "EN", or locale tags).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf52342 and 6163e6f.

📒 Files selected for processing (3)

app/pipeline/faq_ingestion_pipeline.py (1 hunks)
app/retrieval/basic_retrieval.py (1 hunks)
app/retrieval/faq_retrieval.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

app/retrieval/faq_retrieval.py

🧰 Additional context used

🪛 Ruff (0.8.2)

app/pipeline/faq_ingestion_pipeline.py

1-1: threading imported but unused

Remove unused import: threading

(F401)

88-89: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Mend Security Check

🔇 Additional comments (7)

app/pipeline/faq_ingestion_pipeline.py (4)

1-1: Remove unused import.

The threading module is imported but never used since the lock is imported from lecture_ingestion_pipeline.
- import threading
🧰 Tools

🪛 Ruff (0.8.2)

1-1: threading imported but unused

Remove unused import: threading

(F401)

44-48: Consider using a standard GPT version number.

The GPT version requirement of 4.25 seems unusually specific. Consider using a standard version number (e.g., 4.0) unless there's a specific feature requirement from version 4.25.

Could you clarify if there are specific features from GPT-4.25 that are required for this implementation?

98-104: Improve error handling in batch update.

106-120: Fix inconsistent return values in delete_old_faqs method.
app/retrieval/basic_retrieval.py (3)

26-41: Validate data structure of retrieved chunks.
Currently, the code assumes that each chunk contains an "id" and "properties". Consider adding safeguards or checks to handle potentially malformed chunks, especially if other pipeline stages might generate incomplete data.

130-164: Confirm concurrency safety for token management.
This method appends token usage to self.tokens in a try-except block. Consider concurrency patterns that might arise from parallel calls. Centralize and handle token usage consistently across all pipeline methods.

268-274: Interface method looks appropriate.
The abstract get_schema_properties method is well defined for subclasses to implement. No concerns here.

app/retrieval/basic_retrieval.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

app/retrieval/faq_retrieval_utils.py (1)
25-40: Translate docstring to English for consistency.

The docstring is in German while the rest of the codebase is in English. This could make it harder for non-German speaking developers to understand the function's purpose.

Apply this diff to translate the docstring:
-    """
-    Formatiert die abgerufenen FAQs in einen String.
-
-    :param retrieved_faqs: Liste der abgerufenen FAQs
-    :return: Formatierter String mit den FAQ-Daten
-    """
+    """
+    Formats the retrieved FAQs into a string.
+
+    :param retrieved_faqs: List of retrieved FAQs
+    :return: Formatted string containing the FAQ data
+    """
app/pipeline/chat/exercise_chat_agent_pipeline.py (1)
42-43: Remove unused import.

The FaqSchema import is not used in the code.

Apply this diff:
-from ...vector_database.faq_schema import FaqSchema
Also applies to: 46-46
app/pipeline/chat/course_chat_pipeline.py (1)
45-46: Remove unused import.

The FaqSchema import is not used in the code.

Apply this diff:
-from ...vector_database.faq_schema import FaqSchema
Also applies to: 49-49

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6163e6f and 430b777.

📒 Files selected for processing (3)

app/pipeline/chat/course_chat_pipeline.py (8 hunks)
app/pipeline/chat/exercise_chat_agent_pipeline.py (7 hunks)
app/retrieval/faq_retrieval_utils.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

app/pipeline/chat/course_chat_pipeline.py

49-49: ...vector_database.faq_schema.FaqSchema imported but unused

Remove unused import: ...vector_database.faq_schema.FaqSchema

(F401)

app/pipeline/chat/exercise_chat_agent_pipeline.py

46-46: ...vector_database.faq_schema.FaqSchema imported but unused

Remove unused import: ...vector_database.faq_schema.FaqSchema

(F401)

🔇 Additional comments (11)

app/retrieval/faq_retrieval_utils.py (1)

6-22: LGTM!

The function correctly checks for the existence of indexed FAQs for a given course. The implementation is efficient as it limits the query to 1 result since we only need to check existence.

app/pipeline/chat/exercise_chat_agent_pipeline.py (5)

109-109: LGTM!

The attribute is correctly typed and initialized.

143-144: LGTM!

Good job on improving code clarity by using more descriptive variable names and maintaining consistency with other retrievers.

401-424: Fix line length issues in docstring.

The docstring contains lines that exceed the maximum length of 120 characters.

Apply this diff to fix the line length issues:

-            It is suitable when no other tool fits, it is a common question or the question is frequently asked,
-            or the question could be effectively answered by an FAQ. Also use this if the question is explicitly
-            organizational and course-related. An organizational question about the course might be
-            "What is the course structure?" or "How do I enroll?" or exam related content like "When is the exam".
-            The tool performs a RAG retrieval based on the chat history to find the most relevant FAQs.
-            Each FAQ follows this format: FAQ ID, FAQ Question, FAQ Answer.
-            Respond to the query concisely and solely using the answer from the relevant FAQs.
-            This tool should only be used once per query.
+            It is suitable when no other tool fits, it is a common question or the question is frequently asked, or
+            the question could be effectively answered by an FAQ. Also use this if the question is explicitly organizational
+            and course-related.
+            An organizational question about the course might be "What is the course structure?" or "How do I enroll?" or
+            exam related content like "When is the exam".
+            The tool performs a RAG retrieval based on the chat history to find the most relevant FAQs. Each FAQ follows
+            this format: FAQ ID, FAQ Question, FAQ Answer.
+            Respond to the query concisely and solely using the answer from the relevant FAQs. This tool should only be used
+            once per query.

544-547: LGTM!

The FAQ tool is correctly added to the tool list with appropriate conditional logic.

570-572: LGTM!

The code has been reformatted for better readability without any functional changes.

app/pipeline/chat/course_chat_pipeline.py (5)

87-87: LGTM!

The attribute is correctly typed and initialized.

121-122: LGTM!

Good job on improving code clarity by using more descriptive variable names and maintaining consistency with other retrievers.

310-334: Fix line length issues in docstring.

The docstring contains lines that exceed the maximum length of 120 characters.

Apply this diff to fix the line length issues:

-            It is suitable when no other tool fits, it is a common question or the question is frequently asked,
-            or the question could be effectively answered by an FAQ. Also use this if the question is explicitly
-            organizational and course-related. An organizational question about the course might be
-            "What is the course structure?" or "How do I enroll?" or exam related content like "When is the exam".
-            The tool performs a RAG retrieval based on the chat history to find the most relevant FAQs.
-            Each FAQ follows this format: FAQ ID, FAQ Question, FAQ Answer.
-            Respond to the query concisely and solely using the answer from the relevant FAQs.
-            This tool should only be used once per query.
+            It is suitable when no other tool fits, it is a common question or the question is frequently asked, or
+            the question could be effectively answered by an FAQ. Also use this if the question is explicitly organizational
+            and course-related.
+            An organizational question about the course might be "What is the course structure?" or "How do I enroll?" or
+            exam related content like "When is the exam".
+            The tool performs a RAG retrieval based on the chat history to find the most relevant FAQs. Each FAQ follows
+            this format: FAQ ID, FAQ Question, FAQ Answer.
+            Respond to the query concisely and solely using the answer from the relevant FAQs. This tool should only be used
+            once per query.

433-435: LGTM!

The FAQ tool is correctly added to the tool list with appropriate conditional logic.

461-468: LGTM!

The FAQ citation is correctly implemented with appropriate information type and base URL.

# Conflicts: # app/common/PipelineEnum.py # app/web/routers/pipelines.py # docker/pyris-dev.yml

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 430b777 and 1242ef0.

📒 Files selected for processing (3)

app/common/PipelineEnum.py (1 hunks)
app/pipeline/chat/exercise_chat_agent_pipeline.py (6 hunks)
app/web/routers/pipelines.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

app/common/PipelineEnum.py
app/web/routers/pipelines.py

🧰 Additional context used

🪛 Ruff (0.8.2)

app/pipeline/chat/exercise_chat_agent_pipeline.py

46-46: ...vector_database.faq_schema.FaqSchema imported but unused

Remove unused import: ...vector_database.faq_schema.FaqSchema

(F401)

🪛 GitHub Actions: Run linters

app/pipeline/chat/exercise_chat_agent_pipeline.py

[warning] 46-46: Unused import: '...vector_database.faq_schema.FaqSchema' is imported but never used

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: Build and Push Docker Image
GitHub Check: Analyze (python)

🔇 Additional comments (4)

app/pipeline/chat/exercise_chat_agent_pipeline.py (4)

109-109: LGTM! Well-structured attribute addition.

The retrieved_faqs attribute is properly typed as List[dict] and initialized to None.

143-144: LGTM! Clear and consistent retriever initialization.

Good refactoring to rename retriever to lecture_retriever for better clarity, and consistent initialization of faq_retriever.

544-547: LGTM! Clean and consistent tool integration.

The FAQ tool is properly integrated following the same pattern as the lecture tool, with appropriate conditional checking using should_allow_faq_tool.

401-424: Verify the FAQ result limit.

The FAQ retrieval is limited to 10 results while lecture retrieval is limited to 5. Please verify if this difference is intentional and if 10 results might be too verbose for the chat response.

Otherwise, the implementation is well-structured and follows the established patterns:

Clear docstring explaining when to use the tool

Proper progress callback

Consistent error handling

✅ Verification successful

The FAQ result limit of 10 is consistent with the codebase patterns.

The higher limit for FAQs (10) compared to lecture content (5) appears to be an intentional design choice, consistently implemented across different pipelines. This makes sense as FAQs are typically more concise than lecture content, and having more FAQ results allows for better coverage of related questions.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check the typical length of FAQ responses
# Look for other result limits in the codebase for consistency

# Check for other result_limit usage
rg "result_limit\s*=" -A 2 -B 2

# Check FAQ formatting to understand the impact of 10 results
ast-grep --pattern 'def format_faqs($$$)'
Length of output: 5398

app/pipeline/chat/exercise_chat_agent_pipeline.py

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1242ef0 and cf2feac.

📒 Files selected for processing (1)

app/pipeline/chat/exercise_chat_agent_pipeline.py (6 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Build and Push Docker Image

🔇 Additional comments (5)

app/pipeline/chat/exercise_chat_agent_pipeline.py (5)

42-43: LGTM! The new imports enhance FAQ functionality.

The imports are correctly placed and follow the project's import style.

The FaqSchema import is still unused and should be removed as mentioned in the previous review.

108-108: LGTM! The retrieved_faqs attribute is properly typed.

The attribute is correctly initialized as List[dict] = None to store FAQ retrieval results.

142-143: LGTM! Clear and consistent retriever initialization.

Good renaming of retriever to lecture_retriever for better clarity, and consistent initialization of faq_retriever.

544-545: LGTM! Consistent conditional tool addition.

The FAQ tool is correctly added to the tool list using the same pattern as the lecture tool.

400-424: LGTM! Well-documented and properly implemented FAQ retrieval.

The implementation follows the established pattern and includes clear usage guidelines in the docstring.

Verify if retrieving 10 FAQs (line 416) is the optimal limit for this use case. Let's check other retrieval implementations:

✅ Verification successful

The result limit of 10 FAQs is appropriate and consistent with the codebase

The implementation aligns with other FAQ retrievers in the codebase, which also use a limit of 10. This higher limit (compared to 3-5 used for other content types) is suitable for FAQs as they are typically shorter and more focused, allowing for better coverage of common questions.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for result_limit parameter in retrieval implementations
rg -A 1 "result_limit\s*=" app/retrieval/
Length of output: 1149

Script:
#!/bin/bash
# Search for instances where result_limit is specified in pipeline files
rg -B 2 -A 2 "result_limit\s*=\s*\d+" app/pipeline/
Length of output: 2743

app/pipeline/chat/exercise_chat_agent_pipeline.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

app/pipeline/chat/exercise_chat_agent_pipeline.py (1)

400-423: Add error handling and empty results validation.

While the implementation follows the pattern of lecture_content_retrieval, consider these improvements:

Add validation for empty results
Include error handling for retriever failures
Consider a more concise docstring

 def faq_content_retrieval() -> str:
     """
-    Use this tool to retrieve information from indexed FAQs.
-    It is suitable when no other tool fits, it is a common question or the question is frequently asked,
-    or the question could be effectively answered by an FAQ. Also use this if the question is explicitly
-    organizational and course-related. An organizational question about the course might be
-    "What is the course structure?" or "How do I enroll?" or exam related content like "When is the exam".
-    The tool performs a RAG retrieval based on the chat history to find the most relevant FAQs.
-    Each FAQ follows this format: FAQ ID, FAQ Question, FAQ Answer.
-    Respond to the query concisely and solely using the answer from the relevant FAQs.
-    This tool should only be used once per query.
+    Retrieve and provide answers from indexed FAQs for common, organizational,
+    or course-related questions. Uses RAG retrieval based on chat history.
+    Format: FAQ ID, Question, Answer. Use once per query.
     """
     self.callback.in_progress("Retrieving faq content ...")
-    self.retrieved_faqs = self.faq_retriever(
-        chat_history=chat_history,
-        student_query=query.contents[0].text_content,
-        result_limit=10,
-        course_name=dto.course.name,
-        course_id=dto.course.id,
-        base_url=dto.settings.artemis_base_url,
-    )
+    try:
+        self.retrieved_faqs = self.faq_retriever(
+            chat_history=chat_history,
+            student_query=query.contents[0].text_content,
+            result_limit=10,
+            course_name=dto.course.name,
+            course_id=dto.course.id,
+            base_url=dto.settings.artemis_base_url,
+        )
+        if not self.retrieved_faqs:
+            return "No relevant FAQs found for your question."
 
-    result = format_faqs(self.retrieved_faqs)
-    return result
+        return format_faqs(self.retrieved_faqs)
+    except Exception as e:
+        logger.error("Error retrieving FAQs", exc_info=e)
+        return "Sorry, I encountered an error while retrieving FAQs."

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf2feac and a36dfcc.

📒 Files selected for processing (2)

app/pipeline/chat/exercise_chat_agent_pipeline.py (6 hunks)
app/web/routers/pipelines.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

app/web/routers/pipelines.py

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Build and Push Docker Image

🔇 Additional comments (4)

app/pipeline/chat/exercise_chat_agent_pipeline.py (4)

42-43: LGTM! Well-structured imports and class attribute.

The new imports and class attribute for FAQ functionality are properly organized and typed.

Also applies to: 108-108

142-143: Good refactoring of retriever variables!

The renaming of retriever to lecture_retriever improves clarity, and the new faq_retriever follows the same initialization pattern.

543-545: LGTM! Consistent tool list modification.

The FAQ tool is properly added to the tool list with appropriate conditional checking, following the same pattern as the lecture tool.

544-545: Verify FAQ retrieval functionality in production.

Let's verify that the FAQ retrieval is working as expected:

✅ Verification successful

FAQ tool implementation is properly verified and safe

The FAQ retrieval functionality is correctly implemented with proper safeguards:

Verifies FAQ existence before enabling the tool

Uses proper database querying and schema validation

Implements consistent error handling

Maintains identical behavior across course and exercise chat contexts
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify FAQ retrieval functionality
# 1. Check if any courses have FAQs indexed
# 2. Verify the structure of indexed FAQs

# Check for indexed FAQs across all courses
echo "Checking for indexed FAQs..."
ast-grep --pattern 'class FaqRetrieval {
  $$$
  __call__($$$) {
    $$$
  }
}'

# Check FAQ data structure
echo "Verifying FAQ data structure..."
rg -A 5 "FAQ" --type python
Length of output: 280

Script:
#!/bin/bash
# Add Python file type and search for FAQ-related code
echo "Searching for FAQ-related functionality..."
rg -i "faq" --type-add 'py:*.py' --type py -A 5

echo "Looking for should_allow_faq_tool implementation..."
ast-grep --pattern 'def should_allow_faq_tool($$$):
  $$$'

echo "Searching for faq_content_retrieval definition..."
rg "faq_content_retrieval" --type-add 'py:*.py' --type py -A 3
Length of output: 59052

… feature/faq/basic-faq-pipe

go away with your nitpicks

cremertim added 5 commits December 19, 2024 15:13

Initial setup of faq ingestion and deletion

2fe490b

Initial setup of faq retrieval

19c0d81

Further faq retrieval.

933c045

Working FAQ retrival

54f5167

Removed logging

3b12cd6

cremertim requested a review from bassner January 6, 2025 16:20

github-actions bot added component:Pipeline component:FastAPI component:Domain component:Docker labels Jan 6, 2025

cremertim mentioned this pull request Jan 6, 2025

Iris: Enable IRIS to answer with FAQs ls1intum/Artemis#10093

Merged

21 tasks

coderabbitai bot requested changes Jan 6, 2025

View reviewed changes

app/pipeline/faq_ingestion_pipeline.py Outdated Show resolved Hide resolved

app/web/status/faq_ingestion_status_callback.py Outdated Show resolved Hide resolved

app/web/routers/webhooks.py Outdated Show resolved Hide resolved

Removed logging, added Links for FAQ answer, updated prompts

1017477

coderabbitai bot requested changes Jan 8, 2025

View reviewed changes

Added language

e8f611c

coderabbitai bot previously approved these changes Jan 13, 2025

View reviewed changes

Merge branch 'main' into feature/faq/basic-faq-pipe

cfb1240

coderabbitai bot reviewed Jan 13, 2025

View reviewed changes

cremertim added 2 commits January 13, 2025 13:41

Increased faq limit

edfbfa2

Merge remote-tracking branch 'origin/feature/faq/basic-faq-pipe' into…

ce2904a

… feature/faq/basic-faq-pipe

cremertim dismissed coderabbitai[bot]’s stale review via ce2904a January 13, 2025 12:42

coderabbitai bot requested changes Jan 13, 2025

View reviewed changes

app/retrieval/faq_retrieval.py Outdated Show resolved Hide resolved

coderabbitai bot previously approved these changes Jan 13, 2025

View reviewed changes

wasnertobias mentioned this pull request Jan 17, 2025

FAQ rewriting pipeline #191

Merged

Reformat

27abf69

cremertim dismissed coderabbitai[bot]’s stale review via 27abf69 January 18, 2025 20:33

coderabbitai bot requested changes Jan 18, 2025

View reviewed changes

Fix coderabit

82a55c3

coderabbitai bot reviewed Jan 24, 2025

View reviewed changes

bassner requested changes Jan 28, 2025

View reviewed changes

app/pipeline/faq_ingestion_pipeline.py Outdated Show resolved Hide resolved

app/retrieval/faq_retrieval.py Outdated Show resolved Hide resolved

Refactored FAQ retrival pipeline to reduce code duplication

6163e6f

coderabbitai bot requested changes Jan 29, 2025

View reviewed changes

app/retrieval/basic_retrieval.py Show resolved Hide resolved

cremertim added 2 commits January 30, 2025 12:46

Refactored FAQ retrival pipeline to reduce code duplication

2fcdf4e

Refactored FAQ retrival pipeline to reduce code duplication

430b777

coderabbitai bot reviewed Jan 30, 2025

View reviewed changes

bassner previously approved these changes Jan 30, 2025

View reviewed changes

Merge branch 'main' into feature/faq/basic-faq-pipe

1242ef0

# Conflicts: # app/common/PipelineEnum.py # app/web/routers/pipelines.py # docker/pyris-dev.yml

cremertim dismissed bassner’s stale review via 1242ef0 January 30, 2025 12:45

coderabbitai bot requested changes Jan 30, 2025

View reviewed changes

app/pipeline/chat/exercise_chat_agent_pipeline.py Outdated Show resolved Hide resolved

Remove unused import

cf2feac

coderabbitai bot previously requested changes Jan 30, 2025

View reviewed changes

app/pipeline/chat/exercise_chat_agent_pipeline.py Outdated Show resolved Hide resolved

cremertim and others added 2 commits January 30, 2025 14:07

fix typo

dd49dc0

Merge branch 'main' into feature/faq/basic-faq-pipe

a36dfcc

coderabbitai bot reviewed Jan 30, 2025

View reviewed changes

cremertim added 2 commits January 30, 2025 14:33

linter

0921e1a

Merge remote-tracking branch 'origin/feature/faq/basic-faq-pipe' into…

1ca23bd

… feature/faq/basic-faq-pipe

bassner approved these changes Jan 30, 2025

View reviewed changes

coderabbitai bot approved these changes Jan 30, 2025

View reviewed changes

bassner merged commit eb7faf8 into main Jan 30, 2025
7 checks passed

bassner deleted the feature/faq/basic-faq-pipe branch January 30, 2025 13:46

cremertim added the deploy:pyris-test label Jan 31, 2025

github-actions bot added lock:pyris-test and removed deploy:pyris-test labels Jan 31, 2025

cremertim had a problem deploying to Iris Test January 31, 2025 11:03 — with GitHub Actions Failure

github-actions bot added the deployment-error label Jan 31, 2025

cremertim removed lock:pyris-test deployment-error labels Jan 31, 2025

bassner added this to the 1.1.0 milestone Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable IRIS to use FAQs #187

Enable IRIS to use FAQs #187

cremertim commented Jan 6, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 6, 2025 •

edited

Loading

Review failed

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

Enable IRIS to use FAQs #187

Enable IRIS to use FAQs #187

Conversation

cremertim commented Jan 6, 2025 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Jan 6, 2025 • edited Loading

Review failed

Walkthrough

Changes

Possibly related PRs

Suggested Reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

cremertim commented Jan 6, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 6, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)