Merge pull request #93 from om-ai-lab/develop/v0.2.0

Optimize LTM vector db create logic, update README
om-ai-lab · Nov 22, 2024 · 21f492f · 21f492f
2 parents 106b0ed + 34bc3e5
commit 21f492f
Show file tree

Hide file tree

Showing 24 changed files with 349 additions and 59 deletions.
diff --git a/docs/examples/DnC.md b/docs/examples/DnC.md
@@ -0,0 +1,131 @@
+# Divide-and-Conquer Example
+
+In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem.
+
+This example demonstrates how to use the framework for divide-and-conquer tasks. The example code can be found in the `examples/general_dnc` directory.
+
+```bash
+   cd examples/general_dnc
+```
+
+## Overview
+
+This example implements a general divide-and-conquer workflow that consists of following components:
+
+1. **DnC Input Interface**
+   - Handles user input containing questions and(or) images
+   - Construct data structure for workflow running
+
+2. **Init Set Variable Task**
+   - Initialize global workflow variables that is needed in the entire workflow
+
+3. **Conqueror Task**
+   - Conqueror task executes and manages complex task trees: direct agent answer, conquer current task, use tool call for current task or break current task into several subtasks
+   - It takes a hierarchical task tree and processes each task node, maintaining context and state between task executions
+
+4. **Conqueror Update Set Variable Task**
+   - Update global workflow variables changed after conqueror task excution for better reading experience in conductor UI
+
+5. **Divider Task**
+   - Break down complex task into multiple smaller subtasks
+   - Generate and match milestones to each subtask
+
+6. **Divider Update Set Variable Task**
+   - Update global workflow variables changed after divider task excution for better reading experience in conductor UI
+
+7. **Rescue Task**
+   - Rescue failed tool call task, attempt to fix the issue by retrying with corrected parameters
+
+8. **Conclude Task**
+   - Solid end of the workflow, conclude the original root task based on all related information
+
+9.  **Switch Task**
+    - After conqueror task, based on it's dicision, switch to specific next worker.
+    - Default case is the next conqueror task
+    - If too complex, switch to divider task
+    - If failed, switch to rescue task
+
+10. **Task Exit Monitor Task**
+    - Monitor whether the exit condition of the DnC loop task is met
+    - Based on the conqueror and divider task(s), the task tree is dynamicly generated and continuesly updated in the whole workflow
+
+11. **Post Set Variable Task**
+    - Update global workflow variables changed after task exit monitor task execution for better reading experience in conductor UI
+
+12. **DnC Loop Task**
+    - The core of the DnC workflow, takes a hierarchical task tree and processes each task node, maintaining context and state between task executions
+    - It contains three main tasks: conqueror task, divider task and rescue task, and other supporting tasks mentioned above
+
+### This whole workflow is looked like the following diagram:
+
+![DnC Workflow](../images/general_dnc_workflow_diagram.png)
+
+## Prerequisites
+
+- Python 3.10+
+- Required packages installed (see requirements.txt)
+- Access to OpenAI API or compatible endpoint (see configs/llms/*.yml)
+- [Optional] Access to Bing API for WebSearch tool (see configs/tools/*.yml)
+- Redis server running locally or remotely
+- Conductor server running locally or remotely
+
+## Configuration
+
+The container.yaml file is a configuration file that manages dependencies and settings for different components of the system, including Conductor connections, Redis connections, and other service configurations. To set up your configuration:
+
+1. Generate the container.yaml file:
+   ```bash
+   python compile_container.py
+   ```
+   This will create a container.yaml file with default settings under `examples/general_dnc`.
+
+
+2. Configure your LLM and tool settings in `configs/llms/*.yml` and `configs/tools/*.yml`:
+   - Set your OpenAI API key or compatible endpoint through environment variable or by directly modifying the yml file
+   ```bash
+   export custom_openai_key="your_openai_api_key"
+   export custom_openai_endpoint="your_openai_endpoint"
+   ```
+   - [Optional] Set your Bing API key or compatible endpoint through environment variable or by directly modifying the yml file
+   ```bash
+   export bing_api_key="your_bing_api_key"
+   ```
+   **Note: It isn't mandatory to set the Bing API key, as the WebSearch tool will rollback to use duckduckgo search. But it is recommended to set it for better search quality.**
+   - Configure other model settings like temperature as needed through environment variable or by directly modifying the yml file
+
+3. Update settings in the generated `container.yaml`:
+   - Modify Redis connection settings:
+     - Set the host, port and credentials for your Redis instance
+     - Configure both `redis_stream_client` and `redis_stm_client` sections
+   - Update the Conductor server URL under conductor_config section
+   - Adjust any other component settings as needed
+
+## Running the Example
+
+3. Run the general DnC example:
+
+   For terminal/CLI usage:
+   ```bash
+   python run_cli.py
+   ```
+
+   For app/GUI usage:
+   ```bash
+   python run_app.py
+   ```
+
+## Troubleshooting
+
+If you encounter issues:
+- Verify Redis is running and accessible
+- Check your OpenAI API key is valid
+- Check your Bing API key is valid if search results are not as expected
+- Ensure all dependencies are installed correctly
+- Review logs for any error messages
+- **Open an issue on GitHub if you can't find a solution, we will do our best to help you out!**
+
+
+## Building the Example
+
+Coming soon! This section will provide detailed instructions for building and packaging the general_dnc example step by step.
+
diff --git a/docs/examples/outfit_with_loop.md b/docs/examples/outfit_with_loop.md
@@ -44,7 +44,7 @@ The workflow leverages Redis for state management and the Conductor server for w
 
 ## Prerequisites
 
-- Python 3.8+
+- Python 3.10+
 - Required packages installed (see requirements.txt)
 - Access to OpenAI API or compatible endpoint
 - Access to Bing API key for web search functionality to search real-time weather information for outfit recommendations (see configs/tools/websearch.yml)

diff --git a/docs/examples/outfit_with_ltm.md b/docs/examples/outfit_with_ltm.md
@@ -45,7 +45,7 @@ The system uses Redis for state management, Milvus for long-term image storage,
 
 ## Prerequisites
 
-- Python 3.8+
+- Python 3.10+
 - Required packages installed (see requirements.txt)
 - Access to OpenAI API or compatible endpoint (see configs/llms/gpt.yml)
 - Access to Bing API key for web search functionality to search real-time weather information for outfit recommendations (see configs/tools/websearch.yml)

diff --git a/docs/examples/outfit_with_switch.md b/docs/examples/outfit_with_switch.md
@@ -37,7 +37,7 @@ The workflow follows this sequence:
 
 ## Prerequisites
 
-- Python 3.8+
+- Python 3.10+
 - Required packages installed (see requirements.txt)
 - Access to OpenAI API or compatible endpoint (see configs/llms/gpt.yml)
 - Access to Bing API key for web search functionality to search real-time weather information for outfit recommendations (see configs/tools/websearch.yml)

diff --git a/docs/examples/simple_qa.md b/docs/examples/simple_qa.md
@@ -24,7 +24,7 @@ The workflow follows a straightforward sequence:
 
 ## Prerequisites
 
-- Python 3.8+
+- Python 3.10+
 - Required packages installed (see requirements.txt)
 - Access to OpenAI API or compatible endpoint (see configs/llms/gpt.yml)
 - Redis server running locally or remotely

diff --git a/docs/examples/video_understanding.md b/docs/examples/video_understanding.md
@@ -0,0 +1,116 @@
+# Video Understanding Example
+
+This example demonstrates how to use the framework for hour-long video understanding task. The example code can be found in the `examples/video_understanding` directory.
+
+```bash
+   cd examples/video_understanding
+```
+
+## Overview
+
+This example implements a video understanding task workflow based on the DnC workflow, which consists of following components:
+
+1. **Video Preprocess Task**
+   - Preprocess the video with audio information via speech-to-text capability
+   - It detects the scene boundaries, splits the video into several chunks and extract frames at specified intervals
+   - Each scene chunk is summarized by MLLM with detailed information, cached and updated into vector database for Q&A retrieval
+   - Video metadata and video file md5 are transferred for filtering
+
+2. **Video QA Task**
+   - Take the user input question about the video
+   - Retrieve related information from the vector database with the question
+   - Extract the approximate start and end time of the video segment related to the question
+   - Generate video object from serialized data in short-term memory(stm)
+   - Build init task tree with the question to DnC task
+
+3. **Divide and Conquer Task**
+   - Execute the task tree with the question
+   - Detailed information is referred to the [DnC Example](./DnC.md#overview)
+
+The system uses Redis for state management, Milvus for long-tern memory storage and Conductor for workflow orchestration.
+
+### This whole workflow is looked like the following diagram:
+
+![Video Understanding Workflow](../images/video_understanding_workflow_diagram.png)
+
+## Prerequisites
+
+- Python 3.10+
+- Required packages installed (see requirements.txt)
+- Access to OpenAI API or compatible endpoint (see configs/llms/*.yml)
+- [Optional] Access to Bing API for WebSearch tool (see configs/tools/*.yml)
+- Redis server running locally or remotely
+- Conductor server running locally or remotely
+
+## Configuration
+
+The container.yaml file is a configuration file that manages dependencies and settings for different components of the system, including Conductor connections, Redis connections, and other service configurations. To set up your configuration:
+
+1. Generate the container.yaml file:
+   ```bash
+   python compile_container.py
+   ```
+   This will create a container.yaml file with default settings under `examples/video_understanding`.
+
+
+2. Configure your LLM and tool settings in `configs/llms/*.yml` and `configs/tools/*.yml`:
+   - Set your OpenAI API key or compatible endpoint through environment variable or by directly modifying the yml file
+   ```bash
+   export custom_openai_key="your_openai_api_key"
+   export custom_openai_endpoint="your_openai_endpoint"
+   ```
+   - [Optional] Set your Bing API key or compatible endpoint through environment variable or by directly modifying the yml file
+   ```bash
+   export bing_api_key="your_bing_api_key"
+   ```
+   **Note: It isn't mandatory to set the Bing API key, as the WebSearch tool will rollback to use duckduckgo search. But it is recommended to set it for better search quality.**
+   - The default text encoder configuration uses OpenAI `text-embedding-3-large` with **3072** dimensions, make sure you change the dim value of `MilvusLTM` in `container.yaml`
+   - Configure other model settings like temperature as needed through environment variable or by directly modifying the yml file
+
+3. Update settings in the generated `container.yaml`:
+   - Modify Redis connection settings:
+     - Set the host, port and credentials for your Redis instance
+     - Configure both `redis_stream_client` and `redis_stm_client` sections
+   - Update the Conductor server URL under conductor_config section
+   - Configure MilvusLTM in `components` section:
+     - Set the `storage_name` and `dim` for MilvusLTM
+     - Set `dim` is to **3072** if you use default OpenAI encoder, make sure to modify corresponding dimension if you use other custom text encoder model or endpoint 
+     - Adjust other settings as needed
+   - Configure hyper-parameters for video preprocess task in `examples/video_understanding/configs/workers/video_preprocessor.yml`
+     - `use_cache`: Whether to use cache for the video preprocess task
+     - `scene_detect_threshold`: The threshold for scene detection, which is used to determine if a scene change occurs in the video, min value means more scenes will be detected, default value is **27**
+     - `frame_extraction_interval`: The interval between frames to extract from the video, default value is **5**
+     - `kernel_size`: The size of the kernel for scene detection, should be **odd** number, default value is automatically calculated based on the resolution of the video. For hour-long videos, it is recommended to leave it blank, but for short videos, it is recommended to set a smaller value, like **3**, **5** to make it more sensitive to the scene change
+     - `stt.endpoint`: The endpoint for the speech-to-text service, default uses OpenAI ASR service
+     - `stt.api_key`: The API key for the speech-to-text service, default uses OpenAI API key
+   - Adjust any other component settings as needed
+
+## Running the Example
+
+3. Run the video understanding example, currently only supports CLI usage:
+
+   ```bash
+   python run_cli.py
+   ```
+
+   First time you need to input the video file path, it will take a while to preprocess the video and store the information into vector database.
+   After the video is preprocessed, you can input your question about the video and the system will answer it. Note that the agent may give the wrong or vague answer, especially some questions are related the name of the characters in the video.
+
+
+## Troubleshooting
+
+If you encounter issues:
+- Verify Redis is running and accessible
+- Try smaller `scene_detect_threshold` and `frame_extraction_interval` if you find too many scenes are detected
+- Check your OpenAI API key is valid
+- Check your Bing API key is valid if search results are not as expected
+- Check the `dim` value in `MilvusLTM` in `container.yaml` is set correctly, currently unmatched dimension setting will not raise error but lose partial of the information(we will add more checks in the future)
+- Ensure all dependencies are installed correctly
+- Review logs for any error messages
+- **Open an issue on GitHub if you can't find a solution, we will do our best to help you out!**
+
+
+## Building the Example
+
+Coming soon! This section will provide detailed instructions for building and packaging the general_dnc example step by step.
+
diff --git a/docs/images/general_dnc_workflow_diagram.png b/docs/images/general_dnc_workflow_diagram.png
diff --git a/docs/images/video_understanding_workflow_diagram.png b/docs/images/video_understanding_workflow_diagram.png
diff --git a/examples/general_dnc/configs/tools/all_tools.yml b/examples/general_dnc/configs/tools/all_tools.yml
@@ -6,5 +6,5 @@ tools:
     - WriteFileContent
     - ShellTool
     - name: WebSearch
-      bing_api_key: ${env|bing_api_key, microsoft_bing_api_key}
+      bing_api_key: ${env|bing_api_key, null}
       llm: ${sub|text_res}
diff --git a/examples/video_understanding/agent/memories/video_ltm_milvus.py b/examples/video_understanding/agent/memories/video_ltm_milvus.py
@@ -17,7 +17,9 @@ class VideoMilvusLTM(LTMBase):
     dim: int = Field(default=128)
 
     def model_post_init(self, __context: Any) -> None:        
-
+        pass
+
+    def _create_collection(self) -> None:  
         # Check if collection exists
         if not self.milvus_ltm_client._client.has_collection(self.storage_name):
             index_params = self.milvus_ltm_client._client.prepare_index_params()
@@ -54,9 +56,6 @@ def model_post_init(self, __context: Any) -> None:
 
             # Create index separately after collection creation                        
             print(f"Created storage {self.storage_name} successfully")
-        else:
-
-            print(f"{self.storage_name} storage already exists")
 
     def __getitem__(self, key: Any) -> Any:
         key_str = str(key)
@@ -71,6 +70,8 @@ def __getitem__(self, key: Any) -> Any:
             raise KeyError(f"Key {key} not found")
 
     def __setitem__(self, key: Any, value: Any) -> None:
+        self._create_collection()
+
         key_str = str(key)
 
         # Check if value is a dictionary containing 'value' and 'embedding'