Skip to content
This repository has been archived by the owner on Feb 15, 2025. It is now read-only.

Commit

Permalink
Merge branch 'main' into 697-reduce-e2e-test-runtime
Browse files Browse the repository at this point in the history
  • Loading branch information
YrrepNoj committed Jul 29, 2024
2 parents 3a5d7d7 + 89ff0a6 commit 28de723
Show file tree
Hide file tree
Showing 26 changed files with 484 additions and 32 deletions.
33 changes: 21 additions & 12 deletions .github/workflows/e2e-playright.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ jobs:
python-version-file: 'pyproject.toml'

- name: Install Python Deps
run: python -m pip install "."
run: python -m pip install ".[dev]"

- name: Setup Node
uses: actions/setup-node@60edb5dd545a775178f52524783378180af0d1f8 # v4.0.2
Expand Down Expand Up @@ -106,16 +106,6 @@ jobs:
python -m pip install requests
python -m pytest ./tests/e2e/test_supabase.py -v
##########
# UI
##########
- name: Deploy LFAI-UI
run: |
make build-ui LOCAL_VERSION=e2e-test
docker image prune -af
uds zarf package deploy packages/ui/zarf-package-leapfrogai-ui-amd64-e2e-test.tar.zst --confirm
rm packages/ui/zarf-package-leapfrogai-ui-amd64-e2e-test.tar.zst
##########
# API
##########
Expand All @@ -131,12 +121,31 @@ jobs:
python -m pip install requests
python -m pytest ./tests/e2e/test_api.py -v
# Run the playwright UI tests using the deployed Supabase endpoint
##########
# UI
##########
- name: Deploy LFAI-UI
run: |
make build-ui LOCAL_VERSION=e2e-test
docker image prune -af
uds zarf package deploy packages/ui/zarf-package-leapfrogai-ui-amd64-e2e-test.tar.zst --confirm
rm packages/ui/zarf-package-leapfrogai-ui-amd64-e2e-test.tar.zst
# Run the playwright UI tests using the deployed Supabase endpoint and upload report as an artifact
- name: UI/API/Supabase E2E Playwright Tests
run: |
cp src/leapfrogai_ui/.env.example src/leapfrogai_ui/.env
TEST_ENV=CI PUBLIC_DISABLE_KEYCLOAK=true PUBLIC_SUPABASE_ANON_KEY=$ANON_KEY npm --prefix src/leapfrogai_ui run test:integration:ci
# Upload the Playwright report as an artifact
- name: Archive Playwright Report
uses: actions/upload-artifact@v4
if: ${{ !cancelled() }}
with:
name: playwright-report
path: src/leapfrogai_ui/e2e-report/
retention-days: 30

# The UI can be removed after the Playwright tests are finished
- name: Cleanup UI
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pytest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ jobs:
run: docker run -p 50051:50051 -d --name=repeater ghcr.io/defenseunicorns/leapfrogai/repeater:dev

- name: Install Python Deps
run: pip install "." "src/leapfrogai_api" "src/leapfrogai_sdk"
run: pip install ".[dev]" "src/leapfrogai_api" "src/leapfrogai_sdk"

- name: Run Pytest
run: python -m pytest tests/pytest -v
Expand Down
2 changes: 1 addition & 1 deletion adr/0003-database.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

## Status

PROPOSED
ACCEPTED

## Context

Expand Down
100 changes: 100 additions & 0 deletions adr/0006-queueing-high-traffic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Queueing and High Traffic

## Table of Contents

- [Handling High Traffic](#Queueing-and-High-Traffic)
- [Table of Contents](#table-of-contents)
- [Status](#status)
- [Background](#background)
- [Decision](#decision)
- [Rationale](#rationale)
- [Alternatives](#alternatives)
- [Related ADRs](#related-adrs)
- [References](#references)

## Status

PROPOSED

## Context

LeapfrogAI needs to handle a large volume of inference, file upload, and embeddings requests. To ensure that we can manage this level of activity without significant performance degradation, we need to implement systems that prevent overwhelming or blocking by a large volume or single long-running task.

Adding a Queue management component can help create a more efficient request management system to deal with high request volumes of long-running tasks. However, it may introduce a significant level of complexity to the system, and we must weigh the options carefully.

Benefits of a Queue system for request processing:
- Allows the API to quickly respond even when the system is very busy.
- Prevents requests from being dropped or timing out.
- Allows resuming failed requests.
- Enables throttling of message processing rate.

## Decision

We have decided to implement a multi-tiered approach to address the queueing and high traffic challenges:

1. Address underlying bottlenecks in the system:
- Optimize endpoint implementations, processing of long-running tasks, and indexing of files.
- Reduce duplication of indexing efforts.
- Scale horizontal/vertical resources as needed.

2. Implement a lightweight queueing solution using Supabase Realtime and FastAPI background tasks:
- Utilize Supabase Realtime for task status updates (in-progress, complete, etc...) and basic queueing.
- In the event of issues with Supabase Realtime, fallback to RedPanda.
- Leverage FastAPI's background tasks to handle long running operations asynchronously in the background.

3. Prepare for future scaling by designing the system to easily integrate with a more robust queueing solution:
- Design interfaces that can work with both our current lightweight solution and future, more robust options.
- Do not attempt to push Supabase Realtime beyond its designed limits, instead plan to use RedPanda or RabbitMQ if those needs surface.

## Rationale
1. Addressing underlying bottlenecks:
- This approach ensures we're not masking performance issues with a queueing system.
- Optimizations can significantly improve system performance without adding complexity.

2. Lightweight solution (Supabase Realtime and FastAPI background tasks):
- Leverages existing infrastructure (Supabase) reducing additional operational overhead.
- FastAPI background tasks provide a simple way to handle asynchronous operations without introducing new dependencies.
- This solution meets our current needs without over-engineering.

3. Preparation for future scaling:
- Allows for easy transition to more robust solutions as the system grows.
- Prevents lock-in to a solution that may not meet future needs.

We chose this approach over alternatives for a few reasons:
- This tiered approach allows us to start with a simple solution while preparing for future growth.
- Some alternatives are viable but would likely require significant additional setup and mx work to bring to the current environment.
- The additional setup includes but is not limited to: new Zarf packages, updates to uds bundles, spikes to integrate with current app, resolving any permissions/hardening issues, more containers to add to ironbank/chainguard.
- When performing load testing on the system, the primary bottlenecks seem to be around the vectordb file indexing.
- The issues related to this process should be able to be resolved by optimizations, a light amount of queueing, and background tasks.
- Issues not related to indexing were primarily scalability issues. Which can be resolved via resource limits, throttling, improving horizontal and vertical scaling within the cluster.
- Authentication will be an issue for every solution except Supabase Realtime.

## Alternatives
Queueing Solutions Considered:
* RabbitMQ: Meets current and future needs.
* Well maintained JS and Python libraries.
* Requires additional, potentially significant integration work to bring into the k8s cluster.
* Supabase Realtime: Lightweight and already integrated, but may not meet all future queuing needs.
* Well maintained JS and Python libraries.
* Can listen directly to db transactions.
* Already integrated with Supabase auth.
* Kafka: Powerful but too heavy for our current requirements.
* Well maintained JS and Python libraries.
* Requires additional, potentially significant integration work to bring into the k8s cluster.
* Celery: Good option for Python-based systems, but introduces additional dependencies.
* Python library well maintained. JS library not well maintained.
* RedPanda: Accessible internally and provides a scalable solution.
* Well maintained JS and Python libraries as it supports the same tooling as Kafka.
* Zarf/UDS bundle already available.
* Custom Python solution: Flexible but requires significant unnecessary development effort given the tools already available.

## Related ADRs
* [0003-database](0003-database.md)

## References
1. Supabase Realtime Documentation: https://supabase.com/docs/guides/realtime
2. FastAPI Background Tasks: https://fastapi.tiangolo.com/tutorial/background-tasks/
3. Celery Documentation: https://docs.celeryq.dev/en/stable/
4. Kafka Documentation: https://kafka.apache.org/
5. RabbitMQ Documentation: https://www.rabbitmq.com/docs
6. RedPanda: https://docs.redpanda.com/docs/
2 changes: 1 addition & 1 deletion packages/k3d-gpu/plugin/device-plugin-daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ spec:
- name: NVIDIA_VISIBLE_DEVICES
value: all
- name: NVIDIA_DRIVER_CAPABILITIES
value: all
value: compute,utility
- name: MPS_ROOT
value: /run/nvidia/mps
securityContext:
Expand Down
5 changes: 5 additions & 0 deletions packages/text-embeddings/chart/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,11 @@ spec:
labels:
{{- include "chart.selectorLabels" . | nindent 8 }}
spec:
{{- if gt (index .Values.resources.limits "nvidia.com/gpu") 0.0 }}
runtimeClassName: nvidia
{{- else if .Values.gpu.runtimeClassName }}
runtimeClassName: {{ .Values.gpu.runtimeClassName }}
{{- end }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
Expand Down
5 changes: 4 additions & 1 deletion packages/text-embeddings/embedding-values.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
image:
tag: "###ZARF_CONST_IMAGE_VERSION###"

gpu:
runtimeClassName: "###ZARF_VAR_GPU_CLASS_NAME###"

resources:
limits:
nvidia.com/gpu: "###ZARF_VAR_GPU_LIMIT###"
nvidia.com/gpu: ###ZARF_VAR_GPU_LIMIT###
4 changes: 4 additions & 0 deletions packages/text-embeddings/zarf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ variables:
description: The GPU limit for the model inferencing.
default: "0"
pattern: "^[0-9]+$"
- name: GPU_CLASS_NAME
description: The GPU class name for the model inferencing. Leave blank for CPU-only.
default: ""
pattern: "^(nvidia)?$"

components:
- name: text-embeddings-model
Expand Down
1 change: 1 addition & 0 deletions packages/vllm/chart/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ spec:
labels:
{{- include "chart.selectorLabels" . | nindent 8 }}
spec:
runtimeClassName: {{ .Values.gpu.runtimeClassName }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
Expand Down
3 changes: 3 additions & 0 deletions packages/vllm/vllm-values.yaml
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
image:
tag: "###ZARF_CONST_IMAGE_VERSION###"

gpu:
runtimeClassName: nvidia
2 changes: 1 addition & 1 deletion packages/whisper/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ RUN pip uninstall -y ctranslate2 transformers[torch]
RUN pip install packages/whisper/build/lfai_whisper*.whl --no-index --find-links=packages/whisper/build/

# Use hardened ffmpeg image to get compiled binaries
FROM cgr.dev/chainguard/ffmpeg:latest as ffmpeg
FROM cgr.dev/chainguard/ffmpeg:latest AS ffmpeg

# hardened and slim python image
FROM ghcr.io/defenseunicorns/leapfrogai/python:3.11
Expand Down
5 changes: 5 additions & 0 deletions packages/whisper/chart/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,11 @@ spec:
labels:
{{- include "chart.selectorLabels" . | nindent 8 }}
spec:
{{- if gt (index .Values.resources.limits "nvidia.com/gpu") 0.0 }}
runtimeClassName: nvidia
{{- else if .Values.gpu.runtimeClassName }}
runtimeClassName: {{ .Values.gpu.runtimeClassName }}
{{- end }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
Expand Down
5 changes: 4 additions & 1 deletion packages/whisper/whisper-values.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
image:
tag: "###ZARF_CONST_IMAGE_VERSION###"

gpu:
runtimeClassName: "###ZARF_VAR_GPU_CLASS_NAME###"

resources:
limits:
nvidia.com/gpu: "###ZARF_VAR_GPU_LIMIT###"
nvidia.com/gpu: ###ZARF_VAR_GPU_LIMIT###
4 changes: 4 additions & 0 deletions packages/whisper/zarf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ variables:
description: The GPU limit for the model inferencing.
default: "0"
pattern: "^[0-9]+$"
- name: GPU_CLASS_NAME
description: The GPU class name for the model inferencing. Leave blank for CPU-only.
default: ""
pattern: "^(nvidia)?$"

components:
- name: whisper-model
Expand Down
9 changes: 4 additions & 5 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,15 @@ license = {file = "LICENSE"}
dependencies = [ # Dev dependencies needed for all of lfai
"openai",
"pip-tools == 7.3.0",
"pytest",
"pytest-asyncio",
"httpx",
"ruff",
"python-dotenv",
"pytest-asyncio",
"requests"
"python-dotenv"
]
requires-python = "~=3.11"

[project.optional-dependencies]
dev = ["locust", "pytest-asyncio", "requests", "requests-toolbelt", "pytest"]

[tool.pip-tools]
generate-hashes = true

Expand Down
1 change: 1 addition & 0 deletions src/leapfrogai_api/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ install-api:
python -m pip install ../../src/leapfrogai_sdk
@cd ${MAKEFILE_DIR} && \
python -m pip install -e .
python -m pip install "../../.[dev]"

dev-run-api:
@cd ${MAKEFILE_DIR} && \
Expand Down
7 changes: 5 additions & 2 deletions src/leapfrogai_ui/playwright.config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,11 @@ const devConfig: PlaywrightTestConfig = {
// when e2e testing, use the deployed instance
const CI_Config: PlaywrightTestConfig = {
use: {
baseURL: 'https://ai.uds.dev'
}
baseURL: 'https://ai.uds.dev',
screenshot: 'only-on-failure',
video: 'retain-on-failure'
},
reporter: [['html', { outputFolder: 'e2e-report' }]]
};

// get the environment type from command line. If none, set it to dev
Expand Down
Binary file added tests/data/russian.mp3
Binary file not shown.
2 changes: 1 addition & 1 deletion tests/e2e/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ make build-llama-cpp-python
uds zarf package deploy zarf-package-llama-cpp-python-*.tar.zst

# Install the python dependencies
python -m pip install "."
python -m pip install ".[dev]"

# Run the tests!
# NOTE: Each model backend has its own e2e test files
Expand Down
52 changes: 52 additions & 0 deletions tests/load/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# LeapfrogAI Load Tests

## Overview

These tests check the API's ability to handle different amounts of load. The tests simulate a specified number of users hitting the endpoints with some number of requests per second.

# Requirements

### Environment Setup

Before running the tests, ensure that your API URL and bearer token are properly configured in your environment variables. Follow these steps:

1. Set the API URL:
```bash
export API_URL="https://leapfrogai-api.uds.dev"
```

2. Set the API token:
```bash
export BEARER_TOKEN="<your-supabase-jwt-here>"
```

**Note:** The bearer token should be your Supabase user JWT. For information on generating a JWT, please refer to the [Supabase README.md](../../packages/supabase/README.md). While an API key generated from the LeapfrogAI API endpoint can be used, it will cause the token generation load tests to fail.

3. (Optional) - Set the model backend, this will default to `vllm` if unset:
```bash
export DEFAULT_MODEL="llama-cpp-python"
```

## Running the Tests

To start the Locust web interface and run the tests:

1. Install dependencies from the project root.
```bash
pip install ".[dev]"
```

2. Navigate to the directory containing `loadtest.py`.

3. Execute the following command:
```bash
locust -f loadtest.py --web-port 8089
```

4. Open your web browser and go to `http://0.0.0.0:8089`.

5. Use the Locust web interface to configure and run your tests:
- Set the number of users to simulate
- Set the spawn rate (users per second)
- Choose the host to test against (should match your `API_URL`)
- Start the test and monitor results in real-time
Loading

0 comments on commit 28de723

Please sign in to comment.