Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Deflake test_metrics #47750

Draft
wants to merge 36 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
a0d6bd0
right size tests
GeneDer Sep 19, 2024
20bb4b5
trigger another build
GeneDer Sep 20, 2024
e688c20
factor out test_metrics on it's own and use large sized test
GeneDer Sep 20, 2024
ecbc2f6
fix
GeneDer Sep 20, 2024
320d1ba
fix tag
GeneDer Sep 20, 2024
1870bdc
Merge branch 'master' into deflak-test-metrics
GeneDer Sep 20, 2024
70fbb9d
fix kwargs
GeneDer Sep 20, 2024
9165510
try again
GeneDer Sep 20, 2024
d15d0d0
test again
GeneDer Sep 20, 2024
2ffefce
test again
GeneDer Sep 20, 2024
8e22021
test again
GeneDer Sep 20, 2024
efceb03
revert change and add logics to clean up metrics between tests
GeneDer Sep 24, 2024
eb15873
lint
GeneDer Sep 24, 2024
9502dd4
check health for prometheus before cleanup
GeneDer Sep 25, 2024
71d1336
refactor clean up metrics as a fixture
GeneDer Sep 25, 2024
dab214c
test again
GeneDer Sep 25, 2024
22f9145
test again
GeneDer Sep 25, 2024
46ce3a0
test again
GeneDer Sep 25, 2024
3e63207
test again
GeneDer Sep 25, 2024
0a5cbf0
clean up serve and ray before and after the tests
GeneDer Sep 28, 2024
04ee77c
try again
GeneDer Sep 28, 2024
869594b
try again
GeneDer Sep 28, 2024
0838d2a
Merge branch 'master' into deflak-test-metrics
GeneDer Sep 30, 2024
25cc12a
Merge branch 'master' into deflak-test-metrics
GeneDer Sep 30, 2024
ff1a839
Merge branch 'master' into deflak-test-metrics
GeneDer Oct 1, 2024
1482c03
try again
GeneDer Oct 1, 2024
e9a88c6
try again
GeneDer Oct 2, 2024
747d479
only decrement num_scheduling_tasks_in_backoff if it's greater than 0
GeneDer Oct 2, 2024
79fce0e
try again
GeneDer Oct 2, 2024
d924b7e
try again
GeneDer Oct 3, 2024
650452c
try again
GeneDer Oct 3, 2024
e0aa69c
wait for proxies to be healthy before starting any tests
GeneDer Oct 3, 2024
64d88a7
try again
GeneDer Oct 3, 2024
853662d
Merge branch 'master' into deflak-test-metrics
GeneDer Oct 4, 2024
709a0a9
try again
GeneDer Oct 4, 2024
00a45bf
try again
GeneDer Oct 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .buildkite/windows.rayci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,22 @@ steps:
--workers "$${BUILDKITE_PARALLEL_JOB_COUNT}" --worker-id "$${BUILDKITE_PARALLEL_JOB}" --parallelism-per-worker 3
depends_on: windowsbuild

- label: ":ray-serve: serve: :windows: metrics tests"
tags: serve
job_env: WINDOWS
instance_type: windows
commands:
- bash ci/ray_ci/windows/install_tools.sh
- bazel run //ci/ray_ci:test_in_docker -- //python/ray/serve/... serve
--build-name windowsbuild
--operating-system windows
--only-tags serve_test_metrics
--except-tags no_windows
--test-env=CI="1"
--test-env=RAY_CI_POST_WHEEL_TESTS="1"
--test-env=USERPROFILE
depends_on: windowsbuild

- label: ":ray-serve: serve: :windows: enormous tests"
tags: serve
job_env: WINDOWS
Expand Down
21 changes: 19 additions & 2 deletions python/ray/serve/tests/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -166,12 +166,30 @@ py_test_module_list(
],
)

# Large test for test_metrics.py to run on it's own.
py_test_module_list(
size = "large",
files = [
"test_metrics.py",
],
tags = [
"exclusive",
"team:serve",
"use_all_core_windows",
"serve_test_metrics",
],
deps = [
":common",
":conftest",
"//python/ray/serve:serve_lib",
],
)

# Enormous tests.
py_test_module_list(
size = "enormous",
files = [
"test_deploy_app.py",
"test_metrics.py",
"test_enable_task_events.py",
"test_kv_store.py",
"test_long_poll.py",
Expand Down Expand Up @@ -443,4 +461,3 @@ py_test_module_list(
"//python/ray/serve:serve_lib",
],
)

7 changes: 4 additions & 3 deletions python/ray/serve/tests/test_metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

import grpc
import pytest
import pytest_asyncio
import requests
from fastapi import FastAPI

Expand Down Expand Up @@ -32,7 +33,7 @@
TEST_METRICS_EXPORT_PORT = 9999


@pytest.fixture
@pytest_asyncio.fixture
GeneDer marked this conversation as resolved.
Show resolved Hide resolved
def serve_start_shutdown():
"""Fixture provides a fresh Ray cluster to prevent metrics state sharing."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm shouldn't this be sufficient on its own? The prometheus endpoint is the raylet, so if ray is shut down between runs there should be no state sharing.

What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My expectation is there are something that's not cleaned up in between those tests. And in fact adding those calls seems to helped. Now that thinking through it again maybe just adding some sleep in between will also help the same way and maybe the issue is serve and/or ray wasn't complete shutdown before the next test starts? 🤔 Let me do some more experiments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not add any sleeps -- if we need to wait for anything to clean up, then explicitly wait for the cleanup to happen

sleeps are what make things flaky in the first place

ray.init(
Expand Down Expand Up @@ -333,7 +334,7 @@ def verify_metrics(_expected_metrics, do_assert=False):
verify_metrics,
retry_interval_ms=1000,
timeout=10,
expected_metrics=expected_metrics,
_expected_metrics=expected_metrics,
)
except RuntimeError:
verify_metrics(expected_metrics, True)
Expand Down Expand Up @@ -434,7 +435,7 @@ async def __call__(self, *args):
verify_metrics,
retry_interval_ms=1000,
timeout=10,
expected_metrics=expected_metrics,
_expected_metrics=expected_metrics,
)
except RuntimeError:
verify_metrics(expected_metrics, True)
Expand Down