Skip to content

Conversation

@zpoint
Copy link
Collaborator

@zpoint zpoint commented Aug 19, 2025

Resolve #6730

Add staggered timing between AWS tests, Kubernetes tests, and remote server tests.

  • Kubernetes tests run on a local Kind cluster.
  • Remote server tests require launching a local Docker image and connecting to it.

Both are resource-heavy on the Buildkite agent VM, increasing failure risk if run alongside other tests.

Before this PR

Fully sequential tests (e.g., waiting for AWS → Kubernetes → remote, totaling 2.5 hours):

|-----aws tests-----|(1 hour)  
                        |-----kubernetes tests-----|(2 hours total, 1 hour runtime)  
                                                                  |-----remote server tests-----|(2.5 hours total, 0.5 hour runtime)  

After this PR:

Tests overlap intelligently to reduce total time and resource contention:

|-------------aws tests-------------|(1 hour)  
    (25m delay)|--------kubernetes tests--------|(1.4 hours total, 1 hour runtime)  
          (35m delay)|--remote server tests----|(1.1 hour total, 0.5 hour runtime)  

Key changes:

  1. Kubernetes tests start after 25m (0.4-hour delay), when most AWS tests finish.
  2. Remote tests start after 35m, further reducing VM load.
  3. Total runtime drops to ~1.4 hours (vs. 2.5 hours sequential) with lower failure rates.

Tested on my personal repo

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@zpoint zpoint requested a review from Michaelvll August 19, 2025 09:00
@zpoint
Copy link
Collaborator Author

zpoint commented Aug 19, 2025

/smoke-test -k test_autodown --kubernetes

@zpoint zpoint requested review from DanielZhangQD and aylei August 19, 2025 09:50
Copy link
Collaborator

@aylei aylei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, LGTM!

@zpoint zpoint merged commit 1c5edbd into skypilot-org:master Aug 20, 2025
17 checks passed
massaindustries pushed a commit to Seeweb/skypilot that referenced this pull request Aug 26, 2025
…runtime (skypilot-org#6732)

* add delay for smoke test trigger

* change needs

* timeout change

* shorter delay
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve stability of nightly build

2 participants