Skip to content

Conversation

@kevinmingtarja
Copy link
Collaborator

@kevinmingtarja kevinmingtarja commented Aug 15, 2025

Partially fixes https://buildkite.com/skypilot-1/smoke-tests/builds/2430/steps/canvas?jid=0198a7f9-1fbf-49c2-874f-6614e854648d, together with #6697.

During status refresh (which also gets called during sky down), we may call backend.is_definitely_autostopping(). At this point, the cluster might already be terminating/terminated, causing this call to fail. After moving the autostop to gRPC in #6574, we missed adding a try catch block to the gRPC call, so instead of returning False, we bubble up the gRPC timeout exception.

This PR fixes it by returning False from this function when encountering an exception, which is the same behaviour in the legacy SSH execution, returning False if exit code != 0.

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@kevinmingtarja kevinmingtarja changed the title handle case where the cluster is already terminated during is_definitely_autostopping() Handle case where cluster is already terminated in is_definitely_autostopping() Aug 15, 2025
@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_autodown
/smoke-test -k test_autodown --kubernetes
/smoke-test -k test_autostop
/smoke-test -k test_autostop --gcp

@kevinmingtarja
Copy link
Collaborator Author

/quicktest-core

@kevinmingtarja kevinmingtarja enabled auto-merge (squash) August 15, 2025 08:29
@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_autodown --gcp

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_autodown --kubernetes

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_autodown
/smoke-test -k test_autodown --kubernetes
/smoke-test -k test_autostop
/smoke-test -k test_autostop --gcp

@kevinmingtarja kevinmingtarja changed the title Handle case where cluster is already terminated in is_definitely_autostopping() Handle case where cluster is already terminated during status refresh Aug 15, 2025
@kevinmingtarja kevinmingtarja merged commit 64896fc into master Aug 15, 2025
19 checks passed
@kevinmingtarja kevinmingtarja deleted the fix-kubernetes-autostop branch August 15, 2025 21:56
massaindustries pushed a commit to Seeweb/skypilot that referenced this pull request Aug 26, 2025
…skypilot-org#6693)

* handle case where the cluster is already terminated during is_definitely_autostopping()

* handle in one more place
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants