Skip to content

Conversation

@Maknee
Copy link
Collaborator

@Maknee Maknee commented Jul 14, 2025

image

Sometimes for sky status for a remote api server, it checks the network connection by pinging endpoints. It tests two endpoints with 3 retries each for a three second timeout. This means that https://1.1.1.1 is tested for 9 seconds before testing google's endpoint https://8.8.8.8. This PR swaps the order (google's endpoint first) and then the timeout is reduced to 1s instead of 3s. This was tested without any jobs running.

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@Maknee Maknee requested a review from Michaelvll July 14, 2025 22:59
@SeungjinYang
Copy link
Collaborator

If we just need one of these to be live, could we also test it such that

  • test 1.1.1.1
  • test 8.8.8.8
  • test 1.1.1.1 (retry)
  • test 8.8.8.8 (retry)
    ...

instead of

  • test 1.1.1.1
  • test 1.1.1.1 (retry)
    ...
  • test 8.8.8.8
  • test 8.8.8.8 (retry)
    ...

?

@Maknee
Copy link
Collaborator Author

Maknee commented Jul 14, 2025

Will switch to that!

Comment on lines -2613 to +2618
# `sky serve up`. If we have controller's head_ip available and it is ssh-reachable,
# `sky serve up`. If we have controller's head_ip available and it is ssh-reachable,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am so confused on what is happening here. Not blocking approval because it isn't important by any means - I'm just confused.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what happened here

@SeungjinYang
Copy link
Collaborator

If 8.8.8.8 is generally more reliable than 1.1.1.1 we can still keep it as the first tried entry

@Maknee Maknee requested a review from SeungjinYang July 15, 2025 18:51
@SeungjinYang SeungjinYang merged commit c4f0cac into skypilot-org:master Jul 19, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants