Skip to content

Conversation

@aylei
Copy link
Collaborator

@aylei aylei commented Sep 4, 2025

Some of our users have reported the ssh to cluster would suffer delay or disconnecting sometimes. This PR adds more metrics and logs to help clarify the different disconnection reasons:

  • ClientClosed, which is good, note that if the server PING is not receive a PONG in time (20s timeout by default), it is also considered as a ClientDisconnect. The PING/PONG might be delayed by the event loop issue, but I never produce a case that long event loop blocks cause PING/PONG timeout.
  • SSHToPodDisconnected, the ssh connection to pod is disconnected, usually caused by pod issue, e.g. the Pod is oomkilled
  • KubectlPortForwardExit, the kubectl port-forward process under the ssh connection exits, I can reproduce this with very low probability when I start many long-lived ssh connections. The reason why the port-forward process exit is a follow-up and can be prioritized if we can have non-trivial probability of repros in any environment.

The 3 cases are manually tested via killing port-forward process / pod / ssh process manually.

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

Signed-off-by: Aylei <rayingecho@gmail.com>
@aylei
Copy link
Collaborator Author

aylei commented Sep 4, 2025

/smoke-test

Copy link
Collaborator

@SeungjinYang SeungjinYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'd love to get the # of concurrent ssh conns being proxied by the API Server.

Signed-off-by: Aylei <rayingecho@gmail.com>
@aylei
Copy link
Collaborator Author

aylei commented Sep 8, 2025

/smoke-test

@aylei aylei merged commit 44e85cd into master Sep 8, 2025
18 checks passed
@aylei aylei deleted the debug-ssh branch September 8, 2025 15:13
@aylei aylei mentioned this pull request Sep 10, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants