Skip to content

Conversation

@aylei
Copy link
Collaborator

@aylei aylei commented Oct 20, 2025

close #7638

Tested (run the relevant ones):

  • Unit test
  • Manual test as the repro step in [Core][API Server] kubernetes client leaked remnant semaphore files #7638, the semaphore no longer leaks.
  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

aylei added 3 commits October 20, 2025 17:42
Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
Signed-off-by: Aylei <rayingecho@gmail.com>
@aylei
Copy link
Collaborator Author

aylei commented Oct 20, 2025

/quicktest-core
/smoke-test --kubernetes

@aylei aylei requested review from cblmemo and cg505 October 20, 2025 15:47
Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for identifying and fixing this @aylei ! LGTM.

real_client = self._client
elif isinstance(self._client, kubernetes.watch.Watch):
# For watcher, stop the watch before cleanup.
kubernetes.watch.Watch().stop()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this on the self._client? It's surprising to me that this works when we call it on a newly instantiated Watch object. Why is that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after looking at the kubernetes client code, I'm pretty sure this line does nothing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may need to do something to make sure that the Watch.stream iterator cleans up its connection, but I guess that doing that here doesn't make much sense. The way .stop() works, it seems like it will not actually take effect until we call next() on the iterator and it actually terminates (StopIteration)

Comment on lines 214 to 215
if isinstance(obj, ClientWrapper):
return obj
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this happen? I feel like we should either ignore this case or throw an exception.

"""Delegate to the underlying client"""
return getattr(self._client, name)

def __del__(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to try and find a way to avoid relying on __del__ for this - I think it's better than the current state, but if we can manage the lifecycle more explicitly that seems good. E.g. some alternative to the lru_cache that will call close or __exit__

@cg505
Copy link
Collaborator

cg505 commented Oct 20, 2025

I pushed some changes to the PR. @cblmemo could you take another look?

@cg505 cg505 requested a review from cblmemo October 20, 2025 20:52
@cg505
Copy link
Collaborator

cg505 commented Oct 20, 2025

/quicktest-core
/smoke-test --kubernetes

@aylei aylei merged commit ab4d2cd into master Oct 21, 2025
22 checks passed
@aylei aylei deleted the fixed_semaphore_leak branch October 21, 2025 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Core][API Server] kubernetes client leaked remnant semaphore files

4 participants