-
Notifications
You must be signed in to change notification settings - Fork 921
[SDK] reset transient failure count if function made progress on latest retry #6808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
785f632 to
d3786dd
Compare
sky/server/rest.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change isn't necessary as part of this PR but I think it's a nice improvement where we try immediately on the first instance of transient error to improve responsiveness of the code.
|
/quicktest-core |
25c0159 to
96b1e2a
Compare
|
/quicktest-core -k job |
|
test_managed_jobs bw compatibility test is failing, unsure why |
aylei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @SeungjinYang !
|
test_managed_jobs bw compatibility test is also failing on master and seems unrelated to this issue. merging |
On transient retries - if we determine the retry to have succeeded, then reset the retry count. This allows the function decorator to deal with any number of transient errors that the function can recover from.
To determine if a retry has succeeded, we use the
line_processedfield as a proxy for progress. The idea is that if the function, on retry, were able to process more lines, the function must have overcome whatever transient error was thrown before.This PR has no effect on functions that don't interact with
line_processedfield, preserving the current behavior where requests are retriedmax_retriestimes.Tested (run the relevant ones):
bash format.sh/smoke-test(CI) orpytest tests/test_smoke.py(local)/smoke-test -k test_name(CI) orpytest tests/test_smoke.py::test_name(local)/quicktest-core(CI) orpytest tests/smoke_tests/test_backward_compat.py(local)