-
Notifications
You must be signed in to change notification settings - Fork 921
Avoid inserting duplicate events #6620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sky/execution.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's debatable whether this should be deleted, I'm open to keeping it.
Though if we're keeping it we should move it to after job_id definition and include the job id in the event row
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with removing for now. The user will know if their job starts running anyway.
sky/global_user_state.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to add the event_type in the call from get_cluster_from_name() and get_clusters_from_history() after rebase on master.
|
Do you want terminate entries for terminated clusters in same PR? If so 02dd13e#diff-db4566a60e392e4076a1a2bc4eaa56e57c4c2a060b23ff98e8b7b71e3e3d35f1L750 |
c6d0e87 to
eb95a20
Compare
|
/quicktest-core |
lloyd-brown
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
For
All nodes up + ray cluster healthy.message andtransitioning to INITmessage, do not add another row into the db if the last event for the cluster already has the same message.The PR also proposes deleting
Starting job executionmostly because this line has little to do with the state of the cluster.Tested (run the relevant ones):
bash format.sh/smoke-test(CI) orpytest tests/test_smoke.py(local)/smoke-test -k test_name(CI) orpytest tests/test_smoke.py::test_name(local)/quicktest-core(CI) orpytest tests/smoke_tests/test_backward_compat.py(local)