-
Notifications
You must be signed in to change notification settings - Fork 922
[Core][UX] One-click user-space Ray cluster #7935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/smoke-test |
|
/smoke-test -k nemo |
|
/smoke-test -k nemo |
|
/smoke-test -k nemo |
|
/smoke-test -k nemo |
sky_templates/ray/start_cluster.sh
Outdated
| # Environment Variables: | ||
| # RAY_HEAD_PORT=6379 - Ray head node port | ||
| # RAY_DASHBOARD_PORT=8265 - Ray dashboard port | ||
| # RAY_DASHBOARD_HOST=127.0.0.1 - Dashboard host (set to 0.0.0.0 to expose externally) | ||
| # RAY_DASHBOARD_AGENT_LISTEN_PORT= - (Optional) Dashboard agent listen port | ||
| # RAY_NODE_IP_ADDRESS= - (Optional) Node IP address | ||
| # RAY_CMD_PREFIX= - (Optional) Command prefix (e.g., "uv run") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the minimal set of params needed to cover all our examples that uses Ray.
Michaelvll
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kevinmingtarja! It looks mostly good to me. Please see the comments below.
Also, we should modify this ray example page to include the new template: https://docs.skypilot.co/en/latest/examples/training/ray.html
|
/build-docs |
|
✅ ReadTheDocs build triggered for branch The documentation will be available at: https://docs.skypilot.co/en/simple-ray/ |
|
/build-docs |
|
✅ ReadTheDocs build triggered for branch The documentation will be available at: https://docs.skypilot.co/en/simple-ray/ |
|
Docs preview:
Added references on the start and stop templates script. Listed down the environment variables for passing arguments to the script. |
|
/build-docs |
|
✅ ReadTheDocs build triggered for branch The documentation will be available at: https://docs.skypilot.co/en/simple-ray/ |
|
Note: I renamed |
|
/smoke-test -k nemo |
Michaelvll
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kevinmingtarja ! This is awesome! It looks quite good to me
|
/smoke-test -k nemo |
|
/smoke-test -k ray_basic |
This PR implements the first of many SkyPilot templates, a way to simplify the user experience of writing SkyPilot YAMLs. This first PR implements
~/sky_templates/ray/start_cluster.sh, a script to easily spin up your own Ray cluster, for running your own Ray programs, or when running multi-node vLLM inference, or RL with SkyRL, verl, etc.To use it, simply call it from your
runstep:Docs preview:
https://docs.skypilot.co/en/simple-ray/examples/training/ray.html
https://docs.skypilot.co/en/simple-ray/running-jobs/distributed-jobs.html#executing-a-distributed-ray-program
Tested (run the relevant ones):
bash format.shtest_ray_basic/smoke-test(CI) orpytest tests/test_smoke.py(local)/smoke-test -k test_name(CI) orpytest tests/test_smoke.py::test_name(local)/quicktest-core(CI) orpytest tests/smoke_tests/test_backward_compat.py(local)