-
Notifications
You must be signed in to change notification settings - Fork 918
[Volume] Support mounting ephemeral volumes for clusters on k8s #7971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/smoke-test --kubernetes -k test_volumes_on_kubernetes |
|
/smoke-test --kubernetes -k test_volumes_on_kubernetes --remote-server |
|
Test cases:
|
|
/quicktest-core --base-branch master |
| if ephemeral_volumes: | ||
| for ephemeral_volume in ephemeral_volumes: | ||
| # Update the volumes and volume mounts in the pod spec | ||
| if 'volumes' not in pod_spec['spec']: | ||
| pod_spec['spec']['volumes'] = [] | ||
| pod_spec['spec']['volumes'].append({ | ||
| 'name': ephemeral_volume.name, | ||
| 'persistentVolumeClaim': { | ||
| 'claimName': ephemeral_volume.volume_name_on_cloud, | ||
| }, | ||
| }) | ||
| if 'volumeMounts' not in pod_spec['spec']['containers'][0]: | ||
| pod_spec['spec']['containers'][0]['volumeMounts'] = [] | ||
| pod_spec['spec']['containers'][0]['volumeMounts'].append({ | ||
| 'name': ephemeral_volume.name, | ||
| 'mountPath': ephemeral_volume.path, | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have the same logic to mutate the pod spec as normal volumes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the case for other clouds, e.g. runpod, the volume ID is required to mount, which is only available after creation, if we move the k8s handling to the path before provisioning, we cannot handle the ephemeral volume config resolving in one place for different clouds (now it's all in the provisioner/volume.py), so I prefer to keep the current implementation, WDYT?
sky/provision/provisioner.py
Outdated
| provision_volume.delete_ephemeral_volumes(cluster_name.name_on_cloud, | ||
| provider_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After an ephemeral volume get created, can the underlying PVC be used by a normal volume via use_existing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Created issue #8029 to trace this.
| def _resolve_pvc_volume_config(cloud: clouds.Cloud, | ||
| config: provision_common.ProvisionConfig, | ||
| volume_config: Dict[str, Any]) -> Dict[str, Any]: | ||
| provider_config = config.provider_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused why we do we need these new handlings for ephemeral volume. I mean, it looks like a normal volume which has the same lifecycle with the referencing cluster, so why not use the same volume handling logic of normal volume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is that for normal volumes, the volume config and volume info are already available during cluster launching, while it's not the case for ephemeral volumes, e.g. the infra info is available after optimizing, while some volume info, like volume id in runpod, are only available after volume creation.
| 'volume_name': { | ||
| 'type': 'string', | ||
| }, | ||
| 'is_ephemeral': { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm thinking of naming it as lifecycle or something similar, which can be a enum that can be easily extended, e.g. will we need a volume that has same lifecycle as a Pool in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This concept should also apply to Pool if we support it later, it means lifecycle bond to the Pool or not.
sky/provision/aws/instance.py
Outdated
| cluster_name: str, | ||
| cluster_name_on_cloud: str, | ||
| config: common.ProvisionConfig, | ||
| ephemeral_volumes: Optional[List['volume_utils.VolumeInfo']] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we put ephemeral_volumes to ProvisionConfig?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
/quicktest-core --base-branch master |
|
/smoke-test --kubernetes -k test_volumes_on_kubernetes --remote-server |
aylei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @DanielZhangQD !
Tested (run the relevant ones):
bash format.sh/smoke-test(CI) orpytest tests/test_smoke.py(local)/smoke-test -k test_name(CI) orpytest tests/test_smoke.py::test_name(local)/quicktest-core(CI) orpytest tests/smoke_tests/test_backward_compat.py(local)