Skip to content

Conversation

@Maknee
Copy link
Collaborator

@Maknee Maknee commented Jul 3, 2025

Builds on top of speedup-api branch

Here are the results

hyperfine --warmup 3 --runs 10
 'sky status'
Benchmark 1: sky status
  Time (mean ± σ):      5.315 s ±  0.059 s    [User: 0.488 s, System: 0.113 s]
  Range (min … max):    5.247 s …  5.408 s    10 runs

After

hyperfine --warmup 3 --runs 10 'sky status'
Benchmark 1: sky status
  Time (mean ± σ):      2.930 s ±  0.031 s    [User: 0.424 s, System: 0.088 s]
  Range (min … max):    2.866 s …  2.963 s    10 runs

Here's the profile of each:

Before:
image

After:
image

If you take a look closely with the number of check.. calls under status, there's one for every request (blocking). These changes merge it into one call and the main perf improvement is keeping get_api_server_status cached,

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@Maknee Maknee requested a review from Michaelvll July 3, 2025 22:18
@Maknee Maknee changed the title Speedup api updates [CLI] Speed up sky status by caching Jul 3, 2025
Comment on lines 1822 to 1838
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
ssh_future = executor.submit(update_ssh_configs, cluster_records)

# Process cluster records while SSH configs are being updated
hints = []
normal_clusters = []
controllers = []
for cluster_record in cluster_records:
cluster_name = cluster_record['name']
controller = controller_utils.Controllers.from_name(cluster_name)
if controller is not None:
controllers.append(cluster_record)
else:
normal_clusters.append(cluster_record)

# Wait for SSH config updates to complete
ssh_future.result()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to double check if this can provide any speedup.

Copy link
Collaborator Author

@Maknee Maknee Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Master
hyperfine --warmup 3 --runs 10 'sky status'
Benchmark 1: sky status
  Time (mean ± σ):      2.382 s ±  0.057 s    [User: 0.445 s, System: 0.087 s]
  Range (min … max):    2.269 s …  2.446 s    10 runs



With health check caching
hyperfine --warmup 3 --runs 10 'sky status'
Benchmark 1: sky status
  Time (mean ± σ):      2.043 s ±  0.062 s    [User: 0.447 s, System: 0.086 s]
  Range (min … max):    1.954 s …  2.186 s    10 runs




With health check caching + grouping request
hyperfine --warmup 3 --runs 10 'sky status'
Benchmark 1: sky status
  Time (mean ± σ):      1.711 s ±  0.092 s    [User: 0.425 s, System: 0.083 s]
  Range (min … max):    1.641 s …  1.951 s    10 runs

It does speedup

@Michaelvll Michaelvll requested a review from cg505 July 10, 2025 01:08
return ApiServerInfo(status=ApiServerStatus.UNHEALTHY)


@cachetools.cached(cache=cachetools.TTLCache(maxsize=10,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a lock= kwarg to the cachetools.cached call for thread safety?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locked

f'View logs at: {constants.API_SERVER_LOGS}')
try:
# Clear the cache to ensure fresh checks during startup
get_api_server_status.cache_clear() # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing 😔 but I see there is no way around this

@Michaelvll Michaelvll requested a review from cg505 July 24, 2025 23:41
@Michaelvll
Copy link
Collaborator

should we get this merged?

@Maknee Maknee merged commit e5132a7 into skypilot-org:master Jul 24, 2025
15 checks passed
@Maknee Maknee deleted the speedup-api-updates branch July 24, 2025 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants