Make sure catalogs get updated periodically on the executor processes #6272

kevinmingtarja · 2025-07-15T21:04:37Z

When a new catalog file is fetched, there might be a difference for the catalogs among different executor processes, despite there being a single source of truth, on disk at ~/.sky/catalogs/*.

TL;DR this is because read_catalog returns a LazyDataFrame, which gets assigned to a global variable, for each unique (cloud, catalog) pair. And LazyDataFrame caches the DataFrame in memory for the duration of the process' lifetime, because it only calls update_func if self._df is None, which is only when it's first called.

This PR changes the behaviour of LazyDataFrame, such that now, update_if_stale_func returns a bool, that indicates if an update was done or not, and this is called on every read (_load_df), to ensure that a long running executor process is aware if it is time to refresh the catalog, and save the new one to disk.

To account for the performance penalty of calling the update if stale function for every read, we add two optimizations:

In _update_catalog of read_catalog, we add a fast path by calling _need_update before trying to acquire the file lock, to prevent lock contentions. We still need to do it after acquiring the lock, to ensure only one process gets to write to disk, so that part is left as is
Add @annotations.lru_cache(scope='request') to _load_df in LazyDataFrame. A single request could read a catalog multiple times, so this helps alleviate unnecessary calls to update_if_stale_func

I added a unit test in test_catalog.py

Tested (run the relevant ones):

Code formatting: install pre-commit (auto-check on commit) or bash format.sh
Any manual or new tests for this PR (please specify below)
All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

rohansonecha

This looks great @kevinmingtarja! Left one small comment.

sky/catalog/common.py

kevinmingtarja · 2025-07-16T18:04:32Z

Thanks for the review @rohansonecha !

kevinmingtarja added 4 commits July 15, 2025 13:32

make sure catalogs get updated periodically on the executor processes

17a5106

add docstring

b6b409c

update_func -> update_if_stale_func

4923dc3

fix typo in self._update_if_stale_func

563f171

kevinmingtarja changed the title ~~make sure catalogs get updated periodically on the executor processes~~ Make sure catalogs get updated periodically on the executor processes Jul 15, 2025

kevinmingtarja requested a review from SeungjinYang July 15, 2025 21:11

rohansonecha approved these changes Jul 16, 2025

View reviewed changes

sky/catalog/common.py Show resolved Hide resolved

kevinmingtarja merged commit d2c6ec4 into master Jul 16, 2025
15 checks passed

kevinmingtarja deleted the fix-executor-stale-catalogs branch July 16, 2025 18:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make sure catalogs get updated periodically on the executor processes #6272

Make sure catalogs get updated periodically on the executor processes #6272

Uh oh!

kevinmingtarja commented Jul 15, 2025 •

edited

Loading

Uh oh!

rohansonecha left a comment

Uh oh!

Uh oh!

kevinmingtarja commented Jul 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Make sure catalogs get updated periodically on the executor processes #6272

Make sure catalogs get updated periodically on the executor processes #6272

Uh oh!

Conversation

kevinmingtarja commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rohansonecha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kevinmingtarja commented Jul 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kevinmingtarja commented Jul 15, 2025 •

edited

Loading