Skip to content

c8d/system: Fix race between df and prune#51979

Merged
thaJeztah merged 1 commit into
moby:masterfrom
vvoland:c8d-prune-race
Feb 2, 2026
Merged

c8d/system: Fix race between df and prune#51979
thaJeztah merged 1 commit into
moby:masterfrom
vvoland:c8d-prune-race

Conversation

@vvoland

@vvoland vvoland commented Feb 2, 2026

Copy link
Copy Markdown
Contributor

When running docker system df concurrently with docker system prune or image removal, DiskUsage would fail with "snapshot does not exist" error.

This happened because layerDiskUsage walks all snapshots and gets their usage, but a concurrent prune could delete a snapshot between Walk and Usage calls.

Handle NotFound errors gracefully by skipping deleted snapshots instead of returning an error.

- What I did

- How I did it

- How to verify it

- Human readable description for the release notes

Fix `docker system df` failing when run concurrently with `docker system prune`.

- A picture of a cute animal (not mandatory but encouraged)

When running `docker system df` concurrently with `docker system prune`
or image removal, `DiskUsage` would fail with "snapshot does not exist"
error.

This happened because layerDiskUsage walks all snapshots and gets their
usage, but a concurrent prune could delete a snapshot between Walk and
Usage calls.

Handle NotFound errors gracefully by skipping deleted snapshots instead
of returning an error.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
@vvoland vvoland added this to the 29.2.1 milestone Feb 2, 2026
@vvoland vvoland self-assigned this Feb 2, 2026
@github-actions github-actions Bot added area/testing area/daemon Core Engine containerd-integration Issues and PRs related to containerd integration labels Feb 2, 2026
@vvoland vvoland added impact/changelog kind/bugfix PR's that fix bugs and removed area/testing labels Feb 2, 2026
@pjonsson

pjonsson commented Feb 2, 2026

Copy link
Copy Markdown

I don't know how the cache is handled in Docker, but if there's a walk, can the GC specified by:

{
  "builder": {
    "gc": {
      "enabled": true,
      "defaultKeepStorage": "40GB"
    }
  },
}

trigger the same problem for df in a different code path, or is that covered by this PR as well?

@vvoland

vvoland commented Feb 2, 2026

Copy link
Copy Markdown
Contributor Author

The builder GC is handled by Buildkit (https://github.com/moby/buildkit) which handles this differently and doesn't walk all available snapshots and also seems to handle the not found errors gracefully: https://github.com/moby/buildkit/blob/649062d5e7be1785c31e79729a5725c699cf1370/cache/refs.go#L361

So I think it should be fine, but if there are some issues then they need to be handled on buildkit side.

cc @crazy-max @tonistiigi

@thaJeztah thaJeztah left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah

Copy link
Copy Markdown
Member

Failure on Oracle 8 is unrelated, and a known flaky test;

=== Failed
=== FAIL: amd64.docker.docker.integration.networking TestAccessPublishedPortFromHost/userland-proxy=false/IPv6=true (2.24s)
    port_mapping_linux_test.go:413: assertion failed: error is not nil: Get "http://[fdfb:5cbb:29bf::2]:1237": dial tcp [fdfb:5cbb:29bf::2]:1237: connect: connection refused
    --- FAIL: TestAccessPublishedPortFromHost/userland-proxy=false/IPv6=true (2.24s)

=== FAIL: amd64.docker.docker.integration.networking TestAccessPublishedPortFromHost (8.19s)

@thaJeztah thaJeztah merged commit d392ea1 into moby:master Feb 2, 2026
261 of 266 checks passed

@lingotesoropuro-jpg lingotesoropuro-jpg left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gracias

@lingotesoropuro-jpg lingotesoropuro-jpg left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gracias

sahilsGit added a commit to sahilsGit/moby that referenced this pull request May 22, 2026
When `docker system df` runs concurrently with image prune, a blob
can be removed between presentChildrenHandler's store.Info check and
the c8dimages.Children call that reads it, failing the API with
"NotFound: content digest sha256:...: not found".

PR moby#51979 fixed the same race in the snapshotter path; this handles
it in the content-store walk. Treat NotFound from c8dimages.Children
like the store.Info NotFound branch above it: the content is gone, so
it has no children. Other errors still propagate.

Fixes moby#52538

Signed-off-by: Sahil Singh <sahiilsiingh37@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/daemon Core Engine containerd-integration Issues and PRs related to containerd integration impact/changelog kind/bugfix PR's that fix bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Running docker system df fails with "Error response from daemon: failed to calculate image disk usage"

4 participants