Skip to content

fix: add waitpid safety net after signalfd setup to prevent SIGCHLD race#40229

Merged
benhillis merged 1 commit intomasterfrom
copilot/fix-init-sigchld-race
Apr 20, 2026
Merged

fix: add waitpid safety net after signalfd setup to prevent SIGCHLD race#40229
benhillis merged 1 commit intomasterfrom
copilot/fix-init-sigchld-race

Conversation

@benhillis
Copy link
Copy Markdown
Member

@benhillis benhillis commented Apr 17, 2026

In src/linux/init/init.cpp, there is a race window between signal(SIGCHLD, SIG_DFL) and the signalfd setup where a child exit signal can be delivered and discarded (SIG_DFL for SIGCHLD means 'ignore' but children still become zombies). If the distro init process exits during this window, the signalfd never fires and the parent hangs forever.

The fix adds a non-blocking waitpid safety net immediately after the signalfd is established. This catches any child that exited during the setup window — whether it was auto-reaped under the prior SIG_IGN disposition (ECHILD) or became a zombie under SIG_DFL.

Also fixes a bug in the signalfd poll loop where Result was checked instead of Pid for the distro init exit.

@benhillis benhillis requested a review from a team as a code owner April 17, 2026 15:01
Copilot AI review requested due to automatic review settings April 17, 2026 15:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts SIGCHLD setup ordering in the Linux init event loop to reduce the chance of missing a child-exit notification when using signalfd, preventing hangs when the monitored init process exits.

Changes:

  • Reorders SIGCHLD setup to block the signal before resetting its disposition.
  • Updates comments explaining the intended race avoidance around SIGCHLD delivery.

Comment thread src/linux/init/init.cpp Outdated
Comment on lines 2412 to 2416
// Block SIGCHLD before resetting the handler to avoid a race where
// the child exits between signal(SIG_DFL) and sigprocmask(SIG_BLOCK).
sigset_t SignalMask;
sigemptyset(&SignalMask);
sigaddset(&SignalMask, SIGCHLD);
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With init’s startup behavior of ignoring signals (UtilSetSignalHandlers(..., true) sets SIGCHLD to SIG_IGN), blocking SIGCHLD before switching to SIG_DFL introduces a small window where the watched child can exit while SIGCHLD is still ignored. In that case the kernel can auto-reap the child and not generate a pending SIGCHLD for signalfd, and this loop will never wake to notice the exit. Consider either restoring SIGCHLD disposition to SIG_DFL before blocking, and/or adding an immediate non-blocking waitpid(distroInitPid, ..., WNOHANG) check after signalfd setup to handle the “already exited” case regardless of timing.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will consider this

When setting up signalfd to watch distroInitPid, there is a window
where the child can exit before the signal infrastructure is ready:
either auto-reaped under SIG_IGN before we reach this code, or
exiting between signal(SIG_DFL) and sigprocmask(SIG_BLOCK) where
the SIGCHLD is discarded.

Add a non-blocking waitpid check after signalfd setup to catch both
cases, preventing an unrecoverable hang if the distro init exits
during startup.

Also fixes the existing loop to check waitpid return (Pid) instead
of poll return (Result) when handling SIGCHLD.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@benhillis benhillis force-pushed the copilot/fix-init-sigchld-race branch from dbbc4dd to dc5b8cf Compare April 20, 2026 17:12
@benhillis benhillis changed the title fix: block SIGCHLD before resetting handler to prevent race fix: add waitpid safety net after signalfd setup to prevent SIGCHLD race Apr 20, 2026
@benhillis benhillis merged commit 8e5b4a9 into master Apr 20, 2026
9 checks passed
@benhillis benhillis deleted the copilot/fix-init-sigchld-race branch April 23, 2026 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants