fix: add waitpid safety net after signalfd setup to prevent SIGCHLD race#40229
fix: add waitpid safety net after signalfd setup to prevent SIGCHLD race#40229
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adjusts SIGCHLD setup ordering in the Linux init event loop to reduce the chance of missing a child-exit notification when using signalfd, preventing hangs when the monitored init process exits.
Changes:
- Reorders SIGCHLD setup to block the signal before resetting its disposition.
- Updates comments explaining the intended race avoidance around SIGCHLD delivery.
| // Block SIGCHLD before resetting the handler to avoid a race where | ||
| // the child exits between signal(SIG_DFL) and sigprocmask(SIG_BLOCK). | ||
| sigset_t SignalMask; | ||
| sigemptyset(&SignalMask); | ||
| sigaddset(&SignalMask, SIGCHLD); |
There was a problem hiding this comment.
With init’s startup behavior of ignoring signals (UtilSetSignalHandlers(..., true) sets SIGCHLD to SIG_IGN), blocking SIGCHLD before switching to SIG_DFL introduces a small window where the watched child can exit while SIGCHLD is still ignored. In that case the kernel can auto-reap the child and not generate a pending SIGCHLD for signalfd, and this loop will never wake to notice the exit. Consider either restoring SIGCHLD disposition to SIG_DFL before blocking, and/or adding an immediate non-blocking waitpid(distroInitPid, ..., WNOHANG) check after signalfd setup to handle the “already exited” case regardless of timing.
When setting up signalfd to watch distroInitPid, there is a window where the child can exit before the signal infrastructure is ready: either auto-reaped under SIG_IGN before we reach this code, or exiting between signal(SIG_DFL) and sigprocmask(SIG_BLOCK) where the SIGCHLD is discarded. Add a non-blocking waitpid check after signalfd setup to catch both cases, preventing an unrecoverable hang if the distro init exits during startup. Also fixes the existing loop to check waitpid return (Pid) instead of poll return (Result) when handling SIGCHLD. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
dbbc4dd to
dc5b8cf
Compare
In
src/linux/init/init.cpp, there is a race window betweensignal(SIGCHLD, SIG_DFL)and the signalfd setup where a child exit signal can be delivered and discarded (SIG_DFL for SIGCHLD means 'ignore' but children still become zombies). If the distro init process exits during this window, the signalfd never fires and the parent hangs forever.The fix adds a non-blocking
waitpidsafety net immediately after the signalfd is established. This catches any child that exited during the setup window — whether it was auto-reaped under the priorSIG_IGNdisposition (ECHILD) or became a zombie underSIG_DFL.Also fixes a bug in the signalfd poll loop where
Resultwas checked instead ofPidfor the distro init exit.