Skip to content

multi-runner: re-try scale-up operation for runner-type if it fails due to insufficient IPv4 addresses in subnet #4105

@cisco-sbg-mgiassa-ai

Description

@cisco-sbg-mgiassa-ai

Good day,

I have a multi-tenanted CICD system that uses multi-runner to handle runner management, standby/warm-up pools, etc. etc. (and it works very well, by the way 😄 ). I'm also using fairly up-to-date code (i.e. v5.15.2 of this project), along with up-to-date GHA actions/runner agent/tools.

I have a set of runners that use a shared/multi-team subnet in AWS. There are occasions where a tenant over-commits runners, and exhausts the IPv4 address space supplied by the subnet. For most operations, GHA queues-up/serializes jobs nicely. For example, if some runner-type has an upper limit of 30 instances, and 60 jobs are queued up, all of the jobs eventually run to successful completion. One semi-related "corner case" where this doesn't happen, however, is if a runner fails to start due to insufficient space in the subnet being used to launch the runner (i.e. "insufficient IP space available").

Is there some mechanism/feature-flag/etc. that exists (or that could reasonably be implemented) so that some sort of "re-try this scale-up operation with some back-off timer to prevent API spam/overload" could be provided? It would be preferable to have the job eventually get queued-up, even if it means waiting for a progressively lengthier duration, versus having a job stuck in the "wait for a runner" state for 18 hours (as a specific example). In this contrived example, case, it'd be desirable if the job eventually were queued up (say, during low-usage periods overnight when there's ample capacity) instead of requiring user interaction (or a CI bot to auto-cancel "stuck" jobs).

Besides this quirk, this is an awesome project/tool that has been extremely helpful/useful. Cheers!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions