Simplify gateway programming#50321
Conversation
cd8fab4 to
2276936
Compare
| return fmt.Errorf("failed to update bridge endpoint %.7s to store: %v", endpoint.id, err) | ||
| undo := func() []portBinding { | ||
| pbReq := make([]types.PortBinding, 0, len(ep.portMapping)) | ||
| for _, pb := range ep.portMapping { |
There was a problem hiding this comment.
The outer scope doesn't capture ep, nor the original ep.portMapping. This is currently okay because ep.portMapping isn't overwritten until addPortMappings succeeds, but I think it's not reasonable to make trimPortBindings depend on the exact implementation of its caller.
Could you capture ep.portMapping at the start of trimPortBindings instead?
There was a problem hiding this comment.
Eek, yes - it should have been restoring toDrop, not the whole set of original mappings ... so I've made that change.
| func epShortId(ep *Endpoint) string { | ||
| return stringid.TruncateID(epId(ep)) | ||
| } | ||
|
|
There was a problem hiding this comment.
Can you make this an *Endpoint method?
(And it'd be great to suppress epId() but I'm not sure it's 100% safe until we fix data persistence issues.)
There was a problem hiding this comment.
I did it this way because it (and epId()) are there to cope with nil Endpoints, by just returning no id. Calling a method on a nil pointer can be a bit nerve wracking for the casual reader! So, could change it and handle the nil receiver ... but I think it's probably better not to?
| } | ||
| }() | ||
| } | ||
| if !n.internal { |
There was a problem hiding this comment.
So now it's driver's responsibility to not map ports for an internal network? But I don't see any n.internal check in the bridge driver (neither existing or added by this PR) 🤔 What's preventing an internal bridge network from having port bindings set?
There was a problem hiding this comment.
An endpoint in an internal network can't be selected as a gateway, so the bridge driver won't set up port bindings. (And it probably should be up to the driver to decide that.)
| // Ids of the endpoints currently acting as the container's default gateway for | ||
| // IPv4 and IPv6 are passed as gw4Id/gw6Id. (Those endpoints may be managed by | ||
| // different network drivers. If there is no gateway, the id will be the | ||
| // empty string.) | ||
| // | ||
| // This method is called after Driver.Join, and when eid is or was equal to | ||
| // gw4Id or gw6Id, and there's a change. |
There was a problem hiding this comment.
It's called after Driver.Join, and during Driver.Leave.
Maybe it'd be helpful to indicate that a sandbox with no gateway is a transient state that appears when the current gateway endpoint leaves the sandbox — and empty gwId can only occur in that case. WDYT?
There was a problem hiding this comment.
It's called after Driver.Join, and during Driver.Leave.
Before Driver.Leave (rather than during) ... but yes, updated.
Maybe it'd be helpful to indicate that a sandbox with no gateway is a transient state that appears when the current gateway endpoint leaves the sandbox — and empty gwId can only occur in that case. WDYT?
It's not a transient state for a container that's only connected to an internal network (or when there's no remaining network) ... and, just to note, this PR doesn't change that - before, there wouldn't have been a gateway between calls to RevokeExternalConnnectivity for the leaving endpoint and ProgramExternalConnectivity for the new one. So, I've added this - hopefully it captures the main point? ...
// When an endpoint acting as a gateway is deleted, this function is called
// with the leaving endpoint's id, and empty gateway ids (even if another
// endpoint is present and will shortly be selected as the gateway).
| log.G(ctx).WithError(err).Warn("driver failed revoking external connectivity on endpoint") | ||
| } | ||
| if ecd, ok := d.(driverapi.ExtConner); ok { | ||
| if err := ecd.ProgramExternalConnectivity(context.WithoutCancel(ctx), n.ID(), ep.ID(), nil, "", ""); err != nil { |
There was a problem hiding this comment.
I think it'd make more sense to always pass the sandbox labels.
There was a problem hiding this comment.
As discussed - the nil param disappears anyway in the next commit in this PR, because labels are only passed to Join now, not ProgramExternalConnectivity (the labels can't change after Join).
d7412ca to
7be5d13
Compare
corhere
left a comment
There was a problem hiding this comment.
I'm a little confused about the exact semantics of ProgramExternalConnectivity. Isn't the default gateway a property of the sandbox -- the iface of the next-hop for outgoing packets in the netns which do not match a more specific route? Does that mean that the (network id, endpoint id) tuple being passed into ProgramExternalConnectivity is semantically an indirect way of referencing the sandbox the endpoint is joined to, and that libnetwork is expected to call ProgramExternalConnectivity on all endpoints attached to a sandbox when the default gateway changes?
It would make more intuitive sense to me (if my assumptions above are correct) to have an API shaped more like
interface {
HandleGatewayChange(ctx context.Context, sboxKey, gw4Id, gw6Id string) error
}which libnetwork calls once per driver when the sandbox's gateway changes. Since the drivers all know which sandbox each endpoint is joined to they would have no problem iterating through all of their endpoints that are joined to some sandbox. And it makes the drivers more loosely coupled to the implementation details of libnetwork by being descriptive rather than prescriptive (this event happened btw, do whatever is necessary vs. do this specific thing because the framework said so.) It looks rather suspiciously like ProgramExternalConnectivity is only called for the gateway endpoints, simply because that's what the bridge driver required.
| // When an endpoint acting as a gateway is deleted, this function is called | ||
| // with that endpoint's id, and empty gateway ids (even if another endpoint | ||
| // is present and will shortly be selected as the gateway). |
There was a problem hiding this comment.
The fact that the gateway IDs may be for endpoints belonging to other drivers makes this statement ambiguous. What happens if the endpoint for gw4Id, belonging to another driver, is deleted? I would intuitively expect libnetwork to call ProgramExternalConnectivity(ctx, nid, eid, "", "") but the interface contract reads like libnetwork would be expected to pass gw4Id as the eid parameter!
How helpful are these semantics in the case where the endpoint being deleted is the gateway: eid == gw4Id || eid == gw6Id? DeleteEndpoint(nid, eid) will be invoked in the driver regardless. What are drivers expected to do if DeleteEndpoint is called on an endpoint that is the default gateway for a network of the driver's type?
Are atomic swaps of the gateways permitted? E.g.
ProgramExternalConnectivity(ctx, nid, eid, "gateway1", "")
ProgramExternalConnectivity(ctx, nid, eid, "gateway2", "")There was a problem hiding this comment.
The fact that the gateway IDs may be for endpoints belonging to other drivers makes this statement ambiguous. What happens if the endpoint for
gw4Id, belonging to another driver, is deleted? I would intuitively expect libnetwork to callProgramExternalConnectivity(ctx, nid, eid, "", "")but the interface contract reads like libnetwork would be expected to passgw4Idas theeidparameter!
Added "in eid" to "with that endpoint's id".
How helpful are these semantics in the case where the endpoint being deleted is the gateway:
eid == gw4Id || eid == gw6Id?DeleteEndpoint(nid, eid)will be invoked in the driver regardless. What are drivers expected to do ifDeleteEndpointis called on an endpoint that is the default gateway for a network of the driver's type?
RevokeExternalConnectivity would have been called for the leaving endpoint before, now it's ProgramExternalConnectivity (with no gateway ids), there's no other change.
A follow-up could take it further and get rid of that call by making sure the driver cleans up port mappings during DeleteEndpoint.
Are atomic swaps of the gateways permitted? E.g.
ProgramExternalConnectivity(ctx, nid, eid, "gateway1", "") ProgramExternalConnectivity(ctx, nid, eid, "gateway2", "")
Not currently, without a call to revoke the gateway-ness of endpoint "gateway1" - just as a call RevokeExternalConnectivity would have been needed before this change. (Its godoc comment says "This method is called after Driver.Join, before Driver.Leave, and when eid is or was equal to gw4Id or gw6Id, and there's a change.")
Again, a follow-up could take this change further, continuing with cleanup of driverapi.
| return err | ||
| // trimPortBindings compares pbmReq with the current port bindings, and removes | ||
| // port bindings that are no longer required. If anything goes wrong, an error will | ||
| // be logged, but there's no error return - try to continue, because failure to |
There was a problem hiding this comment.
But there is an error return?
There was a problem hiding this comment.
The comment was wrong (old), I've removed it.
(releasePortBindings should probably roll-back its changes on error, but that's not changed in this PR.)
There was a problem hiding this comment.
It doesn't look like your edit to the comment has made it into your PR branch yet.
There was a problem hiding this comment.
Doh! Thank you - it's in now.
Oh, I misread - you'd still pass in the endpoint ids, but you'd just replace endpoint/network ids with sandbox id. (I don't think it would make any difference, except maybe the driver having to track the relationship between sandbox/network/endpoint slightly differently) ... (The driver could be told sandbox id and network instead of endpoint id, that wouldn't give it any more or less information. Or, rather than give Endpoints unique ids, they could be identified by sandbox id plus network id - I've not looked, but I think that would be quite a wide ranging change with little benefit.)
You're suggesting taking this change slightly further - it already does a lot of that decoupling ... libnetwork no longer needs to care what the driver does when the gateway-ness of one of its Endpoints changes, it just tells the driver when there's a change in its affected Endpoints. Having the driver work out which of its endpoints are affected (by pushing down the gateway-ness tracking into each of the drivers) probably would be an improvement. But, there's no pressing need ... with this change, the layering is already better, and this PR along with #50226 achieve what they set out to achieve - which is enabling the change to make routed-mode endpoints open, even when the endpoint is not selected as a gateway (these PRs are split out of #50140, which was stuck in the review queue for a month or so). So, let's leave that for a follow-up. |
7be5d13 to
f73b550
Compare
corhere
left a comment
There was a problem hiding this comment.
Thank you for confirming my understanding! I wholeheartedly agree with leaving further improvements to follow-ups.
| return err | ||
| // trimPortBindings compares pbmReq with the current port bindings, and removes | ||
| // port bindings that are no longer required. If anything goes wrong, an error will | ||
| // be logged, but there's no error return - try to continue, because failure to |
There was a problem hiding this comment.
It doesn't look like your edit to the comment has made it into your PR branch yet.
Change the semantics of ProgramExternalConnectivity, libnet can now call it whenever an endpoint is selected or deselected as a container's gateway endpoint. It's the driver's responsibility to remember what bindings it's set up, and to work out what needs to change. So, calling ProgramExternalConnectivity to tell the driver an endpoint is no longer a gateway has the same effect as RevokeExternalConnectivity - bindings need to be removed. That means libnet no longer needs to work out whether to Program/Revoke, it can just call ProgramExternalConnectivity. RevokeExternalConnectivity has been removed. Signed-off-by: Rob Murray <rob.murray@docker.com>
The same sandbox options are passed to Join. Signed-off-by: Rob Murray <rob.murray@docker.com>
Legacy links were set up by ProgramExternalConnectivity, but removed by Leave (rather than RevokeExternalConnectivity). The options needed by legacy links are all available in Join, and Join will only be called once per Endpoint. So, create legacy links there. Signed-off-by: Rob Murray <rob.murray@docker.com>
f73b550 to
30b9480
Compare
|
|
- What I did
For a long time, libnetwork's sbLeave/sbJoin have had logic to work out whether a container's default gateway changed because of an endpoint joining/leaving ... when the gateway changed, libnet notified its network drivers via calls to ProgramExternalConnectivity/RevokeExternalConnectivity, so the driver could add or remove host port mappings - revoking from old gateway endpoints, then programming new ones.
In that original code, only one endpoint could be the container's gateway - if there was a dual-stack endpoint, it'd be selected, otherwise it'd be IPv4-only. When a bridge driver's IPv4-only endpoint is acting as a container's gateway, it can proxy from host IPv6 addresses to container IPv4 addresses using the userland-proxy.
Now we have IPv6-only endpoints as well. So, when an IPv4-only endpoint with dual-stack port mappings is running, and an IPv6-only endpoint is added - the IPv4 endpoint has to stop its 6-to-4 proxying in order to let the IPv6 endpoint set up its own mappings. Tracking that in sbJoin/sbLeave, to work out when to program/revoke connectivity got complicated, that's 1832774.
(Extending that logic to understand that routed-mode networks should always have programmed-external-connectivity would further break the libnet/driver layering and make the libnet code even more convoluted. So, as well as simplifying the logic by putting it in the right place, this change enables that modification to routed-mode.)
- How I did it
Remove the complexity from sbJoin/sbLeave ...
For follow-up ...
Since 4f09af6, the bridge driver allocates the same host port for IPv4 and IPv6 for dual-stack bindings, when the endpoint is dual stack. It can do that because there's a call to ProgramExternalConnectivity that adds both bindings to a single endpoint (so, it sees the request and the same time, and can combine them into a single request to the port allocator). But, since IPv6-only endpoints were added, it hasn't been able to do that for containers that have two single-stack endpoints ... the bridge driver learns about them in unrelated calls to ProgramExternalConnectivity. Now, because ProgramExternalConnectivity supplies the endpoint ids of both gateways - when asked to set up bindings for a single-stack endpoint, the bridge driver can check whether the "other" endpoint already has a port allocated for dual-stack bindings. So, it can prefer that port. (If the second port is unavailable, maybe the user would want a new binding for both ports, but equally that might break things. They probably wouldn't want the network connect to fail. So, rather than make it too complicated, probably just make it a preference that'll work a lot of the time - it's possible to specify a host port if that's not good enough.)
- How to verify it
Existing tests, particularly the ones in integration/networking/port_mapping_linux_test.go.
- Human readable description for the release notes