Skip to content

[BUG] Fix UNSUBs after disconnect can cause auth violations#1336

Merged
scottf merged 5 commits intomainfrom
fix-1320
Jun 25, 2025
Merged

[BUG] Fix UNSUBs after disconnect can cause auth violations#1336
scottf merged 5 commits intomainfrom
fix-1320

Conversation

@scottf
Copy link
Contributor

@scottf scottf commented Jun 24, 2025

Fixes #1320

An Authorization Violation could occur if an UNSUB was sent between a disconnect and reconnect, because the UNSUB got queued and sent before the CONNECT protocol finished. This PR prevents that from happening and also addresses not needing to send unsubs after a disconnect.

  1. Puts UNSUBS in the normal (not internal/reconnect) queue. This by itself is enough to solve the bug, since the normal queue is never sent while not connected.

  2. On a disconnect, PINGS and PONGS were already filtered out of the normal outgoing queue because they are invalid anyway.

    • This PR filters out UNSUB messages, since they are unnecessary traffic; subs are closed on the server anyway because of the disconnect.
    • This PR filters out SUB messages. By the time the request is made to send the actual SUB message to the server, the sub has been registered internally. This means that if there is a disconnect, the client is going to resub, so not filtering would actually create a double sub. This is a rare edge case.

If you want to verify, run the test it passes. Then go into NatsConnection and change line 1062 to

queueInternalOutgoing(new ProtocolMessage(bab));

and then run the test, it will fail.


Thanks @ajax-surovskyi-y

@scottf scottf changed the title [BUG] Fix #1320 [BUG] Fix UNSUBs after disconnect can cause auth violations Jun 24, 2025
@scottf scottf requested a review from MauriceVanVeen June 24, 2025 22:21
Copy link
Member

@MauriceVanVeen MauriceVanVeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

try {
long time = NatsSystemClock.nanoTime();
writer.queueInternalMessage(new ProtocolMessage(OP_PING_BYTES));
writer.queueInternalMessage(new ProtocolMessage(PONG_PROTO));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the premade PONG_PROTO for faster construction

this.outgoing.filter((msg) ->
msg.isProtocol() &&
(msg.getProtocolBab().equals(OP_PING_BYTES) || msg.getProtocolBab().equals(OP_PONG_BYTES)));
this.outgoing.filter(NatsMessage::isProtocolFilterOnStop);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leverage the new flag for faster filtering instead of doing byte compares on proto bytes

sendBuffer[sendPosition++] = LF;

if (!msg.isProtocol()) { // because a protocol message does not have headers
if (!msg.isProtocol()) { // because a protocol message does not have headers or data
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clarification was important.

// Protocol message is a special version of a NatsPublishableMessage extends NatsMessage
// ----------------------------------------------------------------------------------------------------
class ProtocolMessage extends NatsPublishableMessage {
final boolean filterOnStop;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must construct with this information now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I wanted to raise is the potential memory impact of adding additional state to each NatsMessage.
In high-throughput systems (we’re handling tens of thousands of messages per second), even a small increase in object size can become noticeable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I may have overestimated the impact — this change applies to protocol messages rather than the actual PUB data, so the volume is relatively low.
That likely means the memory overhead is negligible.

super(false);
protocolBab = new ByteArrayBuilder(protocol);
sizeInBytes = controlLineLength = protocolBab.length() + 2; // CRLF, protocol doesn't have data
this.filterOnStop = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through every use this constructor. 99% were test construction. The only production usage were these, the pre-made messages which speed up new ProtocolMessage creation by sharing the existing protocol buffer which never changes for these fixed size protocols. These are used to construct new ProtocolMessage, which is required since they have some state related to their queueing - you can only share / reuse the data, not the message itself.

private static final ProtocolMessage PING_PROTO = new ProtocolMessage(OP_PING_BYTES);
private static final ProtocolMessage PONG_PROTO = new ProtocolMessage(OP_PONG_BYTES);

scottf added 2 commits June 25, 2025 06:55
…eady registered internally and will be re-subscribed automatically after a recconnect. It's only a rare edge case anyway
@scottf scottf merged commit f5b1042 into main Jun 25, 2025
5 checks passed
@scottf scottf deleted the fix-1320 branch June 25, 2025 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NATS Client "Authorization Violation" after server restart

3 participants