Refactored sending lock of subscriber #589
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation / Background
Refactored sending lock of subscriber to the SendToSubscriber loop to be as close as possible to the sending part of the logic. This results in the output buffer actually being utilized and fixes blocking behavior mentioned in issue #275 and #382.
Detail
In a project we came to a point where we wanted to rely on "flushing" behavior noted in the docs here. Which we expected to work as go channels can still be read from even after calling close as per doc
But to our surprise in testing where we set up a PubSub and published an equal amount of messages as the buffer and then calling close on the pubsub. We only received a fraction of the messages in our subscriber handler. Note that we made sure to only start reading from the channel after a few seconds, which should be plenty amount of time, to give the publisher a change to fill up the buffer. We noticed that the buffer didn't contain the expected published messages. But we did notice the messages being present int the
persistedMessages
map when making usage of the persist config. So timing couldn't being the issue.I tracked down the culprit of this (for us) unexpected behavior to the
s.sending.Lock()
being in a place where it introduces a deadlock.When Publishing() multiple messages each message result in a go routine which loops over all the subscriber of the targeted topic. For each subscriber a new go routine is created which finally send the actual message through
subscriber.sendMessageToSubscriber(message, logFields)
.However, ALL go routines share the same "sending lock". Thus, if multiple messages are sent concurrently 1 go routine will keep this lock hostage ("deadlock") until it returns from this
sendMessageToSubscriber
func. This is due the unlock being done on defer AND because of the last switch case where the func only returns if:And this switch case could take a while to actually trigger depending on when a dev calls one of these 3 options.
This part of the logic has no further reason (as far as I understand) to keep this sending lock.
Now by moving the lock /unlock to the "sending" part of the logic we avoid this deadlock. And circles back to our issue where now the buffer is actually utilized and flushing can be done. While also fixing #275 and #382.
Alternative approaches considered (if applicable)
N/A
Checklist
The resources of our team are limited. There are a couple of things that you can do to help us merge your PR faster:
make up
.make test_short
for a quick check.make test
.