-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[FIX] Dispose StatelessWorkerGrainContext
when it contains no more workers
#9636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a memory leak issue where StatelessWorkerGrainContext
instances were not being properly disposed due to a timer keeping a reference to the context. The fix ensures disposal by explicitly enqueuing a DisposeAsync
command when the last worker is destroyed.
Key Changes
- Adds explicit disposal command when worker count reaches zero
- Prevents memory leaks by ensuring proper cleanup of StatelessWorkerGrainContext instances
var completion = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously); | ||
EnqueueWorkItem(WorkItemType.DisposeAsync, new DisposeAsyncWorkItemState(completion)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The completion
variable is created but never used or awaited. Consider either using this TaskCompletionSource to wait for disposal completion or removing it if not needed.
var completion = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously); | |
EnqueueWorkItem(WorkItemType.DisposeAsync, new DisposeAsyncWorkItemState(completion)); | |
EnqueueWorkItem(WorkItemType.DisposeAsync, new DisposeAsyncWorkItemState(null)); |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither is correct, for one the DisposeAsyncWorkItemState
expects a non-null TCS, and awaiting the tcs.Task would deadlock the loop. Eitherway I hooked a continuation to at least log any potential errors that might occur in the dispose internal method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me! I am, however, not familiar enough with how activation/deactivation works to understand why DisposeAsync
is not called when the SW grain is deactivated?
{ | ||
await _inspectionTimer.DisposeAsync().AsTask() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the change here needed? what is the benefit of AsTask().ConfigureAwait(...)
against former DisposeAsync()
-> only to suppress exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only to suppress exceptions
Yes, this is an "just in case" thing so the rest of the method continues
@@ -172,9 +172,20 @@ private async Task RunMessageLoop() | |||
|
|||
if (_workers.Count == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be a scenario where _workers.Count
reached 0, but in the queue there is an item to process WorkItemType.Message
which will then create a new worker? In that case a new worker will be created right after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very unlikely since the destruction of the workers comes after the fact of a grain disposing (which means it wont receive more messages), but possible! If that happens, the dispose command will be processed after that message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative would be to stop the disposal if a worker exists within the SW context, we should not do that, we should respect the call for disposal from outside!
And a dumb question: there is a Can u please elaborate on when a context should be disposed in a normal scenario? |
That is correct, there is one SW Context per grain GrainAddress (techically per GrainId-SiloAddress, but does not matter here). |
One thing we could do is to not dispose the SW context if the strategy does NOT specify idle worker removal so it would be aligned fully with previous behavior in all fronts |
…d be performed or not
did this! |
When the inspection timer is created, the SW context itself is passed as the
state
argument. When all workers are destroyed the reference tothis
from the timer keeps the SW context itself alive from a GC perspective, so dispose wont get called.This PR enqueues a
DisposeAsync
command when there are no more workers, as we can consider the SW to be destroyed as well.fix #9634
Microsoft Reviewers: Open in CodeFlow