[FIX] Dispose `StatelessWorkerGrainContext` when it contains no more workers #9636

ledjon-behluli · 2025-08-04T16:35:19Z

When the inspection timer is created, the SW context itself is passed as the state argument. When all workers are destroyed the reference to this from the timer keeps the SW context itself alive from a GC perspective, so dispose wont get called.

This PR enqueues a DisposeAsync command when there are no more workers, as we can consider the SW to be destroyed as well.

fix #9634

Microsoft Reviewers: Open in CodeFlow

ReubenBond · 2025-08-04T17:06:32Z

@vrecluse @gfix please take a look

Copilot

Pull Request Overview

This PR fixes a memory leak issue where StatelessWorkerGrainContext instances were not being properly disposed due to a timer keeping a reference to the context. The fix ensures disposal by explicitly enqueuing a DisposeAsync command when the last worker is destroyed.

Key Changes

Adds explicit disposal command when worker count reaches zero
Prevents memory leaks by ensuring proper cleanup of StatelessWorkerGrainContext instances

Copilot · 2025-08-04T17:09:37Z

src/Orleans.Runtime/Catalog/StatelessWorkerGrainContext.cs

+                                    var completion = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
+                                    EnqueueWorkItem(WorkItemType.DisposeAsync, new DisposeAsyncWorkItemState(completion));


The completion variable is created but never used or awaited. Consider either using this TaskCompletionSource to wait for disposal completion or removing it if not needed.

Suggested change

var completion = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);

EnqueueWorkItem(WorkItemType.DisposeAsync, new DisposeAsyncWorkItemState(completion));

EnqueueWorkItem(WorkItemType.DisposeAsync, new DisposeAsyncWorkItemState(null));

Neither is correct, for one the DisposeAsyncWorkItemState expects a non-null TCS, and awaiting the tcs.Task would deadlock the loop. Eitherway I hooked a continuation to at least log any potential errors that might occur in the dispose internal method.

…isposal

gfix

This looks fine to me! I am, however, not familiar enough with how activation/deactivation works to understand why DisposeAsync is not called when the SW grain is deactivated?

DeagleGross · 2025-08-05T10:50:38Z

src/Orleans.Runtime/Catalog/StatelessWorkerGrainContext.cs

        {
+            await _inspectionTimer.DisposeAsync().AsTask()


why is the change here needed? what is the benefit of AsTask().ConfigureAwait(...) against former DisposeAsync() -> only to suppress exceptions?

only to suppress exceptions

Yes, this is an "just in case" thing so the rest of the method continues

DeagleGross · 2025-08-05T10:54:50Z

src/Orleans.Runtime/Catalog/StatelessWorkerGrainContext.cs

@@ -172,9 +172,20 @@ private async Task RunMessageLoop()

                                if (_workers.Count == 0)


Can there be a scenario where _workers.Count reached 0, but in the queue there is an item to process WorkItemType.Message which will then create a new worker? In that case a new worker will be created right after.

very unlikely since the destruction of the workers comes after the fact of a grain disposing (which means it wont receive more messages), but possible! If that happens, the dispose command will be processed after that message.

The alternative would be to stop the disposal if a worker exists within the SW context, we should not do that, we should respect the call for disposal from outside!

DeagleGross · 2025-08-05T11:00:35Z

And a dumb question: there is a StatelessWorkerGrainContext and it is a single GrainContext per GrainAddress right? I am a bit lost with the hierarchy of the classes and why we even should call dispose on the GrainContext (initially i thought it should be shared for multiple grains and should not be disposed).

Can u please elaborate on when a context should be disposed in a normal scenario?

ledjon-behluli · 2025-08-05T18:04:40Z

And a dumb question: there is a StatelessWorkerGrainContext and it is a single GrainContext per GrainAddress right? I am a bit lost with the hierarchy of the classes and why we even should call dispose on the GrainContext (initially i thought it should be shared for multiple grains and should not be disposed).

Can u please elaborate on when a context should be disposed in a normal scenario?

That is correct, there is one SW Context per grain GrainAddress (techically per GrainId-SiloAddress, but does not matter here).
Normal scenario would be the host is shutting down, but we the change (which is a good side-effect) we cleanup the SW if no workers are active. You can think of it like a way of doing a sort of "garbage collection", which is good as we'll release the timer too!

ledjon-behluli · 2025-08-05T18:11:06Z

One thing we could do is to not dispose the SW context if the strategy does NOT specify idle worker removal so it would be aligned fully with previous behavior in all fronts

…d be performed or not

ledjon-behluli · 2025-08-05T20:11:59Z

One thing we could do is to not dispose the SW context if the strategy does NOT specify idle worker removal so it would be aligned fully with previous behavior in all fronts

did this!

ensure disposal is called when no workers are alive

2be6566

ledjon-behluli mentioned this pull request Aug 4, 2025

(Orleans 9.1.2) Stateless woker grain potential thread pool tasks leak? #9634

Open

ReubenBond requested review from DeagleGross, adityamandaleeka, benjaminpetit and Copilot August 4, 2025 17:09

Copilot AI reviewed Aug 4, 2025

View reviewed changes

hook continuation in order to log any erros that might occur during d…

a078007

…isposal

gfix approved these changes Aug 5, 2025

View reviewed changes

DeagleGross reviewed Aug 5, 2025

View reviewed changes

fully align SW context behavior based on if idle worker removal shoul…

8ae48d5

…d be performed or not

DeagleGross approved these changes Aug 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FIX] Dispose `StatelessWorkerGrainContext` when it contains no more workers #9636

[FIX] Dispose `StatelessWorkerGrainContext` when it contains no more workers #9636

Uh oh!

ledjon-behluli commented Aug 4, 2025 •

edited by dotnet-policy-service bot

Loading

Uh oh!

ReubenBond commented Aug 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 4, 2025

Uh oh!

ledjon-behluli Aug 4, 2025

Uh oh!

gfix left a comment

Uh oh!

DeagleGross Aug 5, 2025

Uh oh!

ledjon-behluli Aug 5, 2025 •

edited

Loading

Uh oh!

DeagleGross Aug 5, 2025

Uh oh!

ledjon-behluli Aug 5, 2025

Uh oh!

ledjon-behluli Aug 5, 2025

Uh oh!

DeagleGross commented Aug 5, 2025

Uh oh!

ledjon-behluli commented Aug 5, 2025

Uh oh!

ledjon-behluli commented Aug 5, 2025

Uh oh!

ledjon-behluli commented Aug 5, 2025

Uh oh!

Uh oh!

		var completion = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
		EnqueueWorkItem(WorkItemType.DisposeAsync, new DisposeAsyncWorkItemState(completion));

		@@ -172,9 +172,20 @@ private async Task RunMessageLoop()

		if (_workers.Count == 0)

[FIX] Dispose StatelessWorkerGrainContext when it contains no more workers #9636

Are you sure you want to change the base?

[FIX] Dispose StatelessWorkerGrainContext when it contains no more workers #9636

Uh oh!

Conversation

ledjon-behluli commented Aug 4, 2025 • edited by dotnet-policy-service bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Microsoft Reviewers: Open in CodeFlow

Uh oh!

ReubenBond commented Aug 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Uh oh!

Copilot AI Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

ledjon-behluli Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

gfix left a comment

Choose a reason for hiding this comment

Uh oh!

DeagleGross Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

ledjon-behluli Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DeagleGross Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

ledjon-behluli Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

ledjon-behluli Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

DeagleGross commented Aug 5, 2025

Uh oh!

ledjon-behluli commented Aug 5, 2025

Uh oh!

ledjon-behluli commented Aug 5, 2025

Uh oh!

ledjon-behluli commented Aug 5, 2025

Uh oh!

Uh oh!

[FIX] Dispose `StatelessWorkerGrainContext` when it contains no more workers #9636

[FIX] Dispose `StatelessWorkerGrainContext` when it contains no more workers #9636

ledjon-behluli commented Aug 4, 2025 •

edited by dotnet-policy-service bot

Loading

ledjon-behluli Aug 5, 2025 •

edited

Loading