ethclient: fix flaky pending tx test #32380
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes: #32252
This PR fixes an issue where testAtFunctions fails because of timeout.
Cause
The issue caused by infinite loop in the test here:
go-ethereum/ethclient/ethclient_test.go
Lines 513 to 523 in c3ef6c7
Although the transaction is sent to the backend, it sometimes keeps returning a pending transaction count of
0
, never reaching the break condition. I was able to reproduce this failure locally, albeit rarely—about 1 in every 1000 test runs.In the successful cases,
pendingNonceAt
returns2
, which is consistent with inserting a block containing two transactions viaInsertChain
when initializing the test backend. However, in the failing cases,pendingNonceAt
returns0
.Root Cause
The issue is caused by a race condition between invoking
InsertChain
and the startup of the txpool's background process that updates subpool'spendingNonces
when initializing test backend.The failure occurs because the
pendingNonces
inside thetxPool
'ssubpools
does not have the correct nonce information. This is due to the fact that when the blocks are inserted viaInsertChain
, the txpool.loop() has not yet started running, so the background processing that updates the state does not occur.In the regular cases,
txpool.loop
receives the updated head when the blocks are inserted, triggeringsubpool.Reset(oldHead, newHead)
. Then, insideLegacyPool.runReorg
,pendingNonces
is properly updated.In the failure pattern, however, blocks are ALWAYS inserted before
txpool.loop()
has started, sopendingNonces
is never updated.Reproduction
The issue can be reliably reproduced by adding
time.Sleep(1 * time.Second)
at the beginning oftxpool.loop
. This delay causes the test to consistently fail, confirming the presence of a race condition.Solution
I found that we already have a handy function
txpool.Sync()
to wait for txpool starting its background loop. We can use it and fix the flakyness.