Skip to content

ethclient: fix flaky pending tx test #32380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 11, 2025

Conversation

kashitaka
Copy link
Contributor

@kashitaka kashitaka commented Aug 8, 2025

Fixes: #32252

This PR fixes an issue where testAtFunctions fails because of timeout.

Cause

The issue caused by infinite loop in the test here:

for {
// Check pending transaction count
pending, err := ec.PendingTransactionCount(context.Background())
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if pending == 1 {
break
}
time.Sleep(100 * time.Millisecond)
}

Although the transaction is sent to the backend, it sometimes keeps returning a pending transaction count of 0, never reaching the break condition. I was able to reproduce this failure locally, albeit rarely—about 1 in every 1000 test runs.

In the successful cases, pendingNonceAt returns 2, which is consistent with inserting a block containing two transactions via InsertChain when initializing the test backend. However, in the failing cases, pendingNonceAt returns 0.

Root Cause

The issue is caused by a race condition between invoking InsertChain and the startup of the txpool's background process that updates subpool's pendingNonces when initializing test backend.

The failure occurs because the pendingNonces inside the txPool's subpools does not have the correct nonce information. This is due to the fact that when the blocks are inserted via InsertChain, the txpool.loop() has not yet started running, so the background processing that updates the state does not occur.

In the regular cases, txpool.loop receives the updated head when the blocks are inserted, triggering subpool.Reset(oldHead, newHead). Then, inside LegacyPool.runReorg, pendingNonces is properly updated.
In the failure pattern, however, blocks are ALWAYS inserted before txpool.loop() has started, so pendingNonces is never updated.

Reproduction

The issue can be reliably reproduced by adding time.Sleep(1 * time.Second) at the beginning of txpool.loop. This delay causes the test to consistently fail, confirming the presence of a race condition.

Solution

I found that we already have a handy function txpool.Sync() to wait for txpool starting its background loop. We can use it and fix the flakyness.

@kashitaka kashitaka requested a review from fjl as a code owner August 8, 2025 17:00
Copy link
Contributor Author

@kashitaka kashitaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjl493456442 I identified the cause of the issue, so I close #32329 and submit this PR. Please take a look when you have time.

sendTransaction(ec)
if err := sendTransaction(ec); err != nil {
t.Fatalf("unexpected error: %v", err)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error was unhandled, so I added error handling to make debugging easier.

@rjl493456442 rjl493456442 added this to the 1.16.3 milestone Aug 11, 2025
@rjl493456442 rjl493456442 merged commit 8ba1c79 into ethereum:master Aug 11, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix flaky test TestEthClient
2 participants