Skip to content

[BUG] PostgreSQL query tools return 500 (Npgsql 'Exception while reading from stream') on long-running queries #2959

Description

@ss4aman

Describe the bug

Long-running queries issued through the Azure MCP PostgreSQL tools (postgres database query, postgres table schema get) fail with HTTP 500 Internal Server Error. The underlying Npgsql exception is:

Exception while reading from stream

Two factors combine to cause this:

  1. Default command timeout is too low. The command timeout falls back to Npgsql's built-in default of 30 seconds, so queries such as an exact COUNT(*) over a large table, or any analytical query taking longer than 30s, time out and surface as a 500.
  2. No keepalive on the connection. While a single command runs, no application data flows over the TCP socket. An intermediary in Azure (Load Balancer / NAT gateway, ~4 minute idle timeout) silently drops the idle connection. Npgsql then surfaces this as Exception while reading from stream → HTTP 500, even when the server itself is still processing the query.

This was hit by a customer running queries in the ~1–5 minute range and reproduced consistently.

Expected behavior

  • Reasonable long-running read queries (up to a few minutes) complete successfully without a 500.
  • The command timeout is configurable per call and server-wide.
  • The connection is not dropped by an intermediary idle timeout while a query is actively running.

Actual behavior

  • Queries longer than ~30s (or that stay idle on the socket past the load balancer idle timeout) fail with 500 Internal Server Error / Exception while reading from stream.
  • There is no way to raise the timeout for an individual query.

Affected tools:

  • postgres database query
  • postgres table schema get

Reproduction Steps

  1. Configure the Azure MCP PostgreSQL tools against an Azure Database for PostgreSQL Flexible Server.
  2. Run a query that takes longer than 30 seconds, e.g.:
    SELECT pg_sleep(280), 1;
    or an exact count over a large table:
    SELECT COUNT(*) FROM <large_table>;
  3. Observe the request fail with 500 Internal Server Error and Exception while reading from stream.

Proposed fix (validated)

The following changes resolved the issue for the customer:

  1. Add an explicit, configurable command timeout with this precedence:
    • explicit per-call command-timeout option,
    • AZURE_MCP_POSTGRES_COMMAND_TIMEOUT environment variable (server-wide default),
    • default of 300 seconds (5 minutes),
    • 0 = wait indefinitely.
  2. Enable keepalives on the connection string so the socket stays active during long-running queries:
    • KeepAlive = 30 (Npgsql application-level keepalive every 30 idle seconds),
    • TcpKeepAlive = true.

With both changes, the 5-minute pg_sleep and large COUNT(*) queries complete successfully instead of returning a 500.

Environment

  • Tool: Azure MCP Server — PostgreSQL tools (Azure.Mcp.Tools.Postgres)
  • Driver: Npgsql
  • Backend: Azure Database for PostgreSQL Flexible Server

Metadata

Metadata

Assignees

Type

No fields configured for Bug.

Projects

Status
Untriaged

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions