[BUG] PostgreSQL query tools return 500 (Npgsql 'Exception while reading from stream') on long-running queries

### Describe the bug

Long-running queries issued through the Azure MCP PostgreSQL tools (`postgres database query`, `postgres table schema get`) fail with HTTP `500 Internal Server Error`. The underlying Npgsql exception is:

```
Exception while reading from stream
```

Two factors combine to cause this:

1. **Default command timeout is too low.** The command timeout falls back to Npgsql's built-in default of 30 seconds, so queries such as an exact `COUNT(*)` over a large table, or any analytical query taking longer than 30s, time out and surface as a 500.
2. **No keepalive on the connection.** While a single command runs, no application data flows over the TCP socket. An intermediary in Azure (Load Balancer / NAT gateway, ~4 minute idle timeout) silently drops the idle connection. Npgsql then surfaces this as `Exception while reading from stream` → HTTP 500, even when the server itself is still processing the query.

This was hit by a customer running queries in the ~1–5 minute range and reproduced consistently.

### Expected behavior

- Reasonable long-running read queries (up to a few minutes) complete successfully without a 500.
- The command timeout is configurable per call and server-wide.
- The connection is not dropped by an intermediary idle timeout while a query is actively running.

### Actual behavior

- Queries longer than ~30s (or that stay idle on the socket past the load balancer idle timeout) fail with `500 Internal Server Error` / `Exception while reading from stream`.
- There is no way to raise the timeout for an individual query.

Affected tools:
- `postgres database query`
- `postgres table schema get`

### Reproduction Steps

1. Configure the Azure MCP PostgreSQL tools against an Azure Database for PostgreSQL Flexible Server.
2. Run a query that takes longer than 30 seconds, e.g.:
   ```sql
   SELECT pg_sleep(280), 1;
   ```
   or an exact count over a large table:
   ```sql
   SELECT COUNT(*) FROM <large_table>;
   ```
3. Observe the request fail with `500 Internal Server Error` and `Exception while reading from stream`.

### Proposed fix (validated)

The following changes resolved the issue for the customer:

1. **Add an explicit, configurable command timeout** with this precedence:
   - explicit per-call `command-timeout` option,
   - `AZURE_MCP_POSTGRES_COMMAND_TIMEOUT` environment variable (server-wide default),
   - default of **300 seconds (5 minutes)**,
   - `0` = wait indefinitely.
2. **Enable keepalives on the connection string** so the socket stays active during long-running queries:
   - `KeepAlive = 30` (Npgsql application-level keepalive every 30 idle seconds),
   - `TcpKeepAlive = true`.

With both changes, the 5-minute `pg_sleep` and large `COUNT(*)` queries complete successfully instead of returning a 500.

### Environment

- Tool: Azure MCP Server — PostgreSQL tools (`Azure.Mcp.Tools.Postgres`)
- Driver: Npgsql
- Backend: Azure Database for PostgreSQL Flexible Server


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] PostgreSQL query tools return 500 (Npgsql 'Exception while reading from stream') on long-running queries #2959

Describe the bug

Expected behavior

Actual behavior

Reproduction Steps

Proposed fix (validated)

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] PostgreSQL query tools return 500 (Npgsql 'Exception while reading from stream') on long-running queries #2959

Description

Describe the bug

Expected behavior

Actual behavior

Reproduction Steps

Proposed fix (validated)

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions