ci for #8078#8128
Conversation
Add opt-in AWS IAM authentication for ElastiCache Redis connections and Redis Cluster mode support. When IAM is enabled, services authenticate to Redis using short-lived SigV4 pre-signed tokens instead of static passwords, with automatic token refresh before expiry. New environment variables: - REDIS_AWS_IAM_AUTH_ENABLED: enable IAM authentication for Redis - REDIS_AWS_IAM_CACHE_NAME: ElastiCache cache instance name for the signer - REDIS_AWS_REGION: optional override for the Redis region - REDIS_CLUSTER_MODE_ENABLED: enable Redis Cluster mode - REDIS_USERNAME: optional Redis username for ACL-based authentication
- Fix refreshIamAuth to set password BEFORE AUTH call (prevents auth failures) - Add timer initialization for pubsub Redis client - Enhance test coverage with unhappy paths and organized test structure - Improve JSDoc comments for AWS IAM interfaces and functions
# Conflicts: # packages/services/server/src/environment.ts # packages/services/server/src/index.ts # packages/services/service-common/package.json # packages/services/service-common/src/index.ts # packages/services/usage/src/environment.ts # packages/services/workflows/src/index.ts
- Add IAM authentication support for AWS-managed Redis - Refactor redis-config-validation to redis-config with enhanced schema - Update all services to use centralized Redis config - Add ClickHouse and feature flags support to workflows - Implement tracing configuration across services
There was a problem hiding this comment.
Code Review
This pull request centralizes Redis client creation and configuration into @hive/service-common and adds opt-in AWS IAM authentication and Cluster mode support. Feedback on the implementation addresses critical correctness and performance issues: a resource leak caused by ignoring the token refresh cleanup function, performance overhead from recreating the AWS credential provider on every token generation, a potential process hang during shutdown due to uncleared retry timers, slow sequential re-authentication of cluster nodes, and lost error stack traces in structured logs.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if (redisIamConfig) { | ||
| startIamTokenRefresh(redis, redisIamConfig, options.logger); | ||
| } |
There was a problem hiding this comment.
The cleanup function returned by startIamTokenRefresh is currently ignored. This leads to a resource leak where the token refresh timer continues to run and attempt AUTH commands even after the Redis client has been permanently disconnected (e.g., during graceful shutdown or in tests). Listening to the end event on the Redis client and calling the cleanup function prevents this leak.
if (redisIamConfig) {
const stopRefresh = startIamTokenRefresh(redis, redisIamConfig, options.logger);
redis.on('end', () => {
options.logger.debug('Redis connection ended, stopping IAM token refresh timer');
stopRefresh();
});
}References
- Focus on Correctness and Performance. (link)
| export async function generatePresignedToken(config: PresignedTokenConfig): Promise<string> { | ||
| const credentialProvider = fromNodeProviderChain(); |
There was a problem hiding this comment.
Instantiating fromNodeProviderChain() on every call to generatePresignedToken bypasses the AWS SDK's internal credential caching. This results in redundant calls to the AWS metadata service or STS on every token generation. Moving the provider instantiation to the module level allows it to be reused and benefit from automatic credential caching.
| export async function generatePresignedToken(config: PresignedTokenConfig): Promise<string> { | |
| const credentialProvider = fromNodeProviderChain(); | |
| const credentialProvider = fromNodeProviderChain(); | |
| export async function generatePresignedToken(config: PresignedTokenConfig): Promise<string> { |
References
- Focus on Correctness and Performance. (link)
| if (attempt < maxRetries) { | ||
| await new Promise(r => setTimeout(r, attempt * retryBackoffMs)); | ||
| } |
There was a problem hiding this comment.
The retry setTimeout inside the Promise is not cleared or unreferenced during graceful shutdown. If a retry is in progress when the application is shutting down, the active timer will prevent the Node.js event loop from exiting, causing the process to hang for up to 15 seconds. Assigning the timeout to currentTimer and unreferencing it ensures it doesn't block process exit and can be cleared cleanly.
if (attempt < maxRetries) {
await new Promise<void>(r => {
if (isShuttingDown) {
r();
return;
}
currentTimer = setTimeout(r, attempt * retryBackoffMs);
if (currentTimer && typeof currentTimer.unref === 'function') {
currentTimer.unref();
}
});
}References
- Focus on Correctness and Performance. (link)
| for (const node of nodes) { | ||
| node.options.password = token; | ||
| try { | ||
| await node.call('AUTH', username, token); | ||
| } catch (err) { | ||
| logger.warn('Failed to re-AUTH cluster node (service=elasticache, error=%s)', err); | ||
| } | ||
| } |
There was a problem hiding this comment.
Re-authenticating cluster nodes sequentially using a for...of loop with await can be slow and block subsequent nodes if some nodes are slow or timing out. Using Promise.all allows all cluster nodes to be re-authenticated in parallel, improving performance and resilience.
await Promise.all(
nodes.map(async node => {
node.options.password = token;
try {
await node.call('AUTH', username, token);
} catch (err) {
logger.warn({ err }, 'Failed to re-AUTH cluster node (service=elasticache)');
}
})
);References
- Focus on Correctness and Performance. (link)
| redis.on('error', err => { | ||
| logger.error('Redis connection error (error=%s)', err); | ||
| }); |
There was a problem hiding this comment.
Logging errors using string interpolation (error=%s) serializes the error object and loses the stack trace and metadata in structured logging (pino). Passing the error object as the first argument (e.g., { err }) ensures the stack trace is properly captured and searchable in log aggregators.
| redis.on('error', err => { | |
| logger.error('Redis connection error (error=%s)', err); | |
| }); | |
| redis.on('error', err => { | |
| logger.error({ err }, 'Redis connection error'); | |
| }); |
References
- Focus on Correctness and Performance. (link)
🚀 Snapshot Release (
|
| Package | Version | Info |
|---|---|---|
@graphql-hive/apollo |
0.48.1-alpha-20260610074719-7de4c687eb0af6f691f724bbdd2a1734105ab26e |
npm ↗︎ unpkg ↗︎ |
@graphql-hive/cli |
0.60.1-alpha-20260610074719-7de4c687eb0af6f691f724bbdd2a1734105ab26e |
npm ↗︎ unpkg ↗︎ |
@graphql-hive/core |
0.21.1-alpha-20260610074719-7de4c687eb0af6f691f724bbdd2a1734105ab26e |
npm ↗︎ unpkg ↗︎ |
@graphql-hive/envelop |
0.40.6-alpha-20260610074719-7de4c687eb0af6f691f724bbdd2a1734105ab26e |
npm ↗︎ unpkg ↗︎ |
@graphql-hive/yoga |
0.48.1-alpha-20260610074719-7de4c687eb0af6f691f724bbdd2a1734105ab26e |
npm ↗︎ unpkg ↗︎ |
hive |
11.3.0-alpha-20260610074719-7de4c687eb0af6f691f724bbdd2a1734105ab26e |
npm ↗︎ unpkg ↗︎ |
|
🐋 This PR was built and pushed to the following Docker images: Targets: Platforms: Image Tag: |
run CI for #8078