Skip to content

ci for #8078#8128

Draft
n1ru4l wants to merge 13 commits into
mainfrom
redis-iam-auth
Draft

ci for #8078#8128
n1ru4l wants to merge 13 commits into
mainfrom
redis-iam-auth

Conversation

@n1ru4l

@n1ru4l n1ru4l commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

run CI for #8078

mish-elle and others added 10 commits May 26, 2026 10:23
Add opt-in AWS IAM authentication for ElastiCache Redis connections and Redis Cluster mode support. When IAM is enabled, services authenticate to Redis using short-lived SigV4 pre-signed tokens instead of static passwords, with automatic token refresh before expiry.

New environment variables:

- REDIS_AWS_IAM_AUTH_ENABLED: enable IAM authentication for Redis

- REDIS_AWS_IAM_CACHE_NAME: ElastiCache cache instance name for the signer

- REDIS_AWS_REGION: optional override for the Redis region

- REDIS_CLUSTER_MODE_ENABLED: enable Redis Cluster mode

- REDIS_USERNAME: optional Redis username for ACL-based authentication
- Fix refreshIamAuth to set password BEFORE AUTH call (prevents auth failures)
- Add timer initialization for pubsub Redis client
- Enhance test coverage with unhappy paths and organized test structure
- Improve JSDoc comments for AWS IAM interfaces and functions
# Conflicts:
#	packages/services/server/src/environment.ts
#	packages/services/server/src/index.ts
#	packages/services/service-common/package.json
#	packages/services/service-common/src/index.ts
#	packages/services/usage/src/environment.ts
#	packages/services/workflows/src/index.ts
- Add IAM authentication support for AWS-managed Redis
- Refactor redis-config-validation to redis-config with enhanced schema
- Update all services to use centralized Redis config
- Add ClickHouse and feature flags support to workflows
- Implement tracing configuration across services

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request centralizes Redis client creation and configuration into @hive/service-common and adds opt-in AWS IAM authentication and Cluster mode support. Feedback on the implementation addresses critical correctness and performance issues: a resource leak caused by ignoring the token refresh cleanup function, performance overhead from recreating the AWS credential provider on every token generation, a potential process hang during shutdown due to uncleared retry timers, slow sequential re-authentication of cluster nodes, and lost error stack traces in structured logs.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +96 to +98
if (redisIamConfig) {
startIamTokenRefresh(redis, redisIamConfig, options.logger);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The cleanup function returned by startIamTokenRefresh is currently ignored. This leads to a resource leak where the token refresh timer continues to run and attempt AUTH commands even after the Redis client has been permanently disconnected (e.g., during graceful shutdown or in tests). Listening to the end event on the Redis client and calling the cleanup function prevents this leak.

  if (redisIamConfig) {
    const stopRefresh = startIamTokenRefresh(redis, redisIamConfig, options.logger);
    redis.on('end', () => {
      options.logger.debug('Redis connection ended, stopping IAM token refresh timer');
      stopRefresh();
    });
  }
References
  1. Focus on Correctness and Performance. (link)

Comment on lines +41 to +42
export async function generatePresignedToken(config: PresignedTokenConfig): Promise<string> {
const credentialProvider = fromNodeProviderChain();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instantiating fromNodeProviderChain() on every call to generatePresignedToken bypasses the AWS SDK's internal credential caching. This results in redundant calls to the AWS metadata service or STS on every token generation. Moving the provider instantiation to the module level allows it to be reused and benefit from automatic credential caching.

Suggested change
export async function generatePresignedToken(config: PresignedTokenConfig): Promise<string> {
const credentialProvider = fromNodeProviderChain();
const credentialProvider = fromNodeProviderChain();
export async function generatePresignedToken(config: PresignedTokenConfig): Promise<string> {
References
  1. Focus on Correctness and Performance. (link)

Comment on lines +135 to +137
if (attempt < maxRetries) {
await new Promise(r => setTimeout(r, attempt * retryBackoffMs));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The retry setTimeout inside the Promise is not cleared or unreferenced during graceful shutdown. If a retry is in progress when the application is shutting down, the active timer will prevent the Node.js event loop from exiting, causing the process to hang for up to 15 seconds. Assigning the timeout to currentTimer and unreferencing it ensures it doesn't block process exit and can be cleared cleanly.

            if (attempt < maxRetries) {
              await new Promise<void>(r => {
                if (isShuttingDown) {
                  r();
                  return;
                }
                currentTimer = setTimeout(r, attempt * retryBackoffMs);
                if (currentTimer && typeof currentTimer.unref === 'function') {
                  currentTimer.unref();
                }
              });
            }
References
  1. Focus on Correctness and Performance. (link)

Comment on lines +111 to +118
for (const node of nodes) {
node.options.password = token;
try {
await node.call('AUTH', username, token);
} catch (err) {
logger.warn('Failed to re-AUTH cluster node (service=elasticache, error=%s)', err);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Re-authenticating cluster nodes sequentially using a for...of loop with await can be slow and block subsequent nodes if some nodes are slow or timing out. Using Promise.all allows all cluster nodes to be re-authenticated in parallel, improving performance and resilience.

    await Promise.all(
      nodes.map(async node => {
        node.options.password = token;
        try {
          await node.call('AUTH', username, token);
        } catch (err) {
          logger.warn({ err }, 'Failed to re-AUTH cluster node (service=elasticache)');
        }
      })
    );
References
  1. Focus on Correctness and Performance. (link)

Comment on lines +153 to +155
redis.on('error', err => {
logger.error('Redis connection error (error=%s)', err);
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Logging errors using string interpolation (error=%s) serializes the error object and loses the stack trace and metadata in structured logging (pino). Passing the error object as the first argument (e.g., { err }) ensures the stack trace is properly captured and searchable in log aggregators.

Suggested change
redis.on('error', err => {
logger.error('Redis connection error (error=%s)', err);
});
redis.on('error', err => {
logger.error({ err }, 'Redis connection error');
});
References
  1. Focus on Correctness and Performance. (link)

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🚀 Snapshot Release (alpha)

The latest changes of this PR are available as alpha on npm (based on the declared changesets):

Package Version Info
@graphql-hive/apollo 0.48.1-alpha-20260610074719-7de4c687eb0af6f691f724bbdd2a1734105ab26e npm ↗︎ unpkg ↗︎
@graphql-hive/cli 0.60.1-alpha-20260610074719-7de4c687eb0af6f691f724bbdd2a1734105ab26e npm ↗︎ unpkg ↗︎
@graphql-hive/core 0.21.1-alpha-20260610074719-7de4c687eb0af6f691f724bbdd2a1734105ab26e npm ↗︎ unpkg ↗︎
@graphql-hive/envelop 0.40.6-alpha-20260610074719-7de4c687eb0af6f691f724bbdd2a1734105ab26e npm ↗︎ unpkg ↗︎
@graphql-hive/yoga 0.48.1-alpha-20260610074719-7de4c687eb0af6f691f724bbdd2a1734105ab26e npm ↗︎ unpkg ↗︎
hive 11.3.0-alpha-20260610074719-7de4c687eb0af6f691f724bbdd2a1734105ab26e npm ↗︎ unpkg ↗︎

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🐋 This PR was built and pushed to the following Docker images:

Targets: build

Platforms: linux/amd64

Image Tag: 7de4c687eb0af6f691f724bbdd2a1734105ab26e

@n1ru4l n1ru4l changed the title ci for https://github.com/graphql-hive/console/pull/8078 ci for #8078 Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants