Skip to content

fix: handle operator crash when using workloadRef #989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 15, 2025

Conversation

praddy26
Copy link
Contributor

@praddy26 praddy26 commented Aug 6, 2025

Description
This MR fixes issue #751 where restarting an Argo Rollout configured with workloadRef causes the Reloader operator pod to crash with a panic: "runtime error: index out of range [0] with length 0".

Root Cause
The issue occurs because Argo Rollouts using workloadRef reference an existing Deployment/ReplicaSet instead of defining containers directly in their Spec.Template.Spec.Containers. When the GetRolloutContainers function is called for such rollouts, it returns an empty slice since no containers are defined in the rollout template itself.

The panic happens in the getContainerUsingResource function in upgrade.go when the code attempts to access containers[0] without first checking if the slice contains any elements.

Solution
Added bounds checking before accessing the first element of the containers slice to prevent index out of range panics.

Related Issues
Closes #751

@praddy26 praddy26 force-pushed the fix-operator-pod-crash branch 4 times, most recently from 59028c3 to 6defef5 Compare August 11, 2025 01:35
@Felix-Stakater
Copy link
Contributor

Hi and thank you for your contribution!

The fix proposed will work in theory but it will only result in the error NoContainerFound being generated, since we are not actually fetching the underlying workload being referenced.

This is for sure better than the operator crashing so i think it can be implemented as-is.

For the full fix we would probably need to fetch the referenced workload and find the container to restart from there. The issue then becomes: "How do we know if a deployment is being referenced from an ArgoCD Rollout?", this is for when getting an updated Deployment, we need to know if we should ignore it and trigger the reload via the argoCD Rollout, or if we should trigger the reload based on the "raw" Deployment.

This requires some planning for how to implement it and keep track of if a resource is referenced by an argoCD rollout. We are open to ideas and PRs for sure.

@praddy26 praddy26 force-pushed the fix-operator-pod-crash branch from 6defef5 to 2e1a3df Compare August 11, 2025 12:02
@praddy26 praddy26 force-pushed the fix-operator-pod-crash branch from 2e1a3df to 1e6a6ec Compare August 14, 2025 02:26
@Felix-Stakater Felix-Stakater merged commit 9b2af6f into stakater:master Aug 15, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Restarting a rollout with workloadRef crashes the operator pod
2 participants