Skip to content

Introduce flag to default L4 NetLB Services to RBS controller #843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

08volt
Copy link
Contributor

@08volt 08volt commented Apr 17, 2025

Changes Introduced

  1. New Flag:

    • Added the --enable-rbs-default-l4-netlb boolean flag to the CCM binary (main.go). Default is false.
  2. Cloud Provider Logic:

    • The GCE cloud provider (gce.Cloud) now accepts and stores the state of this flag via SetEnableRBSDefaultForL4NetLB.
  3. Controller Selection Logic (utils.go):

    • Modified the usesCCMforNetLB function, which determines if the legacy CCM Service controller should manage a given L4 Service.
    • The function now considers the enableRBSDefaultForL4NetLB flag's state.
  4. Unit Tests:

    • Updated the following existing unit tests to include scenarios where the --enable-rbs-default-l4-netlb flag is enabled (isRBSDefault=true):
      • TestEnsureExternalLoadBalancerRBSAnnotation
      • TestEnsureExternalLoadBalancerRBSFinalizer
      • TestEnsureExternalLoadBalancerClass
    • These tests verify the correct controller selection logic (returning cloudprovider.ImplementedElsewhere or processing the service) based on the combination of annotations, finalizers, loadBalancerClass, and the new flag's state during ensure, update, and delete operations.

Behavior with the Flag

  • --enable-rbs-default-l4-netlb=false (Default):

    • No change in behavior.
    • The legacy CCM Service controller will reconcile L4 NetLB services without a spec.loadBalancerClass, unless they explicitly opt-in to RBS via annotations (cloud.google.com/l4-rbs-enabled: "true"), specific finalizers (gke.networking.k8s.io/l4-netlb-v2, gke.networking.k8s.io/l4-regional-netlb-v3), or have an existing forwarding rule pointing to a Backend Service.
  • --enable-rbs-default-l4-netlb=true:

    • The legacy CCM Service controller will NOT reconcile L4 NetLB services without a spec.loadBalancerClass by default.
    • These services are assumed to be handled by an RBS-based controller (e.g., ingress-gce controller).
    • The legacy CCM Service controller will only reconcile L4 Services explicitly requesting it via spec.loadBalancerClass: gce.networking.k8s.io/regional-external-legacy.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 17, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: 08volt
Once this PR has been reviewed and has the lgtm label, please assign bowei for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 17, 2025
@k8s-ci-robot k8s-ci-robot requested review from cici37 and jiahuif April 17, 2025 11:24
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If the repository mantainers determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 17, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @08volt. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 17, 2025
@08volt
Copy link
Contributor Author

08volt commented Apr 17, 2025

@mmamczur

@08volt
Copy link
Contributor Author

08volt commented Apr 17, 2025

@FelipeYepez

@mmamczur
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 18, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 18, 2025
08volt added 3 commits April 22, 2025 09:38
This change introduces the `NetLBFinalizerV1` ("gke.networking.io/l4-netlb-v1")
to Kubernetes Service objects of type LoadBalancer managed by the legacy
cloud-controller-manager logic for external L4 Network Load Balancers (NetLB).

 **Controller Distinction:** It clearly marks the Service as being managed
    by this specific controller logic (`ensureExternalLoadBalancer`),
    distinguishing it from services managed by newer controllers using
    Regional Backend Services (RBS), which use `NetLBFinalizerV2` or
    `NetLBFinalizerV3`. The `usesL4RBS` function now explicitly checks
    for the absence of V1 finalizer as one condition to determine if a
    service *might* be managed elsewhere.

Implementation details:
- The `NetLBFinalizerV1` is added to the Service metadata within the
  `ensureExternalLoadBalancer` function when the load balancer is
  created or updated.
- The finalizer is removed from the Service metadata within the
  `ensureExternalLoadBalancerDeleted` function *after* all associated
  GCP resources have been successfully deleted or confirmed non-existent.

Tested the create, update, and delete lifecycle for external L4
LoadBalancer services to ensure the finalizer is added and removed
correctly and that resource cleanup proceeds as expected.
This change introduces the `--enable-rbs-default-l4-netlb` flag for the GCE Cloud Controller Manager (CCM).

This flag modifies the default controller responsible for handling GCE L4 External Network Load Balancer (NetLB) Services when no explicit `spec.loadBalancerClass` is set.

**Behavior Change:**

* **When `--enable-rbs-default-l4-netlb` is `true`:**
    * L4 LB Services *without* an explicit `spec.loadBalancerClass` will **not** be reconciled by the legacy CCM Service controller by default.
    * It is assumed these services will be handled by the newer Regional Backend Service (RBS) based L4 controller (e.g., the GKE Service controller).
    * The legacy CCM Service controller will only reconcile L4 Services that explicitly specify `spec.loadBalancerClass: gce.networking.k8s.io/regional-external-legacy`.
* **When `--enable-rbs-default-l4-netlb` is `false` (Default):**
    * The existing behavior is preserved.
    * The legacy CCM Service controller will reconcile L4 LB Services without a `spec.loadBalancerClass`, unless they explicitly opt-in to RBS via annotations (`cloud.google.com/l4-rbs-enabled: "true"`), specific finalizers (`gke.networking.k8s.io/l4-netlb-v2` / `gke.networking.k8s.io/l4-regional-netlb-v3`), or have an existing forwarding rule pointing to a Backend Service.

**Implementation Details:**

* Added the `enableRBSDefaultForL4NetLB` boolean flag, controllable via `--enable-rbs-default-l4-netlb`.
* Added `SetEnableRBSDefaultForL4NetLB` method to the GCE cloud provider (`gce.Cloud`) to store the flag's state.
* Modified the `usesCCMforNetLB` utility function to accept the `isRbsDefault` state (derived from the flag) and adjust its logic accordingly.
@mmamczur
Copy link
Contributor

could you squash the last 2 commits to get rid of the one named fix?

@08volt 08volt marked this pull request as draft April 23, 2025 09:42
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 23, 2025
@YifeiZhuang
Copy link
Contributor

/label tide/merge-method-squash

@mmamczur let's always use this label. This will do squash automatically by combining all the commits in the PR, which is mostly what we want. See #848.

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Apr 23, 2025
@mmamczur
Copy link
Contributor

/label tide/merge-method-squash

@mmamczur let's always use this label. This will do squash automatically by combining all the commits in the PR, which is mostly what we want. See #848.

interesting, I didn't know you could do that. But I'd say it depends on the PR, sometimes it's good to have separate commits, like in deps update - go mod changes + vendor in one commit and the cluster/ thing in the other commit

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants