Skip to content

v0.10.0

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 02 Jul 14:30
· 9 commits to main since this release
7094685

This new release of NRI Reference Plugins brings a new NRI plugin, new features in resource policy plugins, a number of bug fixes, end-to-end tests and few use cases in documentation.

What's New

Balloons Policy

  • Composite balloons enables allocating a diverse set of CPUs for containers with complex CPU requirements. For example, "allocate an equal number of CPUs from both NUMA nodes on CPU socket 0". This allocation enables efficient parallelism inside an AI inference engine container that runs inference on CPU, and still isolate inference engines from each other.

    balloonTypes:
    - name: balance-pkg0-nodes
      components:
      - balloonType: node0
      - balloonType: node1
    - name: node0
      preferCloseToDevices:
      - /sys/devices/system/node/node0
    - name: node1
      preferCloseToDevices:
      - /sys/devices/system/node/node1
  • Documentation includes recipes for preventing creation of certain containers on a worker node, and resetting CPU and memory pinning of all containers in a cluster.

Topology Aware Policy

  • Pick CPU and Memory by Topology Hints Normally topology hints are only used to pick the assigned pool for a workload. Once a pool is selected the available resources within the pool are considered equally good for satisfying the topology hints. When the policy is allocating exclusive CPUs and picking pinned memory for the workload, only other potential criteria and attributes are considered for picking the individual resources.

    When multiple devices are allocated to a single container, it is possible that this default assumption of all resources within the pool being topologically equal is not true. If a container is allocated misaligned devices, IOW devices with different memory or CPU locality. To overcome this, containers can now be annotated to prefer hint based selection and pinning of CPU and memory resources using the pick-resources-by-hints.resource-policy.nri.io annotation. For example,

    apiVersion: v1
    kind: Pod
    metadata:
      name: data-pump
      annotations:
        k8s.v1.cni.cncf.io/networks: sriov-net1
        prefer-isolated-cpus.resource-policy.nri.io/container.ctr0: "true"
        pick-resources-by-hints.resource-policy.nri.io/container.ctr0: "true"
    spec:
      containers:
      - name: ctr0
        image: dpdk-pump
        imagePullPolicy: Always
        resources:
          requests:
            cpu: 2
            memory: 100M
            vendor.com/sriov_netdevice_A: '1'
            vendor.com/sriov_netdevice_B: '1'
          limits:
            vendor.com/sriov_netdevice_A: '1'
            vendor.com/sriov_netdevice_B: '1'
            cpu: 2
            memory: 100M

    When annotated like that, the policy will try to pick one exclusive isolated CPU with locality to one device and another with locality to the other. It will also try to pick and pin to memory aligned with these devices.

Common Policy Improvements

These are improvements to common infrastructure and as such are available for the balloons and topology-aware policy plugins, as well as for the wireframe template policy plugin.

  • Cache Allocation

    Plugins can be configured to exercise class-based control over the L2 and L3 cache allocated to containers' processes. In practice, containers are assigned to classes. Classes have a corresponding cache allocation configuration. This configuration is applied to all containers and subsequently to all processes started in a container. To enable cache control use the control.rdt.enable option which defaults to false.

    Plugins can be configured to assign containers by default to a cache class named after the Pod QoS class of the container: one of BestEffort, Burstable, and Guaranteed. The configuration setting controlling this behavior is control.rdt.usagePodQoSAsDefaultClass and it defaults to false.

    Additionally, containers can be explicitly annotated to be assigned to a class. Use the rdtclass.resource-policy.nri.io annotation key for this. For instance

    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod
      annotations:
        rdtclass.resource-policy.nri.io/pod: poddefaultclass
        rdtclass.resource-policy.nri.io/container.special-container: specialclass
    ...

    This will assign the container named special-container within the pod to the specialclass RDT class and any other container within the pod to the poddefaultclass RDT class. Effectively these containers' processes will be assigned to the RDT CLOSes corresponding to those classes.

    Cache Class/Partitioning Configuration

    RDT configuration is supplied as part of thecontrol.rdt configuration block. Here is a sample snippet as a Helm chart value which assigns 33%, 66% and 100% of cache lines to BestEffort, Burstable and Guaranteed Pod QoS class containers correspondingly:

    config:
      control:
        rdt:
          enable: true
          usePodQoSAsDefaultClass: true
          options:
            l2:
              optional: true
            l3:
              optional: true
            mb:
              optional: true
          partitions:
            fullCache:
              l2Allocation:
                all:
                  unified: 100%
              l3Allocation:
                all:
                  unified: 100%
              classes:
                BestEffort:
                  l2Allocation:
                    all:
                      unified: 33%
                  l3Allocation:
                    all:
                      unified: 33%
                Burstable:
                  l2Allocation:
                    all:
                      unified: 66%
                  l3Allocation:
                    all:
                      unified: 66%
                Guaranteed:
                  l2Allocation:
                    all:
                      unified: 100%
                  l3Allocation:
                    all:
                      unified: 100%

    Cache Allocation Prerequisites

    Note that for cache allocation control to work, you must have

    • a hardware platform which supports cache allocation
    • resctrlfs pseudofilesystem enabled in your kernel, and loaded if it is a module
    • the resctrlfs filesystem mounted (possibly with extra options for your platform)

New plugin: nri-memory-policy

  • The NRI memory policy plugin sets Linux memory policy for new containers.
  • The memory policy plugin, for instance, advises kernel to interleave memory pages of a container on all NUMA nodes in the system, or on all NUMA nodes near the same socket where container's allowed CPUs are located.
  • The plugin works as a stand-alone plugin, and it works together with NRI resource policy plugins and Kubernetes resource managers. It recognizes CPU and memory pinning set by resource management components. The memory policy plugin should be after the resource policy plugins in the NRI plugins chain.
  • Memory policy for a container is defined in pod annotations.
  • At the time of NRI plugins release, latest released containerd or CRI-O do not support NRI Linux memory policy adjustments, or NRI container command line adjustments for a workaround. Using this plugin requires a container runtime that is built with NRI version including command line adjustments. (NRI version > 0.9.0)

What's Changed

  • resmgr,config: allow configuring cache allocation via goresctrl. by @klihub in #541
  • resmgr: expose RDT metrics. by @klihub in #543
  • Balloons with components by @askervin in #526
  • topology-aware: try picking resources by hints first by @klihub in #545
  • memory-policy: NRI plugin for setting memory policy by @askervin in #517
  • mempolicy: go interface for set_mempolicy and get_mempolicy syscalls by @askervin in #514
  • mpolset: get/set memory policy and exec a command by @askervin in #515
  • topology-aware: fix format of container-exported memsets. by @klihub in #532
  • resmgr: update container-exported resource data. by @klihub in #537
  • sysfs: add a helper for gathering whatever IDs related to CPUs by @askervin in #513
  • sysfs: fix CPU.GetCaches() to not return empty slice. by @klihub in #533
  • sysfs: export CPUFreq.{Min,Max}. by @klihub in #534
  • helm: add Chart for memory-policy deployment by @askervin in #519
  • go.{mod,sum}: use new goresctrl tag v0.9.0. by @klihub in #544
  • Drop tools.go in favor of native tool directive support in go 1.24 by @fmuyassarov in #535
  • golang: bump go version to 1.24[.3]. by @klihub in #528

Full Changelog: v0.9.4...v0.10.0