This new release of NRI Reference Plugins brings a new NRI plugin, new features in resource policy plugins, a number of bug fixes, end-to-end tests and few use cases in documentation.
What's New
Balloons Policy
-
Composite balloons enables allocating a diverse set of CPUs for containers with complex CPU requirements. For example, "allocate an equal number of CPUs from both NUMA nodes on CPU socket 0". This allocation enables efficient parallelism inside an AI inference engine container that runs inference on CPU, and still isolate inference engines from each other.
balloonTypes: - name: balance-pkg0-nodes components: - balloonType: node0 - balloonType: node1 - name: node0 preferCloseToDevices: - /sys/devices/system/node/node0 - name: node1 preferCloseToDevices: - /sys/devices/system/node/node1
-
Documentation includes recipes for preventing creation of certain containers on a worker node, and resetting CPU and memory pinning of all containers in a cluster.
Topology Aware Policy
-
Pick CPU and Memory by Topology Hints Normally topology hints are only used to pick the assigned pool for a workload. Once a pool is selected the available resources within the pool are considered equally good for satisfying the topology hints. When the policy is allocating exclusive CPUs and picking pinned memory for the workload, only other potential criteria and attributes are considered for picking the individual resources.
When multiple devices are allocated to a single container, it is possible that this default assumption of all resources within the pool being topologically equal is not true. If a container is allocated misaligned devices, IOW devices with different memory or CPU locality. To overcome this, containers can now be annotated to prefer hint based selection and pinning of CPU and memory resources using the
pick-resources-by-hints.resource-policy.nri.io
annotation. For example,apiVersion: v1 kind: Pod metadata: name: data-pump annotations: k8s.v1.cni.cncf.io/networks: sriov-net1 prefer-isolated-cpus.resource-policy.nri.io/container.ctr0: "true" pick-resources-by-hints.resource-policy.nri.io/container.ctr0: "true" spec: containers: - name: ctr0 image: dpdk-pump imagePullPolicy: Always resources: requests: cpu: 2 memory: 100M vendor.com/sriov_netdevice_A: '1' vendor.com/sriov_netdevice_B: '1' limits: vendor.com/sriov_netdevice_A: '1' vendor.com/sriov_netdevice_B: '1' cpu: 2 memory: 100M
When annotated like that, the policy will try to pick one exclusive isolated CPU with locality to one device and another with locality to the other. It will also try to pick and pin to memory aligned with these devices.
Common Policy Improvements
These are improvements to common infrastructure and as such are available for the balloons
and topology-aware
policy plugins, as well as for the wireframe template
policy plugin.
-
Cache Allocation
Plugins can be configured to exercise class-based control over the L2 and L3 cache allocated to containers' processes. In practice, containers are assigned to classes. Classes have a corresponding cache allocation configuration. This configuration is applied to all containers and subsequently to all processes started in a container. To enable cache control use the
control.rdt.enable
option which defaults tofalse
.Plugins can be configured to assign containers by default to a cache class named after the Pod QoS class of the container: one of
BestEffort
,Burstable
, andGuaranteed
. The configuration setting controlling this behavior iscontrol.rdt.usagePodQoSAsDefaultClass
and it defaults tofalse
.Additionally, containers can be explicitly annotated to be assigned to a class. Use the
rdtclass.resource-policy.nri.io
annotation key for this. For instanceapiVersion: v1 kind: Pod metadata: name: test-pod annotations: rdtclass.resource-policy.nri.io/pod: poddefaultclass rdtclass.resource-policy.nri.io/container.special-container: specialclass ...
This will assign the container named
special-container
within the pod to thespecialclass
RDT class and any other container within the pod to thepoddefaultclass
RDT class. Effectively these containers' processes will be assigned to the RDT CLOSes corresponding to those classes.Cache Class/Partitioning Configuration
RDT configuration is supplied as part of the
control.rdt
configuration block. Here is a sample snippet as a Helm chart value which assigns 33%, 66% and 100% of cache lines toBestEffort
,Burstable
andGuaranteed
Pod QoS class containers correspondingly:config: control: rdt: enable: true usePodQoSAsDefaultClass: true options: l2: optional: true l3: optional: true mb: optional: true partitions: fullCache: l2Allocation: all: unified: 100% l3Allocation: all: unified: 100% classes: BestEffort: l2Allocation: all: unified: 33% l3Allocation: all: unified: 33% Burstable: l2Allocation: all: unified: 66% l3Allocation: all: unified: 66% Guaranteed: l2Allocation: all: unified: 100% l3Allocation: all: unified: 100%
Cache Allocation Prerequisites
Note that for cache allocation control to work, you must have
- a hardware platform which supports cache allocation
- resctrlfs pseudofilesystem enabled in your kernel, and loaded if it is a module
- the resctrlfs filesystem mounted (possibly with extra options for your platform)
New plugin: nri-memory-policy
- The NRI memory policy plugin sets Linux memory policy for new containers.
- The memory policy plugin, for instance, advises kernel to interleave memory pages of a container on all NUMA nodes in the system, or on all NUMA nodes near the same socket where container's allowed CPUs are located.
- The plugin works as a stand-alone plugin, and it works together with NRI resource policy plugins and Kubernetes resource managers. It recognizes CPU and memory pinning set by resource management components. The memory policy plugin should be after the resource policy plugins in the NRI plugins chain.
- Memory policy for a container is defined in pod annotations.
- At the time of NRI plugins release, latest released containerd or CRI-O do not support NRI Linux memory policy adjustments, or NRI container command line adjustments for a workaround. Using this plugin requires a container runtime that is built with NRI version including command line adjustments. (NRI version > 0.9.0)
What's Changed
- resmgr,config: allow configuring cache allocation via goresctrl. by @klihub in #541
- resmgr: expose RDT metrics. by @klihub in #543
- Balloons with components by @askervin in #526
- topology-aware: try picking resources by hints first by @klihub in #545
- memory-policy: NRI plugin for setting memory policy by @askervin in #517
- mempolicy: go interface for set_mempolicy and get_mempolicy syscalls by @askervin in #514
- mpolset: get/set memory policy and exec a command by @askervin in #515
- topology-aware: fix format of container-exported memsets. by @klihub in #532
- resmgr: update container-exported resource data. by @klihub in #537
- sysfs: add a helper for gathering whatever IDs related to CPUs by @askervin in #513
- sysfs: fix CPU.GetCaches() to not return empty slice. by @klihub in #533
- sysfs: export CPUFreq.{Min,Max}. by @klihub in #534
- helm: add Chart for memory-policy deployment by @askervin in #519
- go.{mod,sum}: use new goresctrl tag v0.9.0. by @klihub in #544
- Drop tools.go in favor of native tool directive support in go 1.24 by @fmuyassarov in #535
- golang: bump go version to 1.24[.3]. by @klihub in #528
Full Changelog: v0.9.4...v0.10.0