Feature #63550: NUMA enhacements - RADOS - Ceph

Actions

Copy link

Feature #63550

open

NUMA enhacements

Added by Kyle Bader 6 months ago. Updated 6 months ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Performance/Resource Usage

Target version:

% Done:

Source:

Tags:

Backport:

Reviewed:

Affected Versions:

Component(RADOS):

Pull request ID:

Description

Right now we support pinning an OSD process to the CPUs that corresponds with a particular NUMA node according to which NUMA node that is responsible for the network adapter the OSD is listening on (osd_numa_prefer_iface=true), or the NUMA node that is responsible for the storage device and the network adapter the OSD is listening on (osd_numa_auto_affinity=true).

It is not unusual for systems to have asymmetries. CPUs themselves are employing chiplet designs that, depending on BIOS configuration, can result in 4 NUMA nodes per socket. Even in conventional dual socket systems that present 2 NUMA nodes, there might be an uneven distribution of NVMe devices across NUMA nodes, or all the network adapters be on a single NUMA node.

In these situations, we might want to employ a third NUMA strategy (osd_numa_distribute=true) - divide the OSDs by the number of NUMA nodes. This would mean that OSD tasks can be scheduled to any CPU on a NUMA node, and that the memory allocation would prefer affinity with that NUMA node. This should help with CPU cache lines, and reduce memory access latency.

A cgroup would be created for each OSD, and we would set:

cpuset.cpus=[list of cpus in numa] # only schedule OSD tasks to CPUs on a single NUMA
cpuset.mems=1                      # prefer to allocate memory with NUMA affinity (not hard)
cpuset.memory_migrate=1            # move memory pages to NUMA

Actions

Copy link

Updated by Kyle Bader 6 months ago

Tracker changed from Bug to Feature

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Feature #63550

NUMA enhacements

Updated by Kyle Bader 6 months ago