Project

General

Profile

Actions

Bug #62512

open

osd msgr-worker high cpu 300% due to throttle-osd_client_messages get_or_fail_fail (osd_client_message_cap=256)

Added by jianwei zhang 9 months ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
v18.1.0
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
08/22/2023
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

problem:

osd high cpu

# ceph daemon osd.0 perf dump throttle-osd_client_messages
{
    "throttle-osd_client_messages": {
        "val": 256,
        "max": 256,
        "get_started": 0,
        "get": 147211691,
        "get_sum": 147211691,
        "get_or_fail_fail": 539294322,
        "get_or_fail_success": 147211691,
        "take": 0,
        "take_sum": 0,
        "put": 147211435,
        "put_sum": 147211435,
        "wait": {
            "avgcount": 0,
            "sum": 0.000000000,
            "avgtime": 0.000000000
        }
    }
}

osd_client_message_cap change history :

1. https://github.com/ceph/ceph/commit/9087e3b751a78211011b39377394ceb297078f76

 Revert "ceph_osd: remove client message cap limit" 

This reverts commit 45d5ac3.

Without a msg throttler, we can't change osd_client_message_cap cap.
The throttler is designed to work with 0 as a max, so change the
default to 0 to disable it by default instead.

This doesn't affect the default behavior, it only lets us use this
option again.

Fixes: https://tracker.ceph.com/issues/46143

Conflicts:
    src/ceph_osd.cc - new style of gconf() access

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Neha Ojha <nojha@redhat.com>

2. https://github.com/ceph/ceph/commit/ac8cf275a6d191d71c104f6822b62ba67a0a4fcd

 common/options: Set osd_client_message_cap to 256.

This seems like a reasonable default value based on testing results here:
https://docs.google.com/spreadsheets/d/1dwKcxFKpAOWzDPekgojrJhfiCtPgiIf8CGGMG1rboRU/edit?usp=sharing

Eventually we may want to rethink how the throttles and even how flow control
works, but this at least gives us some basic limits now ( a little higher than
the old value of 100 that we used for many years).

Signed-off-by: Mark Nelson <mnelson@redhat.com>


Files

top-osd-cpu.jpg (163 KB) top-osd-cpu.jpg top-osd-cpu jianwei zhang, 08/22/2023 07:27 AM
top-H-osd-cpu.jpg (187 KB) top-H-osd-cpu.jpg top-H-msgr-worker-cpu jianwei zhang, 08/22/2023 07:27 AM
osd-cpu变化曲线.jpg (118 KB) osd-cpu变化曲线.jpg osd-cpu-history-change jianwei zhang, 08/22/2023 07:44 AM
osd-cpu-history-change.jpg (118 KB) osd-cpu-history-change.jpg osd-cpu-history-change jianwei zhang, 08/22/2023 07:47 AM
Actions

Also available in: Atom PDF