Project

General

Profile

Actions

Bug #22977

closed

High CPU load caused by operations on onode_map

Added by Paul Emmerich about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm investigating performance on a cluster that shows an unusually high CPU load.
Setup are Bluestore OSDs running mostly a 5/3 erasure coding pool.

ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), centos 7, kernel 4.13.7

Each OSD is handling around 60 write ops and 40 read ops per second, all small operations from kernel RBD clients.
All OSDs are running on SSDs that look fine (low queue sizes/wait times, ~20% io load).
CPU load per OSD process was around 90-130% with 2.2 GHz CPU cores.
No background operations like scrubbing or recovery were ongoing.

I've run perf to see what the OSD is doing. I've initially expected that the CPU load was mainly due to the EC setup but no: it was caused by random operations on the onode_map in Bluestore.

More than 50% of the CPU time is spent inside std::_Hashtable chasing pointers in _M_find_node. Looks like a textbook hash collision problem in that unordered_map to me if I interpreted that correctly.
Restarting the OSD reduces the CPU usage by 50%, but it's slowly increasing again

I've uploaded the perf traces before and after restarting the OSD here:
https://static.croit.io/share/bluestore-onode-cpu/perf-before-restart.data
https://static.croit.io/share/bluestore-onode-cpu/perf-after-restart.data
(~4MB each, external links because I can only upload 1MB here.)

I waited for 5 minutes after the recovery completed after the restart before taking the second trace.

I unfortunately wasn't able to reproduce this on my test cluster; I probably had the wrong workload here to test it. But I've seen it on two production clusters.


Files

perf-dump.json (28 KB) perf-dump.json Paul Emmerich, 02/10/2018 10:02 PM
osd-4-mempool.json (1.6 KB) osd-4-mempool.json Paul Emmerich, 02/12/2018 11:18 AM
perf-before-restart.txt (579 KB) perf-before-restart.txt Paul Emmerich, 02/27/2018 05:08 PM
perf-after-restart.txt (769 KB) perf-after-restart.txt Paul Emmerich, 02/27/2018 05:08 PM

Related issues 2 (0 open2 closed)

Related to bluestore - Bug #21259: bluestore: segv in BlueStore::TwoQCache::_trimResolved09/06/2017

Actions
Copied to bluestore - Backport #24720: mimic: High CPU load caused by operations on onode_mapResolvedActions
Actions

Also available in: Atom PDF