Project

General

Profile

Actions

Bug #21092

open

OSD sporadically starts reading at 100% of ssd bandwidth

Added by Aleksei Gutikov over 6 years ago. Updated over 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

luminous v12.1.4
bluestore

Periodically (10 mins) some osd starts reading ssd disk at maximum available speed (450-480Mb/sec).
This continuous for 1-3 minutes.
Then after some delay other osd starts same.
Obviously that leads to stuck of pgs on this osd.
The load of other osds not changed during this glitch.
The client io in 'ceph -s' not shows increasing of client traffic.

Stracing of osd at the time of glitch shows over 100 calls pread64 per second with same size and same offset.

3393366 09:15:10.679105 pread64(24, "\267\5\0\0\4\0\0\0\0\361\36\221\203\2662/f\211\305\344\30\253\324\17\251\310\0328\206\316\t\243"..., 2940928, 96121688064) = 2940928

pread64 called with next sizes:
- 4096
- 8192
- 385024
- 1048576
- 2940928
This size and offset (2940928, 96121688064) observed only in combination at same time.
Total 339 calls per second, total 363651072 bytes/sec.
reads of 96121688064 offset 335265792 bytes/sec.

All threads performing those pread64 calls have names "tp_osd_tp".


Files

osd-36.starce.1.txt.gz (616 KB) osd-36.starce.1.txt.gz 4 seconds of osd strace during 100% ssd load Aleksei Gutikov, 08/24/2017 10:07 AM
osd-27.debug_bluefs-20.log.gz (153 KB) osd-27.debug_bluefs-20.log.gz osd log with debug_bluefs=20 Aleksei Gutikov, 08/24/2017 04:13 PM
59.log.gz (161 KB) 59.log.gz bluefs log with more repeated random reads Aleksei Gutikov, 08/24/2017 05:22 PM
Actions

Also available in: Atom PDF