Project

General

Profile

Actions

Bug #23811

open

RADOS stat slow for some objects on same OSD

Added by Vaibhav Bhembre about 6 years ago. Updated almost 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have observed that queries have been slow for some RADOS objects while others on the same OSD respond much quickly. We found this issue when trying to perform radosgw-admin bucket stats and saw the response was always slow for a few buckets. Diving in we first felt that the OMAP extraction was the bottleneck but then saw even rados stat commands observe a similar hit. We did a comparison against 2 objects (which are bucket index shards for the same bucket) that were placed on same OSD and saw operations for one object being consistently slower than the other.

We can see the objects map to same primary OSD that will be used for rados stat.

# ceph osd map rgw.buckets.index .dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.10
osdmap e48499 pool 'rgw.buckets.index' (79) object '.dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.10' -> pg 79.29d01df9 (79.1f9) -> up ([15,3,7], p15) acting ([15,3,7], p15)

# ceph osd map rgw.buckets.index .dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.26
osdmap e48499 pool 'rgw.buckets.index' (79) object '.dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.26' -> pg 79.f7b2d653 (79.253) -> up ([15,4,23], p15) acting ([15,4,23], p15)

The time taken for first object when doing op like rados stat is signficantly higher.

# time rados -p rgw.buckets.index stat .dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.10
rgw.buckets.index/.dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.10 mtime 2018-01-09 10:26:16.000000, size 0

real    0m1.274s
user    0m0.029s
sys    0m0.000s

# time rados -p rgw.buckets.index stat .dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.26
rgw.buckets.index/.dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.26 mtime 2017-12-12 11:16:39.000000, size 0

real    0m0.029s
user    0m0.022s
sys    0m0.006s

We tried increasing debug_osd and debug_filestore to see if anything stood out, but could not identify the cause as such. The logs from one of such runs is attached.

Let me know if there's any other information I can provide to help.


Files

slow_rados_object_osd_log.txt (8.57 KB) slow_rados_object_osd_log.txt Vaibhav Bhembre, 04/20/2018 05:28 PM
Actions

Also available in: Atom PDF