RADOS stat slow for some objects on same OSD
We have observed that queries have been slow for some RADOS objects while others on the same OSD respond much quickly. We found this issue when trying to perform radosgw-admin bucket stats and saw the response was always slow for a few buckets. Diving in we first felt that the OMAP extraction was the bottleneck but then saw even rados stat commands observe a similar hit. We did a comparison against 2 objects (which are bucket index shards for the same bucket) that were placed on same OSD and saw operations for one object being consistently slower than the other.
We can see the objects map to same primary OSD that will be used for rados stat.
# ceph osd map rgw.buckets.index .dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.10 osdmap e48499 pool 'rgw.buckets.index' (79) object '.dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.10' -> pg 79.29d01df9 (79.1f9) -> up ([15,3,7], p15) acting ([15,3,7], p15) # ceph osd map rgw.buckets.index .dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.26 osdmap e48499 pool 'rgw.buckets.index' (79) object '.dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.26' -> pg 79.f7b2d653 (79.253) -> up ([15,4,23], p15) acting ([15,4,23], p15)
The time taken for first object when doing op like rados stat is signficantly higher.
# time rados -p rgw.buckets.index stat .dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.10 rgw.buckets.index/.dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.10 mtime 2018-01-09 10:26:16.000000, size 0 real 0m1.274s user 0m0.029s sys 0m0.000s # time rados -p rgw.buckets.index stat .dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.26 rgw.buckets.index/.dir.76bcb3c5-9015-4817-8e0b-9ec86fa9a561.963088.1.26 mtime 2017-12-12 11:16:39.000000, size 0 real 0m0.029s user 0m0.022s sys 0m0.006s
We tried increasing debug_osd and debug_filestore to see if anything stood out, but could not identify the cause as such. The logs from one of such runs is attached.
Let me know if there's any other information I can provide to help.
#1 Updated by Vaibhav Bhembre 3 months ago
We are still debugging this. On a further look, it looks like all objects on that PG (aka 79.1f9) show similar slowness. The PG is not being scrubbed and no-deepscrub has been applied to the cluster since quite some time. Is there a reason why a single PG would undergo slowness on same OSD while others won't?
#3 Updated by Chang Liu about 2 months ago
# cat ceph-osd.15.log | grep '0x7f28a6c84400' 2018-04-20 16:35:50.262536 7f28828b9700 15 osd.15 48489 enqueue_op 0x7f28a6c84400 prio 63 cost 0 latency 0.000005 epoch 48489 osd_op(client.3997533.0:1 79.1f9 79.29d01df9 (undecoded) ondisk+read+known_if_redirected e48489) v8 2018-04-20 16:35:50.262542 7f28828b9700 20 osd.15 op_wq(1) _enqueue 79.1f9 PGQueueable(0x7f28a6c84400 prio 63 cost 0 e48489) 2018-04-20 16:35:53.775399 7f2808e09700 20 osd.15 op_wq(1) _process 79.1f9 item PGQueueable(0x7f28a6c84400 prio 63 cost 0 e48489) queued 2018-04-20 16:35:53.775409 7f2808e09700 30 osd.15 op_wq(1) _process 79.1f9 to_process PGQueueable(0x7f28a6c84400 prio 63 cost 0 e48489) waiting_for_pg=0
it seems that there is a long latency(3.5s) between OSD::enqueue_op and OSD::dequeue_op. is the shard of op_wq which contains PG 79.1f9 particularly busy?