Actions
Bug #49967
closedOSD: dequeue_op hight latency
Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
my ceph cluster have slow request, but ceph have no too many client request.
cluster: id: d0d92a1f-fc22-46a3-9328-b27d6fa0c8b8 health: HEALTH_WARN 45 slow requests are blocked > 32 sec. Implicated osds 4 services: mon: 3 daemons, quorum node-1,node-2,node-3 mgr: node-3(active), standbys: node-2, node-1 osd: 12 osds: 12 up, 12 in flags nodeep-scrub rbd-mirror: 1 daemon active rgw: 3 daemons active data: pools: 13 pools, 832 pgs objects: 302.86k objects, 1.11TiB usage: 3.25TiB used, 14.1TiB / 17.3TiB avail pgs: 832 active+clean io: client: 1.54MiB/s rd, 20.7MiB/s wr, 472op/s rd, 110op/s wr
I updata debug_osd to 10, I see many op very slow when it dequeue_op:
2021-03-24 15:30:14.806482 7f2627634700 10 osd.4 2154 dequeue_op 0x55b6b38c6a00 prio 63 cost 458752 latency 29.947369 osd_op(client.531416.0:23002872 10.3c 10.e515bc3c (undecoded) ondisk+write+known_if_redirected e2154) v8 pg pg[10.3c( v 2154'145208 (2154'145195,2154'145208] local-lis/les=2151/2152 n=1129 ec=105/46 lis/c 2151/2151 les/c/f 2152/2154/0 2151/2151/2151) [4,10,2] r=0 lpr=2151 luod=2154'145196 lua=2154'145196 crt=2154'145208 lcod 2154'145195 mlcod 2154'145195 active+clean]
pg 10.3c is unnormal, I got all log about pg 10.3c, as follows:
Files
Updated by yite gu about 3 years ago
Why is the latency so high?
I'm not sure what might be the cause of this problem. I'm grateful for any help.
Updated by yite gu about 3 years ago
about this problem, further investigation:
2021-03-25 20:49:59.086661 7fb0d1eae700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 0.260263 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160270 (3223'158690,3231'160270] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160260 lua=3231'160269 crt=3231'160270 lcod 3231'160259 mlcod 3231'160259 active+clean] 2021-03-25 20:50:04.620595 7fb0cdea6700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 5.794196 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160271 (3223'158690,3231'160271] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160270 lua=3231'160270 crt=3231'160271 lcod 3231'160269 mlcod 3231'160269 active+clean] 2021-03-25 20:50:09.972641 7fb0cdea6700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 11.146243 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160272 (3223'158690,3231'160272] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160271 lua=3231'160271 crt=3231'160272 lcod 3231'160270 mlcod 3231'160270 active+clean] 2021-03-25 20:50:13.642743 7fb0d1eae700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 14.816344 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160279 (3223'158690,3231'160279] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160272 lua=3231'160272 crt=3231'160279 lcod 3231'160271 mlcod 3231'160271 active+clean] 2021-03-25 20:50:19.498725 7fb0d1eae700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 20.672327 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160289 (3223'158690,3231'160289] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160279 lua=3231'160288 crt=3231'160289 lcod 3231'160278 mlcod 3231'160278 active+clean] 2021-03-25 20:50:23.099160 7fb0cdea6700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 24.272762 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160290 (3223'158690,3231'160290] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160289 lua=3231'160289 crt=3231'160290 lcod 3231'160288 mlcod 3231'160288 active+clean] 2021-03-25 20:50:28.234479 7fb0d1eae700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 29.408081 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160291 (3223'158790,3231'160291] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160290 lua=3231'160290 crt=3231'160291 lcod 3231'160289 mlcod 3231'160289 active+clean] 2021-03-25 20:50:31.540501 7fb0cdea6700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 32.714102 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160292 (3223'158790,3231'160292] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160291 lua=3231'160291 crt=3231'160292 lcod 3231'160290 mlcod 3231'160290 active+clean] 2021-03-25 20:50:37.539869 7fb0d1eae700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 38.713471 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160302 (3223'158790,3231'160302] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160292 lua=3231'160301 crt=3231'160302 lcod 3231'160291 mlcod 3231'160291 active+clean] 2021-03-25 20:50:42.315758 7fb0cdea6700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 43.489360 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160303 (3223'158790,3231'160303] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160302 lua=3231'160302 crt=3231'160303 lcod 3231'160301 mlcod 3231'160301 active+clean] 2021-03-25 20:50:46.094504 7fb0cdea6700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 47.268106 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160304 (3223'158790,3231'160304] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160303 lua=3231'160303 crt=3231'160304 lcod 3231'160302 mlcod 3231'160302 active+clean] 2021-03-25 20:50:49.728532 7fb0cdea6700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 50.902133 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160304 (3223'158790,3231'160304] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 crt=3231'160304 lcod 3231'160303 mlcod 3231'160303 active+clean]
Updated by yite gu about 3 years ago
This op requeue many times,root cause is bluestore write disk slow for same a object.
2021-04-02 17:38:56.819673 7f05f6ac9700 5 bdev(0x561d5b125200 /var/lib/ceph/osd/ceph-4/block) flush in 2.992328 2021-04-02 17:40:14.052609 7f05f6ac9700 5 bdev(0x561d5b125200 /var/lib/ceph/osd/ceph-4/block) flush in 3.134817 2021-04-02 17:38:07.722972 7f05f6ac9700 5 bdev(0x561d5b125200 /var/lib/ceph/osd/ceph-4/block) flush in 3.300163
Updated by yite gu about 3 years ago
This is my hard disk fault, please close bug.
Actions