Project

General

Profile

Actions

Bug #49967

closed

OSD: dequeue_op hight latency

Added by yite gu about 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

my ceph cluster have slow request, but ceph have no too many client request.

  cluster:
    id:     d0d92a1f-fc22-46a3-9328-b27d6fa0c8b8
    health: HEALTH_WARN
            45 slow requests are blocked > 32 sec. Implicated osds 4

  services:
    mon:        3 daemons, quorum node-1,node-2,node-3
    mgr:        node-3(active), standbys: node-2, node-1
    osd:        12 osds: 12 up, 12 in
                flags nodeep-scrub
    rbd-mirror: 1 daemon active
    rgw:        3 daemons active

  data:
    pools:   13 pools, 832 pgs
    objects: 302.86k objects, 1.11TiB
    usage:   3.25TiB used, 14.1TiB / 17.3TiB avail
    pgs:     832 active+clean

  io:
    client:   1.54MiB/s rd, 20.7MiB/s wr, 472op/s rd, 110op/s wr

I updata debug_osd to 10, I see many op very slow when it dequeue_op:
2021-03-24 15:30:14.806482 7f2627634700 10 osd.4 2154 dequeue_op 0x55b6b38c6a00 prio 63 cost 458752 latency 29.947369 osd_op(client.531416.0:23002872 10.3c 10.e515bc3c (undecoded) ondisk+write+known_if_redirected e2154) v8 pg pg[10.3c( v 2154'145208 (2154'145195,2154'145208] local-lis/les=2151/2152 n=1129 ec=105/46 lis/c 2151/2151 les/c/f 2152/2154/0 2151/2151/2151) [4,10,2] r=0 lpr=2151 luod=2154'145196 lua=2154'145196 crt=2154'145208 lcod 2154'145195 mlcod 2154'145195 active+clean]

pg 10.3c is unnormal, I got all log about pg 10.3c, as follows:


Files

pg_10_3c.txt (315 KB) pg_10_3c.txt yite gu, 03/25/2021 03:25 AM
Actions #1

Updated by yite gu about 3 years ago

Why is the latency so high?
I'm not sure what might be the cause of this problem. I'm grateful for any help.

Actions #2

Updated by yite gu about 3 years ago

about this problem, further investigation:

2021-03-25 20:49:59.086661 7fb0d1eae700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 0.260263 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160270 (3223'158690,3231'160270] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160260 lua=3231'160269 crt=3231'160270 lcod 3231'160259 mlcod 3231'160259 active+clean]
2021-03-25 20:50:04.620595 7fb0cdea6700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 5.794196 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160271 (3223'158690,3231'160271] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160270 lua=3231'160270 crt=3231'160271 lcod 3231'160269 mlcod 3231'160269 active+clean]
2021-03-25 20:50:09.972641 7fb0cdea6700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 11.146243 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160272 (3223'158690,3231'160272] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160271 lua=3231'160271 crt=3231'160272 lcod 3231'160270 mlcod 3231'160270 active+clean]
2021-03-25 20:50:13.642743 7fb0d1eae700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 14.816344 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160279 (3223'158690,3231'160279] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160272 lua=3231'160272 crt=3231'160279 lcod 3231'160271 mlcod 3231'160271 active+clean]
2021-03-25 20:50:19.498725 7fb0d1eae700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 20.672327 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160289 (3223'158690,3231'160289] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160279 lua=3231'160288 crt=3231'160289 lcod 3231'160278 mlcod 3231'160278 active+clean]
2021-03-25 20:50:23.099160 7fb0cdea6700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 24.272762 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160290 (3223'158690,3231'160290] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160289 lua=3231'160289 crt=3231'160290 lcod 3231'160288 mlcod 3231'160288 active+clean]
2021-03-25 20:50:28.234479 7fb0d1eae700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 29.408081 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160291 (3223'158790,3231'160291] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160290 lua=3231'160290 crt=3231'160291 lcod 3231'160289 mlcod 3231'160289 active+clean]
2021-03-25 20:50:31.540501 7fb0cdea6700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 32.714102 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160292 (3223'158790,3231'160292] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160291 lua=3231'160291 crt=3231'160292 lcod 3231'160290 mlcod 3231'160290 active+clean]
2021-03-25 20:50:37.539869 7fb0d1eae700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 38.713471 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160302 (3223'158790,3231'160302] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160292 lua=3231'160301 crt=3231'160302 lcod 3231'160291 mlcod 3231'160291 active+clean]
2021-03-25 20:50:42.315758 7fb0cdea6700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 43.489360 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160303 (3223'158790,3231'160303] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160302 lua=3231'160302 crt=3231'160303 lcod 3231'160301 mlcod 3231'160301 active+clean]
2021-03-25 20:50:46.094504 7fb0cdea6700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 47.268106 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160304 (3223'158790,3231'160304] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 luod=3231'160303 lua=3231'160303 crt=3231'160304 lcod 3231'160302 mlcod 3231'160302 active+clean]
2021-03-25 20:50:49.728532 7fb0cdea6700 10 osd.4 3231 dequeue_op 0x55f9daf4bf80 prio 63 cost 393216 latency 50.902133 osd_op(client.11232505.0:31443 10.ff 10:ffb56c0b:::rbd_data.15d6a971c9f3e4.00000000000002a2:head [stat,write 3801088~393216] snapc 0=[] ondisk+write+known_if_redirected e3231) v8 pg pg[10.ff( v 3231'160304 (3223'158790,3231'160304] local-lis/les=3228/3229 n=1187 ec=105/46 lis/c 3228/3228 les/c/f 3229/3231/0 3228/3228/3228) [4,8,1] r=0 lpr=3228 crt=3231'160304 lcod 3231'160303 mlcod 3231'160303 active+clean]

Actions #3

Updated by yite gu about 3 years ago

This op requeue many times,root cause is bluestore write disk slow for same a object.

2021-04-02 17:38:56.819673 7f05f6ac9700  5 bdev(0x561d5b125200 /var/lib/ceph/osd/ceph-4/block) flush in 2.992328
2021-04-02 17:40:14.052609 7f05f6ac9700  5 bdev(0x561d5b125200 /var/lib/ceph/osd/ceph-4/block) flush in 3.134817
2021-04-02 17:38:07.722972 7f05f6ac9700  5 bdev(0x561d5b125200 /var/lib/ceph/osd/ceph-4/block) flush in 3.300163

Actions #4

Updated by yite gu about 3 years ago

This is my hard disk fault, please close bug.

Actions #5

Updated by Greg Farnum almost 3 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF