Project

General

Profile

Actions

Bug #49967

closed

OSD: dequeue_op hight latency

Added by yite gu about 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

my ceph cluster have slow request, but ceph have no too many client request.

  cluster:
    id:     d0d92a1f-fc22-46a3-9328-b27d6fa0c8b8
    health: HEALTH_WARN
            45 slow requests are blocked > 32 sec. Implicated osds 4

  services:
    mon:        3 daemons, quorum node-1,node-2,node-3
    mgr:        node-3(active), standbys: node-2, node-1
    osd:        12 osds: 12 up, 12 in
                flags nodeep-scrub
    rbd-mirror: 1 daemon active
    rgw:        3 daemons active

  data:
    pools:   13 pools, 832 pgs
    objects: 302.86k objects, 1.11TiB
    usage:   3.25TiB used, 14.1TiB / 17.3TiB avail
    pgs:     832 active+clean

  io:
    client:   1.54MiB/s rd, 20.7MiB/s wr, 472op/s rd, 110op/s wr

I updata debug_osd to 10, I see many op very slow when it dequeue_op:
2021-03-24 15:30:14.806482 7f2627634700 10 osd.4 2154 dequeue_op 0x55b6b38c6a00 prio 63 cost 458752 latency 29.947369 osd_op(client.531416.0:23002872 10.3c 10.e515bc3c (undecoded) ondisk+write+known_if_redirected e2154) v8 pg pg[10.3c( v 2154'145208 (2154'145195,2154'145208] local-lis/les=2151/2152 n=1129 ec=105/46 lis/c 2151/2151 les/c/f 2152/2154/0 2151/2151/2151) [4,10,2] r=0 lpr=2151 luod=2154'145196 lua=2154'145196 crt=2154'145208 lcod 2154'145195 mlcod 2154'145195 active+clean]

pg 10.3c is unnormal, I got all log about pg 10.3c, as follows:


Files

pg_10_3c.txt (315 KB) pg_10_3c.txt yite gu, 03/25/2021 03:25 AM
Actions

Also available in: Atom PDF