Bug #24531: Mimic MONs have slow/long running ops - RADOS - Ceph

Actions

Copy link

Bug #24531

closed

Mimic MONs have slow/long running ops

Added by Wido den Hollander almost 6 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

High

Assignee:

Joao Eduardo Luis

Category:

Target version:

% Done:

Source:

Tags:

Backport:

nautilus, mimic, luminous

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v13.2.0

ceph-qa-suite:

Component(RADOS):

Monitor

Pull request ID:

28177

Crash signature (v1):

Crash signature (v2):

Description

When setting up a Mimic 13.2.0 cluster I saw a message like this:

2018-06-15 00:23:55.639369 mon.dc3-f3-cl1-mon1 mon.0 10.16.5.141:6789/0 27880 : cluster [WRN] Health check update: 438 slow ops, oldest one blocked for 32476 sec, daemons [mon.dc3-f3-cl1-mon1,mon.dc3-f4-cl1-mon1,mon.dc3-f5-cl1-mon1] have slow ops. (SLOW_OPS)

Asking this mon for his Ops I see ops which have been running for a very long time:

ceph daemon mon.dc3-f3-cl1-mon1 ops

        {
            "description": "osd_failure(failed timeout osd.266 10.16.5.171:6888/30602 for 1384sec e1380 v1380)",
            "initiated_at": "2018-06-15 13:36:46.593698",
            "age": 9702.180330,
            "duration": 9702.180347,
            "type_data": {
                "events": [
                    {
                        "time": "2018-06-15 13:36:46.593698",
                        "event": "initiated" 
                    },
                    {
                        "time": "2018-06-15 13:36:46.593698",
                        "event": "header_read" 
                    },
                    {
                        "time": "2018-06-15 13:36:46.593699",
                        "event": "throttled" 
                    },
                    {
                        "time": "2018-06-15 13:36:46.593703",
                        "event": "all_read" 
                    },
                    {
                        "time": "2018-06-15 13:36:46.599249",
                        "event": "dispatched" 
                    },
                    {
                        "time": "2018-06-15 13:36:46.599250",
                        "event": "mon:_ms_dispatch" 
                    },
                    {
                        "time": "2018-06-15 13:36:46.599250",
                        "event": "mon:dispatch_op" 
                    },
                    {
                        "time": "2018-06-15 13:36:46.599250",
                        "event": "psvc:dispatch" 
                    },
                    {
                        "time": "2018-06-15 13:36:46.599255",
                        "event": "osdmap:preprocess_query" 
                    },
                    {
                        "time": "2018-06-15 13:36:46.599255",
                        "event": "osdmap:preprocess_failure" 
                    },
                    {
                        "time": "2018-06-15 13:36:46.599257",
                        "event": "osdmap:prepare_update" 
                    },
                    {
                        "time": "2018-06-15 13:36:46.599257",
                        "event": "osdmap:prepare_failure" 
                    }
                ],
                "info": {
                    "seq": 972,
                    "src_is_mon": false,
                    "source": "osd.247 10.16.5.172:6868/29968",
                    "forwarded_to_leader": false
                }
            }
        },

The current status of this cluster:

  cluster:
    id:     f8348af9-0d79-4ffd-a2ef-07ca83659935
    health: HEALTH_WARN
            No Zabbix server configured

  services:
    mon: 3 daemons, quorum dc3-f3-cl1-mon1,dc3-f4-cl1-mon1,dc3-f5-cl1-mon1
    mgr: dc3-f3-cl1-mon1(active), standbys: dc3-f5-cl1-mon1, dc3-f4-cl1-mon1
    mds: cephfs-2/2/2 up  {0=dc3-f3-cl1-mds1=up:active,1=dc3-f4-cl1-mds1=up:active}, 1 up:standby
    osd: 288 osds: 288 up, 288 in

  data:
    pools:   3 pools, 8256 pgs
    objects: 499.2 k objects, 1.9 TiB
    usage:   14 TiB used, 1.0 PiB / 1.0 PiB avail
    pgs:     8256 active+clean

  io:
    client:   1.1 GiB/s wr, 0 op/s rd, 547 op/s wr

[root@dc3-f3-cl1-mon1 ceph]# ceph osd dump|head -n 10
epoch 1463
fsid f8348af9-0d79-4ffd-a2ef-07ca83659935
created 2018-06-15 12:23:37.382975
modified 2018-06-15 15:50:11.132981
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 510
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
[root@dc3-f3-cl1-mon1 ceph]#

Files

Download all files

ops.json (15.3 KB) ops.json		Wido den Hollander, 06/15/2018 02:18 PM
mon.b.slowops (20.9 KB) mon.b.slowops		jun gong, 05/22/2019 06:26 AM
mon.c.slowops (13.1 KB) mon.c.slowops		jun gong, 05/22/2019 06:26 AM
mon.a.slowops (6.53 KB) mon.a.slowops		jun gong, 05/22/2019 06:26 AM

Related issues 5 (0 open — 5 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #24531

Mimic MONs have slow/long running ops

Updated by Greg Farnum almost 6 years ago

Updated by Wido den Hollander almost 6 years ago

Updated by Josh Durgin almost 6 years ago

Updated by Hector Martin over 5 years ago

Updated by Josh Durgin over 5 years ago

Updated by Josh Durgin over 5 years ago

Updated by Paul Emmerich over 5 years ago

Updated by Tobias Rehn over 5 years ago

Updated by Stig Telfer over 5 years ago

Updated by Paul Emmerich over 5 years ago

Updated by Марк Коренберг over 5 years ago

Updated by Марк Коренберг over 5 years ago

Updated by Dan van der Ster about 5 years ago

Updated by jun gong almost 5 years ago

Updated by jun gong almost 5 years ago

Updated by Dan van der Ster almost 5 years ago

Updated by Theo O over 4 years ago

Updated by Josh Durgin over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by Neha Ojha over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Марк Коренберг over 4 years ago

Updated by Aaron Bassett over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Neha Ojha about 4 years ago

Updated by Aleksei Gutikov almost 4 years ago

Updated by Daniel Poelzleithner almost 4 years ago

Updated by Kefu Chai over 3 years ago