https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2020-10-28T15:28:23Z
Ceph
RADOS - Bug #48028: ceph-mon always suffer lots of slow ops from v14.2.9
https://tracker.ceph.com/issues/48028?journal_id=178153
2020-10-28T15:28:23Z
Yao Ning
zay11022@163.com
<ul></ul><p>Yao Ning wrote:</p>
<blockquote>
<p>root@worker-2:~# docker exec ceph-mon-worker-2 ceph -s<br />cluster:<br />id: 299a04ba-dd3e-43a7-af17-628190cf742f<br />health: HEALTH_WARN<br />41 slow ops, oldest one blocked for 38274 sec, mon.worker-2 has slow ops</p>
<p>services:<br />mon: 3 daemons, quorum worker-2,worker-12,worker-22 (age 17m)<br />mgr: worker-22(active, since 11d), standbys: worker-12, worker-2<br />mds: cephfs:1 {0=worker-22=up:active} 2 up:standby<br />osd: 300 osds: 300 up (since 56m), 300 in (since 11d)</p>
<p>task status:<br />scrub status:<br />mds.worker-22: idle</p>
<p>data:<br />pools: 2 pools, 16384 pgs<br />objects: 78.68M objects, 300 TiB<br />usage: 645 TiB used, 1.5 PiB / 2.2 PiB avail<br />pgs: 16384 active+clean</p>
<p>io:<br />client: 5.1 KiB/s rd, 315 MiB/s wr, 1 op/s rd, 188 op/s wr</p>
</blockquote>
<p>{<br /> "description": "osd_failure(failed timeout osd.79 [v2:172.168.0.12:6878/46988,v1:172.168.0.12:6879/46988] for 145sec e89966 v89966)",<br /> "initiated_at": "2020-10-28 06:00:45.449880",<br /> "age": 33713.186917503001,<br /> "duration": 30.000211802999999,<br /> "type_data": {<br /> "events": [
{<br /> "time": "2020-10-28 06:00:45.449880",<br /> "event": "initiated" <br /> },
{<br /> "time": "2020-10-28 06:00:45.449880",<br /> "event": "header_read" <br /> },
{<br /> "time": "0.000000",<br /> "event": "throttled" <br /> },
{<br /> "time": "0.000000",<br /> "event": "all_read" <br /> },
{<br /> "time": "0.000000",<br /> "event": "dispatched" <br /> },
{<br /> "time": "2020-10-28 06:00:45.449973",<br /> "event": "mon:_ms_dispatch" <br /> },
{<br /> "time": "2020-10-28 06:00:45.449974",<br /> "event": "mon:dispatch_op" <br /> },
{<br /> "time": "2020-10-28 06:00:45.449974",<br /> "event": "psvc:dispatch" <br /> },
{<br /> "time": "2020-10-28 06:00:45.449991",<br /> "event": "osdmap:preprocess_query" <br /> },
{<br /> "time": "2020-10-28 06:00:45.449993",<br /> "event": "osdmap:preprocess_failure" <br /> },
{<br /> "time": "2020-10-28 06:00:45.450004",<br /> "event": "osdmap:prepare_update" <br /> },
{<br /> "time": "2020-10-28 06:00:45.450005",<br /> "event": "osdmap:prepare_failure" <br /> },
{<br /> "time": "2020-10-28 06:00:45.450018",<br /> "event": "no_reply: send routed request" <br /> },
{<br /> "time": "2020-10-28 06:01:15.449819",<br /> "event": "no_reply: send routed request" <br /> },
{<br /> "time": "2020-10-28 06:01:15.450092",<br /> "event": "done" <br /> }<br /> ],<br /> "info": {<br /> "seq": 16028221,<br /> "src_is_mon": false,<br /> "source": "osd.107 v2:172.168.0.5:6881/12793",<br /> "forwarded_to_leader": false</p>
<p>we find that, it is frequently occurs during osd_failures. and the slow ops cannot be auto recovered until we restart the ceph monitor daemon</p>
RADOS - Bug #48028: ceph-mon always suffer lots of slow ops from v14.2.9
https://tracker.ceph.com/issues/48028?journal_id=178176
2020-10-29T02:43:30Z
Yao Ning
zay11022@163.com
<ul></ul><p>Yao Ning wrote:</p>
<blockquote>
<p>root@worker-2:~# docker exec ceph-mon-worker-2 ceph -s<br />cluster:<br />id: 299a04ba-dd3e-43a7-af17-628190cf742f<br />health: HEALTH_WARN<br />41 slow ops, oldest one blocked for 38274 sec, mon.worker-2 has slow ops</p>
<p>services:<br />mon: 3 daemons, quorum worker-2,worker-12,worker-22 (age 17m)<br />mgr: worker-22(active, since 11d), standbys: worker-12, worker-2<br />mds: cephfs:1 {0=worker-22=up:active} 2 up:standby<br />osd: 300 osds: 300 up (since 56m), 300 in (since 11d)</p>
<p>task status:<br />scrub status:<br />mds.worker-22: idle</p>
<p>data:<br />pools: 2 pools, 16384 pgs<br />objects: 78.68M objects, 300 TiB<br />usage: 645 TiB used, 1.5 PiB / 2.2 PiB avail<br />pgs: 16384 active+clean</p>
<p>io:<br />client: 5.1 KiB/s rd, 315 MiB/s wr, 1 op/s rd, 188 op/s wr</p>
</blockquote>
<p>in leader mon log:</p>
<p>2020-10-29 03:07:14.980 7f2255491700 0 log_channel(cluster) log [WRN] : Health check update: 41 slow ops, oldest one blocked for 422147 sec, mon.overcloud-controller-0 has slow ops (SLOW_OPS)<br />2020-10-29 03:07:14.988 7f2255491700 -1 mon.overcloud-controller-0@0(leader) e8 get_health_metrics reporting 41 slow ops, oldest is osd_failure(failed timeout osd.24 [v2:10.0.213.181:6804/4392,v1:10.0.213.181:6806/4392] for 11sec e343763 v343763)</p>
RADOS - Bug #48028: ceph-mon always suffer lots of slow ops from v14.2.9
https://tracker.ceph.com/issues/48028?journal_id=193985
2021-05-05T14:20:57Z
Sage Weil
sage@newdream.net
<ul><li><strong>Project</strong> changed from <i>Ceph</i> to <i>RADOS</i></li><li><strong>Category</strong> deleted (<del><i>Monitor</i></del>)</li></ul>
RADOS - Bug #48028: ceph-mon always suffer lots of slow ops from v14.2.9
https://tracker.ceph.com/issues/48028?journal_id=209111
2022-01-24T22:30:17Z
Neha Ojha
nojha@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Won't Fix - EOL</i></li></ul><p>Please feel free to reopen if you see the issue in a recent version of Ceph.</p>