Actions
Bug #23511
closedforwarded osd_failure leak in mon
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2018-03-29 13:31:14.846 7f888bf9f700 0 mon.b@1(peon) e1 DEBUG SLOW OPS{ "description": "osd_failure(failed immediate osd.1 172.21.15.179:6801/13388 for 21sec e21 v21)", "initiated_at": "2018-03-29 13:30:41.306549", "age": 33.542503, "duration": 33.542534, "type_data": { "events": [ { "time": "2018-03-29 13:30:41.306549", "event": "initiated" }, { "time": "2018-03-29 13:30:41.306549", "event": "header_read" }, { "time": "2018-03-29 13:30:41.306549", "event": "throttled" }, { "time": "2018-03-29 13:30:41.306554", "event": "all_read" }, { "time": "2018-03-29 13:30:41.306748", "event": "dispatched" }, { "time": "2018-03-29 13:30:41.306751", "event": "mon:_ms_dispatch" }, { "time": "2018-03-29 13:30:41.306761", "event": "mon:dispatch_op" }, { "time": "2018-03-29 13:30:41.306762", "event": "psvc:dispatch" }, { "time": "2018-03-29 13:30:41.306790", "event": "osdmap:preprocess_query" }, { "time": "2018-03-29 13:30:41.306802", "event": "osdmap:preprocess_failure" }, { "time": "2018-03-29 13:30:41.306815", "event": "forward_request_leader" }, { "time": "2018-03-29 13:30:41.306849", "event": "forwarded" } ], "info": { "seq": 1276, "src_is_mon": false, "source": "osd.2 172.21.15.179:6809/13391", "forwarded_to_leader": true } } }
no-reply was replied by leader
2018-03-29 13:30:41.301 7f55b7e89700 1 -- 172.21.15.179:6789/0 <== mon.1 172.21.15.179:6790/0 303 ==== forward(osd_failure(failed immediate osd.1 172.21.15.179:6801/13388 for 21sec e21 v21) v3 caps allow * tid 142 con_features 2305244844817580027) v3 ==== 251+0+0 (551290306 0 0) 0x55947aa94c00 con 0x55947a06c1c0 ... 2018-03-29 13:30:41.305 7f55b7e89700 10 mon.a@0(leader) e1 no_reply to osd.2 172.21.15.179:6809/13391 via 172.21.15.179:6790/0 for request osd_failure(failed immediate osd.1 172.21.15.179:6801/13388 for 21sec e21 v21) v3 2018-03-29 13:30:41.305 7f55b7e89700 1 -- 172.21.15.179:6789/0 --> 172.21.15.179:6790/0 -- route(no-reply tid 142) v3 -- ?+0 0x55947a23df80 con 0x55947a06c1c0
and the no-reply was received by the peon
2018-03-29 13:30:41.305 7f888979a700 1 -- 172.21.15.179:6790/0 <== mon.0 172.21.15.179:6789/0 374 ==== route(no-reply tid 142) v3 ==== 69+0+0 (540655743 0 0) 0x556a4d8deac0 con 0x556a4d70c640
Updated by Kefu Chai about 6 years ago
Updated by Kefu Chai about 6 years ago
- Subject changed from SLOW OPS in mon to forwarded osd_failure leak in mon
Updated by Greg Farnum about 6 years ago
Kefu, did your latest no_reply() PR resolve this?
Updated by Kefu Chai about 6 years ago
Greg, no. both tests below include the no_reply() fix.
see
- http://pulpito.ceph.com/kchai-2018-03-30_15:29:52-rados-wip-slow-mon-ops-kefu-distro-basic-smithi/
- http://pulpito.ceph.com/yuriw-2018-04-05_20:57:23-rados-wip-yuriw-master-4.5.18-distro-basic-smithi/
and seems the failure is reproducible .
Updated by Greg Farnum over 4 years ago
- Status changed from New to Can't reproduce
I don't think we've seen this again and may have made even more no_reply fixes?
Actions