Bug #63028
closedceph-mgr seg faults when testing for rbd_support module recovery on repeated blocklisting of its client
0%
Description
Ran the integration test in https://github.com/ceph/ceph/pull/53535 that repeatedly blocklists the rbd_support module's RADOS client approximately every 10 seconds after the module recovers from previous blocklisting at http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/ . Observed 3 job failures,
http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401645/
http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401646/
http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401661/
where the ceph-mgr hits a seg fault after a few recoveries of rbd_support module from repeated client blocklisting.
Excerpt from /a/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401645/remote/smithi154/log/ceph-mgr.x.log.gz
023-09-23T08:05:02.428+0000 7fd2a9e3d640 20 librbd: C_AioCompletion::finish: r=-108 2023-09-23T08:05:02.428+0000 7fd2a9e3d640 -1 librbd::io::AioCompletion: 0x5588f9b18160 fail: (108) Cannot send after transport endpoint shutdown 2023-09-23T08:05:02.428+0000 7fd2a9e3d640 20 librbd::io::AioCompletion: 0x5588f9b18160 complete_request: cb=1, pending=0 2023-09-23T08:05:02.428+0000 7fd2a9e3d640 20 librbd::io::AioCompletion: 0x5588f9b18160 finalize: r=-108 2023-09-23T08:05:02.428+0000 7fd2aa63e640 20 librbd::mirror::GetInfoRequest: 0x5588f947c000 handle_get_mirror_image: r=-108 2023-09-23T08:05:02.428+0000 7fd2aa63e640 -1 librbd::mirror::GetInfoRequest: 0x5588f947c000 handle_get_mirror_image: failed to retrieve mirroring state: (108) Cannot send after transport endpoint shutdown 2023-09-23T08:05:02.428+0000 7fd2aa63e640 20 librbd::mirror::GetInfoRequest: 0x5588f947c000 finish: r=-108 2023-09-23T08:05:02.428+0000 7fd2aa63e640 20 librbd: C_AioCompletion::finish: r=-108 2023-09-23T08:05:02.428+0000 7fd2aa63e640 -1 librbd::io::AioCompletion: 0x5588f9abcc60 fail: (108) Cannot send after transport endpoint shutdown 2023-09-23T08:05:02.428+0000 7fd2aa63e640 20 librbd::io::AioCompletion: 0x5588f9abcc60 complete_request: cb=1, pending=0 2023-09-23T08:05:02.428+0000 7fd2aa63e640 20 librbd::io::AioCompletion: 0x5588f9abcc60 finalize: r=-108 2023-09-23T08:05:02.444+0000 7fd30eda2640 1 -- 172.21.15.154:0/4158557413 <== osd.1 v2:172.21.15.154:6807/1669680350 503 ==== osd_op_reply(736 rbd_header.10c0ee7dca38 [watch ping cookie 94046662626304] v0'0 uv0 ondisk = -108 ((108) Cannot send after transport endpoint shutdown)) v8 ==== 167+0+0 (secure 0 0 0) 0x5588f35ee480 con 0x5588f9b53800 2023-09-23T08:05:02.444+0000 7fd30eda2640 1 -- 172.21.15.154:0/4158557413 <== osd.1 v2:172.21.15.154:6807/1669680350 504 ==== osd_op_reply(737 rbd_header.10d2433c7098 [watch ping cookie 94046791258112] v0'0 uv0 ondisk = -108 ((108) Cannot send after transport endpoint shutdown)) v8 ==== 167+0+0 (secure 0 0 0) 0x5588f35ee480 con 0x5588f9b53800 2023-09-23T08:05:02.444+0000 7fd30eda2640 1 -- 172.21.15.154:0/4158557413 <== osd.1 v2:172.21.15.154:6807/1669680350 505 ==== osd_op_reply(741 rbd_header.10c0ee7dca38 [call] v0'0 uv0 ondisk = -108 ((108) Cannot send after transport endpoint shutdown)) v8 ==== 167+0+0 (secure 0 0 0) 0x5588f35ee480 con 0x5588f9b53800 2023-09-23T08:05:02.452+0000 7fd30eda2640 1 -- 172.21.15.154:0/4158557413 <== osd.1 v2:172.21.15.154:6807/1669680350 506 ==== watch-notify(disconnect (3) cookie 94046792808448 notify 0 ret 0) v3 ==== 42+0+0 (secure 0 0 0) 0x5588f84d7040 con 0x5588f9b53800 2023-09-23T08:05:02.488+0000 7fd30f5a3640 1 -- 172.21.15.154:0/4158557413 <== osd.0 v2:172.21.15.154:6802/1458739677 10 ==== osd_op_reply(746 rbd_header.10db9e5db833 [watch unwatch cookie 94046785367040] v0'0 uv0 ondisk = -108 ((108) Cannot send after transport endpoint shutdown)) v8 ==== 167+0+0 (secure 0 0 0) 0x5588f916a480 con 0x5588f91adc00 2023-09-23T08:05:02.492+0000 7fd30fda4640 1 -- 172.21.15.154:0/4158557413 <== osd.2 v2:172.21.15.154:6800/2255729285 27 ==== watch-notify(disconnect (3) cookie 94046794357760 notify 0 ret 0) v3 ==== 42+0+0 (secure 0 0 0) 0x5588f8cdad00 con 0x5588f9a7d400 2023-09-23T08:05:02.504+0000 7fd30eda2640 1 -- 172.21.15.154:0/4158557413 <== osd.1 v2:172.21.15.154:6807/1669680350 507 ==== osd_op_reply(734 rbd_header.1081cb785fa0 [watch ping cookie 94046784311296] v0'0 uv0 ondisk = -108 ((108) Cannot send after transport endpoint shutdown)) v8 ==== 167+0+0 (secure 0 0 0) 0x5588f35ee480 con 0x5588f9b53800 2023-09-23T08:05:02.504+0000 7fd30eda2640 1 -- 172.21.15.154:0/4158557413 <== osd.1 v2:172.21.15.154:6807/1669680350 508 ==== osd_op_reply(735 rbd_header.10a51056c8f5 [watch ping cookie 94046795061248] v0'0 uv0 ondisk = -108 ((108) Cannot send after transport endpoint shutdown)) v8 ==== 167+0+0 (secure 0 0 0) 0x5588f35ee480 con 0x5588f9b53800 2023-09-23T08:05:02.532+0000 7fd2a9e3d640 -1 *** Caught signal (Segmentation fault) ** in thread 7fd2a9e3d640 thread_name:io_context_pool ceph version 18.0.0-6334-gee696c23 (ee696c23a60e05ef9b01eb4ed6317dfcb4036a8e) reef (dev) 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fd3121c2520] 2: /lib/x86_64-linux-gnu/libc.so.6(+0x1aac9d) [0x7fd31232ac9d] 3: PyBytes_FromString() 4: /usr/lib/python3/dist-packages/rbd.cpython-310-x86_64-linux-gnu.so(+0x59868) [0x7fd303858868] 5: /usr/lib/python3/dist-packages/rbd.cpython-310-x86_64-linux-gnu.so(+0x40a3b) [0x7fd30383fa3b] 6: /usr/lib/python3/dist-packages/rbd.cpython-310-x86_64-linux-gnu.so(+0xb53b5) [0x7fd3038b43b5] 7: /usr/lib/python3/dist-packages/rbd.cpython-310-x86_64-linux-gnu.so(+0x3b1a4) [0x7fd30383a1a4] 8: /lib/librbd.so.1(+0x22f9cd) [0x7fd3033049cd] 9: /lib/librbd.so.1(+0x234293) [0x7fd303309293] 10: /lib/librbd.so.1(+0x23585d) [0x7fd30330a85d] 11: /lib/librbd.so.1(+0x2af315) [0x7fd303384315] 12: /lib/librados.so.2(+0x11005e) [0x7fd31210505e] 13: /lib/librados.so.2(+0xc3acf) [0x7fd3120b8acf] 14: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc2b3) [0x7fd31258b2b3] 15: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7fd312214b43] 16: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7fd3122a6a00] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Ramana Raja 7 months ago
- Related to Bug #62891: [test][rbd] test recovery of rbd_support module from repeated blocklisting of its client added
Updated by Ramana Raja 7 months ago
- Status changed from New to In Progress
- Assignee set to Ramana Raja
Updated by Ilya Dryomov 7 months ago
- Status changed from In Progress to Fix Under Review
- Backport set to pacific,quincy,reef
- Pull request ID set to 54002
Updated by Ramana Raja 7 months ago
- Assignee changed from Ramana Raja to Ilya Dryomov
Updated by Ilya Dryomov 7 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot 7 months ago
- Copied to Backport #63226: pacific: ceph-mgr seg faults when testing for rbd_support module recovery on repeated blocklisting of its client added
Updated by Backport Bot 7 months ago
- Copied to Backport #63227: quincy: ceph-mgr seg faults when testing for rbd_support module recovery on repeated blocklisting of its client added
Updated by Backport Bot 7 months ago
- Copied to Backport #63228: reef: ceph-mgr seg faults when testing for rbd_support module recovery on repeated blocklisting of its client added
Updated by Backport Bot 3 months ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".