Project

General

Profile

Actions

Bug #63028

closed

ceph-mgr seg faults when testing for rbd_support module recovery on repeated blocklisting of its client

Added by Ramana Raja 7 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
backport_processed
Backport:
pacific,quincy,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ran the integration test in https://github.com/ceph/ceph/pull/53535 that repeatedly blocklists the rbd_support module's RADOS client approximately every 10 seconds after the module recovers from previous blocklisting at http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/ . Observed 3 job failures,
http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401645/
http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401646/
http://pulpito.front.sepia.ceph.com/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401661/
where the ceph-mgr hits a seg fault after a few recoveries of rbd_support module from repeated client blocklisting.

Excerpt from /a/rraja-2023-09-23_06:37:41-rbd:cli-wip-62891-distro-default-smithi/7401645/remote/smithi154/log/ceph-mgr.x.log.gz

023-09-23T08:05:02.428+0000 7fd2a9e3d640 20 librbd: C_AioCompletion::finish: r=-108
2023-09-23T08:05:02.428+0000 7fd2a9e3d640 -1 librbd::io::AioCompletion: 0x5588f9b18160 fail: (108) Cannot send after transport endpoint shutdown
2023-09-23T08:05:02.428+0000 7fd2a9e3d640 20 librbd::io::AioCompletion: 0x5588f9b18160 complete_request: cb=1, pending=0
2023-09-23T08:05:02.428+0000 7fd2a9e3d640 20 librbd::io::AioCompletion: 0x5588f9b18160 finalize: r=-108
2023-09-23T08:05:02.428+0000 7fd2aa63e640 20 librbd::mirror::GetInfoRequest: 0x5588f947c000 handle_get_mirror_image: r=-108
2023-09-23T08:05:02.428+0000 7fd2aa63e640 -1 librbd::mirror::GetInfoRequest: 0x5588f947c000 handle_get_mirror_image: failed to retrieve mirroring state: (108) Cannot send after transport endpoint shutdown
2023-09-23T08:05:02.428+0000 7fd2aa63e640 20 librbd::mirror::GetInfoRequest: 0x5588f947c000 finish: r=-108
2023-09-23T08:05:02.428+0000 7fd2aa63e640 20 librbd: C_AioCompletion::finish: r=-108
2023-09-23T08:05:02.428+0000 7fd2aa63e640 -1 librbd::io::AioCompletion: 0x5588f9abcc60 fail: (108) Cannot send after transport endpoint shutdown
2023-09-23T08:05:02.428+0000 7fd2aa63e640 20 librbd::io::AioCompletion: 0x5588f9abcc60 complete_request: cb=1, pending=0
2023-09-23T08:05:02.428+0000 7fd2aa63e640 20 librbd::io::AioCompletion: 0x5588f9abcc60 finalize: r=-108
2023-09-23T08:05:02.444+0000 7fd30eda2640  1 -- 172.21.15.154:0/4158557413 <== osd.1 v2:172.21.15.154:6807/1669680350 503 ==== osd_op_reply(736 rbd_header.10c0ee7dca38 [watch ping cookie 94046662626304] v0'0 uv0 ondisk = -108 ((108) Cannot send after transport endpoint shutdown)) v8 ==== 167+0+0 (secure 0 0 0) 0x5588f35ee480 con 0x5588f9b53800
2023-09-23T08:05:02.444+0000 7fd30eda2640  1 -- 172.21.15.154:0/4158557413 <== osd.1 v2:172.21.15.154:6807/1669680350 504 ==== osd_op_reply(737 rbd_header.10d2433c7098 [watch ping cookie 94046791258112] v0'0 uv0 ondisk = -108 ((108) Cannot send after transport endpoint shutdown)) v8 ==== 167+0+0 (secure 0 0 0) 0x5588f35ee480 con 0x5588f9b53800
2023-09-23T08:05:02.444+0000 7fd30eda2640  1 -- 172.21.15.154:0/4158557413 <== osd.1 v2:172.21.15.154:6807/1669680350 505 ==== osd_op_reply(741 rbd_header.10c0ee7dca38 [call] v0'0 uv0 ondisk = -108 ((108) Cannot send after transport endpoint shutdown)) v8 ==== 167+0+0 (secure 0 0 0) 0x5588f35ee480 con 0x5588f9b53800
2023-09-23T08:05:02.452+0000 7fd30eda2640  1 -- 172.21.15.154:0/4158557413 <== osd.1 v2:172.21.15.154:6807/1669680350 506 ==== watch-notify(disconnect (3) cookie 94046792808448 notify 0 ret 0) v3 ==== 42+0+0 (secure 0 0 0) 0x5588f84d7040 con 0x5588f9b53800
2023-09-23T08:05:02.488+0000 7fd30f5a3640  1 -- 172.21.15.154:0/4158557413 <== osd.0 v2:172.21.15.154:6802/1458739677 10 ==== osd_op_reply(746 rbd_header.10db9e5db833 [watch unwatch cookie 94046785367040] v0'0 uv0 ondisk = -108 ((108) Cannot send after transport endpoint shutdown)) v8 ==== 167+0+0 (secure 0 0 0) 0x5588f916a480 con 0x5588f91adc00
2023-09-23T08:05:02.492+0000 7fd30fda4640  1 -- 172.21.15.154:0/4158557413 <== osd.2 v2:172.21.15.154:6800/2255729285 27 ==== watch-notify(disconnect (3) cookie 94046794357760 notify 0 ret 0) v3 ==== 42+0+0 (secure 0 0 0) 0x5588f8cdad00 con 0x5588f9a7d400
2023-09-23T08:05:02.504+0000 7fd30eda2640  1 -- 172.21.15.154:0/4158557413 <== osd.1 v2:172.21.15.154:6807/1669680350 507 ==== osd_op_reply(734 rbd_header.1081cb785fa0 [watch ping cookie 94046784311296] v0'0 uv0 ondisk = -108 ((108) Cannot send after transport endpoint shutdown)) v8 ==== 167+0+0 (secure 0 0 0) 0x5588f35ee480 con 0x5588f9b53800
2023-09-23T08:05:02.504+0000 7fd30eda2640  1 -- 172.21.15.154:0/4158557413 <== osd.1 v2:172.21.15.154:6807/1669680350 508 ==== osd_op_reply(735 rbd_header.10a51056c8f5 [watch ping cookie 94046795061248] v0'0 uv0 ondisk = -108 ((108) Cannot send after transport endpoint shutdown)) v8 ==== 167+0+0 (secure 0 0 0) 0x5588f35ee480 con 0x5588f9b53800
2023-09-23T08:05:02.532+0000 7fd2a9e3d640 -1 *** Caught signal (Segmentation fault) **
 in thread 7fd2a9e3d640 thread_name:io_context_pool

 ceph version 18.0.0-6334-gee696c23 (ee696c23a60e05ef9b01eb4ed6317dfcb4036a8e) reef (dev)
 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fd3121c2520]
 2: /lib/x86_64-linux-gnu/libc.so.6(+0x1aac9d) [0x7fd31232ac9d]
 3: PyBytes_FromString()
 4: /usr/lib/python3/dist-packages/rbd.cpython-310-x86_64-linux-gnu.so(+0x59868) [0x7fd303858868]
 5: /usr/lib/python3/dist-packages/rbd.cpython-310-x86_64-linux-gnu.so(+0x40a3b) [0x7fd30383fa3b]
 6: /usr/lib/python3/dist-packages/rbd.cpython-310-x86_64-linux-gnu.so(+0xb53b5) [0x7fd3038b43b5]
 7: /usr/lib/python3/dist-packages/rbd.cpython-310-x86_64-linux-gnu.so(+0x3b1a4) [0x7fd30383a1a4]
 8: /lib/librbd.so.1(+0x22f9cd) [0x7fd3033049cd]
 9: /lib/librbd.so.1(+0x234293) [0x7fd303309293]
 10: /lib/librbd.so.1(+0x23585d) [0x7fd30330a85d]
 11: /lib/librbd.so.1(+0x2af315) [0x7fd303384315]
 12: /lib/librados.so.2(+0x11005e) [0x7fd31210505e]
 13: /lib/librados.so.2(+0xc3acf) [0x7fd3120b8acf]
 14: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc2b3) [0x7fd31258b2b3]
 15: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7fd312214b43]
 16: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7fd3122a6a00]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Related issues 4 (0 open4 closed)

Related to rbd - Bug #62891: [test][rbd] test recovery of rbd_support module from repeated blocklisting of its clientResolvedRamana Raja

Actions
Copied to rbd - Backport #63226: pacific: ceph-mgr seg faults when testing for rbd_support module recovery on repeated blocklisting of its clientResolvedIlya DryomovActions
Copied to rbd - Backport #63227: quincy: ceph-mgr seg faults when testing for rbd_support module recovery on repeated blocklisting of its clientResolvedIlya DryomovActions
Copied to rbd - Backport #63228: reef: ceph-mgr seg faults when testing for rbd_support module recovery on repeated blocklisting of its clientResolvedIlya DryomovActions
Actions #1

Updated by Ramana Raja 7 months ago

  • Related to Bug #62891: [test][rbd] test recovery of rbd_support module from repeated blocklisting of its client added
Actions #2

Updated by Ramana Raja 7 months ago

  • Status changed from New to In Progress
  • Assignee set to Ramana Raja
Actions #3

Updated by Ilya Dryomov 7 months ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to pacific,quincy,reef
  • Pull request ID set to 54002
Actions #4

Updated by Ramana Raja 7 months ago

  • Assignee changed from Ramana Raja to Ilya Dryomov
Actions #5

Updated by Ilya Dryomov 7 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Backport Bot 7 months ago

  • Copied to Backport #63226: pacific: ceph-mgr seg faults when testing for rbd_support module recovery on repeated blocklisting of its client added
Actions #7

Updated by Backport Bot 7 months ago

  • Copied to Backport #63227: quincy: ceph-mgr seg faults when testing for rbd_support module recovery on repeated blocklisting of its client added
Actions #8

Updated by Backport Bot 7 months ago

  • Copied to Backport #63228: reef: ceph-mgr seg faults when testing for rbd_support module recovery on repeated blocklisting of its client added
Actions #9

Updated by Backport Bot 7 months ago

  • Tags set to backport_processed
Actions #10

Updated by Backport Bot 3 months ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF