Project

General

Profile

Actions

Bug #44586

open

Deleting a pool w/ in-flight ops might crash client osdc

Added by Jason Dillaman about 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The rbd-mirror test cases conclude the test by deleting the pools just to ensure the daemons survive. It appears that there is a potential race in Objecter where if an OSD map update indicates that the pool DNE, it might trigger a '_send_op_map_check' and then race with a future OSD map update where '_finish_op' throws an assertion failure because there is still an in-flight map request.

http://qa-proxy.ceph.com/teuthology/jdillaman-2020-03-12_14:35:00-rbd-wip-jd-testing-distro-basic-smithi/4850399/teuthology.log

(gdb) bt
#0  raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise50
#1  0x000055a51fd99553 in reraise_fatal (signum=6) at /usr/src/debug/ceph-15.1.0-2080.g7bd890c.el8.x86_64/src/global/signal_handler.cc:326
#2  handle_fatal_signal (signum=6) at /usr/src/debug/ceph-15.1.0-2080.g7bd890c.el8.x86_64/src/global/signal_handler.cc:326
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x00007f5edf89ccf5 in __GI_abort () at abort.c:79
#6  0x00007f5ee1fcaa91 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, 
    func=0x7f5eeb0e1de0 <Objecter::_finish_op(Objecter::Op*, int)::__PRETTY_FUNCTION__> "void Objecter::_finish_op(Objecter::Op*, int)")
    at /usr/src/debug/ceph-15.1.0-2080.g7bd890c.el8.x86_64/src/common/assert.cc:73
#7  0x00007f5ee1fcac5a in ceph::__ceph_assert_fail (ctx=...) at /usr/src/debug/ceph-15.1.0-2080.g7bd890c.el8.x86_64/src/common/assert.cc:78
#8  0x00007f5eeb0991a9 in Objecter::_finish_op (this=0x55a521e26600, op=0x55a52541cf00, r=0) at /usr/src/debug/ceph-15.1.0-2080.g7bd890c.el8.x86_64/src/log/Entry.h:35
#9  0x00007f5eeb09928e in Objecter::_check_op_pool_dne (this=0x55a521e26600, op=0x55a52541cf00, sl=0x7f5ed20661d0) at /usr/src/debug/ceph-15.1.0-2080.g7bd890c.el8.x86_64/src/osdc/Objecter.cc:1572
#10 0x00007f5eeb09a0b0 in Objecter::_scan_requests (this=0x55a521e26600, s=0x55a5211826e0, skipped_map=false, cluster_full=<optimized out>, pool_full_map=0x7f5ed2066330, need_resend=std::map with 0 elements, 
    need_resend_linger=empty std::__cxx11::list, need_resend_command=std::map with 0 elements, sul=...) at /usr/src/debug/ceph-15.1.0-2080.g7bd890c.el8.x86_64/src/osdc/Objecter.cc:1106
#11 0x00007f5eeb09e818 in Objecter::handle_osd_map (this=0x55a521e26600, m=0x55a5211a2fc0) at /usr/src/debug/ceph-15.1.0-2080.g7bd890c.el8.x86_64/src/osdc/Objecter.cc:1241
#12 0x00007f5eeb0a21d3 in Objecter::ms_dispatch (this=0x55a521e26600, m=0x55a5211a2fc0) at /usr/src/debug/ceph-15.1.0-2080.g7bd890c.el8.x86_64/src/osdc/Objecter.cc:1025
#13 0x00007f5eeb07576a in Dispatcher::ms_dispatch2 (this=0x55a521e26608, m=...) at /usr/src/debug/ceph-15.1.0-2080.g7bd890c.el8.x86_64/src/msg/Dispatcher.h:124
#14 0x00007f5ee21e2bda in Messenger::ms_deliver_dispatch (m=..., this=0x55a521209800) at /usr/src/debug/ceph-15.1.0-2080.g7bd890c.el8.x86_64/src/msg/DispatchQueue.cc:200
#15 DispatchQueue::entry (this=0x55a521209b28) at /usr/src/debug/ceph-15.1.0-2080.g7bd890c.el8.x86_64/src/msg/DispatchQueue.cc:199
#16 0x00007f5ee2284f51 in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-2080.g7bd890c.el8.x86_64/src/msg/DispatchQueue.h:101
#17 0x00007f5ee14082de in start_thread (arg=<optimized out>) at pthread_create.c:486
#18 0x00007f5edf977133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

No data to display

Actions

Also available in: Atom PDF