Project

General

Profile

Actions

Bug #20256

closed

"ceph osd df" is broken; asserts out on Luminous-enabled clusters

Added by Greg Farnum almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
Administration/Usability
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I got a private email report:

When do ‘ceph osd df’, ceph-mon always crush. The stack info as following:

0> 2017-06-08 04:56:51.647510 7f91b9972700 -1 *** Caught signal (Aborted) **
in thread 7f91b9972700 thread_name:ms_dispatch

ceph version  12.0.2-2454-g853ae30 (853ae30b1560fe23274c01003c9aa8161638978b) luminous (dev)
 1: (()+0x7f8bf2) [0x5570b35edbf2]
2: (()+0x115c0) [0x7f91c2a2d5c0]
3: (gsignal()+0x9f) [0x7f91bfa3f91f]
4: (abort()+0x16a) [0x7f91bfa4151a]
5: (()+0x4475b9) [0x5570b323c5b9]
6: (OSDMonitor::print_utilization(std::ostream&, ceph::Formatter*, bool) const+0x1760) [0x5570b317e320]
7: (OSDMonitor::preprocess_command(boost::intrusive_ptr<MonOpRequest>)+0xaa8) [0x5570b31b4138]
8: (OSDMonitor::preprocess_query(boost::intrusive_ptr<MonOpRequest>)+0x2c0) [0x5570b31bbc10]
9: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x7e8) [0x5570b31678d8]
10: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x205b) [0x5570b311ef7b]
11: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x956) [0x5570b3124bd6]
12: (Monitor::_ms_dispatch(Message*)+0x5d3) [0x5570b3125b43]
13: (Monitor::ms_dispatch(Message*)+0x23) [0x5570b314ee63]
14: (DispatchQueue::entry()+0xeca) [0x5570b3593cda]
15: (DispatchQueue::DispatchThread::entry()+0xd) [0x5570b33f360d]
16: (()+0x76ca) [0x7f91c2a236ca]
17: (clone()+0x5f) [0x7f91bfb11f7f]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

W/ gdb:

Thread 10 (Thread 0x7f26189f9700 (LWP 134531)):
#0  0x00007f26200b2c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f26200b6028 in __GI_abort () at abort.c:89
#2  0x000055f5ed4afc59 in PGStatService::get_osd_stat (this=<optimized out>, osd=<optimized out>) at /root/ceph/src/mon/PGStatService.h:45
#3  0x000055f5ed3fb9f8 in get_osd_utilization (this=<optimized out>, kb_avail=<synthetic pointer>, kb_used=<synthetic pointer>, kb=<synthetic pointer>, id=0) at /root/ceph/src/mon/OSDMonitor.cc:662
#4  average_utilization (this=0x7f26189f4c30) at /root/ceph/src/mon/OSDMonitor.cc:652
#5  OSDUtilizationDumper (tree_=<optimized out>, pgs_=<optimized out>, osdmap_=0x7f26189f4ad0, crush=<optimized out>, this=0x7f26189f4c30) at /root/ceph/src/mon/OSDMonitor.cc:585
#6  OSDUtilizationPlainDumper (tree=<optimized out>, pgs=<optimized out>, osdmap=0x7f26189f4ad0, crush=<optimized out>, this=0x7f26189f4c30) at /root/ceph/src/mon/OSDMonitor.cc:715
#7  OSDMonitor::print_utilization (this=this@entry=0x7f261f58a800, out=..., f=f@entry=0x0, tree=<optimized out>) at /root/ceph/src/mon/OSDMonitor.cc:883
#8  0x000055f5ed42f3b9 in OSDMonitor::preprocess_command (this=this@entry=0x7f261f58a800, op=...) at /root/ceph/src/mon/OSDMonitor.cc:4147
#9  0x000055f5ed4360f6 in OSDMonitor::preprocess_query (this=0x7f261f58a800, op=...) at /root/ceph/src/mon/OSDMonitor.cc:1581
#10 0x000055f5ed3ed62e in PaxosService::dispatch (this=0x7f261f58a800, op=...) at /root/ceph/src/mon/PaxosService.cc:74
#11 0x000055f5ed3abf6a in Monitor::handle_command (this=this@entry=0x7f261f589400, op=...) at /root/ceph/src/mon/Monitor.cc:2940
#12 0x000055f5ed3afcaf in Monitor::dispatch_op (this=this@entry=0x7f261f589400, op=...) at /root/ceph/src/mon/Monitor.cc:3854
#13 0x000055f5ed3b0e52 in Monitor::_ms_dispatch (this=this@entry=0x7f261f589400, m=m@entry=0x7f2613271400) at /root/ceph/src/mon/Monitor.cc:3749
#14 0x000055f5ed3d51f3 in Monitor::ms_dispatch (this=0x7f261f589400, m=0x7f2613271400) at /root/ceph/src/mon/Monitor.h:851
#15 0x000055f5ed7a624b in ms_deliver_dispatch (m=0x7f2613271400, this=0x7f261e125500) at /root/ceph/src/msg/Messenger.h:617
#16 DispatchQueue::entry (this=0x7f261e125658) at /root/ceph/src/msg/DispatchQueue.cc:197
#17 0x000055f5ed62cc9d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /root/ceph/src/msg/DispatchQueue.h:102
#18 0x00007f262189e184 in start_thread (arg=0x7f26189f9700) at pthread_create.c:312
#19 0x00007f2620179bed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

git bisect says it's from 459ec61901a3e7d58e971b96a06eb99b43e19571.

Actions #1

Updated by Greg Farnum almost 7 years ago

So obviously what happened is I thought we had moved the osd df command into the monitor, but that didn't actually happen. I'm fixing it now by moving the parent PGStatService class into PGMap and migrating stuff into the monitor. Taking a little longer than expected to do that code move and deal with the resulting new multiple-inheritance issues on PGStatService classes in the monitor.

Actions #3

Updated by Greg Farnum almost 7 years ago

  • Status changed from In Progress to 7
Actions #4

Updated by Nathan Cutler almost 7 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF