Project

General

Profile

Actions

Bug #20948

closed

monitors crashing fairly regularly with OSDMonitor.cc: 2549 failed FAILED assert(0)

Added by Patrick McLean over 6 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We are seeing monitors die pretty regularly across our network, with a failed assert. Here is the assert failure from the log, I am attaching the full relevant section of the log. I can provide objdump -rdS of the binary on request, it's 94MiB so the tracker won't let me upload it here.

2017-08-08 19:32:12.522780 7ff3b4dce700 -1 mon/OSDMonitor.cc: In function 'MOSDMap* OSDMonitor::build_incremental(epoch_t, epoch_t)' thread 7ff3b4dce700 time 2017-08-08 19:32:12.521362                                                        
mon/OSDMonitor.cc: 2549: FAILED assert(0)                                                                                                                                                                                                       

 ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)                                                                                                                                                                                 
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x82) [0x55997b11aef2]                                                                                                                                                
 2: (OSDMonitor::build_incremental(unsigned int, unsigned int)+0x72d) [0x55997ada39ad]                                                                                                                                                          
 3: (OSDMonitor::send_incremental(unsigned int, MonSession*, bool, std::shared_ptr<MonOpRequest>)+0x107) [0x55997ada4047]                                                                                                                       
 4: (OSDMonitor::check_sub(Subscription*)+0x226) [0x55997ada9e46]                                                                                                                                                                               
 5: (Monitor::handle_subscribe(std::shared_ptr<MonOpRequest>)+0x7b2) [0x55997ad30e92]                                                                                                                                                           
 6: (Monitor::dispatch_op(std::shared_ptr<MonOpRequest>)+0x723) [0x55997ad56653]                                                                                                                                                                
 7: (Monitor::_ms_dispatch(Message*)+0x581) [0x55997ad57391]                                                                                                                                                                                    
 8: (Monitor::ms_dispatch(Message*)+0x23) [0x55997ad75433]                                                                                                                                                                                      
 9: (DispatchQueue::entry()+0x6ea) [0x55997b20a8ea]                                                                                                                                                                                             
 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x55997b0ff95d]                                                                                                                                                                              
 11: (()+0x74a4) [0x7ff3ca3084a4]                                                                                                                                                                                                               
 12: (clone()+0x6d) [0x7ff3c882ffcd]                                                                                                                                                                                                            
 13: [(nil)]                                                                                                                 

Files

failed-assert.log.gz (214 KB) failed-assert.log.gz sanitized log version of relevant log section Patrick McLean, 08/08/2017 09:29 PM
Actions #1

Updated by Patrick McLean over 6 years ago

Actions #2

Updated by Patrick McLean over 6 years ago

Here is the output of objdump -rdS on the ceph-mon binary (xz compressed)

https://send.firefox.com/download/a3ddd514bc/#QGrRsRG7viBj7-WYLkphCg

Actions #3

Updated by Greg Farnum over 6 years ago

  • Status changed from New to Need More Info

Can you update to 10.2.9 and report back if it still happens? I don't see any similar reports in my email but there were some fixes to the monitor code between those versions.

Also, while I believe firefox send links expire quickly, you can use ceph-post-file to upload data where it will be accessible by ceph devs.

Actions #4

Updated by Patrick McLean over 6 years ago

We have not seen this since updating to 10.2.9

Actions #5

Updated by Sage Weil almost 3 years ago

  • Status changed from Need More Info to Closed
Actions

Also available in: Atom PDF