Project

General

Profile

Actions

Bug #40029

closed

ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+0x2fa) [0x7f516505614a]

Added by Iain Buclaw almost 5 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When adding a new osd, all primary monitors crashed.

2019-05-27 09:57:24.536 7f515ca31700  0 log_channel(cluster) log [DBG] : osdmap e23907: 57 total, 55 up, 55 in
2019-05-27 09:57:24.536 7f515ca31700  1 mon.host-247@1(leader).auth v48405 _upgrade_format_to_mimic upgrading from format 2 to 3
2019-05-27 09:57:24.536 7f515ca31700  0 log_channel(cluster) log [DBG] : mgrmap e1234: host-257(active), standbys: host-259, host-252, host-247, host-262
2019-05-27 09:57:24.536 7f515ca31700  0 log_channel(cluster) log [WRN] : Health check failed: 1/5 mons down, quorum host-247,host-262,host-259,host-257 (MON_DOWN)
2019-05-27 09:57:24.544 7f515ca31700  0 mon.host-247@1(leader).osd e23907 create-or-move crush item name 'osd.55' initial_weight 0.4411 at location {host=host-382,root=default}
2019-05-27 09:57:24.548 7f515ca31700 -1 *** Caught signal (Aborted) **
 in thread 7f515ca31700 thread_name:fn_monstore

 ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)
 1: /usr/bin/ceph-mon() [0x849490]
 2: (()+0x11390) [0x7f5164556390]
 3: (gsignal()+0x38) [0x7f516319e428]
 4: (abort()+0x16a) [0x7f51631a002a]
 5: (tcmalloc::Log(tcmalloc::LogMode, char const*, int, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem)+0x22e) [0x7f51640995ce]
 6: (()+0x1375f) [0x7f516408d75f]
 7: (operator delete[](void*)+0x1fd) [0x7f51640b066d]
 8: (CrushWrapper::update_choose_args(CephContext*)+0x2fa) [0x7f516505614a]
 9: (CrushWrapper::remove_root(int)+0x1cf) [0x7f516505af3f]
 10: (CrushWrapper::remove_root(int)+0x70) [0x7f516505ade0]
 11: (CrushWrapper::trim_roots_with_class()+0x1a7) [0x7f516505d047]
 12: (CrushWrapper::rebuild_roots_with_classes()+0xc2) [0x7f516506acf2]
 13: (CrushWrapper::insert_item(CephContext*, int, float, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)+0x450) [0x7f516506c290]
 14: (CrushWrapper::create_or_move_item(CephContext*, int, float, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)+0xf2) [0x7f516506d132]
 15: (OSDMonitor::prepare_command_impl(boost::intrusive_ptr<MonOpRequest>, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&)+0x9ff5) [0x7ad945]
 16: (OSDMonitor::prepare_command(boost::intrusive_ptr<MonOpRequest>)+0x252) [0x7c6ae2]
 17: (OSDMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0x170) [0x7c6e40]
 18: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x996) [0x757626]
 19: (PaxosService::C_RetryMessage::_finish(int)+0x61) [0x6bd281]
 20: (C_MonOp::finish(int)+0x43) [0x66dcb3]
 21: (Context::complete(int)+0x9) [0x66cec9]
 22: (void finish_contexts<Context>(CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0xb3) [0x6744c3]
 23: (Paxos::finish_round()+0x98) [0x74d928]
 24: (Paxos::commit_finish()+0x565) [0x74f865]
 25: (C_Committed::finish(int)+0x31) [0x753711]
 26: (Context::complete(int)+0x9) [0x66cec9]
 27: (MonitorDBStore::C_DoTransaction::finish(int)+0x97) [0x750ef7]
 28: (Context::complete(int)+0x9) [0x66cec9]
 29: (Finisher::finisher_thread_entry()+0x12e) [0x7f5164c61b5e]
 30: (()+0x76ba) [0x7f516454c6ba]
 31: (clone()+0x6d) [0x7f516327041d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Looking at the crush tree, the osds its crashing on are all not moved into a host.

ID  CLASS WEIGHT   REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS TYPE NAME       
 -1       26.90276        -  27 TiB 7.2 TiB  19 TiB 27.11 1.00   - root default    
 -2        0.88199        - 883 GiB 323 GiB 560 GiB 36.60 1.35   -     host host-247 
  0   ssd  0.44099  1.00000 442 GiB 163 GiB 279 GiB 36.90 1.36  77         osd.0   
  1   ssd  0.44099  1.00000 442 GiB 160 GiB 281 GiB 36.30 1.34  77         osd.1   
 -3        0.88199        - 897 GiB 375 GiB 522 GiB 41.80 1.54   -     host host-248 
  2   ssd  0.44099  1.00000 442 GiB 170 GiB 271 GiB 38.61 1.42  79         osd.2   
  3   ssd  0.44099  1.00000 456 GiB 205 GiB 251 GiB 44.90 1.66  77         osd.3   
 -7        0.88199        - 883 GiB 344 GiB 539 GiB 38.96 1.44   -     host host-252 
 10   ssd  0.44099  1.00000 442 GiB 172 GiB 270 GiB 38.84 1.43  81         osd.10  
 11   ssd  0.44099  1.00000 442 GiB 173 GiB 269 GiB 39.08 1.44  77         osd.11  
 -8        0.88199        - 883 GiB 282 GiB 602 GiB 31.88 1.18   -     host host-253 
 12   ssd  0.44099  1.00000 442 GiB 163 GiB 278 GiB 36.96 1.36  80         osd.12  
 13   ssd  0.44099  1.00000 442 GiB 118 GiB 323 GiB 26.80 0.99  59         osd.13  
 -9        0.88199        - 883 GiB 331 GiB 552 GiB 37.50 1.38   -     host host-254 
 14   ssd  0.44099  1.00000 442 GiB 165 GiB 277 GiB 37.32 1.38  74         osd.14  
 15   ssd  0.44099  1.00000 442 GiB 166 GiB 275 GiB 37.68 1.39  78         osd.15  
-10        0.88199        - 883 GiB 342 GiB 541 GiB 38.72 1.43   -     host host-255 
 16   ssd  0.44099  1.00000 442 GiB 161 GiB 281 GiB 36.37 1.34  75         osd.16  
 17   ssd  0.44099  1.00000 442 GiB 181 GiB 260 GiB 41.07 1.52  87         osd.17  
-11        0.88199        - 883 GiB 343 GiB 540 GiB 38.90 1.44   -     host host-256 
 18   ssd  0.44099  1.00000 442 GiB 171 GiB 271 GiB 38.68 1.43  80         osd.18  
 19   ssd  0.44099  1.00000 442 GiB 173 GiB 269 GiB 39.11 1.44  84         osd.19  
-12        0.88199        - 883 GiB 334 GiB 549 GiB 37.83 1.40   -     host host-257 
 20   ssd  0.44099  1.00000 442 GiB 171 GiB 271 GiB 38.72 1.43  82         osd.20  
 21   ssd  0.44099  1.00000 442 GiB 163 GiB 278 GiB 36.95 1.36  78         osd.21  
-13        0.88199        - 883 GiB 326 GiB 557 GiB 36.89 1.36   -     host host-258 
 22   ssd  0.44099  1.00000 442 GiB 166 GiB 275 GiB 37.66 1.39  88         osd.22  
 23   ssd  0.44099  1.00000 442 GiB 159 GiB 282 GiB 36.11 1.33  78         osd.23  
-14        0.88199        - 883 GiB 350 GiB 533 GiB 39.65 1.46   -     host host-259 
 24   ssd  0.44099  1.00000 442 GiB 170 GiB 272 GiB 38.40 1.42  81         osd.24  
 25   ssd  0.44099  1.00000 442 GiB 181 GiB 261 GiB 40.90 1.51  88         osd.25  
-15        0.88199        - 883 GiB 335 GiB 549 GiB 37.88 1.40   -     host host-260 
 26   ssd  0.44099  1.00000 442 GiB 169 GiB 272 GiB 38.36 1.42  83         osd.26  
 27   ssd  0.44099  1.00000 442 GiB 165 GiB 276 GiB 37.41 1.38  81         osd.27  
-16        0.88199        - 883 GiB 350 GiB 533 GiB 39.66 1.46   -     host host-261 
 28   ssd  0.44099  1.00000 442 GiB 172 GiB 269 GiB 39.00 1.44  87         osd.28  
 29   ssd  0.44099  1.00000 442 GiB 178 GiB 263 GiB 40.33 1.49  84         osd.29  
-17        0.88199        - 883 GiB 341 GiB 542 GiB 38.60 1.42   -     host host-262 
 30   ssd  0.44099  1.00000 442 GiB 171 GiB 270 GiB 38.83 1.43  88         osd.30  
 31   ssd  0.44099  1.00000 442 GiB 169 GiB 272 GiB 38.37 1.42  86         osd.31  
-18        0.88199        - 883 GiB 324 GiB 559 GiB 36.74 1.36   -     host host-263 
 32   ssd  0.44099  1.00000 442 GiB 151 GiB 291 GiB 34.20 1.26  73         osd.32  
 33   ssd  0.44099  1.00000 442 GiB 173 GiB 268 GiB 39.29 1.45  81         osd.33  
-23        0.88199        - 883 GiB 344 GiB 539 GiB 38.92 1.44   -     host host-349 
  4   ssd  0.44099  1.00000 442 GiB 172 GiB 270 GiB 38.96 1.44  86         osd.4   
  5   ssd  0.44099  1.00000 442 GiB 172 GiB 270 GiB 38.88 1.43  90         osd.5   
-21        0.88199        - 883 GiB 318 GiB 565 GiB 35.99 1.33   -     host host-350 
  6   ssd  0.44099  1.00000 442 GiB 161 GiB 281 GiB 36.46 1.35  79         osd.6   
  7   ssd  0.44099  1.00000 442 GiB 157 GiB 285 GiB 35.52 1.31  75         osd.7   
-19        0.88199        - 883 GiB 331 GiB 552 GiB 37.49 1.38   -     host host-351 
  8   ssd  0.44099  1.00000 442 GiB 158 GiB 284 GiB 35.75 1.32  77         osd.8   
  9   ssd  0.44099  1.00000 442 GiB 173 GiB 268 GiB 39.23 1.45  86         osd.9   
-25        0.88199        - 883 GiB 342 GiB 541 GiB 38.69 1.43   -     host host-352 
 34   ssd  0.44099  1.00000 442 GiB 172 GiB 269 GiB 38.97 1.44  83         osd.34  
 35   ssd  0.44099  1.00000 442 GiB 170 GiB 272 GiB 38.41 1.42  85         osd.35  
-38        0.88199        - 883 GiB 327 GiB 556 GiB 37.00 1.36   -     host host-353 
 36   ssd  0.44099  1.00000 442 GiB 159 GiB 282 GiB 36.03 1.33  79         osd.36  
 37   ssd  0.44099  1.00000 442 GiB 168 GiB 274 GiB 37.97 1.40  84         osd.37  
-47        0.88217        - 903 GiB  78 GiB 825 GiB  8.63 0.32   -     host host-369 
 41   ssd  0.44109  1.00000 452 GiB  40 GiB 412 GiB  8.77 0.32  29         osd.41  
 43   ssd  0.44109  1.00000 452 GiB  38 GiB 413 GiB  8.49 0.31  27         osd.43  
-53        0.88217        - 903 GiB  56 GiB 848 GiB  6.19 0.23   -     host host-370 
 46   ssd  0.44109  1.00000 452 GiB  54 GiB 398 GiB 11.91 0.44  33         osd.46  
 61   ssd  0.44109  1.00000 452 GiB 2.1 GiB 450 GiB  0.46 0.02  13         osd.61  
-55        0.88217        - 903 GiB 101 GiB 803 GiB 11.14 0.41   -     host host-372 
 48   ssd  0.44109  1.00000 452 GiB  49 GiB 402 GiB 10.94 0.40  31         osd.48  
 51   ssd  0.44109  1.00000 452 GiB  51 GiB 401 GiB 11.33 0.42  36         osd.51  
-49        0.88217        - 903 GiB  87 GiB 816 GiB  9.67 0.36   -     host host-373 
 42   ssd  0.44109  1.00000 452 GiB  45 GiB 407 GiB 10.00 0.37  30         osd.42  
 47   ssd  0.44109  1.00000 452 GiB  42 GiB 410 GiB  9.33 0.34  21         osd.47  
-61        0.88217        - 903 GiB 142 GiB 762 GiB 15.70 0.58   -     host host-374 
 53   ssd  0.44109  1.00000 452 GiB  79 GiB 373 GiB 17.52 0.65  35         osd.53  
 64   ssd  0.44109  1.00000 452 GiB  63 GiB 389 GiB 13.87 0.51  34         osd.64  
-44        0.88217        - 903 GiB  88 GiB 815 GiB  9.78 0.36   -     host host-375 
 40   ssd  0.44109  1.00000 452 GiB  43 GiB 409 GiB  9.53 0.35  29         osd.40  
 49   ssd  0.44109  1.00000 452 GiB  45 GiB 406 GiB 10.02 0.37  27         osd.49  
-51        0.88217        - 903 GiB 110 GiB 794 GiB 12.14 0.45   -     host host-376 
 45   ssd  0.44109  1.00000 452 GiB  68 GiB 384 GiB 14.98 0.55  40         osd.45  
 56   ssd  0.44109  1.00000 452 GiB  42 GiB 410 GiB  9.30 0.34  25         osd.56  
-63        0.88217        - 903 GiB  80 GiB 823 GiB  8.86 0.33   -     host host-377 
 54   ssd  0.44109  1.00000 452 GiB  43 GiB 409 GiB  9.55 0.35  22         osd.54  
 58   ssd  0.44109  1.00000 452 GiB  37 GiB 415 GiB  8.17 0.30  26         osd.58  
-42        0.88217        - 903 GiB  86 GiB 818 GiB  9.49 0.35   -     host host-378 
 39   ssd  0.44109  1.00000 452 GiB  40 GiB 411 GiB  8.93 0.33  25         osd.39  
 44   ssd  0.44109  1.00000 452 GiB  45 GiB 406 GiB 10.05 0.37  32         osd.44  
-59        0.88217        - 903 GiB  77 GiB 827 GiB  8.48 0.31   -     host host-379 
 52   ssd  0.44109  1.00000 452 GiB  47 GiB 404 GiB 10.51 0.39  28         osd.52  
 68   ssd  0.44109  1.00000 452 GiB  29 GiB 423 GiB  6.45 0.24  20         osd.68  
-40        0.44109        - 452 GiB  47 GiB 405 GiB 10.34 0.38   -     host host-380 
 38   ssd  0.44109  1.00000 452 GiB  47 GiB 405 GiB 10.34 0.38  34         osd.38  
-57        0.88217        - 903 GiB  56 GiB 848 GiB  6.17 0.23   -     host host-381 
 50   ssd  0.44109  1.00000 452 GiB  33 GiB 418 GiB  7.35 0.27  21         osd.50  
 63   ssd  0.44109  1.00000 452 GiB  23 GiB 429 GiB  4.98 0.18  25         osd.63  
 55              0        0     0 B     0 B     0 B     0    0   0 osd.55          
 57              0        0     0 B     0 B     0 B     0    0   0 osd.57          
 59   ssd        0        0     0 B     0 B     0 B     0    0   0 osd.59          
 60   ssd        0        0     0 B     0 B     0 B     0    0   0 osd.60          
 62   ssd        0        0     0 B     0 B     0 B     0    0   0 osd.62          
 65   ssd        0        0     0 B     0 B     0 B     0    0   0 osd.65          
 66              0        0     0 B     0 B     0 B     0    0   0 osd.66          
 67   ssd        0        0     0 B     0 B     0 B     0    0   0 osd.67          
                      TOTAL  27 TiB 7.2 TiB  19 TiB 27.11                          
MIN/MAX VAR: 0.02/1.66  STDDEV: 13.96

Fixed the crash by removing all osds from the crush tree (55, 57, 59, 60, 62, 65, 66, 67), although had to constantly restart the crashing monitors in order to get the first change in. Once all were removed, cluster became stable again.


Files

ceph-mon.host-369.log.xz (603 KB) ceph-mon.host-369.log.xz Iain Buclaw, 05/28/2019 09:43 AM
crushmap.txt (14.2 KB) crushmap.txt Iain Buclaw, 05/28/2019 10:18 AM

Related issues 1 (0 open1 closed)

Related to RADOS - Bug #39978: Adding OSD to Luminous Cluster will crash the active monDuplicate05/20/2019

Actions
Actions

Also available in: Atom PDF