Project

General

Profile

Backport #14668

Updated by Nathan Cutler about 8 years ago

One of our hammer clusters won't start now after running ceph mds getmap. 
 I did: 
 <pre> 
 # ceph status 
     cluster dd535a7e-4647-4bee-853d-f34112615f81 
      health HEALTH_WARN 
             mds cluster is degraded 
      monmap e30: 3 mons at {p01001532077488=128.142.36.227:6790/0,p01001532149022=128.142.39.77:6790/0,p01001532184554=128.142.39.144:6790/0} 
             election epoch 2170, quorum 0,1,2 p01001532077488,p01001532149022,p01001532184554 
      mdsmap e2493: 1/2/2 up {1=cephmdsd2=up:resolve}, 2 up:standby 
      osdmap e96770: 217 osds: 216 up, 216 in 
       pgmap v19240490: 18208 pgs, 37 pools, 5001 GB data, 11600 kobjects 
             15383 GB used, 435 TB / 450 TB avail 
                18206 active+clean 
                    2 active+clean+scrubbing 

 # ceph mds getmap 2493 
 </pre> 

 and now the ceph-mon processes all crash at startup like this: 


 <pre> 
 2016-02-05 09:42:30.855475 7f2c1a0e1700    0 log_channel(audit) log [DBG] : from='client.1 
 74542645 128.142.142.252:0/1003572' entity='client.admin' cmd=[{"prefix": "mds getmap", 
 "epoch": 2493}]: dispatch 
 2016-02-05 09:42:30.858517 7f2c1a0e1700 -1 mon/MDSMonitor.cc: In function 'bool MDSMonit 
 or::preprocess_command(MMonCommand*)' thread 7f2c1a0e1700 time 2016-02-05 09:42:30.85555 
 1 
 mon/MDSMonitor.cc: 757: FAILED assert(r == 0) 

  ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f3b65] 
  2: (MDSMonitor::preprocess_command(MMonCommand*)+0xfae) [0x67251e] 
  3: (MDSMonitor::preprocess_query(PaxosServiceMessage*)+0x28b) [0x675d2b] 
  4: (PaxosService::dispatch(PaxosServiceMessage*)+0x833) [0x607433] 
  5: (Monitor::handle_command(MMonCommand*)+0x11f1) [0x5cdf01] 
  6: (Monitor::dispatch(MonSession*, Message*, bool)+0xf9) [0x5d13c9] 
  7: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x5d2076] 
  8: (Monitor::ms_dispatch(Message*)+0x23) [0x5f1443] 
  9: (DispatchQueue::entry()+0x62a) [0x947d5a] 
  10: (DispatchQueue::DispatchThread::entry()+0xd) [0x7dbbad] 
  11: (()+0x7dc5) [0x7f2c2542cdc5] 
  12: (clone()+0x6d) [0x7f2c23f0e21d] 
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 
 </pre> 

 Eeekk! help?

Back