Bug #45318: Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in cluster log running tasks/mon_clock_no_skews.yaml - RADOS - Ceph

Bug #45318

Updated by Brad Hubbard about 4 years ago

/a/teuthology-2020-04-26_02:30:03-rados-octopus-distro-basic-smithi/4984906 

 The MON log shows it came back up around 09:41:11. 
 <pre> 
 2020-04-26T09:41:09.547+0000 7f3e7a667540    4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work 
 2020-04-26T09:41:09.551+0000 7f3e7a667540    4 rocksdb: [db/db_impl.cc:563] Shutdown complete 
 2020-04-26T09:41:09.551+0000 7f3e7a667540    0 ceph-mon: created monfs at /var/lib/ceph/mon/ceph-d for mon.d 
 2020-04-26T09:41:11.139+0000 7f2561752540    0 ceph version 15.2.1-136-ga8c125c7d7 (a8c125c7d78f5cd973863993d258cd717ade4c99) octopus (stable), process ceph-mon, pid 12193 
 </pre> 

 Around that time in the teuthology log we see. 

 <pre> 
 2020-04-26T09:41:11.031 INFO:tasks.ceph.mon.d:Restarting daemon 
 2020-04-26T09:41:11.031 INFO:teuthology.orchestra.run.smithi101:> true 
 2020-04-26T09:41:11.036 INFO:teuthology.orchestra.run.smithi101:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f --cluster ceph -i d 
 2020-04-26T09:41:11.082 INFO:tasks.ceph.mon.d:Started 
 </pre> 

 So it is restarted around 09:41:11 but the warning is issued at 09:41:26. The monitor log shows that quorum wasn't achieved until 09:41:27. 

 <pre> 
 2020-04-26T09:41:27.343+0000 7f254c186700    7 mon.d@2(peon).log v7 update_from_paxos applying incremental log 7 2020-04-26T09:41:26.428693+0000 mon.b (mon.0) 27 : cluster [INF] mon.b is new leader, mons b,a,d,c,f,e in quorum (ranks 0,1,2,3,4,5) 
 </pre> 

 I don't think we can whitelist this message if we want to catch actual failures in the mons during the test.

Back

Project

General

Profile

Ceph » RADOS

Bug #45318