Project

General

Profile

Bug #3545

Updated by Sage Weil over 11 years ago


 Since I have only 2 ceph-mon's specifying 3 for the id is invalid. 

 ubuntu@ceph1:/var/log/ceph$ sudo ceph -s 
    health HEALTH_OK 
    monmap e1: 2 mons at {ceph1=192.168.101.2:6789/0,ceph2=192.168.102.2:6789/0}, election epoch 2, quorum 0,1 ceph1,ceph2  
    osdmap e27: 4 osds: 4 up, 4 in 
     pgmap v584: 192 pgs: 192 active+clean; 1027 MB data, 2214 MB used, 34601 MB / 36816 MB avail 
    mdsmap e9: 1/1/1 up {0=ceph2=up:active} 
 ubuntu@ceph1:/var/log/ceph$ sudo ceph mon tell 3 injectargs --debug-mon 20 
 2012-11-27 18:07:14.869621 7f6957385700    0 monclient: hunting for new mon 
 2012-11-27 18:07:15.091468 7f6957385700    0 monclient: hunting for new mon 
 2012-11-27 18:07:17.844026 7f6957385700    0 monclient: hunting for new mon 
 2012-11-27 18:07:17.993697 7f6957385700    0 monclient: hunting for new mon 
 2012-11-27 18:07:18.947649 7f6957385700    0 monclient: hunting for new mon 
 2012-11-27 18:07:19.090900 7f6957385700    0 monclient: hunting for new mon 
 2012-11-27 18:07:20.453571 7f6957385700    0 monclient: hunting for new mon 
 2012-11-27 18:07:20.669181 7f6957385700    0 monclient: hunting for new mon 
 ^C 
 ubuntu@ceph1:/var/log/ceph$ ps -ef | grep ceph-mon 
 ubuntu     10759    1139    0 18:10 pts/0      00:00:00 grep --color=auto ceph-mon 
 ubuntu@ceph1:/var/log$ tail kern.log 
 Nov 27 18:08:19 ceph1 kernel: [91747.074155] init: ceph-mon (ceph/ceph1) main process (10248) killed by ABRT signal 
 Nov 27 18:08:19 ceph1 kernel: [91747.074192] init: ceph-mon (ceph/ceph1) main process ended, respawning 
 Nov 27 18:08:20 ceph1 kernel: [91748.386154] init: ceph-mon (ceph/ceph1) main process (10338) killed by ABRT signal 
 Nov 27 18:08:20 ceph1 kernel: [91748.386342] init: ceph-mon (ceph/ceph1) main process ended, respawning 
 Nov 27 18:08:26 ceph1 kernel: [91754.143919] init: ceph-mon (ceph/ceph1) main process (10374) killed by ABRT signal 
 Nov 27 18:08:26 ceph1 kernel: [91754.144191] init: ceph-mon (ceph/ceph1) main process ended, respawning 
 Nov 27 18:08:27 ceph1 kernel: [91755.219081] init: ceph-mon (ceph/ceph1) main process (10454) killed by ABRT signal 
 Nov 27 18:08:27 ceph1 kernel: [91755.219119] init: ceph-mon (ceph/ceph1) main process ended, respawning 
 Nov 27 18:08:28 ceph1 kernel: [91756.078737] init: ceph-mon (ceph/ceph1) main process (10497) killed by ABRT signal 
 Nov 27 18:08:28 ceph1 kernel: [91756.078774] init: ceph-mon (ceph/ceph1) respawning too fast, stopped 

 I don't believe that it is a bug that ceph-mon ends up not respawning because the ceph command has no way to know that the injectargs itself is causes ceph-mon to crash.    So it is proper for it to keep trying to communicate with ceph-mon.    We should do better error checking in the ceph command and examine the ABRT occurred in ceph-mon.    Even better ceph-mon should return an error code back to the ceph command so that the user can be informed.

Back