Project

General

Profile

Bug #5171

Updated by Joao Eduardo Luis almost 11 years ago

Rebooted cluster (3 nodes, 3 monitors). After it one of monitors sync, update database, then die (assertion) and then always want to use address of other mon (IMHO). 

 There are log tail (monitor mon.1, on address 10.227.227.101, but trying to use 10.227.227.103 from mon.3): 

 <pre> 
    

    -25> 2013-05-25 01:26:48.662849 7fc062a83700    1 mon.1@0(synchronizing) e1 discarding message auth(proto 0 30 bytes epoch 1) v1 and sending client elsewhere 
    -24> 2013-05-25 01:26:48.662852 7fc062a83700    1 -- 10.227.227.101:6789/0 mark_down 0x7fc048004900 -- pipe dne 
    -23> 2013-05-25 01:26:48.662858 7fc062a83700    5 mon.1@0(synchronizing) e1 waitlisting message auth(proto 0 26 bytes epoch 1) v1 
    -22> 2013-05-25 01:26:48.662862 7fc062a83700    1 mon.1@0(synchronizing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere 
    -21> 2013-05-25 01:26:48.662865 7fc062a83700    1 -- 10.227.227.101:6789/0 mark_down 0x7fc0480056b0 -- pipe dne 
    -20> 2013-05-25 01:26:48.662871 7fc062a83700    5 mon.1@0(synchronizing) e1 waitlisting message auth(proto 0 26 bytes epoch 1) v1 
    -19> 2013-05-25 01:26:48.662874 7fc062a83700    1 mon.1@0(synchronizing) e1 discarding message auth(proto 0 30 bytes epoch 0) v1 and sending client elsewhere 
    -18> 2013-05-25 01:26:48.662877 7fc062a83700    1 -- 10.227.227.101:6789/0 mark_down 0x7fc048002f80 -- pipe dne 
    -17> 2013-05-25 01:26:48.662883 7fc062a83700    1 mon.1@0(synchronizing) e1 discarding message auth(proto 0 30 bytes epoch 1) v1 and sending client elsewhere 
    -16> 2013-05-25 01:26:48.662885 7fc062a83700    1 -- 10.227.227.101:6789/0 mark_down 0x7fc048001060 -- pipe dne 
    -15> 2013-05-25 01:26:48.662890 7fc062a83700    1 mon.1@0(synchronizing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere 
    -14> 2013-05-25 01:26:48.662892 7fc062a83700    1 -- 10.227.227.101:6789/0 mark_down 0x7fc048006e90 -- pipe dne 
    -13> 2013-05-25 01:26:48.662898 7fc062a83700    5 mon.1@0(synchronizing) e1 waitlisting message auth(proto 0 26 bytes epoch 1) v1 
    -12> 2013-05-25 01:26:48.662901 7fc062a83700    5 mon.1@0(synchronizing) e1 waitlisting message auth(proto 0 30 bytes epoch 0) v1 
    -11> 2013-05-25 01:26:48.662948 7fc062282700    5 mon.1@0(synchronizing) e1 waitlisting message auth(proto 0 30 bytes epoch 1) v1 
    -10> 2013-05-25 01:26:48.662989 7fc062282700    1 -- 10.227.227.101:6789/0 <== client.26092 10.227.227.104:0/2032174 1 ==== auth(proto 0 30 bytes epoch 1) v1 ==== 60+0+0 (681331102 0 0) 0x7fbeb8006270 con 0x7fc0480008c0 
     -9> 2013-05-25 01:26:48.663013 7fc062282700    5 mon.1@0(synchronizing) e1 waitlisting message auth(proto 0 30 bytes epoch 1) v1 
     -8> 2013-05-25 01:26:48.663022 7fc062282700    1 -- 10.227.227.101:6789/0 <== client.26095 10.227.227.104:0/3032174 1 ==== auth(proto 0 30 bytes epoch 1) v1 ==== 60+0+0 (2064528846 0 0) 0x7fbf3c00a940 con 0x7fc048000a20 
     -7> 2013-05-25 01:26:48.663035 7fc062282700    5 mon.1@0(synchronizing) e1 waitlisting message auth(proto 0 30 bytes epoch 1) v1 
     -6> 2013-05-25 01:26:48.915526 7fc060f7d700    1 -- 10.227.227.101:6789/0 >> :/0 pipe(0x7fc0480035d0 sd=27 :6789 s=0 pgs=0 cs=0 l=0).accept sd=27 10.227.227.101:35927/0 
     -5> 2013-05-25 01:26:48.915929 7fc062282700    1 -- 10.227.227.101:6789/0 <== osd.1 10.227.227.101:6805/29217 1 ==== auth(proto 0 26 bytes epoch 1) v1 ==== 56+0+0 (1484277464 0 0) 0x7fc00000ee40 con 0x7fc0480027b0 
     -4> 2013-05-25 01:26:48.915974 7fc062282700    5 mon.1@0(synchronizing) e1 waitlisting message auth(proto 0 26 bytes epoch 1) v1 
     -3> 2013-05-25 01:26:48.981958 7fc062282700    1 -- 10.227.227.101:6789/0 <== mon.1 10.227.227.103:6789/0 16632 ==== election(2b558bc4-8db2-44d3-8fac-9d37d9ed0549 propose 2895) v4 ==== 525+0+0 (373874603 0 0) 0x7fc03c000b40 con 0x1d99790 
     -2> 2013-05-25 01:26:49.039211 7fc026534700    2 -- 10.227.227.101:6789/0 >> 10.227.227.101:6800/29194 pipe(0x7fc048001490 sd=24 :6789 s=2 pgs=622 cs=1 l=1).reader couldn't read tag, Success 
     -1> 2013-05-25 01:26:49.039255 7fc026534700    2 -- 10.227.227.101:6789/0 >> 10.227.227.101:6800/29194 pipe(0x7fc048001490 sd=24 :6789 s=2 pgs=622 cs=1 l=1).fault 0: Success 
      0> 2013-05-25 01:26:49.225640 7fc062a83700 -1 mon/Monitor.cc: In function 'void Monitor::sync_timeout(entity_inst_t&)' thread 7fc062a83700 time 2013-05-25 01:26:49.224458 
 mon/Monitor.cc: 1099: FAILED assert(sync_role == SYNC_ROLE_REQUESTER) 

  ceph version 0.62-350-gbb407bf (bb407bfd101d9707c63655cb3c1b51346a0de8cd) 
  1: (Monitor::sync_timeout(entity_inst_t&)+0x9eb) [0x4d240b] 
  2: (Context::complete(int)+0xa) [0x4e255a] 
  3: (SafeTimer::timer_thread()+0x1c2) [0x6962a2] 
  4: (SafeTimerThread::entry()+0xd) [0x697e3d] 
  5: (()+0x8216) [0x7fc0670f7216] 
  6: (clone()+0x6d) [0x7fc0655d204d] 
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 

 --- logging levels --- 
    0/ 5 none 
    0/ 1 lockdep 
    0/ 1 context 
    1/ 1 crush 
    1/ 5 mds 
    1/ 5 mds_balancer 
    1/ 5 mds_locker 
    1/ 5 mds_log 
    1/ 5 mds_log_expire 
    1/ 5 mds_migrator 
    0/ 1 buffer 
    0/ 1 timer 
    0/ 1 filer 
    0/ 1 striper 
    0/ 1 objecter 
    0/ 5 rados 
    0/ 5 rbd 
    0/ 5 journaler 
    0/ 5 objectcacher 
    0/ 5 client 
    0/ 5 osd 
    0/ 5 optracker 
    0/ 5 objclass 
    1/ 3 filestore 
    1/ 3 journal 
    0/ 5 ms 
    1/ 5 mon 
    0/10 monc 
    0/ 5 paxos 
    0/ 5 tp 
    1/ 5 auth 
    1/ 5 crypto 
    1/ 1 finisher 
    1/ 5 heartbeatmap 
    1/ 5 perfcounter 
    1/ 5 rgw 
    1/ 5 hadoop 
    1/ 5 javaclient 
    1/ 5 asok 
    1/ 1 throttle 
   -2/-2 (syslog threshold) 
   -1/-1 (stderr threshold) 
   max_recent       10000 
   max_new           1000 
   log_file /var/log/ceph/ceph-mon.1.log 
 --- end dump of recent events --- 
 2013-05-25 01:26:49.364326 7fc062a83700 -1 *** Caught signal (Aborted) ** 
  in thread 7fc062a83700 

  ceph version 0.62-350-gbb407bf (bb407bfd101d9707c63655cb3c1b51346a0de8cd) 
  1: /usr/bin/ceph-mon() [0x5c4fbd] 
  2: (()+0x10630) [0x7fc0670ff630] 
  3: (gsignal()+0x39) [0x7fc0654ce8c9] 
  4: (abort()+0x160) [0x7fc0654cff80] 
  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fc065e324f5] 
  6: (()+0x63666) [0x7fc065e30666] 
  7: (()+0x63693) [0x7fc065e30693] 
  8: (()+0x638be) [0x7fc065e308be] 
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x425) [0x6a2485] 
  10: (Monitor::sync_timeout(entity_inst_t&)+0x9eb) [0x4d240b] 
  11: (Context::complete(int)+0xa) [0x4e255a] 
  12: (SafeTimer::timer_thread()+0x1c2) [0x6962a2] 
  13: (SafeTimerThread::entry()+0xd) [0x697e3d] 
  14: (()+0x8216) [0x7fc0670f7216] 
  15: (clone()+0x6d) [0x7fc0655d204d] 
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 

 --- begin dump of recent events --- 
      0> 2013-05-25 01:26:49.364326 7fc062a83700 -1 *** Caught signal (Aborted) ** 
  in thread 7fc062a83700 

  ceph version 0.62-350-gbb407bf (bb407bfd101d9707c63655cb3c1b51346a0de8cd) 
  1: /usr/bin/ceph-mon() [0x5c4fbd] 
  2: (()+0x10630) [0x7fc0670ff630] 
  3: (gsignal()+0x39) [0x7fc0654ce8c9] 
  4: (abort()+0x160) [0x7fc0654cff80] 
  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fc065e324f5] 
  6: (()+0x63666) [0x7fc065e30666] 
  7: (()+0x63693) [0x7fc065e30693] 
  8: (()+0x638be) [0x7fc065e308be] 
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x425) [0x6a2485] 
  10: (Monitor::sync_timeout(entity_inst_t&)+0x9eb) [0x4d240b] 
  11: (Context::complete(int)+0xa) [0x4e255a] 
  12: (SafeTimer::timer_thread()+0x1c2) [0x6962a2] 
  13: (SafeTimerThread::entry()+0xd) [0x697e3d] 
  14: (()+0x8216) [0x7fc0670f7216] 
  15: (clone()+0x6d) [0x7fc0655d204d] 
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 

 --- logging levels --- 
    0/ 5 none 
    0/ 1 lockdep 
    0/ 1 context 
    1/ 1 crush 
    1/ 5 mds 
    1/ 5 mds_balancer 
    1/ 5 mds_locker 
    1/ 5 mds_log 
    1/ 5 mds_log_expire 
    1/ 5 mds_migrator 
    0/ 1 buffer 
    0/ 1 timer 
    0/ 1 filer 
    0/ 1 striper 
    0/ 1 objecter 
    0/ 5 rados 
    0/ 5 rbd 
    0/ 5 journaler 
    0/ 5 objectcacher 
    0/ 5 client 
    0/ 5 osd 
    0/ 5 optracker 
    0/ 5 objclass 
    1/ 3 filestore 
    1/ 3 journal 
    0/ 5 ms 
    1/ 5 mon 
    0/10 monc 
    0/ 5 paxos 
    0/ 5 tp 
    1/ 5 auth 
    1/ 5 crypto 
    1/ 1 finisher 
    1/ 5 heartbeatmap 
    1/ 5 perfcounter 
    1/ 5 rgw 
    1/ 5 hadoop 
    1/ 5 javaclient 
    1/ 5 asok 
    1/ 1 throttle 
   -2/-2 (syslog threshold) 
   -1/-1 (stderr threshold) 
   max_recent       10000 
   max_new           1000 
   log_file /var/log/ceph/ceph-mon.1.log 
 --- end dump of recent events --- 
 2013-05-25 01:30:01.148597 7f44053d77c0    0 ceph version 0.62-350-gbb407bf (bb407bfd101d9707c63655cb3c1b51346a0de8cd), process ceph-mon, pid 14126 
 2013-05-25 01:30:01.688168 7f44053d77c0 -1 accepter.accepter.bind unable to bind to 10.227.227.103:6789: Cannot assign requested address 
 </pre>

Back