MDS Daemon msgr-worker-2 thread crush
I found such log as below:
2018-08-17 19:07:03.523167 7f7418023700 0 -- 192.168.212.28:6801/3119423490 >> 192.168.213.61:0/1349706434 conn(0x560b3c1f6800 :6801 s=STATE_OPEN pgs=23126 cs=17 l=0).process bad tag 102 2018-08-17 19:07:03.524336 7f7418023700 0 -- 192.168.212.28:6801/3119423490 >> 192.168.213.61:0/1349706434 conn(0x560b399fb800 :6801 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 18 vs existing csq=17 existing_state=STATE_STANDBY 2018-08-17 19:07:03.558748 7f7418023700 -1 *** Caught signal (Segmentation fault) ** in thread 7f7418023700 thread_name:msgr-worker-2 ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable) 1: (()+0x5bdfa4) [0x560b2befbfa4] 2: (()+0x11390) [0x7f741bb7f390] 3: (ceph::buffer::ptr::c_str()+0x23) [0x560b2befe333] 4: (AsyncConnection::_process_connection()+0x141b) [0x560b2c2c81ab] 5: (AsyncConnection::process()+0x1ae8) [0x560b2c2cdb98] 6: (EventCenter::process_events(int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xa08) [0x560b2bfe3128] 7: (()+0x6a90b8) [0x560b2bfe70b8] 8: (()+0xb8c80) [0x7f741b47bc80] 9: (()+0x76ba) [0x7f741bb756ba] 10: (clone()+0x6d) [0x7f741abe141d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
#4 Updated by Michael Yang 3 months ago
John Spray wrote:
On its own, this probably isn't going to be enough to diagnose an issue -- the crash may be caused by something bad that another thread did.
Has this happened again since?
No, it only happy once when the CephFS Metadata Pool is rebalance after I add more OSDs;