Project

General

Profile

Bug #34525

MDS Daemon msgr-worker-2 thread crush

Added by Michael Yang 4 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
08/30/2018
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:

Description

I found such log as below:

2018-08-17 19:07:03.523167 7f7418023700  0 -- 192.168.212.28:6801/3119423490 >> 192.168.213.61:0/1349706434 conn(0x560b3c1f6800 :6801 s=STATE_OPEN pgs=23126 cs=17 l=0).process bad tag 102
2018-08-17 19:07:03.524336 7f7418023700  0 -- 192.168.212.28:6801/3119423490 >> 192.168.213.61:0/1349706434 conn(0x560b399fb800 :6801 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 18 vs existing csq=17 existing_state=STATE_STANDBY
2018-08-17 19:07:03.558748 7f7418023700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f7418023700 thread_name:msgr-worker-2

 ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)
 1: (()+0x5bdfa4) [0x560b2befbfa4]
 2: (()+0x11390) [0x7f741bb7f390]
 3: (ceph::buffer::ptr::c_str()+0x23) [0x560b2befe333]
 4: (AsyncConnection::_process_connection()+0x141b) [0x560b2c2c81ab]
 5: (AsyncConnection::process()+0x1ae8) [0x560b2c2cdb98]
 6: (EventCenter::process_events(int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xa08) [0x560b2bfe3128]
 7: (()+0x6a90b8) [0x560b2bfe70b8]
 8: (()+0xb8c80) [0x7f741b47bc80]
 9: (()+0x76ba) [0x7f741bb756ba]
 10: (clone()+0x6d) [0x7f741abe141d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ceph-mds.mds-jq7.log.gz (856 KB) Michael Yang, 09/13/2018 11:25 AM


Related issues

Related to RADOS - Bug #25027: mon: src/msg/async/AsyncConnection.cc: 1710: FAILED assert(can_write == WriteStatus::NOWRITE) In Progress 07/20/2018

History

#1 Updated by John Spray 3 months ago

  • Project changed from Ceph to fs
  • Category deleted (msgr)

On its own, this probably isn't going to be enough to diagnose an issue -- the crash may be caused by something bad that another thread did.

Has this happened again since?

#2 Updated by Patrick Donnelly 3 months ago

  • Related to Bug #25027: mon: src/msg/async/AsyncConnection.cc: 1710: FAILED assert(can_write == WriteStatus::NOWRITE) added

#3 Updated by Patrick Donnelly 3 months ago

  • Description updated (diff)
  • Target version set to v14.0.0
  • Tags deleted (Luminous 12.2.7)
  • Backport set to mimic,luminous
  • ceph-qa-suite deleted (fs)
  • Component(FS) MDS added
  • Labels (FS) crash added

May be related to #25027.

#4 Updated by Michael Yang 3 months ago

John Spray wrote:

On its own, this probably isn't going to be enough to diagnose an issue -- the crash may be caused by something bad that another thread did.

Has this happened again since?

No, it only happy once when the CephFS Metadata Pool is rebalance after I add more OSDs;

#5 Updated by Michael Yang 3 months ago

I upload the related log about the crush MDS, find it from attachment.

Also available in: Atom PDF