Project

General

Profile

Actions

Bug #18817

closed

MDS crush: thread_name:ms_dispatch

Added by Ahmed Akhuraidah about 7 years ago. Updated about 7 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
fs, kcephfs
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Have crushed MDS daemon during executing different payload tests with such dump:

--- begin dump of recent events ---
0> 2017-02-03 02:34:41.974639 7f7e8ec5e700 -1 ** Caught signal (Aborted) *
in thread 7f7e8ec5e700 thread_name:ms_dispatch

ceph version 10.2.4-211-g12b091b (12b091b4a40947aa43919e71a318ed0dcedc8734)
1: (()+0x5142a2) [0x557c51e092a2]
2: (()+0x10b00) [0x7f7e95df2b00]
3: (gsignal()+0x37) [0x7f7e93ccb8d7]
4: (abort()+0x13a) [0x7f7e93ccccaa]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x557c51f133d5]
6: (MutationImpl::~MutationImpl()+0x28e) [0x557c51bb9e1e]
7: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()+0x39) [0x557c51b2ccf9]
8: (Locker::check_inode_max_size(CInode*, bool, bool, unsigned long, bool, unsigned long, utime_t)+0x9a7) [0x557c51ca2757]
9: (Locker::remove_client_cap(CInode*, client_t)+0xb1) [0x557c51ca38f1]
10: (Locker::_do_cap_release(client_t, inodeno_t, unsigned long, unsigned int, unsigned int)+0x90d) [0x557c51ca424d]
11: (Locker::handle_client_cap_release(MClientCapRelease*)+0x1cc) [0x557c51ca449c]
12: (MDSRank::handle_deferrable_message(Message*)+0xc1c) [0x557c51b33d3c]
13: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x557c51b3c991]
14: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x557c51b3dae5]
15: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x557c51b25703]
16: (DispatchQueue::entry()+0x78b) [0x557c5200d06b]
17: (DispatchQueue::DispatchThread::entry()+0xd) [0x557c51ee5dcd]
18: (()+0x8734) [0x7f7e95dea734]
19: (clone()+0x6d) [0x7f7e93d80d3d]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

How to reproduce the issue:

1th option is FIO test. Run FIO payload on client machine with config below, then execute "echo 3 > /proc/sys/vm/drop_caches" and after run test again.

[test]
blocksize=64k
filename=/mnt/mycephfs/payload3G
rw=randread
direct=1
buffered=0
ioengine=libaio
iodepth=8
runtime=300
filesize=3G

2th option is "inodes payload". Download latest Joomla distribution, unzip and perfrom this:

time for i in `seq 100`; do cp -a joomla /mnt/mycephfs/joomla${i}; done

My sandbox is pretty simple and consists server (cephnode below) and client (payload below) machines:

cephnode:~ # ceph -s
cluster c848af4a-98ea-498c-87d6-059ebf609287
health HEALTH_WARN
mds cephnode is laggy
monmap e1: 1 mons at {cephnode=192.168.10.20:6789/0}
election epoch 9, quorum 0 cephnode
fsmap e96: 1/1/1 up {0=cephnode=up:active(laggy or crashed)}
osdmap e96: 1 osds: 1 up, 1 in
flags sortbitwise,require_jewel_osds
pgmap v1832: 204 pgs, 3 pools, 3072 MB data, 787 objects
3117 MB used, 396 GB / 399 GB avail
204 active+clean

cephnode:~ # cat /etc/ceph/ceph.conf
[global]
fsid = c848af4a-98ea-498c-87d6-059ebf609287
mon_initial_members = cephnode
mon_host = 192.168.10.20
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_pool_default_size = 1
mds_log = false

cephnode:~ # lsb_release -a
LSB Version: n/a
Distributor ID: SUSE
Description: SUSE Linux Enterprise Server 12 SP2
Release: 12.2
Codename: n/a

cephnode:~ # uname -a
Linux cephnode 4.4.38-93-default #1 SMP Wed Dec 14 12:59:43 UTC 2016 (2d3e9d4) x86_64 x86_64 x86_64 GNU/Linux

cephnode:~ # ceph -v
ceph version 10.2.4-211-g12b091b (12b091b4a40947aa43919e71a318ed0dcedc8734)

payload:~ # ceph -v
ceph version 10.2.4-211-g12b091b (12b091b4a40947aa43919e71a318ed0dcedc8734)

payload:~ # mount
...
192.168.10.20:6789:/ on /mnt/mycephfs type ceph (rw,relatime,name=admin,secret=<hidden>,acl)

payload:~ # lsb_release -a
LSB Version: n/a
Distributor ID: SUSE
Description: SUSE Linux Enterprise Server 12 SP2
Release: 12.2
Codename: n/a

payload:~ # uname -a
Linux payload 4.4.38-93-default #1 SMP Wed Dec 14 12:59:43 UTC 2016 (2d3e9d4) x86_64 x86_64 x86_64 GNU/Linux

Actions #1

Updated by Greg Farnum about 7 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF