Bug #21070: MDS: MDS is laggy or crashed When deleting a large number of files - CephFS - Ceph

Actions

Copy link

Bug #21070

closed

MDS: MDS is laggy or crashed When deleting a large number of files

Added by huanwen ren over 6 years ago. Updated about 6 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Zheng Yan

Category:

Target version:

Ceph - v12.2.0

% Done:

Source:

Tags:

Backport:

luminous

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

We plan to use mdtest to create 100w level of the file, in the ceph-fuse mount the directory, the command is as follows:

./mdtest -d / cephfs / test / -z 1 -w 10k 1 -T -b 10 -L -n 1000000 -F -C

The creation process is normal, but when we delete the file, the rm -rf command is used,the error happened, as follows:

services:
    mon: 2 daemons, quorum 192.9.9.221,192.9.9.223
    mgr: 192.9.9.221(active), standbys: 192.9.9.223
    mds: 1/1/1 up {0=192.9.9.223=up:active(laggy or crashed)}
    osd: 19 osds: 19 up, 19 in

The mds log has the following error：

2017-08-22 15:29:11.017122 7ffb752f8700  1 mds.0.28 recovery_done -- successful recovery!
2017-08-22 15:29:11.017247 7ffb752f8700  1 mds.0.28 active_start
2017-08-22 15:29:11.018963 7ffb752f8700  1 mds.0.28 cluster recovered.
2017-08-22 15:29:11.023689 7ffb6f2ec700 -1 *** Caught signal (Segmentation fault) **
 in thread 7ffb6f2ec700 thread_name:mds_rank_progr

 ceph version 12.1.0.3 (48ff484a1b75334f6a08c5b3dbdbb0abfa1cb2cf) luminous (dev)
 1: (()+0x58592f) [0x7ffb7e1d792f]
 2: (()+0xf130) [0x7ffb7bd25130]
 3: (Server::handle_client_readdir(boost::intrusive_ptr<MDRequestImpl>&)+0xbb9) [0x7ffb7df54989]
 4: (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0x9b1) [0x7ffb7df85a51]
 5: (MDSInternalContextBase::complete(int)+0x1eb) [0x7ffb7e162dcb]
 6: (MDSRank::_advance_queues()+0x4a5) [0x7ffb7df0d665]
 7: (MDSRank::ProgressThread::entry()+0x4a) [0x7ffb7df0daca]
 8: (()+0x7df3) [0x7ffb7bd1ddf3]
 9: (clone()+0x6d) [0x7ffb7ae033ed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Description:
12.1.0.3 comes from version 12.1.0

Files

readdir.patch (838 Bytes) readdir.patch

Zheng Yan, 08/24/2017 10:48 AM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by huanwen ren over 6 years ago

I open the log to try to view the problem where the log information is as follows：

2017-08-22 16:46:37.846888 7fbda3aae700 10 mds.0.server rdlock_path_pin_ref request(client.1483804:4138670 cr=0x7fbdbc093500) #10000000007
2017-08-22 16:46:37.846890 7fbda3aae700 10 mds.0.locker acquire_locks request(client.1483804:4138670 cr=0x7fbdbc093500) - done locking
2017-08-22 16:46:37.846893 7fbda3aae700 20 Session check_access path /test/#test-dir.0/mdtest_tree.0/mdtest_tree.5
2017-08-22 16:46:37.846894 7fbda3aae700 10 MDSAuthCap is_capable inode(path /test/#test-dir.0/mdtest_tree.0/mdtest_tree.5 owner 0:0 mode 040755) by caller 0:0 mask 1 new 0:0 cap: MDSAuthCaps[allow *]
2017-08-22 16:46:37.846896 7fbda3aae700 10 mds.0.server  frag 111111* offset '' offset_hash 16776760 flags 1
2017-08-22 16:46:37.846898 7fbda3aae700 10 mds.0.server handle_client_readdir on [dir 10000000007.111111* /test/#test-dir.0/mdtest_tree.0/mdtest_tree.5/ [2,head] auth v=85827 cv=0/0 state=1073741826|complete f(v0 m2017-08-22 15:28:22.988204) n(v0 rc2017-08-22 15:28:22.988204) hs=0+389,ss=0+0 dirty=389 | child=1 waiter=0 authpin=0 0x7fbdbef91a80]
2017-08-22 16:46:37.846905 7fbda3aae700 10 mds.0.server snapid head
2017-08-22 16:46:37.848174 7fbda3aae700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fbda3aae700 thread_name:mds_rank_progr

 ceph version 12.1.0.3 (48ff484a1b75334f6a08c5b3dbdbb0abfa1cb2cf) luminous (dev)
 1: (()+0x58592f) [0x7fbdb299992f]
 2: (()+0xf130) [0x7fbdb04e7130]
 3: (Server::handle_client_readdir(boost::intrusive_ptr<MDRequestImpl>&)+0xbb9) [0x7fbdb2716989]
 4: (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0x9b1) [0x7fbdb2747a51]
 5: (MDSInternalContextBase::complete(int)+0x1eb) [0x7fbdb2924dcb]
 6: (MDSRank::_advance_queues()+0x4a5) [0x7fbdb26cf665]
 7: (MDSRank::ProgressThread::entry()+0x4a) [0x7fbdb26cfaca]
 8: (()+0x7df3) [0x7fbdb04dfdf3]
 9: (clone()+0x6d) [0x7fbdaf5c53ed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
-10000> 2017-08-22 16:46:36.950581 7fbda9aba700  7 mds.0.cache    dentry [dentry #100/stray0/100000bf8e

Actions

Copy link

Updated by xie xingguo over 6 years ago

Assignee set to Patrick Donnelly
Priority changed from Normal to Urgent

Actions

Copy link

Updated by huanwen ren over 6 years ago

I added some print debugging information, and can be reproduced:

1.The right to print

2017-08-24 14:48:05.996554 7f8a0c3ca700 20 Session check_access path /test/#test-dir.0/mdtest_tree.0/mdtest_tree.7
2017-08-24 14:48:05.996556 7f8a0c3ca700 10 MDSAuthCap is_capable inode(path /test/#test-dir.0/mdtest_tree.0/mdtest_tree.7 own
er 0:0 mode 040755) by caller 0:0 mask 1 new 0:0 cap: MDSAuthCaps[allow *]
2017-08-24 14:48:05.996561 7f8a0c3ca700 10 mds.0.server  frag 111111* offset 'file.mdtest.0.730836' offset_hash 16514491 flag
s 1
2017-08-24 14:48:05.996583 7f8a0c3ca700 10 mds.0.server handle_client_readdir on [dir 0x10000000009.111111* /test/#test-dir.0
/mdtest_tree.0/mdtest_tree.7/ [2,head] auth v=82019 cv=82019/82019 state=1073741826|complete f(v0 m2017-08-22 14:55:59.141856
 1517=1517+0) n(v0 rc2017-08-22 14:55:59.141856 b15170 1517=1517+0) hs=1517+0,ss=0+0 | child=1 waiter=0 authpin=0 0x7f8a25bd4
d80]
2017-08-24 14:48:05.996593 7f8a0c3ca700 10 mds.0.server snapid head
2017-08-24 14:48:05.996594 7f8a0c3ca700 10 mds.0.server test1
2017-08-24 14:48:05.996595 7f8a0c3ca700 10 mds.0.server test2
2017-08-24 14:48:05.996597 7f8a0c3ca700 10 mds.0.server test3
2017-08-24 14:48:05.996598 7f8a0c3ca700 10 mds.0.server test4 589758
2017-08-24 14:48:05.996599 7f8a0c3ca700 10 mds.0.server test5
2017-08-24 14:48:05.996600 7f8a0c3ca700 10 mds.0.server test6
2017-08-24 14:48:05.996601 7f8a0c3ca700 10 mds.0.server test6-1
2017-08-24 14:48:05.996602 7f8a0c3ca700 10 mds.0.server dn1-10x7f8a357dbdc0
2017-08-24 14:48:05.996603 7f8a0c3ca700 10 mds.0.server dn1-2 0x7f8a357dbdc0
2017-08-24 14:48:05.996604 7f8a0c3ca700 10 mds.0.server dn1-3[dentry #0x1/test/#test-dir.0/mdtest_tree.0/mdtest_tree.7/file.m
dtest.0.775074 [2,head] auth (dversion lock) pv=0 v=82019 inode=0x7f8a3600d200 state=1073741824 0x7f8a357dbdc0]
2017-08-24 14:48:05.996608 7f8a0c3ca700 10 mds.0.server dn2->inode0x7f8a25ae3000
2017-08-24 14:48:05.996609 7f8a0c3ca700 10 mds.0.server dn3->linkage0x7f8a3600d200
2017-08-24 14:48:05.996610 7f8a0c3ca700 10 mds.0.server dn20x7f8a357dbdc0
2017-08-24 14:48:05.996611 7f8a0c3ca700 10 mds.0.server test6-2
2017-08-24 14:48:05.996612 7f8a0c3ca700 10 mds.0.server test6-3
2017-08-24 14:48:05.996613 7f8a0c3ca700 10 mds.0.server test6-4
2017-08-24 14:48:05.996614 7f8a0c3ca700 10 mds.0.server test6-5

2.The erro to print

2017-08-24 14:55:40.930126 7f8a0abc7700 10 mds.0.server  frag 111* offset '' offset_hash 16777078 flags 1
2017-08-24 14:55:40.930130 7f8a0abc7700 10 mds.0.server handle_client_readdir on [dir 0x10000000009.111* /test/#test-dir.0/md
test_tree.0/mdtest_tree.7/ [2,head] auth v=85713 cv=85407/85407 state=1610612738|complete f(v0 m2017-08-24 14:55:39.890505) n
(v0 rc2017-08-24 14:55:39.890505) hs=0+153,ss=0+0 dirty=153 | child=1 frozen=0 dirty=1 waiter=0 authpin=0 0x7f8a25101740]
2017-08-24 14:55:40.930138 7f8a0abc7700 10 mds.0.server snapid head
2017-08-24 14:55:40.930152 7f8a0abc7700 10 mds.0.server test1
2017-08-24 14:55:40.930152 7f8a0abc7700 10 mds.0.server test2
2017-08-24 14:55:40.930156 7f8a0abc7700 10 mds.0.server test3
2017-08-24 14:55:40.930157 7f8a0abc7700 10 mds.0.server test4 589758
2017-08-24 14:55:40.930158 7f8a0abc7700 10 mds.0.server test5
2017-08-24 14:55:40.930163 7f8a0abc7700 10 mds.0.server test6
2017-08-24 14:55:40.930164 7f8a0abc7700 10 mds.0.server test6-1
2017-08-24 14:55:40.930165 7f8a0abc7700 10 mds.0.server dn1-10x600000000000099
2017-08-24 14:55:40.930166 7f8a0abc7700 10 mds.0.server dn1-2 0x600000000000099
2017-08-24 14:55:40.934081 7f8a0abc7700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f8a0abc7700 thread_name:fn_anonymous

 ceph version 12.1.2-589-gc8c837f (c8c837f665edc6e32efda5f15dedd8a48698214a) luminous (rc)
 1: (()+0x594dd1) [0x7f8a19ac1dd1]
 2: (()+0xf130) [0x7f8a17600130]
 3: (CDentry::make_path(filepath&, bool) const+0x7) [0x7f8a199ba4b7]
 4: (operator<<(std::ostream&, CDentry const&)+0x77) [0x7f8a199ba607]
 5: (Server::handle_client_readdir(boost::intrusive_ptr<MDRequestImpl>&)+0x110b) [0x7f8a1983af1b]
 6: (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0x9c1) [0x7f8a1986dc31]
 7: (MDSInternalContextBase::complete(int)+0x1eb) [0x7f8a19a4be6b]
 8: (void finish_contexts<MDSInternalContextBase>(CephContext*, std::list<MDSInternalContextBase*, std::allocator<MDSInternal
ContextBase*> >&, int)+0xac) [0x7f8a1980982c]

test is my add print info as fllow:

dout(10) << "test5 " << dendl;
  for (CDir::map_t::iterator it = start ? dir->begin() : dir->lower_bound(skip_key);
       !end && numfiles < max;
       end = (it == dir->end())) {
    dout(10) << "test6 " << dendl;
    CDentry *dn = it->second;
    ++it;
    dout(10) << "test6-1 " << dendl;
    if (!dn) {
       dout(10) << "test6-1111 " << dendl;
       continue;
    }
    dout(10) << "dn1-1" << dn << dendl;
    dout(10) << "dn1-2 " << dn << dendl;
    dout(10) << "dn1-3" << *dn << dendl;
    dout(10) << "dn2->inode" << dn->get_dir()->inode << dendl;
    dout(10) << "dn3->linkage" << dn->get_linkage()->get_inode() << dendl;
    dout(10) << "dn2" << dn << dendl;
    if (dn->state_test(CDentry::STATE_PURGING))
      continue;

Actions

Copy link

Updated by huanwen ren over 6 years ago

The address of *dn(mds.0.server dn1-10x600000000000099) is overflowed,
but have not found the reason.

Actions

Copy link

Updated by Zheng Yan over 6 years ago

File readdir.patch readdir.patch added

please try the attached patch

Actions

Copy link

Updated by Zheng Yan over 6 years ago

Assignee changed from Patrick Donnelly to Zheng Yan

Actions

Copy link

Updated by huanwen ren over 6 years ago

ok,I will test it

Actions

Copy link

Updated by huanwen ren over 6 years ago

In addition, I set "offset_hash" to 0 when offset_str is empty, which can solve this problem.
Modify the code as follows:

__u32 offset_hash = 0;
  if (!offset_str.empty())
    offset_hash = ceph_frag_value(diri->hash_dentry_name(offset_str));
  else
-    offset_hash = (__u32)req->head.args.readdir.offset_hash;
+    offset_hash = 0;

Actions

Copy link

Updated by Zheng Yan over 6 years ago

my patch alone does not work? your change will make seeky readdir on directory inefficiency

Actions

Copy link

#10

Updated by huanwen ren over 6 years ago

your patch can solve this problem, I test twice today.:-)

my modification is just to verify the dirfrag offset caused by the problem, not a good solution.
I have a question, under what circumstances will there be offset_str.empty is null, but offset_hash also has offset data? Is it dirfrag split at the end of the object?

thank you.

Actions

Copy link

#11