Project

General

Profile

Actions

Bug #17069

closed

multimds: slave rmdir assertion failure

Added by Patrick Donnelly over 7 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Low
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Assertion: /srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-1382-g253f285/src/mds/Server.cc: 5784: FAILED assert(straydn->first >= in->first)
ceph version v11.0.0-1382-g253f285 (253f28556c8dead17806deeb49917246bdbed8ea)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x91d3fb]
 2: (Server::handle_slave_rmdir_prep(std::shared_ptr<MDRequestImpl>&)+0x11d6) [0x6559d6]
 3: (Server::dispatch_slave_request(std::shared_ptr<MDRequestImpl>&)+0x71b) [0x666c9b]
 4: (Server::handle_slave_request(MMDSSlaveRequest*)+0x8fc) [0x67008c]
 5: (Server::dispatch(Message*)+0x69b) [0x670deb]
 6: (MDSRank::handle_deferrable_message(Message*)+0x80c) [0x5efdac]
 7: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x5f9be1]
 8: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x5fad35]
 9: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x5e8503]
 10: (DispatchQueue::entry()+0x78b) [0xab84db]
 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x97dded]
 12: (()+0x8184) [0x7f365a3f3184]
 13: (clone()+0x6d) [0x7f3658d6037d]

In first run:

http://pulpito.ceph.com/pdonnell-2016-08-10_03:03:20-multimds-master---basic-mira/357018/
http://pulpito.ceph.com/pdonnell-2016-08-10_03:03:20-multimds-master---basic-mira/357232/

and in another run:

http://pulpito.ceph.com/pdonnell-2016-08-11_20:40:52-multimds-master-testing-basic-mira/358681/
http://pulpito.ceph.com/pdonnell-2016-08-11_20:40:52-multimds-master-testing-basic-mira/358719/
http://pulpito.ceph.com/pdonnell-2016-08-11_20:40:52-multimds-master-testing-basic-mira/358910/

Here is an excerpt from one of the MDS logs:

2016-08-12 20:20:23.165872 7f1918db9700 20 mds.1.cache.dir(1000000000d) lookup (head, 'copyofsnap1')
2016-08-12 20:20:23.165875 7f1918db9700 20 mds.1.cache.dir(1000000000d)   hit -> (copyofsnap1,head)
2016-08-12 20:20:23.165878 7f1918db9700 10 mds.1.cache path_traverse finish on snapid head
2016-08-12 20:20:23.165880 7f1918db9700 10 mds.1.server  dn [dentry #1/client.0/tmp/copyofsnap1 [d6,head] rep@2,-2.1 (dn lock) (dversion lock) v=15128 inode=0xd41af90 | request=0 lock=0 inodepin=1 dirty=0 authpin=0 tempexporting=0 clientlease=0 0xd133f40]
2016-08-12 20:20:23.165893 7f1918db9700 10 mds.1.server  straydn [dentry #102/stray5/2000000086e [2,head] rep@2,-2.1 NULL (dn lock) (dversion lock) v=0 inode=0 | request=1 0xa77bb60]
2016-08-12 20:20:23.165904 7f1918db9700 20 mds.1.server  rollback is 77 bytes
2016-08-12 20:20:23.165907 7f1918db9700 10 mds.1.server  no auth subtree in [inode 2000000086e [...da,head] /client.0/tmp/copyofsnap1/ rep@2.1 v33032 f(v1 m2016-08-12 20:20:21.897837) n(v7 rc2016-08-12 20:20:18.090778 b20298 437=398+39)/n(v1 rc2016-08-12 20:19:02.350605 1034=946+88) (ilink lock) (inest lock) (iversion lock) caps={4131=pAsXs/p@9} | dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=1 exportingcaps=0 dirtyparent=0 dirty=0 waiter=0 authpin=0 tempexporting=0 0xd41af90], skipping journal
2016-08-12 20:20:23.165933 7f1918db9700 12 mds.1.cache.dir(1000000000d) unlink_inode [dentry #1/client.0/tmp/copyofsnap1 [d6,head] rep@2,-2.1 (dn lock) (dversion lock) v=15128 inode=0xd41af90 | request=1 lock=0 inodepin=1 dirty=0 authpin=0 tempexporting=0 clientlease=0 0xd133f40] [inode 2000000086e [...da,head] /client.0/tmp/copyofsnap1/ rep@2.1 v33032 f(v1 m2016-08-12 20:20:21.897837) n(v7 rc2016-08-12 20:20:18.090778 b20298 437=398+39)/n(v1 rc2016-08-12 20:19:02.350605 1034=946+88) (ilink lock) (inest lock) (iversion lock) caps={4131=pAsXs/p@9} | dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=1 exportingcaps=0 dirtyparent=0 dirty=0 waiter=0 authpin=0 tempexporting=0 0xd41af90]
2016-08-12 20:20:23.165963 7f1918db9700 12 mds.1.cache.dir(619) link_primary_inode [dentry #102/stray5/2000000086e [2,head] rep@2,-2.1 NULL (dn lock) (dversion lock) v=0 inode=0 | request=1 0xa77bb60] [inode 2000000086e [...da,head] #2000000086e/ rep@-2.1 v33032 f(v1 m2016-08-12 20:20:21.897837) n(v7 rc2016-08-12 20:20:18.090778 b20298 437=398+39)/n(v1 rc2016-08-12 20:19:02.350605 1034=946+88) (ilink lock) (inest lock) (iversion lock) caps={4131=pAsXs/p@9} | dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=1 exportingcaps=0 dirtyparent=0 dirty=0 waiter=0 authpin=0 tempexporting=0 0xd41af90]
2016-08-12 20:20:23.169559 7f1918db9700 -1 /srv/autobuild-ceph/gitbuilder.git/build/rpmbuild/BUILD/ceph-11.0.0/src/mds/Server.cc: In function 'void Server::handle_slave_rmdir_prep(MDRequestRef&)' thread 7f1918db9700 time 2016-08-12 20:20:23.166004
/srv/autobuild-ceph/gitbuilder.git/build/rpmbuild/BUILD/ceph-11.0.0/src/mds/Server.cc: 5784: FAILED assert(straydn->first >= in->first)

 ceph version v11.0.0-1464-gec24ff0 (ec24ff0ceeaa735423bb113a4e522bb543e1bbcc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x9339b5]
 2: (Server::handle_slave_rmdir_prep(std::shared_ptr<MDRequestImpl>&)+0x11e9) [0x6511b9]
 3: (Server::dispatch_slave_request(std::shared_ptr<MDRequestImpl>&)+0x70b) [0x66966b]
 4: (Server::handle_slave_request(MMDSSlaveRequest*)+0x924) [0x672ce4]
 5: (Server::dispatch(Message*)+0x6db) [0x673a8b]
 6: (MDSRank::handle_deferrable_message(Message*)+0x82c) [0x5ef3fc]
 7: (MDSRank::_dispatch(Message*, bool)+0x207) [0x5f96d7]
 8: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x5fa835]
 9: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x5e7583]
 10: (DispatchQueue::entry()+0x78a) [0xadbafa]
 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x99820d]
 12: (()+0x7dc5) [0x7f191eaccdc5]
 13: (clone()+0x6d) [0x7f191dbb821d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #1

Updated by Zheng Yan over 7 years ago

strange. have you ever use snapshot on the testing cluster?

Actions #2

Updated by Zheng Yan over 7 years ago

  • Status changed from New to 12

please don't run snapshot tests on multimds, they are know broken.

Actions #3

Updated by Zheng Yan over 7 years ago

  • Priority changed from High to Low

snapshot bug, lower Priority

Actions #4

Updated by John Spray almost 7 years ago

  • Status changed from 12 to Closed

Closing because currently we know that snapshots+multimds is broken.

Actions #5

Updated by Patrick Donnelly about 5 years ago

  • Category deleted (90)
  • Labels (FS) multimds added
Actions

Also available in: Atom PDF