Project

General

Profile

Actions

Bug #18755

closed

multimds: MDCache.cc: 4735: FAILED assert(in)

Added by Patrick Donnelly about 7 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ase/11.1.0-7091-g326a113/rpm/el7/BUILD/ceph-11.1.0-7091-g326a113/src/mds/MDCache.cc: 4735: FAILED assert(in)
2017-01-31T17:25:06.566 INFO:tasks.ceph.mds.c.mira035.stderr:
2017-01-31T17:25:06.566 INFO:tasks.ceph.mds.c.mira035.stderr: ceph version 11.1.0-7091-g326a113 (326a11362f76098867add4557a03a45a219a640c)
2017-01-31T17:25:06.566 INFO:tasks.ceph.mds.c.mira035.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x5639ff555690]
2017-01-31T17:25:06.566 INFO:tasks.ceph.mds.c.mira035.stderr: 2: (MDCache::handle_cache_rejoin_strong(MMDSCacheRejoin*)+0x20e0) [0x5639ff359af0]
2017-01-31T17:25:06.567 INFO:tasks.ceph.mds.c.mira035.stderr: 3: (MDCache::handle_cache_rejoin(MMDSCacheRejoin*)+0x203) [0x5639ff35af63]
2017-01-31T17:25:06.567 INFO:tasks.ceph.mds.c.mira035.stderr: 4: (MDCache::dispatch(Message*)+0xa5) [0x5639ff360385]
2017-01-31T17:25:06.567 INFO:tasks.ceph.mds.c.mira035.stderr: 5: (MDSRank::handle_deferrable_message(Message*)+0x5bc) [0x5639ff2565cc]
2017-01-31T17:25:06.567 INFO:tasks.ceph.mds.c.mira035.stderr: 6: (MDSRank::_dispatch(Message*, bool)+0x20c) [0x5639ff25fdcc]
2017-01-31T17:25:06.567 INFO:tasks.ceph.mds.c.mira035.stderr: 7: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x5639ff260f95]
2017-01-31T17:25:06.567 INFO:tasks.ceph.mds.c.mira035.stderr: 8: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x5639ff24e803]
2017-01-31T17:25:06.567 INFO:tasks.ceph.mds.c.mira035.stderr: 9: (DispatchQueue::entry()+0x7a2) [0x5639ff7639d2]
2017-01-31T17:25:06.567 INFO:tasks.ceph.mds.c.mira035.stderr: 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x5639ff5c575d]
2017-01-31T17:25:06.567 INFO:tasks.ceph.mds.c.mira035.stderr: 11: (()+0x7dc5) [0x7f62f41d6dc5]
2017-01-31T17:25:06.567 INFO:tasks.ceph.mds.c.mira035.stderr: 12: (clone()+0x6d) [0x7f62f34ce73d]

From: http://pulpito.ceph.com/pdonnell-2017-01-31_17:01:56-multimds:thrash-wip-multimds-tests-testing-basic-mira/769786/

Logs and coredump at: /ceph/tmp/pdonnell/multimds-thrash/769786

Actions #1

Updated by Patrick Donnelly about 7 years ago

There are actually 3 assertion failures (on 3 different MDS) in this run!

2017-01-31T17:24:17.908 INFO:tasks.ceph.mds.a.mira037.stderr:/mnt/jenkins/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-7091-g326a113/rpm/el7/BUILD/ceph-11.1.0-7091-g326a113/src/mds/MDCache.cc: 328: FAILED assert(o->get_num_ref() == 0)
2017-01-31T17:24:17.909 INFO:tasks.ceph.mds.a.mira037.stderr: ceph version 11.1.0-7091-g326a113 (326a11362f76098867add4557a03a45a219a640c)
2017-01-31T17:24:17.909 INFO:tasks.ceph.mds.a.mira037.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55b317365690]
2017-01-31T17:24:17.910 INFO:tasks.ceph.mds.a.mira037.stderr: 2: (MDCache::remove_inode(CInode*)+0x43d) [0x55b31711cd2d]
2017-01-31T17:24:17.910 INFO:tasks.ceph.mds.a.mira037.stderr: 3: (StrayManager::_purge_stray_logged(CDentry*, unsigned long, LogSegment*)+0x208) [0x55b3171b45f8]
2017-01-31T17:24:17.911 INFO:tasks.ceph.mds.a.mira037.stderr: 4: (MDSIOContextBase::complete(int)+0xa4) [0x55b3172b31f4]
2017-01-31T17:24:17.911 INFO:tasks.ceph.mds.a.mira037.stderr: 5: (MDSLogContextBase::complete(int)+0x3c) [0x55b3172b362c]
2017-01-31T17:24:17.911 INFO:tasks.ceph.mds.a.mira037.stderr: 6: (Finisher::finisher_thread_entry()+0x1ee) [0x55b3173622be]
2017-01-31T17:24:17.911 INFO:tasks.ceph.mds.a.mira037.stderr: 7: (()+0x7dc5) [0x7f306ebebdc5]
2017-01-31T17:24:17.912 INFO:tasks.ceph.mds.a.mira037.stderr: 8: (clone()+0x6d) [0x7f306dee373d]
2017-01-31T17:24:37.610 INFO:tasks.ceph.mds.e-s.mira035.stderr:/mnt/jenkins/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-7091-g326a113/rpm/el7/BUILD/ceph-11.1.0-7091-g326a113/src/mds/MDCache.cc: 4735: FAILED assert(in)
2017-01-31T17:24:37.614 INFO:tasks.ceph.mds.e-s.mira035.stderr: ceph version 11.1.0-7091-g326a113 (326a11362f76098867add4557a03a45a219a640c)
2017-01-31T17:24:37.614 INFO:tasks.ceph.mds.e-s.mira035.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55863fa32690]
2017-01-31T17:24:37.614 INFO:tasks.ceph.mds.e-s.mira035.stderr: 2: (MDCache::handle_cache_rejoin_strong(MMDSCacheRejoin*)+0x20e0) [0x55863f836af0]
2017-01-31T17:24:37.614 INFO:tasks.ceph.mds.e-s.mira035.stderr: 3: (MDCache::handle_cache_rejoin(MMDSCacheRejoin*)+0x203) [0x55863f837f63]
2017-01-31T17:24:37.614 INFO:tasks.ceph.mds.e-s.mira035.stderr: 4: (MDCache::dispatch(Message*)+0xa5) [0x55863f83d385]
2017-01-31T17:24:37.614 INFO:tasks.ceph.mds.e-s.mira035.stderr: 5: (MDSRank::handle_deferrable_message(Message*)+0x5bc) [0x55863f7335cc]
2017-01-31T17:24:37.615 INFO:tasks.ceph.mds.e-s.mira035.stderr: 6: (MDSRank::_dispatch(Message*, bool)+0x20c) [0x55863f73cdcc]
2017-01-31T17:24:37.615 INFO:tasks.ceph.mds.e-s.mira035.stderr: 7: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x55863f73df95]
2017-01-31T17:24:37.615 INFO:tasks.ceph.mds.e-s.mira035.stderr: 8: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x55863f72b803]
2017-01-31T17:24:37.615 INFO:tasks.ceph.mds.e-s.mira035.stderr: 9: (DispatchQueue::entry()+0x7a2) [0x55863fc409d2]
2017-01-31T17:24:37.615 INFO:tasks.ceph.mds.e-s.mira035.stderr: 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x55863faa275d]
2017-01-31T17:24:37.615 INFO:tasks.ceph.mds.e-s.mira035.stderr: 11: (()+0x7dc5) [0x7fed38e7edc5]
2017-01-31T17:24:37.615 INFO:tasks.ceph.mds.e-s.mira035.stderr: 12: (clone()+0x6d) [0x7fed3817673d]
2017-01-31T17:25:06.448 INFO:tasks.ceph.mds.c.mira035.stderr:/mnt/jenkins/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-7091-g326a113/rpm/el7/BUILD/ceph-11.1.0-7091-g326a113/src/mds/MDCache.cc: 4735: FAILED assert(in)
2017-01-31T17:25:06.453 INFO:tasks.ceph.mds.c.mira035.stderr: ceph version 11.1.0-7091-g326a113 (326a11362f76098867add4557a03a45a219a640c)
2017-01-31T17:25:06.453 INFO:tasks.ceph.mds.c.mira035.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x5639ff555690]
2017-01-31T17:25:06.453 INFO:tasks.ceph.mds.c.mira035.stderr: 2: (MDCache::handle_cache_rejoin_strong(MMDSCacheRejoin*)+0x20e0) [0x5639ff359af0]
2017-01-31T17:25:06.453 INFO:tasks.ceph.mds.c.mira035.stderr: 3: (MDCache::handle_cache_rejoin(MMDSCacheRejoin*)+0x203) [0x5639ff35af63]
2017-01-31T17:25:06.453 INFO:tasks.ceph.mds.c.mira035.stderr: 4: (MDCache::dispatch(Message*)+0xa5) [0x5639ff360385]
2017-01-31T17:25:06.453 INFO:tasks.ceph.mds.c.mira035.stderr: 5: (MDSRank::handle_deferrable_message(Message*)+0x5bc) [0x5639ff2565cc]
2017-01-31T17:25:06.453 INFO:tasks.ceph.mds.c.mira035.stderr: 6: (MDSRank::_dispatch(Message*, bool)+0x20c) [0x5639ff25fdcc]
2017-01-31T17:25:06.454 INFO:tasks.ceph.mds.c.mira035.stderr: 7: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x5639ff260f95]
2017-01-31T17:25:06.454 INFO:tasks.ceph.mds.c.mira035.stderr: 8: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x5639ff24e803]
2017-01-31T17:25:06.454 INFO:tasks.ceph.mds.c.mira035.stderr: 9: (DispatchQueue::entry()+0x7a2) [0x5639ff7639d2]
2017-01-31T17:25:06.454 INFO:tasks.ceph.mds.c.mira035.stderr: 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x5639ff5c575d]
2017-01-31T17:25:06.454 INFO:tasks.ceph.mds.c.mira035.stderr: 11: (()+0x7dc5) [0x7f62f41d6dc5]
2017-01-31T17:25:06.454 INFO:tasks.ceph.mds.c.mira035.stderr: 12: (clone()+0x6d) [0x7f62f34ce73d]
Actions #2

Updated by Zheng Yan about 7 years ago

 -1189> 2017-01-31 17:24:17.897818 7f306a254700  7 mds.2.cache handle_discover added dentry [dentry #102/stray1/2000000002c [2,head] auth{0=1} (dversion lock) pv=1277 v=353 inode=0x55b320eb5d88 | request=0 lock=0 inodepin=0 purging=1 replicated=1 dirty=0 authpin=0 0x55b320f15450]

MDCache::handle_discover adds replica to purging inode. This can explain all assertions

Actions #3

Updated by Zheng Yan about 7 years ago

  • Status changed from New to 12
Actions #5

Updated by Zheng Yan about 7 years ago

  • Status changed from Fix Under Review to Resolved
Actions #6

Updated by Zheng Yan about 7 years ago

by commit a1499bc4 (mds: stop purging strays when mds is being shutdown)

Actions #7

Updated by Patrick Donnelly about 5 years ago

  • Category deleted (90)
  • Labels (FS) multimds added
Actions

Also available in: Atom PDF