Project

General

Profile

Actions

Bug #20440

closed

mds: mds/journal.cc: 1559: FAILED assert(inotablev == mds->inotable->get_version())

Added by Patrick Donnelly almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Actions #1

Updated by Patrick Donnelly almost 7 years ago

  • Assignee set to Zheng Yan

Zheng, please take a look at this one.

Actions #2

Updated by Zheng Yan almost 7 years ago

  • Status changed from New to Fix Under Review
Actions #3

Updated by Patrick Donnelly almost 7 years ago

Zheng, adding that patch lets the test make progress but there still appears to be a problem:

2017-07-01T01:53:39.191 INFO:tasks.workunit:Running workunits matching fs/misc/trivial_sync.sh on client.0...
2017-07-01T01:53:39.195 INFO:tasks.workunit:Running workunit fs/misc/trivial_sync.sh...
2017-07-01T01:53:39.198 INFO:teuthology.orchestra.run.smithi094:Running (workunit test fs/misc/trivial_sync.sh): 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=wip-pdonnell-20170630 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 1h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/misc/trivial_sync.sh'
2017-07-01T01:53:39.209 INFO:tasks.ceph.mds.a.smithi006.stderr:2017-07-01 01:53:39.206026 7fe201ab4700 -1 log_channel(cluster) log [ERR] : bad backtrace on dir ino 10000000000
2017-07-01T01:53:39.212 INFO:tasks.ceph.mds.a.smithi006.stderr:2017-07-01 01:53:39.206110 7fe201ab4700 -1 log_channel(cluster) log [ERR] : loaded dup inode 10000000001 [2,head] v10 at /client.0/tmp, but inode 10000000001.head v11 already exists at ~mds0/stray1/10000000001
2017-07-01T01:53:40.473 INFO:teuthology.orchestra.run.smithi094:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0/tmp'
2017-07-01T01:53:40.523 INFO:tasks.workunit:Stopping ['fs/misc/trivial_sync.sh'] on client.0...
2017-07-01T01:53:40.528 INFO:teuthology.orchestra.run.smithi094:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0'
2017-07-01T01:53:40.646 DEBUG:teuthology.parallel:result is None
2017-07-01T01:53:40.759 INFO:teuthology.orchestra.run.smithi094:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0'
2017-07-01T01:53:40.792 INFO:tasks.ceph.mds.a.smithi006.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.3-2341-g7250d71/rpm/el7/BUILD/ceph-12.0.3-2341-g7250d71/src/mds/MDCache.cc: In function 'void MDCache::predirty_journal_parents(MutationRef, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)' thread 7fe207ac0700 time 2017-07-01 01:53:40.788417
2017-07-01T01:53:40.796 INFO:tasks.ceph.mds.a.smithi006.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.3-2341-g7250d71/rpm/el7/BUILD/ceph-12.0.3-2341-g7250d71/src/mds/MDCache.cc: 2281: FAILED assert(!"negative dirstat size" == g_conf->mds_verify_scatter)
2017-07-01T01:53:40.801 INFO:tasks.ceph.mds.a.smithi006.stderr: ceph version 12.0.3-2341-g7250d71 (7250d71d0b423ef87a7ac7b7c5def16842eb8208) luminous (dev)
2017-07-01T01:53:40.809 INFO:tasks.ceph.mds.a.smithi006.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7fe20efd65a0]
2017-07-01T01:53:40.815 INFO:tasks.ceph.mds.a.smithi006.stderr: 2: (MDCache::predirty_journal_parents(boost::intrusive_ptr<MutationImpl>, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x1d69) [0x7fe20eddd1a9]
2017-07-01T01:53:40.828 INFO:tasks.ceph.mds.a.smithi006.stderr: 3: (Server::_unlink_local(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*)+0x335) [0x7fe20ed248a5]
2017-07-01T01:53:40.833 INFO:tasks.ceph.mds.a.smithi006.stderr: 4: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0xec8) [0x7fe20ed26358]
2017-07-01T01:53:40.840 INFO:tasks.ceph.mds.a.smithi006.stderr: 5: (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0xbcb) [0x7fe20ed45d7b]
2017-07-01T01:53:40.845 INFO:tasks.ceph.mds.a.smithi006.stderr: 6: (Server::handle_client_request(MClientRequest*)+0x48d) [0x7fe20ed4639d]
2017-07-01T01:53:40.848 INFO:tasks.ceph.mds.a.smithi006.stderr: 7: (Server::dispatch(Message*)+0x38b) [0x7fe20ed4a8fb]
2017-07-01T01:53:40.852 INFO:tasks.ceph.mds.a.smithi006.stderr: 8: (MDSRank::handle_deferrable_message(Message*)+0x7fc) [0x7fe20ecc2a9c]
2017-07-01T01:53:40.855 INFO:tasks.ceph.mds.a.smithi006.stderr: 9: (MDSRank::_dispatch(Message*, bool)+0x1eb) [0x7fe20eccffab]
2017-07-01T01:53:40.857 INFO:tasks.ceph.mds.a.smithi006.stderr: 10: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x7fe20ecd0ef5]
2017-07-01T01:53:40.864 INFO:tasks.ceph.mds.a.smithi006.stderr: 11: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x7fe20ecba6f3]
2017-07-01T01:53:40.868 INFO:tasks.ceph.mds.a.smithi006.stderr: 12: (DispatchQueue::entry()+0x792) [0x7fe20f2361c2]
2017-07-01T01:53:40.871 INFO:tasks.ceph.mds.a.smithi006.stderr: 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x7fe20f05233d]
2017-07-01T01:53:40.873 INFO:tasks.ceph.mds.a.smithi006.stderr: 14: (()+0x7dc5) [0x7fe20caf8dc5]
2017-07-01T01:53:40.877 INFO:tasks.ceph.mds.a.smithi006.stderr: 15: (clone()+0x6d) [0x7fe20bbdd73d]
2017-07-01T01:53:40.880 INFO:tasks.ceph.mds.a.smithi006.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2017-07-01T01:53:40.883 INFO:tasks.ceph.mds.a.smithi006.stderr:2017-07-01 01:53:40.794530 7fe207ac0700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.3-2341-g7250d71/rpm/el7/BUILD/ceph-12.0.3-2341-g7250d71/src/mds/MDCache.cc: In function 'void MDCache::predirty_journal_parents(MutationRef, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)' thread 7fe207ac0700 time 2017-07-01 01:53:40.788417

I dug around but didn't find the cause. Any idea?

Actions #4

Updated by Patrick Donnelly almost 7 years ago

Log snippet from: /ceph/teuthology-archive/pdonnell-2017-07-01_01:07:39-fs-wip-pdonnell-20170630-distro-basic-smithi/1347412/teuthology.log

Actions #5

Updated by Zheng Yan almost 7 years ago

The recover_dentries command of journal tool only inject link, but never delete old links. this causes duplicated primary dentries.

https://github.com/ceph/ceph/pull/16202

Actions #6

Updated by Patrick Donnelly almost 7 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF