Bug #16768
closedmultimds: check_rstat assertion failure
0%
Description
2016-07-19T11:43:54.392 INFO:tasks.ceph.mds.e.mira027.stderr:/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-709-g12c0683/src/mds/CDir.cc: In function 'bool CDir::check_rstats(bool)' thread 7fe0b191e700 time 2016-07-19 18:43:55.424696 2016-07-19T11:43:54.392 INFO:tasks.ceph.mds.e.mira027.stderr:/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-709-g12c0683/src/mds/CDir.cc: 289: FAILED assert(nest_info.rbytes == fnode.rstat.rbytes) 2016-07-19T11:43:54.392 INFO:tasks.ceph.mds.e.mira027.stderr: ceph version v11.0.0-709-g12c0683 (12c068365c43a140fe1fe23bf68318342710e84d) 2016-07-19T11:43:54.393 INFO:tasks.ceph.mds.e.mira027.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x1878453] 2016-07-19T11:43:54.393 INFO:tasks.ceph.mds.e.mira027.stderr: 2: (CDir::check_rstats(bool)+0x16e8) [0x164daf8] 2016-07-19T11:43:54.393 INFO:tasks.ceph.mds.e.mira027.stderr: 3: (MDCache::predirty_journal_parents(std::shared_ptr<MutationImpl>, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x2640) [0x14bed80] 2016-07-19T11:43:54.393 INFO:tasks.ceph.mds.e.mira027.stderr: 4: (Server::_rename_prepare(std::shared_ptr<MDRequestImpl>&, EMetaBlob*, ceph::buffer::list*, CDentry*, CDentry*, CDentry*)+0x1a86) [0x143199e] 2016-07-19T11:43:54.394 INFO:tasks.ceph.mds.e.mira027.stderr: 5: (Server::handle_slave_rename_prep(std::shared_ptr<MDRequestImpl>&)+0x1fc2) [0x1436548] 2016-07-19T11:43:54.394 INFO:tasks.ceph.mds.e.mira027.stderr: 6: (Server::dispatch_slave_request(std::shared_ptr<MDRequestImpl>&)+0xc33) [0x14034db] 2016-07-19T11:43:54.394 INFO:tasks.ceph.mds.e.mira027.stderr: 7: (Server::_slave_rename_sessions_flushed(std::shared_ptr<MDRequestImpl>&)+0x22f) [0x143cb19] 2016-07-19T11:43:54.395 INFO:tasks.ceph.mds.e.mira027.stderr: 8: (C_MDS_SlaveRenameSessionsFlushed::finish(int)+0x2a) [0x1451d3e] 2016-07-19T11:43:54.395 INFO:tasks.ceph.mds.e.mira027.stderr: 9: (Context::complete(int)+0x27) [0x137daf5] 2016-07-19T11:43:54.395 INFO:tasks.ceph.mds.e.mira027.stderr: 10: (MDSInternalContextBase::complete(int)+0x1c6) [0x1705596] 2016-07-19T11:43:54.395 INFO:tasks.ceph.mds.e.mira027.stderr: 11: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::delete_me()+0x41) [0x13d48a3] 2016-07-19T11:43:54.396 INFO:tasks.ceph.mds.e.mira027.stderr: 12: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::sub_finish(MDSInternalContextBase*, int)+0x2a8) [0x13e0d5a] 2016-07-19T11:43:54.396 INFO:tasks.ceph.mds.e.mira027.stderr: 13: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::C_GatherSub::finish(int)+0x29) [0x13e08ab] 2016-07-19T11:43:54.396 INFO:tasks.ceph.mds.e.mira027.stderr: 14: (Context::complete(int)+0x27) [0x137daf5] 2016-07-19T11:43:54.397 INFO:tasks.ceph.mds.e.mira027.stderr: 15: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::C_GatherSub::complete(int)+0x20) [0x13e096a] 2016-07-19T11:43:54.397 INFO:tasks.ceph.mds.e.mira027.stderr: 16: (MDSRank::_advance_queues()+0x4c3) [0x13a9047] 2016-07-19T11:43:54.398 INFO:tasks.ceph.mds.e.mira027.stderr: 17: (MDSRank::_dispatch(Message*, bool)+0x55d) [0x13a6463] 2016-07-19T11:43:54.398 INFO:tasks.ceph.mds.e.mira027.stderr: 18: (MDSRankDispatcher::ms_dispatch(Message*)+0x34) [0x13a5ef0] 2016-07-19T11:43:54.398 INFO:tasks.ceph.mds.e.mira027.stderr: 19: (MDSDaemon::ms_dispatch(Message*)+0x21d) [0x13782bf] 2016-07-19T11:43:54.401 INFO:tasks.ceph.mds.e.mira027.stderr: 20: (Messenger::ms_deliver_dispatch(Message*)+0x98) [0x1af71bc] 2016-07-19T11:43:54.402 INFO:tasks.ceph.mds.e.mira027.stderr: 21: (DispatchQueue::entry()+0x5dd) [0x1af62d9] 2016-07-19T11:43:54.402 INFO:tasks.ceph.mds.e.mira027.stderr: 22: (DispatchQueue::DispatchThread::entry()+0x1c) [0x1902046] 2016-07-19T11:43:54.403 INFO:tasks.ceph.mds.e.mira027.stderr: 23: (Thread::entry_wrapper()+0xc1) [0x19ef733] 2016-07-19T11:43:54.404 INFO:tasks.ceph.mds.e.mira027.stderr: 24: (Thread::_entry_func(void*)+0x18) [0x19ef668] 2016-07-19T11:43:54.404 INFO:tasks.ceph.mds.e.mira027.stderr: 25: (()+0x8182) [0x7fe0b6730182] 2016-07-19T11:43:54.406 INFO:tasks.ceph.mds.e.mira027.stderr: 26: (clone()+0x6d) [0x7fe0b583147d] 2016-07-19T11:43:54.407 INFO:tasks.ceph.mds.e.mira027.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
This test was marked dead due to timeout (I think?):
2016-07-19T11:43:55.737 INFO:tasks.workunit.client.0.mira034.stdout:5/839: dread d12/d16/d24/d32/de7/df5/ff7 [0,4194304] 0 2016-07-19T11:43:55.744 INFO:tasks.workunit.client.0.mira034.stdout:5/840: mkdir d12/d6d/d9b/ddd/d12a 0 2016-07-19T11:43:55.757 INFO:tasks.workunit.client.0.mira034.stdout:5/841: dwrite d12/f11b [0,4194304] 0 2016-07-19T11:43:55.767 INFO:tasks.workunit.client.0.mira034.stdout:5/842: unlink d12/d16/d24/d32/d5b/fb5 0 2016-07-19T11:43:55.777 INFO:tasks.workunit.client.0.mira034.stdout:5/843: dread d12/d16/d24/fa9 [0,4194304] 0 2016-07-19T14:42:53.022 INFO:tasks.workunit.client.0.mira034.stderr:/home/ubuntu/cephtest/workunit.client.0/suites/fsstress.sh: line 1: 13555 Terminated $command
Killed after 3 hours.
From: http://pulpito.ceph.com/pdonnell-2016-07-18_20:02:54-multimds-master---basic-mira/321809/
Updated by Patrick Donnelly over 7 years ago
Another of the same:
Dead: 2016-07-24T02:04:40.579 INFO:tasks.workunit.client.0.mira082.stderr:/home/ubuntu/cephtest/workunit.client.0/suites/fsstress.sh: line 1: 14917 Terminated $command ceph version v11.0.0-820-ga0294e6 (a0294e64507a7916fdd9707ae22ba40b0d7b65d1) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x91be5b] 2: (CDir::check_rstats(bool)+0x14a6) [0x7a8176] 3: (MDCache::predirty_journal_parents(std::shared_ptr<MutationImpl>, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0xf73) [0x6fe913] 4: (Server::_rename_prepare(std::shared_ptr<MDRequestImpl>&, EMetaBlob*, ceph::buffer::list*, CDentry*, CDentry*, CDentry*)+0x6b9) [0x659579] 5: (Server::handle_slave_rename_prep(std::shared_ptr<MDRequestImpl>&)+0xffb) [0x6653fb] 6: (Server::dispatch_slave_request(std::shared_ptr<MDRequestImpl>&)+0x70b) [0x666dab] 7: (Server::handle_slave_request(MMDSSlaveRequest*)+0x8fc) [0x67019c] 8: (Server::dispatch(Message*)+0x69b) [0x670efb] 9: (MDSRank::handle_deferrable_message(Message*)+0x80c) [0x5f607c] 10: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x5ffd91] 11: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x600ee5] 12: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x5e8043] 13: (DispatchQueue::entry()+0x78b) [0xab7d0b] 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x97b07d] 15: (()+0x8182) [0x7f89e6bec182] 16: (clone()+0x6d) [0x7f89e5ced47d] 1 jobs: ['327574']
From: http://pulpito.ceph.com/pdonnell-2016-07-21_13:20:27-multimds-master---basic-mira/327574/
Updated by Patrick Donnelly over 7 years ago
Segmentation fault in another test which may be related:
Dead: 2016-07-19T01:26:57.608 INFO:tasks.workunit.client.0.mira064.stderr:/home/ubuntu/cephtest/workunit.client.0/suites/fsstress.sh: line 1: 3178 Terminated $command ceph version v11.0.0-709-g12c0683 (12c068365c43a140fe1fe23bf68318342710e84d) 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x17b61e7] 2: ceph-mds() [0x17b549f] 3: (()+0x10340) [0x7f65805d6340] 4: (MDSCacheObject::get(int)+0x13) [0x14454d5] 5: (MutationImpl::pin(MDSCacheObject*)+0x42) [0x14a3c8c] 6: (Server::handle_slave_rename_prep(std::shared_ptr<MDRequestImpl>&)+0x92c) [0x1434eb2] 7: (Server::dispatch_slave_request(std::shared_ptr<MDRequestImpl>&)+0xc33) [0x14034db] 8: (Server::handle_slave_request(MMDSSlaveRequest*)+0x1137) [0x140187f] 9: (Server::dispatch(Message*)+0x976) [0x13f1b84] 10: (MDSRank::handle_deferrable_message(Message*)+0xa71) [0x13a8057] 11: (MDSRank::_dispatch(Message*, bool)+0x3bc) [0x13a62c2] 12: (MDSRankDispatcher::ms_dispatch(Message*)+0x34) [0x13a5ef0] 13: (MDSDaemon::ms_dispatch(Message*)+0x21d) [0x13782bf] 14: (Messenger::ms_deliver_dispatch(Message*)+0x98) [0x1af71bc] 15: (DispatchQueue::entry()+0x5dd) [0x1af62d9] 16: (DispatchQueue::DispatchThread::entry()+0x1c) [0x1902046] 17: (Thread::entry_wrapper()+0xc1) [0x19ef733] 18: (Thread::_entry_func(void*)+0x18) [0x19ef668] 19: (()+0x8182) [0x7f65805ce182] 20: (clone()+0x6d) [0x7f657f6cf47d] 1 jobs: ['321707'] suites: ['clusters/3-mds.yaml', 'debug/mds_client.yaml', 'fs/btrfs.yaml', 'inline/no.yaml', 'mount/cfuse.yaml', 'multimds/basic/{ceph/base.yaml', 'overrides/whitelist_wrongly_marked_down.yaml', 'tasks/suites_fsstress.yaml}']
http://pulpito.ceph.com/pdonnell-2016-07-18_20:02:54-multimds-master---basic-mira/321707/
Updated by Patrick Donnelly over 7 years ago
Zheng, which setting is that and how do I enable it? Sorry...
Updated by John Spray over 7 years ago
- Related to Bug #16807: Crash in handle_slave_rename_prep added
Updated by John Spray over 7 years ago
I've opened a separate ticket for the segfault, seems likely to be it's own issue (http://tracker.ceph.com/issues/16807)
Updated by Zheng Yan over 7 years ago
please add a line "debug mds = 10" to ceph.conf
Updated by Patrick Donnelly over 7 years ago
Zheng, I think we already have "debug mds = 20", right? From the config for this run: http://pulpito.ceph.com/pdonnell-2016-07-18_20:02:54-multimds-master---basic-mira/321707/
Updated by Patrick Donnelly over 7 years ago
Here's another instance of the assertion failure on a more recent master branch:
Updated by Zheng Yan over 7 years ago
Patrick Donnelly wrote:
Zheng, I think we already have "debug mds = 20", right? From the config for this run: http://pulpito.ceph.com/pdonnell-2016-07-18_20:02:54-multimds-master---basic-mira/321707/
the problem is that there is no log in http://qa-proxy.ceph.com/teuthology/pdonnell-2016-07-18_20:02:54-multimds-master---basic-mira/321707/. (I think all multimds runs do not have no log, which makes diagnose impossible)
Updated by Zheng Yan over 7 years ago
- Status changed from New to Need More Info
Updated by John Spray over 7 years ago
- Priority changed from Normal to High
- Target version set to v12.0.0
Updated by John Spray about 7 years ago
I noticed that failure while the test was still stuck trying to unmount the kernel client, so I went in and killed the ssh connection that was running umount so that it will (hopefully) proceed to gather the logs for us.
Updated by John Spray about 7 years ago
- Related to Bug #8090: multimds: mds crash in check_rstats added
Updated by John Spray about 7 years ago
- Related to deleted (Bug #8090: multimds: mds crash in check_rstats )
Updated by John Spray about 7 years ago
- Has duplicate Bug #8090: multimds: mds crash in check_rstats added
Updated by John Spray about 7 years ago
Hmm, well it didn't grab the logs for some reason but I did get the crashing MDS's log before the test tore down. It's in /home/jspray/16768 on teuthology.
Updated by Zheng Yan about 7 years ago
- Status changed from Need More Info to Fix Under Review
Updated by Zheng Yan about 7 years ago
- Status changed from Fix Under Review to Resolved
Updated by Patrick Donnelly about 5 years ago
- Category deleted (
90) - Labels (FS) multimds added