Project

General

Profile

Bug #17819

MDS crashed while performing snapshot creation and deletion in a loop

Added by Ramakrishnan Periyasamy over 7 years ago. Updated almost 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

--- begin dump of recent events ---
1> 2016-11-08 11:04:27.309191 7f4a57c38700 1 - 10.8.128.73:6812/8588 <== osd.1 10.8.128.73:6804/7537 3 ==== osd_op_reply(40 201.00000001 [write 30342~122 [fadvise_dontneed]] v22'11 uv11 ondisk = 0) v7 ==== 132+0+0 (3645249588 0 0) 0x7f4a6ed01440 con 0x7f4a6ed21180
0> 2016-11-08 11:04:27.318246 7f4a5d847700 -1 ** Caught signal (Aborted) *
in thread 7f4a5d847700 thread_name:ms_dispatch

ceph version 10.2.2-39.el7cp (d214d5063625c336001b01c33ffb349b56624266)
1: (()+0x50084a) [0x7f4a63b7084a]
2: (()+0xf100) [0x7f4a62a68100]
3: (gsignal()+0x37) [0x7f4a6146a5f7]
4: (abort()+0x148) [0x7f4a6146bce8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f4a63c6ade7]
6: (SnapRealm::check_cache()+0x3c6) [0x7f4a63ab5846]
7: (MDCache::predirty_journal_parents(std::shared_ptr&lt;MutationImpl&gt;, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x15e8) [0x7f4a63984058]
8: (Server::handle_client_openc(std::shared_ptr&lt;MDRequestImpl&gt;&)+0x1016) [0x7f4a638f8e16]
9: (Server::dispatch_client_request(std::shared_ptr&lt;MDRequestImpl&gt;&)+0xa88) [0x7f4a639063d8]
10: (MDCache::dispatch_request(std::shared_ptr&lt;MDRequestImpl&gt;&)+0x4c) [0x7f4a6398a7dc]
11: (MDSInternalContextBase::complete(int)+0x1eb) [0x7f4a63ad024b]
12: (void finish_contexts&lt;MDSInternalContextBase&gt;(CephContext*, std::list&lt;MDSInternalContextBase*, std::allocator&lt;MDSInternalContextBase*&gt; >&, int)+0xac) [0x7f4a638aadac]
13: (SimpleLock::finish_waiters(unsigned long, int)+0xe3) [0x7f4a63a0dad3]
14: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list&lt;MDSInternalContextBase*, std::allocator&lt;MDSInternalContextBase*&gt; >)+0x1177) [0x7f4a639f2877]
15: (Locker::handle_file_lock(ScatterLock
, MLock*)+0x48e) [0x7f4a639fda7e]
16: (Locker::handle_lock(MLock*)+0x16e) [0x7f4a639ffc7e]
17: (MDSRank::handle_deferrable_message(Message*)+0xc34) [0x7f4a6388e274]
18: (MDSRank::_dispatch(Message*, bool)+0x207) [0x7f4a638974a7]
19: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x7f4a63898635]
20: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x7f4a6387f013]
21: (DispatchQueue::entry()+0x78a) [0x7f4a63d6917a]
22: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f4a63c4fb4d]
23: (()+0x7dc5) [0x7f4a62a60dc5]
24: (clone()+0x6d) [0x7f4a6152bced]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 newstore
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
1/ 5 kinetic
1/ 5 fuse
2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-mds.a.log
--
end dump of recent events ---

Steps:
1. Configure Ceph Fuse and mount
2. copy some data
3. create snapshot
4. rsync snap data to another directory in same mount
5. delete snapshot
6. delete source data
7. do step 2 to 6 in a loop

History

#1 Updated by Greg Farnum over 7 years ago

  • Project changed from Ceph to CephFS
  • Category set to 89
  • Source changed from other to Q/A
  • Component(FS) MDS added

The log ought to display exactly what assert in check_cache failed -- can you post that too?

#2 Updated by Ramakrishnan Periyasamy over 7 years ago

016-11-08T06:04:23.498 INFO:tasks.ceph.mds.a.host.stdout:starting mds.a at :/0
2016-11-08T06:04:23.498 INFO:tasks.ceph.mds.a.host.stderr:mds/SnapRealm.cc: In function 'void SnapRealm::check_cache()' thread 7f4a5d847700 time 2016-11-08 11:04:27.275928
2016-11-08T06:04:23.499 INFO:tasks.ceph.mds.a.host.stderr:mds/SnapRealm.cc: 254: FAILED assert(open)
2016-11-08T06:04:23.499 INFO:tasks.ceph.mds.a.host.stderr: ceph version 10.2.2-39.el7cp (d214d5063625c336001b01c33ffb349b56624266)
2016-11-08T06:04:23.499 INFO:tasks.ceph.mds.a.host.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f4a63c6ac05]
2016-11-08T06:04:23.499 INFO:tasks.ceph.mds.a.host.stderr: 2: (SnapRealm::check_cache()+0x3c6) [0x7f4a63ab5846]
2016-11-08T06:04:23.500 INFO:tasks.ceph.mds.a.host.stderr: 3: (MDCache::predirty_journal_parents(std::shared_ptr<MutationImpl>, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x15e8) [0x7f4a63984058]
2016-11-08T06:04:23.500 INFO:tasks.ceph.mds.a.host.stderr: 4: (Server::handle_client_openc(std::shared_ptr<MDRequestImpl>&)+0x1016) [0x7f4a638f8e16]
2016-11-08T06:04:23.500 INFO:tasks.ceph.mds.a.host.stderr: 5: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0xa88) [0x7f4a639063d8]
2016-11-08T06:04:23.501 INFO:tasks.ceph.mds.a.host.stderr: 6: (MDCache::dispatch_request(std::shared_ptr<MDRequestImpl>&)+0x4c) [0x7f4a6398a7dc]
2016-11-08T06:04:23.501 INFO:tasks.ceph.mds.a.host.stderr: 7: (MDSInternalContextBase::complete(int)+0x1eb) [0x7f4a63ad024b]
2016-11-08T06:04:23.501 INFO:tasks.ceph.mds.a.host.stderr: 8: (void finish_contexts<MDSInternalContextBase>(CephContext*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >&, int)+0xac) [0x7f4a638aadac]
2016-11-08T06:04:23.501 INFO:tasks.ceph.mds.a.host.stderr: 9: (SimpleLock::finish_waiters(unsigned long, int)+0xe3) [0x7f4a63a0dad3]
2016-11-08T06:04:23.502 INFO:tasks.ceph.mds.a.host.stderr: 10: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >)+0x1177) [0x7f4a639f2877]
2016-11-08T06:04:23.502 INFO:tasks.ceph.mds.a.host.stderr: 11: (Locker::handle_file_lock(ScatterLock
, MLock*)+0x48e) [0x7f4a639fda7e]
2016-11-08T06:04:23.502 INFO:tasks.ceph.mds.a.host.stderr: 12: (Locker::handle_lock(MLock*)+0x16e) [0x7f4a639ffc7e]
2016-11-08T06:04:23.502 INFO:tasks.ceph.mds.a.host.stderr: 13: (MDSRank::handle_deferrable_message(Message*)+0xc34) [0x7f4a6388e274]
2016-11-08T06:04:23.502 INFO:tasks.ceph.mds.a.host.stderr: 14: (MDSRank::_dispatch(Message*, bool)+0x207) [0x7f4a638974a7]
2016-11-08T06:04:23.503 INFO:tasks.ceph.mds.a.host.stderr: 15: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x7f4a63898635]
2016-11-08T06:04:23.503 INFO:tasks.ceph.mds.a.host.stderr: 16: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x7f4a6387f013]
2016-11-08T06:04:23.503 INFO:tasks.ceph.mds.a.host.stderr: 17: (DispatchQueue::entry()+0x78a) [0x7f4a63d6917a]
2016-11-08T06:04:23.503 INFO:tasks.ceph.mds.a.host.stderr: 18: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f4a63c4fb4d]
2016-11-08T06:04:23.503 INFO:tasks.ceph.mds.a.host.stderr: 19: (()+0x7dc5) [0x7f4a62a60dc5]
2016-11-08T06:04:23.504 INFO:tasks.ceph.mds.a.host.stderr: 20: (clone()+0x6d) [0x7f4a6152bced]
2016-11-08T06:04:23.504 INFO:tasks.ceph.mds.a.host.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2016-11-08T06:04:23.504 INFO:tasks.ceph.mds.a.host.stderr:2016-11-08 11:04:27.279237 7f4a5d847700 -1 mds/SnapRealm.cc: In function 'void SnapRealm::check_cache()' thread 7f4a5d847700 time 2016-11-08 11:04:27.275928
2016-11-08T06:04:23.504 INFO:tasks.ceph.mds.a.host.stderr:mds/SnapRealm.cc: 254: FAILED assert(open)

#3 Updated by John Spray over 7 years ago

  • Assignee set to Greg Farnum

#4 Updated by Greg Farnum almost 7 years ago

  • Assignee deleted (Greg Farnum)

#5 Updated by Zheng Yan almost 7 years ago

run the test overnight, can't reproduce the crash.

#6 Updated by Zheng Yan almost 7 years ago

  • Status changed from New to Can't reproduce

Also available in: Atom PDF