Actions
Bug #395
closedmds: interval_set assert(0) during journal replay
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
From ML:
Date: Wed, 8 Sep 2010 14:45:04 +1000 From: Nat N <phenisha@gmail.com> To: ceph-devel@vger.kernel.org Subject: MDS crashing Hi I am testing the ceph file system, all has been going OK but now it seems my cmds is crasing with the following error .... <snip> ... 10.09.08_13:48:40.146886 419dc940 -- 172.17.8.3:6802/8771 <== osd8 172.17.8.11:6800/8930 7 ==== osd_op_reply(28 200.00000ef9 [read 0~4194304] = 0) v1 ==== 98+0+4194304 (1203150032 0 2774819477) 0xa22080 10.09.08_13:48:40.147220 44e45940 mds0.cache creating system inode with ino:100 10.09.08_13:48:41.293977 4333f940 -- 172.17.8.3:6802/8771 --> mon2 172.17.8.4:6789/0 -- mdsbeacon(8900/thorium003 up:replay seq 34 v212) v1 -- ?+0 0x2145500 10.09.08_13:48:41.295762 419dc940 -- 172.17.8.3:6802/8771 <== mon2 172.17.8.4:6789/0 48 ==== mdsbeacon(8900/thorium003 up:replay seq 34 v212) v2 ==== 112+0+0 (2962285251 0 0) 0x2145500 ./include/interval_set.h: In function 'void interval_set<T>::insert(T, T) [with T = inodeno_t]': ./include/interval_set.h:202: FAILED assert(0) 1: (EMetaBlob::replay(MDS*, LogSegment*)+0x3f75) [0x691625] 2: (EUpdate::replay(MDS*)+0x38) [0x694d28] 3: (MDLog::_replay_thread()+0x68e) [0x68801e] 4: (MDLog::ReplayThread::entry()+0xd) [0x4bb3cd] 5: (Thread::_entry_func(void*)+0xa) [0x49c71a] 6: /lib64/libpthread.so.0 [0x31d960673d] 7: (clone()+0x6d) [0x31d8ed3d1d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. I am using the unstable git branch as well as kernel .35, one mds and 3 monitors with around 10 osds unfortuantely I do not have access to the core files but please find the objdump cmds here: http://www.geopersonalassistant.com/dump/cmds.dump.gz
Updated by Sage Weil over 13 years ago
The problem was a session close event, followed by an open. The close didn't clear the session state, I believe because the client had already reconnected. This should fix it:
diff --git a/src/mds/journal.cc b/src/mds/journal.cc index ec2013d..64fc6a3 100644 --- a/src/mds/journal.cc +++ b/src/mds/journal.cc @@ -725,6 +725,8 @@ void ESession::replay(MDS *mds) Session *session = mds->sessionmap.get_session(client_inst.name); if (session->is_closed()) mds->sessionmap.remove_session(session); + else + session->clear(); // the client has reconnected; keep the Session, but reset } }
Updated by John Spray over 7 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1) - Target version deleted (
v0.22)
Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.
Actions