Actions
Bug #5665
closedmds takeover too early causes new mds to shutdown
Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
after replay we get
2013-07-17 21:50:59.701234 7f39a5eb1700 1 mds.0.2 rejoin_done 2013-07-17 21:50:59.701236 7f39a5eb1700 10 mds.0.cache show_subtrees - no subtrees 2013-07-17 21:50:59.701239 7f39a5eb1700 7 mds.0.cache show_cache 2013-07-17 21:50:59.701241 7f39a5eb1700 7 mds.0.cache unlinked [inode 1 [...2,head] / auth v1 snaprealm=0x1b39900 f(v0 1=0+1) n(v0 1=0+1) (iversion lock) 0x1b48860] 2013-07-17 21:50:59.701248 7f39a5eb1700 7 mds.0.cache unlinked [inode 100 [...2,head] ~mds0/ auth v1 snaprealm=0x1b39480 f(v0 11=1+10) n(v0 11=1+10) (iversion lock) 0x1b48000] 2013-07-17 21:50:59.701254 7f39a5eb1700 1 mds.0.2 empty cache, no subtrees, leaving cluster 2013-07-17 21:50:59.701256 7f39a5eb1700 3 mds.0.2 request_state down:stopped
full logs for original and takeover mds attached
job was
ubuntu@teuthology:/a/teuthology-2013-07-17_20:00:59-fs-cuttlefish-testing-basic/71119$ cat orig.config.yaml kernel: kdb: true sha1: 77c8bf2f972a9d6ff446c49a41678bf931bbee44 machine_type: plana nuke-on-error: true overrides: admin_socket: branch: cuttlefish ceph: conf: client: debug client: 10 mds: debug mds: 20 debug ms: 1 mon: debug mon: 20 debug ms: 20 debug paxos: 20 osd: osd op thread timeout: 60 fs: btrfs log-whitelist: - slow request - wrongly marked me down sha1: 39bffac6b6c898882d03de392f7f2218933d942b ceph-deploy: conf: client: debug monc: 20 debug ms: 1 debug objecter: 20 debug rados: 20 log file: /var/log/ceph/ceph-..log mon: debug mon: 20 debug ms: 20 debug paxos: 20 install: ceph: sha1: 39bffac6b6c898882d03de392f7f2218933d942b s3tests: branch: cuttlefish workunit: sha1: 39bffac6b6c898882d03de392f7f2218933d942b roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 - mds.b-s-a tasks: - chef: null - clock.check: null - install: null - ceph: null - mds_thrash: null - ceph-fuse: null - workunit: clients: all: - suites/pjd.sh
Files
Updated by Greg Farnum over 10 years ago
Isn't this basically the MDS not getting to write all its startup state to disk?
Seems like maybe we should just prevent the tests from killing them prior to that instead of investing work to recover from it.
Updated by Zheng Yan over 10 years ago
- Status changed from New to Duplicate
I think this is duplication of #4894
Actions