Actions
Bug #4853
closedceph-fuse hang on mount getattr
% Done:
0%
Source:
Q/A
Tags:
Backport:
cuttlefish
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
100% reproducible with this job file
machine_type: plana interactive-on-error: true overrides: ceph: conf: mon: debug mon: 20 debug ms: 20 debug paxos: 20 log-whitelist: - slow request branch: wip-up workunit: branch: wip-up roles: - - mon.a - mds.0 - osd.0 - osd.1 - - mon.b - osd.2 - osd.3 - - client.0 tasks: - chef: null - clock.check: null - install: branch: bobtail - ceph: conf: client: debug client: 20 debug ms: 1 mds: debug mds: 20 debug ms: 1 - install.upgrade: all: branch: next - ceph.restart: null - ceph-fuse: - client.0 - workunit: clients: client.0: - suites/dbench.sh
(this is from the upgrade suite).
wip-up adds some debug output htat i haven't had time to look at yet. afraid this is a regression from the replay cap release fix
Updated by Sage Weil almost 11 years ago
- Status changed from New to In Progress
- Assignee set to Sage Weil
Updated by Greg Farnum almost 11 years ago
/a/teuthology-2013-04-28_21:32:40-fs-next-testing-basic/2662
That's an fsstress run that got hung, I copied the client log into that dir.
Updated by Sage Weil almost 11 years ago
- Status changed from In Progress to Fix Under Review
fix in wip-up
here is the client-side log that shows we send the getattr twice. we only process the first reply, and drop the second, which is how the caps state gets screwed up.
2013-04-29 10:24:46.879319 7f57b68c5780 2 client.4204 mounted: have osdmap 0 and mdsmap 0 2013-04-29 10:24:46.879356 7f57b68c5780 10 client.4204 did not get mds through better means, so chose random mds -1 2013-04-29 10:24:46.879361 7f57b68c5780 20 client.4204 mds is -1 2013-04-29 10:24:46.879362 7f57b68c5780 10 client.4204 target mds.-1 not active, waiting for new mdsmap 2013-04-29 10:24:46.880366 7f57b0d09700 1 -- 10.214.131.31:0/21262 <== mon.0 10.214.131.32:6789/0 5 ==== osd_map(6..6 src has 1..6) v3 ==== 3104+0+0 (4129061429 0 0) 0x22776c0 con 0x2290420 2013-04-29 10:24:46.880504 7f57b0d09700 1 -- 10.214.131.31:0/21262 <== mon.0 10.214.131.32:6789/0 6 ==== mdsmap(e 5) v1 ==== 583+0+0 (929839905 0 0) 0x2281600 con 0x2290420 2013-04-29 10:24:46.880527 7f57b0d09700 1 client.4204 handle_mds_map epoch 5 2013-04-29 10:24:46.880563 7f57b0d09700 1 -- 10.214.131.31:0/21262 <== mon.0 10.214.131.32:6789/0 7 ==== osd_map(6..6 src has 1..6) v3 ==== 3104+0+0 (4129061429 0 0) 0x2277b40 con 0x2290420 2013-04-29 10:24:46.880578 7f57b0d09700 1 -- 10.214.131.31:0/21262 <== mon.0 10.214.131.32:6789/0 8 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (2192033517 0 0) 0x22911c0 con 0x2290420 2013-04-29 10:24:46.880646 7f57b68c5780 10 client.4204 did not get mds through better means, so chose random mds 0 2013-04-29 10:24:46.880656 7f57b68c5780 20 client.4204 mds is 0 2013-04-29 10:24:46.880666 7f57b68c5780 10 client.4204 _open_mds_session mds.0 2013-04-29 10:24:46.880732 7f57b68c5780 1 -- 10.214.131.31:0/21262 --> 10.214.131.32:6806/3185 -- client_session(request_open) v1 -- ?+0 0x2291700 con 0x2290dc0 2013-04-29 10:24:46.880748 7f57b68c5780 10 client.4204 waiting for session to mds.0 to open 2013-04-29 10:24:46.881508 7f57af405700 0 -- 10.214.131.31:0/21262 >> 10.214.131.32:6806/3185 pipe(0x2292280 sd=7 :53754 s=1 pgs=0 cs=0 l=0).connect claims to be 10.214.131.32:6806/3663 not 10.214.131.32:6806/3185 - wrong node! 2013-04-29 10:24:46.881585 7f57af405700 0 -- 10.214.131.31:0/21262 >> 10.214.131.32:6806/3185 pipe(0x2292280 sd=7 :53754 s=1 pgs=0 cs=0 l=0).fault 2013-04-29 10:24:46.882149 7f57af405700 0 -- 10.214.131.31:0/21262 >> 10.214.131.32:6806/3185 pipe(0x2292280 sd=7 :53755 s=1 pgs=0 cs=0 l=0).connect claims to be 10.214.131.32:6806/3663 not 10.214.131.32:6806/3185 - wrong node! 2013-04-29 10:24:46.994039 7f57b0d09700 1 -- 10.214.131.31:0/21262 <== mon.0 10.214.131.32:6789/0 9 ==== mdsmap(e 6) v1 ==== 575+0+0 (907463370 0 0) 0x2281e00 con 0x2290420 2013-04-29 10:24:46.994070 7f57b0d09700 1 client.4204 handle_mds_map epoch 6 2013-04-29 10:24:46.994084 7f57b0d09700 1 -- 10.214.131.31:0/21262 mark_down 10.214.131.32:6806/3185 -- 0x2292280 2013-04-29 10:24:47.498407 7f57b0d09700 1 -- 10.214.131.31:0/21262 <== mon.0 10.214.131.32:6789/0 10 ==== mdsmap(e 7) v1 ==== 583+0+0 (2807803663 0 0) 0x2281600 con 0x2290420 2013-04-29 10:24:47.498437 7f57b0d09700 1 client.4204 handle_mds_map epoch 7 2013-04-29 10:24:47.498451 7f57b0d09700 1 -- 10.214.131.31:0/21262 mark_down 10.214.131.32:6806/3185 -- pipe dne 2013-04-29 10:24:47.879467 7f57b1d0b700 10 client.4204 renew_caps() 2013-04-29 10:24:47.879482 7f57b1d0b700 15 client.4204 renew_caps requesting from mds.0 2013-04-29 10:25:01.302268 7f57b0d09700 1 -- 10.214.131.31:0/21262 <== mon.0 10.214.131.32:6789/0 11 ==== mdsmap(e 8) v1 ==== 583+0+0 (2888500851 0 0) 0x2281a00 con 0x2290420 2013-04-29 10:25:01.302298 7f57b0d09700 1 client.4204 handle_mds_map epoch 8 2013-04-29 10:25:01.302355 7f57b0d09700 10 client.4204 send_reconnect to mds.0 2013-04-29 10:25:01.302375 7f57b0d09700 10 client.4204 connect_mds_targets for mds.0 2013-04-29 10:25:01.302380 7f57b0d09700 1 -- 10.214.131.31:0/21262 --> 10.214.131.32:6806/3663 -- client_reconnect(0 caps) v3 -- ?+0 0x2281600 con 0x2290dc0 2013-04-29 10:25:01.306807 7f57b0d09700 10 client.4204 ms_handle_connect on 10.214.131.32:6806/3663 2013-04-29 10:25:01.307535 7f57b0d09700 1 -- 10.214.131.31:0/21262 <== mds.0 10.214.131.32:6806/3663 1 ==== client_session(open) v1 ==== 28+0+0 (2719320884 0 0) 0x2291700 con 0x2290dc0 2013-04-29 10:25:01.307569 7f57b0d09700 10 client.4204 handle_client_session client_session(open) v1 from mds.0 2013-04-29 10:25:01.307578 7f57b0d09700 10 client.4204 renew_caps mds.0 2013-04-29 10:25:01.307580 7f57b0d09700 1 -- 10.214.131.31:0/21262 --> 10.214.131.32:6806/3663 -- client_session(request_renewcaps seq 1) v1 -- ?+0 0x22911c0 con 0x2290dc0 2013-04-29 10:25:01.307598 7f57b0d09700 10 client.4204 connect_mds_targets for mds.0 2013-04-29 10:25:01.307685 7f57b68c5780 10 client.4204 send_request rebuilding request 1 for mds.0 2013-04-29 10:25:01.307717 7f57b68c5780 20 client.4204 encode_cap_releases enter (req: 0x22a0000, mds: 0) 2013-04-29 10:25:01.307722 7f57b68c5780 20 client.4204 send_request set sent_stamp to 2013-04-29 10:25:01.307722 2013-04-29 10:25:01.307727 7f57b68c5780 10 client.4204 send_request client_request(unknown.0:1 getattr pAsLsXsFs #1) v1 to mds.0 2013-04-29 10:25:01.307740 7f57b68c5780 1 -- 10.214.131.31:0/21262 --> 10.214.131.32:6806/3663 -- client_request(client.4204:1 getattr pAsLsXsFs #1) v1 -- ?+0 0x22ab500 con 0x2290dc0 2013-04-29 10:25:01.307761 7f57b68c5780 20 client.4204 awaiting reply|forward|kick on 0x7fffed882b50 2013-04-29 10:25:01.438798 7f57b0d09700 1 -- 10.214.131.31:0/21262 <== mds.0 10.214.131.32:6806/3663 2 ==== client_session(open) v1 ==== 28+0+0 (2719320884 0 0) 0x22911c0 con 0x2290dc0 2013-04-29 10:25:01.438828 7f57b0d09700 10 client.4204 handle_client_session client_session(open) v1 from mds.0 2013-04-29 10:25:01.438835 7f57b0d09700 10 client.4204 renew_caps mds.0 2013-04-29 10:25:01.438837 7f57b0d09700 1 -- 10.214.131.31:0/21262 --> 10.214.131.32:6806/3663 -- client_session(request_renewcaps seq 2) v1 -- ?+0 0x2291700 con 0x2290dc0 2013-04-29 10:25:01.438853 7f57b0d09700 10 client.4204 connect_mds_targets for mds.0 2013-04-29 10:25:01.570996 7f57b0d09700 1 -- 10.214.131.31:0/21262 <== mon.0 10.214.131.32:6789/0 12 ==== mdsmap(e 9) v1 ==== 583+0+0 (1513426008 0 0) 0x2281800 con 0x2290420 2013-04-29 10:25:01.571026 7f57b0d09700 1 client.4204 handle_mds_map epoch 9 2013-04-29 10:25:01.687263 7f57b0d09700 1 -- 10.214.131.31:0/21262 <== mon.0 10.214.131.32:6789/0 13 ==== mdsmap(e 10) v1 ==== 583+0+0 (2013845237 0 0) 0x2281600 con 0x2290420 2013-04-29 10:25:01.687293 7f57b0d09700 1 client.4204 handle_mds_map epoch 10 2013-04-29 10:25:01.687306 7f57b0d09700 10 client.4204 kick_requests for mds.0 2013-04-29 10:25:01.687309 7f57b0d09700 10 client.4204 send_request rebuilding request 1 for mds.0 2013-04-29 10:25:01.687311 7f57b0d09700 20 client.4204 encode_cap_releases enter (req: 0x22a0000, mds: 0) 2013-04-29 10:25:01.687313 7f57b0d09700 10 client.4204 send_request client_request(unknown.0:1 getattr pAsLsXsFs #1) v1 to mds.0 2013-04-29 10:25:01.687321 7f57b0d09700 1 -- 10.214.131.31:0/21262 --> 10.214.131.32:6806/3663 -- client_request(client.4204:1 getattr pAsLsXsFs #1) v1 -- ?+0 0x2292c80 con 0x2290dc0 2013-04-29 10:25:01.687344 7f57b0d09700 10 client.4204 kick_flushing_caps mds.0 2013-04-29 10:25:01.687346 7f57b0d09700 10 client.4204 connect_mds_targets for mds.0
Updated by Sage Weil almost 11 years ago
- Priority changed from Urgent to High
- Backport set to cuttlefish
this is not a new race, and is only triggered when a mds session open and request race with an mds restart. not a cuttlefish blocker, although we should backport if it doesn't make the release.
Updated by Sage Weil almost 11 years ago
- Status changed from Fix Under Review to Resolved
commit:ee553ac279664b7f1b527a0b1b56768134cf5157
Actions