Bug #4369
closedosd: msgr connection not cleanly shut down
0%
Description
i saw a hang during umount due to the FILE_BUFFER pin on a bunch fo disconnected indoes. at first this looked like the fh leak fixed in wip-traceless earlier, but that was already applied. didn't come up during last run, but should hammer this job a few times to make sure it isn't still a problem.
kernel: branch: testing kdb: true nuke-on-error: true machine_type: mira overrides: ceph: conf: mds: mds inject traceless reply probability: 0.5 client: debug client: 20 debug ms: 20 debug objectcacher: 20 debug objecter: 20 osd: osd debug op order: 1 debug ms: 1 debug osd: 20 log-whitelist: - slow request branch: wip-traceless roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock: null - install: null - ceph: null - ceph-fuse: null - workunit: clients: all: - suites/ffsb.sh
Updated by Sage Weil about 11 years ago
- Status changed from New to In Progress
this is actually requests not coming back from the osd.. possibly/probably due to #4079?
reproducing with osd+journal logs.
Updated by Sage Weil about 11 years ago
now i see that the pgs aren't clean, because of a msgr weirdness... a notify reply isn't delivered due ot the msgr throwing out old messages. low seq # on incoming, vs very high seq number on local connection state. reproducing with debug ms = 20.
Updated by Sage Weil about 11 years ago
suspicious of commit 0f42eddef5da6c1babe9ed51ceaa3212a42c2ec4 for #4271 ...
Updated by Sage Weil about 11 years ago
- Subject changed from ceph-fuse: hang on shutdown after ffsb to osd: msgr connection not cleanly shut down
- Category set to OSD
- Backport set to bobtail
wip-osd-map
now passes my test!
Updated by Sage Weil about 11 years ago
- Status changed from In Progress to Fix Under Review
Updated by Sage Weil about 11 years ago
- Status changed from Fix Under Review to Resolved