Feature #3754
krbd: use new request tracking code for notify ack
0%
Description
Two request types remain that still use the old request
tracking mechanism. One of them is sending acknowledgements
for incoming notify events. This needs to switch over to
the new mechanism so we can eliminate the old.
History
#1 Updated by Alex Elder about 11 years ago
I'm not sure yet whether the problem has to do with this
or whether it's in the existing "new request" code. But
I completed a full xfstests run and had moved on to a
blogbench run, but have hit a new crash, this time in
ceph_osdc_alloc_request(). I'll try to get at the root
cause in the morning.
#2 Updated by Alex Elder about 11 years ago
I've pretty much implemented this feature but having done
this I'm looking at a crash that happened with this code
in place. Not sure where to document my status so I'm just
putting info here (even if it doesn't directly apply).
It looks like this crash has occurred on an image request
that requires multiple (2) object requests. This may
well be the first such image request encountered in this
test.
The crash occurred because a bio_put() was attempted on
a bio that had no references. I dug into memory with kdb
and it looked to me like the object requests in this
image request had already been freed. One possible explanation
is that a field that determined whether processing for an
image request is done needs to be checked while holding a
lock. I'm doing a quick experiment to see if doing this
avoids the issue.
#3 Updated by Alex Elder about 11 years ago
Seems to be working better. It may end up being an
atomic rather than protecting with a spinlock, but
either way, I think I can move forward.
#4 Updated by Alex Elder about 11 years ago
Well that's unfortunate. I hit the same problem. I'll
need to take a closer look I guess.
#5 Updated by Alex Elder about 11 years ago
OK, that quick fix wasn't enough.
I had a spinlock protecting the check for something being
complete. But that wasn't enough, because two concurrent
threads could be on the path to checking that, and despite
the lock they could both come to the same conclusion that
they are responsible for destroying the image request.
Here's a failing scenario.
THREAD 0 THREAD 1 mark obj req 0 done mark obj req 1 done spin_lock() complete obj req 0 complete obj req 1 (we are done.) spin_unlock() spin_lock() no more to do spin_unlock() spin_lock() are we done? -> yes spin_unlock() spin_lock() are we done? -> yes spin_unlock() if (done) clean up image req if (done) clean up image req
Instead, the routine that completes object requests
will determine under lock whether it was the one that
completed the final object request in the image request.
It returns that value. The will then clean up the
image request only if that routine returns true.
So it looks more like:
THREAD 0 THREAD 1 mark obj req 0 done mark obj req 1 done spin_lock() complete obj req 0 complete obj req 1 that was the last one --> arrange to return TRUE spin_unlock() spin_lock() no more to do --> arrange to return FALSE spin_unlock() if (got TRUE) [we did] clean up image req if (got TRUE) [we did not] (don't) clean up image req
I'll test this one overnight, hopefully I got it this
time...
#6 Updated by Alex Elder about 11 years ago
Yeehah! All tests passed, including the previously-failing
blogbench.sh, fsstress, and two passes through xfstests.
#7 Updated by Sage Weil about 11 years ago
Yay!
#8 Updated by Alex Elder about 11 years ago
- Status changed from New to In Progress
I have completed implementing sending synchronous acknowledgement
in response to a watch request notification. It is in the midst
of a little more cleanup and final review work right now, in
preparation for posting it for review.
#9 Updated by Alex Elder about 11 years ago
- Status changed from In Progress to Fix Under Review
I've posted this code for review. I continue to do testing.
#10 Updated by Alex Elder about 11 years ago
- Project changed from Linux kernel client to rbd
#11 Updated by Alex Elder about 11 years ago
- Status changed from Fix Under Review to 7
#12 Updated by Alex Elder about 11 years ago
- Status changed from 7 to Closed
commit 1c8c3c5c571607a188203142020d80aa58e5e280
Author: Alex Elder <elder@inktank.com>
Date: Fri Nov 30 17:53:04 2012 -0600
rbd: use new code for notify ack