Project

General

Profile

Feature #3754

krbd: use new request tracking code for notify ack

Added by Alex Elder about 11 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Two request types remain that still use the old request
tracking mechanism. One of them is sending acknowledgements
for incoming notify events. This needs to switch over to
the new mechanism so we can eliminate the old.

History

#1 Updated by Alex Elder about 11 years ago

I'm not sure yet whether the problem has to do with this
or whether it's in the existing "new request" code. But
I completed a full xfstests run and had moved on to a
blogbench run, but have hit a new crash, this time in
ceph_osdc_alloc_request(). I'll try to get at the root
cause in the morning.

#2 Updated by Alex Elder about 11 years ago

I've pretty much implemented this feature but having done
this I'm looking at a crash that happened with this code
in place. Not sure where to document my status so I'm just
putting info here (even if it doesn't directly apply).

It looks like this crash has occurred on an image request
that requires multiple (2) object requests. This may
well be the first such image request encountered in this
test.

The crash occurred because a bio_put() was attempted on
a bio that had no references. I dug into memory with kdb
and it looked to me like the object requests in this
image request had already been freed. One possible explanation
is that a field that determined whether processing for an
image request is done needs to be checked while holding a
lock. I'm doing a quick experiment to see if doing this
avoids the issue.

#3 Updated by Alex Elder about 11 years ago

Seems to be working better. It may end up being an
atomic rather than protecting with a spinlock, but
either way, I think I can move forward.

#4 Updated by Alex Elder about 11 years ago

Well that's unfortunate. I hit the same problem. I'll
need to take a closer look I guess.

#5 Updated by Alex Elder about 11 years ago

OK, that quick fix wasn't enough.

I had a spinlock protecting the check for something being
complete. But that wasn't enough, because two concurrent
threads could be on the path to checking that, and despite
the lock they could both come to the same conclusion that
they are responsible for destroying the image request.

Here's a failing scenario.

    THREAD 0              THREAD 1
mark obj req 0 done
                       mark obj req 1 done
spin_lock()
complete obj req 0
complete obj req 1
(we are done.)
spin_unlock()
                       spin_lock()
                       no more to do
                       spin_unlock()
spin_lock()
are we done? -> yes 
spin_unlock()
                       spin_lock()
                       are we done? -> yes 
                       spin_unlock()
if (done)
    clean up image req
                       if (done)
                           clean up image req

Instead, the routine that completes object requests
will determine under lock whether it was the one that
completed the final object request in the image request.
It returns that value. The will then clean up the
image request only if that routine returns true.

So it looks more like:

    THREAD 0              THREAD 1
mark obj req 0 done
                       mark obj req 1 done
spin_lock()
complete obj req 0
complete obj req 1
that was the last one
 --> arrange to return TRUE
spin_unlock()
                       spin_lock()
                       no more to do
                        --> arrange to return FALSE
                       spin_unlock()

if (got TRUE)  [we did]
    clean up image req
                       if (got TRUE)  [we did not]
                           (don't) clean up image req

I'll test this one overnight, hopefully I got it this
time...

#6 Updated by Alex Elder about 11 years ago

Yeehah! All tests passed, including the previously-failing
blogbench.sh, fsstress, and two passes through xfstests.

#7 Updated by Sage Weil about 11 years ago

Yay!

#8 Updated by Alex Elder about 11 years ago

  • Status changed from New to In Progress

I have completed implementing sending synchronous acknowledgement
in response to a watch request notification. It is in the midst
of a little more cleanup and final review work right now, in
preparation for posting it for review.

#9 Updated by Alex Elder about 11 years ago

  • Status changed from In Progress to Fix Under Review

I've posted this code for review. I continue to do testing.

#10 Updated by Alex Elder about 11 years ago

  • Project changed from Linux kernel client to rbd

#11 Updated by Alex Elder about 11 years ago

  • Status changed from Fix Under Review to 7

#12 Updated by Alex Elder about 11 years ago

  • Status changed from 7 to Closed

commit 1c8c3c5c571607a188203142020d80aa58e5e280
Author: Alex Elder <>
Date: Fri Nov 30 17:53:04 2012 -0600

rbd: use new code for notify ack

Also available in: Atom PDF