Bug #3950
closedkrbd: new assertion failure running concurrent rbd test
0%
Description
(I think this is a new issue, I haven't investigated it yet.)
I hit an assertion failure while running my new test script.
Maybe it's an old problem, we'll see... It was not expected
though.
[ 683.573981]
[ 683.573981] Assertion failure in rbd_osd_req_callback() at line 1280:
[ 683.573981]
[ 683.573981] rbd_assert(osd_req == obj_request->osd_req);
[ 683.573981]
Entering kdb (current=0xffff8801bdbede20, pid 4626) on processor 0 Oops: (null)
due to oops @ 0xffffffffa0272a4d
CPU 0 dPid: 4626, comm: kworker/0:2 Not tainted 3.6.0-ceph-00226-g9e08c73 #1 Dell Inc. PowerEdge R410/01V648
dRIP: 0010:[<ffffffffa0272a4d>] [<ffffffffa0272a4d>] rbd_osd_req_callback+0x17d/0x1e0 [rbd]
dRSP: 0018:ffff8801bd4e9b70 EFLAGS: 00010286
dRAX: 000000000000006a RBX: ffff88020cddc500 RCX: 0000000000009291
dRDX: ffff8801bdbede20 RSI: 0000000000000001 RDI: 0000000000000246
dRBP: ffff8801bd4e9b80 R08: 0000000000000002 R09: 0000000000000000
dR10: 0000000000000452 R11: 0000000000005000 R12: 0000000000001e51
dR13: 0000000000000004 R14: ffff88020d992800 R15: 0000000000000000
dFS: 0000000000000000(0000) GS:ffff880227200000(0000) knlGS:0000000000000000
dCS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
dCR2: 00000000018e1058 CR3: 00000002223f7000 CR4: 00000000000007f0
dDR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
dDR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
dProcess kworker/0:2 (pid: 4626, threadinfo ffff8801bd4e8000, task ffff8801bdbede20)
dStack:
ffff88020cddce00 0000000000001e51 ffff8801bd4e9bf0 ffffffffa04049ef
ffff88020b9dd1b0 ffff880200000025 ffff88020cdb2800 ffff88020cff5950
00000000bd4e9be0 ffff88020cff5a28 ffff88020b9dd030 ffff88020b9dd030
dCall Trace:
d [<ffffffffa04049ef>] dispatch+0x26f/0x500 [libceph]
more>
d [<ffffffffa03fb13b>] con_work+0xb7b/0x2fd0 [libceph]
d [<ffffffff8108da40>] ? cpuacct_charge+0xb0/0xf0
d [<ffffffff8108d9b8>] ? cpuacct_charge+0x28/0xf0
d [<ffffffff810848a8>] ? finish_task_switch+0x48/0x110
d [<ffffffff810717e0>] process_one_work+0x1a0/0x5f0
d [<ffffffff81071776>] ? process_one_work+0x136/0x5f0
d [<ffffffffa03fa5c0>] ? ceph_msg_new+0x2e0/0x2e0 [libceph]
d [<ffffffff810735fd>] worker_thread+0x18d/0x4f0
d [<ffffffff81073470>] ? manage_workers+0x320/0x320
d [<ffffffff810791fe>] kthread+0xae/0xc0
d [<ffffffff810b393d>] ? trace_hardirqs_on+0xd/0x10
d [<ffffffff8163f3c4>] kernel_thread_helper+0x4/0x10
d [<ffffffff81635880>] ? _raw_spin_unlock_irq+0x30/0x40
d [<ffffffff816360b0>] ? retint_restore_args+0x13/0x13
d [<ffffffff81079150>] ? flush_kthread_work+0x1a0/0x1a0
d [<ffffffff8163f3c0>] ? gs_change+0x13/0x13
dCode: ff c7 43 60 00 00 00 00 eb 99 48 c7 c1 48 8c 27 a0 ba 00 05 00 00 48 c7 c6 f0 9e 27 a0 48 c7 c7 78 8b 27 a0 31 c0 e8 be 7b
[0]kdb>
Updated by Alex Elder over 11 years ago
Weird. It looks to me like the object request that's
just completing is already done, meaning we got
a callback from an osd request more than once. That
would explain why osd_req->r_priv did not point to
an object request pointer that pointed back to the
osd request.
The (r_priv) object request pointer does appear to be
valid (or recently valid) though. But its osd_req
field points to something else--not an osd request.
Updated by Alex Elder over 11 years ago
The object being operated on is the rbd header image, in
this case named "image.5X5ZNB.rbd". The object request type
is "obj_req_nodata" so it's either a CEPH_OSD_OP_WATCH or a
CEPH_OSD_OP_NOTIFY_ACK. But without access to the osd request
I can't tell.
This test only uses format 1 images.
There's still a chance this is just a duplicate of 3937.
Updated by Alex Elder over 11 years ago
OK, I do have the osd request pointer now. It was available
in register R14. And with a little work I can determine that
the op in progress was (little endian):
op = 0x220f = CEPH_OSD_OP_WATCH flags = 0x00000000 cookie = 0x0000000000000014 ver = 0x0000000000000000 flag = 0x00
So the request was shutting down the watch request for the
header object, from rbd_dev_release().
Is there any way an osd request will supply more than one
response to a request (that has not been marked for linger)?
Updated by Ian Colle about 11 years ago
- Status changed from In Progress to Can't reproduce
Not seeing this in current code.