Bug #4524
libceph: bad ptr deref in rbtree for kick_requests
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
<12>[ 2385.395127] init: ttyS2 main process ended, respawning <6>[ 2387.400244] libceph: osd1 weight 0x10000 (in) <1>[ 2387.429685] BUG: unable to handle kernel paging request at 0000000001000010 <1>[ 2387.436679] IP: [<ffffffff81335403>] rb_next+0x23/0x50 <4>[ 2387.441839] PGD 0 <4>[ 2387.443871] Oops: 0000 [#1] SMP [5]kdb> [5]kdb> bt Stack traceback for pid 67 0xffff88020d253f20 67 2 1 5 R 0xffff88020d2543a0 *kworker/5:1 ffff88020d2f1ab8 0000000000000018 ffffffff8165ba7e ffff88020d2f1b18 ffffffffa032a043 ffff88020d799960 c045d13e93bc3e4d ffff88020d799a38 000000004992c649 ffff88020d2f1b18 ffff88019c9b98d5 ffff88020d799960 Call Trace: [<ffffffff8165ba7e>] ? mutex_unlock+0xe/0x10 [<ffffffffa032a043>] ? kick_requests+0x1f3/0x3e0 [libceph] [<ffffffffa032aeff>] ? ceph_osdc_handle_map+0x24f/0x580 [libceph] [<ffffffffa0326a60>] ? dispatch+0x120/0x7e0 [libceph] [<ffffffff810b783d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffa0323684>] ? con_work+0x1f94/0x3010 [libceph] [<ffffffff8109651b>] ? idle_balance+0x1fb/0x330 [<ffffffff8109651b>] ? idle_balance+0x1fb/0x330 [<ffffffff810861c8>] ? finish_task_switch+0x48/0x110 [<ffffffff81072ba9>] ? process_one_work+0x199/0x510 [<ffffffff81072b3c>] ? process_one_work+0x12c/0x510 [<ffffffffa03216f0>] ? ceph_msg_new+0x2e0/0x2e0 [libceph] [<ffffffff81074895>] ? worker_thread+0x165/0x3f0 [<ffffffff81074730>] ? manage_workers+0x2a0/0x2a0 [<ffffffff8107a4ba>] ? kthread+0xea/0xf0
job was
ubuntu@teuthology:/a/teuthology-2013-03-20_20:00:52-kernel-last-master-basic/424$ cat orig.config.yaml kernel: kdb: true sha1: d6c0dd6b0c196979fa7b34c1d99432fcb1b7e1df nuke-on-error: true overrides: ceph: conf: mds: debug mds: 1/20 osd: osd op thread timeout: 60 fs: btrfs log-whitelist: - slow request sha1: 9a7a9d06c0623ccc116a1d3b71c765c20a17e98e s3tests: branch: last workunit: sha1: 9a7a9d06c0623ccc116a1d3b71c765c20a17e98e roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: null - kclient: null - workunit: clients: all: - suites/ffsb.sh
History
#1 Updated by Sage Weil about 11 years ago
- Assignee set to Sage Weil
#2 Updated by Alex Elder about 11 years ago
I am fairly sure the bad pointer dereference is this line
in rb_next():
/*
* If we have a right-hand child, go down and then left as far
* as we can.
*/
if (node->rb_right) {
node = node->rb_right;
while (node->rb_left) <-----------
node=node->rb_left;
return (struct rb_node *)node;
}
And the pointer value indicates that the node pointer
(which was either the right child pointer of the original
node, or was the left child of the leftmost valid
descendent of that node) had value 0x0000000001000000.
This says nothing about how the red-black tree got corrupted
that way of course...
#3 Updated by Sage Weil almost 11 years ago
- Priority changed from Urgent to High
downgrading this until we see it again
#4 Updated by Sage Weil almost 11 years ago
- Status changed from 12 to Can't reproduce