Project

General

Profile

Bug #4524

libceph: bad ptr deref in rbtree for kick_requests

Added by Sage Weil about 11 years ago. Updated almost 11 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
libceph
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

<12>[ 2385.395127] init: ttyS2 main process ended, respawning
<6>[ 2387.400244] libceph: osd1 weight 0x10000 (in)
<1>[ 2387.429685] BUG: unable to handle kernel paging request at 0000000001000010
<1>[ 2387.436679] IP: [<ffffffff81335403>] rb_next+0x23/0x50
<4>[ 2387.441839] PGD 0 
<4>[ 2387.443871] Oops: 0000 [#1] SMP 
[5]kdb>                         
[5]kdb> bt
Stack traceback for pid 67
0xffff88020d253f20       67        2  1    5   R  0xffff88020d2543a0 *kworker/5:1
 ffff88020d2f1ab8 0000000000000018 ffffffff8165ba7e ffff88020d2f1b18
 ffffffffa032a043 ffff88020d799960 c045d13e93bc3e4d ffff88020d799a38
 000000004992c649 ffff88020d2f1b18 ffff88019c9b98d5 ffff88020d799960
Call Trace:
 [<ffffffff8165ba7e>] ? mutex_unlock+0xe/0x10
 [<ffffffffa032a043>] ? kick_requests+0x1f3/0x3e0 [libceph]
 [<ffffffffa032aeff>] ? ceph_osdc_handle_map+0x24f/0x580 [libceph]
 [<ffffffffa0326a60>] ? dispatch+0x120/0x7e0 [libceph]
 [<ffffffff810b783d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffffa0323684>] ? con_work+0x1f94/0x3010 [libceph]
 [<ffffffff8109651b>] ? idle_balance+0x1fb/0x330
 [<ffffffff8109651b>] ? idle_balance+0x1fb/0x330
 [<ffffffff810861c8>] ? finish_task_switch+0x48/0x110
 [<ffffffff81072ba9>] ? process_one_work+0x199/0x510
 [<ffffffff81072b3c>] ? process_one_work+0x12c/0x510
 [<ffffffffa03216f0>] ? ceph_msg_new+0x2e0/0x2e0 [libceph]
 [<ffffffff81074895>] ? worker_thread+0x165/0x3f0
 [<ffffffff81074730>] ? manage_workers+0x2a0/0x2a0
 [<ffffffff8107a4ba>] ? kthread+0xea/0xf0

job was
ubuntu@teuthology:/a/teuthology-2013-03-20_20:00:52-kernel-last-master-basic/424$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: d6c0dd6b0c196979fa7b34c1d99432fcb1b7e1df
nuke-on-error: true
overrides:
  ceph:
    conf:
      mds:
        debug mds: 1/20
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 9a7a9d06c0623ccc116a1d3b71c765c20a17e98e
  s3tests:
    branch: last
  workunit:
    sha1: 9a7a9d06c0623ccc116a1d3b71c765c20a17e98e
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds: null
- kclient: null
- workunit:
    clients:
      all:
      - suites/ffsb.sh

History

#1 Updated by Sage Weil about 11 years ago

  • Assignee set to Sage Weil

#2 Updated by Alex Elder about 11 years ago

I am fairly sure the bad pointer dereference is this line
in rb_next():

/*
 * If we have a right-hand child, go down and then left as far
 * as we can.
*/
if (node->rb_right) {
node = node->rb_right;
while (node->rb_left) <-----------
node=node->rb_left;
return (struct rb_node *)node;
}

And the pointer value indicates that the node pointer
(which was either the right child pointer of the original
node, or was the left child of the leftmost valid
descendent of that node) had value 0x0000000001000000.

This says nothing about how the red-black tree got corrupted
that way of course...

#3 Updated by Sage Weil almost 11 years ago

  • Priority changed from Urgent to High

downgrading this until we see it again

#4 Updated by Sage Weil almost 11 years ago

  • Status changed from 12 to Can't reproduce

Also available in: Atom PDF