Bug #5429: libceph: rcu stall, null deref in osd_reset->__reset_osd->__remove_osd - Linux kernel client - Ceph

Actions

Copy link

Bug #5429

closed

libceph: rcu stall, null deref in osd_reset->__reset_osd->__remove_osd

Added by Sage Weil almost 11 years ago. Updated over 9 years ago.

Status:

Resolved

Priority:

High

Assignee:

Ilya Dryomov

Category:

rbd

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

<1>[19828.585548] BUG: unable to handle kernel NULL pointer dereference at           (null)
<1>[19828.593437] IP: [<ffffffff813185cb>] rb_erase+0x1bb/0x370
<4>[19828.598865] PGD 0 
<4>[19828.600899] Oops: 0002 [#1] SMP 
[dumpcommon]kdb>   -bt

Stack traceback for pid 29967
0xffff88020dd03f20    29967        2  1    4   R  0xffff88020dd043a8 *kworker/4:1
 ffff88020b257b48 0000000000000018 0000000000000000 ffff88020b257b68
 ffffffffa05487bc ffff8802204e4000 ffff880224ec7950 ffff88020b257b98
 ffffffffa0548abf ffff8802204e4030 ffff880224ec7950 0000000000000000
Call Trace:
 [<ffffffffa05487bc>] ? __remove_osd+0x3c/0xa0 [libceph]
 [<ffffffffa0548abf>] ? __reset_osd+0x12f/0x170 [libceph]
 [<ffffffffa054a6de>] ? osd_reset+0x7e/0x2b0 [libceph]
 [<ffffffffa0541e21>] ? con_work+0x571/0x2d50 [libceph]
 [<ffffffff81080bb3>] ? idle_balance+0x133/0x180
 [<ffffffff81071b78>] ? finish_task_switch+0x48/0x110
 [<ffffffff81071b78>] ? finish_task_switch+0x48/0x110
 [<ffffffff8105f36f>] ? process_one_work+0x16f/0x540
 [<ffffffff8105f3da>] ? process_one_work+0x1da/0x540
 [<ffffffff8105f36f>] ? process_one_work+0x16f/0x540
 [<ffffffff810605bc>] ? worker_thread+0x11c/0x370
 [<ffffffff810604a0>] ? manage_workers.isra.20+0x2e0/0x2e0
 [<ffffffff8106727a>] ? kthread+0xea/0xf0
 [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
 [<ffffffff8163ff9c>] ? ret_from_fork+0x7c/0xb0
 [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
[dumpall]kdb>   -bta

but preceeded by 7 seconds earlier by

<4>[19778.015116] libceph: osd0 10.214.132.16:6801 socket closed (con state CONNECTING)
<3>[19799.355399] INFO: rcu_sched self-detected stall on CPU { 6}  (t=2100 jiffies g=245350 c=245349 q=2640)
<4>[19799.364789] CPU: 6 PID: 19284 Comm: kworker/6:2 Tainted: G        W    3.10.0-rc6-ceph-00091-g2dd322b #1
<4>[19799.374303] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
<3>[19799.375424] INFO: rcu_sched detected stalls on CPUs/tasks: { 6} (detected by 0, t=2102 jiffies, g=245350, c=245349, q=2640)
<6>[19799.375425] Task dump for CPU 6:
<6>[19799.375429] kworker/6:2     R  running task        0 19284      2 0x00000000
<6>[19799.375442] Workqueue: ceph-msgr con_work [libceph]
<4>[19799.375445]  ffff880125ff7de8 ffffffff8105f3da ffffffff8105f36f ffff8802272d3a00
<4>[19799.375447]  0000000000000000 00000006272d2f98 ffff880125ff7fd8 ffff8802272d2f80
<4>[19799.375450]  ffffffffa05666d0 0000000000000000 0000000000000000 ffffffffa055940e
<4>[19799.375451] Call Trace:
<4>[19799.375457]  [<ffffffff8105f3da>] ? process_one_work+0x1da/0x540
<4>[19799.375459]  [<ffffffff8105f36f>] ? process_one_work+0x16f/0x540
<4>[19799.375462]  [<ffffffff810605bc>] worker_thread+0x11c/0x370
<4>[19799.375464]  [<ffffffff810604a0>] ? manage_workers.isra.20+0x2e0/0x2e0
<4>[19799.375468]  [<ffffffff8106727a>] kthread+0xea/0xf0
<4>[19799.375471]  [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
<4>[19799.375476]  [<ffffffff8163ff9c>] ret_from_fork+0x7c/0xb0
<4>[19799.375478]  [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
<4>[19799.480150] Workqueue: ceph-msgr con_work [libceph]
<4>[19799.485054]  ffffffff81c4ca00 ffff8802272c3db8 ffffffff81630b82 ffff8802272c3e38
<4>[19799.492523]  ffffffff810e285a 0000000000000006 ffff8802272cd4e0 ffff8802272c3de8
<4>[19799.499995]  ffffffff810e644c 0000000000000086 0000000000000001 0000000000000006
<4>[19799.507469] Call Trace:
<4>[19799.509929]  <IRQ>  [<ffffffff81630b82>] dump_stack+0x19/0x1b
<4>[19799.515718]  [<ffffffff810e285a>] rcu_check_callbacks+0x21a/0x710
<4>[19799.521831]  [<ffffffff810e644c>] ? acct_account_cputime+0x1c/0x20
<4>[19799.528034]  [<ffffffff81050f68>] update_process_times+0x48/0x80
<4>[19799.534062]  [<ffffffff8109b616>] tick_sched_handle.isra.10+0x36/0x50
<4>[19799.540524]  [<ffffffff8109b71c>] tick_sched_timer+0x4c/0x80
<4>[19799.546203]  [<ffffffff8106a841>] __run_hrtimer+0x81/0x1e0
<4>[19799.551709]  [<ffffffff8109b6d0>] ? tick_nohz_handler+0xa0/0xa0
<4>[19799.557647]  [<ffffffff8106b147>] hrtimer_interrupt+0x107/0x260
<4>[19799.563588]  [<ffffffff81641b69>] smp_apic_timer_interrupt+0x69/0x99
<4>[19799.569964]  [<ffffffff81640caf>] apic_timer_interrupt+0x6f/0x80
<4>[19799.575987]  <EOI>  [<ffffffff8112edec>] ? shrink_inactive_list+0x18c/0x400
<4>[19799.582995]  [<ffffffff81637590>] ? _raw_spin_unlock_irq+0x30/0x40
<4>[19799.589195]  [<ffffffff81637595>] ? _raw_spin_unlock_irq+0x35/0x40
<4>[19799.595397]  [<ffffffff8112edec>] shrink_inactive_list+0x18c/0x400
<4>[19799.601595]  [<ffffffff8112f66d>] shrink_lruvec+0x2cd/0x4d0
<4>[19799.607187]  [<ffffffff8119855b>] ? bdi_queue_work+0x8b/0xf0
<4>[19799.612869]  [<ffffffff8112fc1c>] do_try_to_free_pages+0x11c/0x3a0
<4>[19799.619068]  [<ffffffff81130066>] try_to_free_pages+0xd6/0x1b0
<4>[19799.624922]  [<ffffffff811375b0>] ? next_zone+0x30/0x40
<4>[19799.630165]  [<ffffffff81125406>] __alloc_pages_nodemask+0x596/0x8f0
<4>[19799.636541]  [<ffffffff8115bb1a>] alloc_pages_current+0xba/0x170
<4>[19799.642569]  [<ffffffff81516d3e>] sk_page_frag_refill+0x7e/0x130
<4>[19799.648593]  [<ffffffff8156f5a5>] tcp_sendmsg+0x305/0xe50
<4>[19799.654010]  [<ffffffff8159af99>] inet_sendmsg+0xb9/0xf0
<4>[19799.659339]  [<ffffffff8159aee5>] ? inet_sendmsg+0x5/0xf0
<4>[19799.664760]  [<ffffffff81510de2>] sock_sendmsg+0xc2/0xe0
<4>[19799.670090]  [<ffffffff812ee35b>] ? chksum_update+0x1b/0x30
<4>[19799.675686]  [<ffffffff812ea1e8>] ? crypto_shash_update+0x18/0x30
<4>[19799.681814]  [<ffffffffa0000056>] ? crc32c+0x56/0x7c [libcrc32c]
<4>[19799.687842]  [<ffffffff81510e40>] kernel_sendmsg+0x40/0x60
<4>[19799.693353]  [<ffffffffa05424d8>] con_work+0xc28/0x2d50 [libceph]
<4>[19799.699468]  [<ffffffff81080bb3>] ? idle_balance+0x133/0x180
<4>[19799.705145]  [<ffffffff81071b78>] ? finish_task_switch+0x48/0x110
<4>[19799.711257]  [<ffffffff81071b78>] ? finish_task_switch+0x48/0x110
<4>[19799.717371]  [<ffffffff8105f3da>] process_one_work+0x1da/0x540
<4>[19799.723220]  [<ffffffff8105f36f>] ? process_one_work+0x16f/0x540
<4>[19799.729247]  [<ffffffff810605bc>] worker_thread+0x11c/0x370
<4>[19799.734840]  [<ffffffff810604a0>] ? manage_workers.isra.20+0x2e0/0x2e0
<4>[19799.741388]  [<ffffffff8106727a>] kthread+0xea/0xf0
<4>[19799.746283]  [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
<4>[19799.752658]  [<ffffffff8163ff9c>] ret_from_fork+0x7c/0xb0
<4>[19799.758076]  [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
<3>[19820.788280] INFO: rcu_sched self-detected stall on CPU
<3>[19820.788286] INFO: rcu_sched self-detected stall on CP
<4>[19820.788287]

job was

ubuntu@teuthology:/a/teuthology-2013-06-22_01:00:51-kernel-next-testing-basic/42857$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 2dd322b42d608a37f3e5beed57a8fbc673da6e32
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
      osd:
        filestore flush min: 0
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 94eada40460cc6010be23110ef8ce0e3d92691af
  install:
    ceph:
      sha1: 94eada40460cc6010be23110ef8ce0e3d92691af
  s3tests:
    branch: next
  workunit:
    sha1: 94eada40460cc6010be23110ef8ce0e3d92691af
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds: null
- kclient: null
- workunit:
    clients:
      all:
      - suites/ffsb.sh

Files

Download all files

dump2.txt (141 KB) dump2.txt		Sage Weil, 06/23/2013 10:23 AM
dump3.txt (144 KB) dump3.txt		Sage Weil, 06/28/2013 10:51 AM

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Linux kernel client

Custom queries

Bug #5429

libceph: rcu stall, null deref in osd_reset->__reset_osd->__remove_osd

Updated by Sage Weil almost 11 years ago

Updated by Ian Colle almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Josh Durgin over 9 years ago

Updated by Josh Durgin over 9 years ago

Updated by Ilya Dryomov over 9 years ago

Updated by JuanJose Galvez over 9 years ago

Updated by Ilya Dryomov over 9 years ago

Updated by Ilya Dryomov over 9 years ago

Updated by Ilya Dryomov over 9 years ago

Updated by Markus Blank-Burian over 9 years ago

Updated by Ilya Dryomov over 9 years ago