Project

General

Profile

Actions

Bug #5429

closed

libceph: rcu stall, null deref in osd_reset->__reset_osd->__remove_osd

Added by Sage Weil almost 11 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
rbd
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

<1>[19828.585548] BUG: unable to handle kernel NULL pointer dereference at           (null)
<1>[19828.593437] IP: [<ffffffff813185cb>] rb_erase+0x1bb/0x370
<4>[19828.598865] PGD 0 
<4>[19828.600899] Oops: 0002 [#1] SMP 
[dumpcommon]kdb>   -bt

Stack traceback for pid 29967
0xffff88020dd03f20    29967        2  1    4   R  0xffff88020dd043a8 *kworker/4:1
 ffff88020b257b48 0000000000000018 0000000000000000 ffff88020b257b68
 ffffffffa05487bc ffff8802204e4000 ffff880224ec7950 ffff88020b257b98
 ffffffffa0548abf ffff8802204e4030 ffff880224ec7950 0000000000000000
Call Trace:
 [<ffffffffa05487bc>] ? __remove_osd+0x3c/0xa0 [libceph]
 [<ffffffffa0548abf>] ? __reset_osd+0x12f/0x170 [libceph]
 [<ffffffffa054a6de>] ? osd_reset+0x7e/0x2b0 [libceph]
 [<ffffffffa0541e21>] ? con_work+0x571/0x2d50 [libceph]
 [<ffffffff81080bb3>] ? idle_balance+0x133/0x180
 [<ffffffff81071b78>] ? finish_task_switch+0x48/0x110
 [<ffffffff81071b78>] ? finish_task_switch+0x48/0x110
 [<ffffffff8105f36f>] ? process_one_work+0x16f/0x540
 [<ffffffff8105f3da>] ? process_one_work+0x1da/0x540
 [<ffffffff8105f36f>] ? process_one_work+0x16f/0x540
 [<ffffffff810605bc>] ? worker_thread+0x11c/0x370
 [<ffffffff810604a0>] ? manage_workers.isra.20+0x2e0/0x2e0
 [<ffffffff8106727a>] ? kthread+0xea/0xf0
 [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
 [<ffffffff8163ff9c>] ? ret_from_fork+0x7c/0xb0
 [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
[dumpall]kdb>   -bta


but preceeded by 7 seconds earlier by
<4>[19778.015116] libceph: osd0 10.214.132.16:6801 socket closed (con state CONNECTING)
<3>[19799.355399] INFO: rcu_sched self-detected stall on CPU { 6}  (t=2100 jiffies g=245350 c=245349 q=2640)
<4>[19799.364789] CPU: 6 PID: 19284 Comm: kworker/6:2 Tainted: G        W    3.10.0-rc6-ceph-00091-g2dd322b #1
<4>[19799.374303] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
<3>[19799.375424] INFO: rcu_sched detected stalls on CPUs/tasks: { 6} (detected by 0, t=2102 jiffies, g=245350, c=245349, q=2640)
<6>[19799.375425] Task dump for CPU 6:
<6>[19799.375429] kworker/6:2     R  running task        0 19284      2 0x00000000
<6>[19799.375442] Workqueue: ceph-msgr con_work [libceph]
<4>[19799.375445]  ffff880125ff7de8 ffffffff8105f3da ffffffff8105f36f ffff8802272d3a00
<4>[19799.375447]  0000000000000000 00000006272d2f98 ffff880125ff7fd8 ffff8802272d2f80
<4>[19799.375450]  ffffffffa05666d0 0000000000000000 0000000000000000 ffffffffa055940e
<4>[19799.375451] Call Trace:
<4>[19799.375457]  [<ffffffff8105f3da>] ? process_one_work+0x1da/0x540
<4>[19799.375459]  [<ffffffff8105f36f>] ? process_one_work+0x16f/0x540
<4>[19799.375462]  [<ffffffff810605bc>] worker_thread+0x11c/0x370
<4>[19799.375464]  [<ffffffff810604a0>] ? manage_workers.isra.20+0x2e0/0x2e0
<4>[19799.375468]  [<ffffffff8106727a>] kthread+0xea/0xf0
<4>[19799.375471]  [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
<4>[19799.375476]  [<ffffffff8163ff9c>] ret_from_fork+0x7c/0xb0
<4>[19799.375478]  [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
<4>[19799.480150] Workqueue: ceph-msgr con_work [libceph]
<4>[19799.485054]  ffffffff81c4ca00 ffff8802272c3db8 ffffffff81630b82 ffff8802272c3e38
<4>[19799.492523]  ffffffff810e285a 0000000000000006 ffff8802272cd4e0 ffff8802272c3de8
<4>[19799.499995]  ffffffff810e644c 0000000000000086 0000000000000001 0000000000000006
<4>[19799.507469] Call Trace:
<4>[19799.509929]  <IRQ>  [<ffffffff81630b82>] dump_stack+0x19/0x1b
<4>[19799.515718]  [<ffffffff810e285a>] rcu_check_callbacks+0x21a/0x710
<4>[19799.521831]  [<ffffffff810e644c>] ? acct_account_cputime+0x1c/0x20
<4>[19799.528034]  [<ffffffff81050f68>] update_process_times+0x48/0x80
<4>[19799.534062]  [<ffffffff8109b616>] tick_sched_handle.isra.10+0x36/0x50
<4>[19799.540524]  [<ffffffff8109b71c>] tick_sched_timer+0x4c/0x80
<4>[19799.546203]  [<ffffffff8106a841>] __run_hrtimer+0x81/0x1e0
<4>[19799.551709]  [<ffffffff8109b6d0>] ? tick_nohz_handler+0xa0/0xa0
<4>[19799.557647]  [<ffffffff8106b147>] hrtimer_interrupt+0x107/0x260
<4>[19799.563588]  [<ffffffff81641b69>] smp_apic_timer_interrupt+0x69/0x99
<4>[19799.569964]  [<ffffffff81640caf>] apic_timer_interrupt+0x6f/0x80
<4>[19799.575987]  <EOI>  [<ffffffff8112edec>] ? shrink_inactive_list+0x18c/0x400
<4>[19799.582995]  [<ffffffff81637590>] ? _raw_spin_unlock_irq+0x30/0x40
<4>[19799.589195]  [<ffffffff81637595>] ? _raw_spin_unlock_irq+0x35/0x40
<4>[19799.595397]  [<ffffffff8112edec>] shrink_inactive_list+0x18c/0x400
<4>[19799.601595]  [<ffffffff8112f66d>] shrink_lruvec+0x2cd/0x4d0
<4>[19799.607187]  [<ffffffff8119855b>] ? bdi_queue_work+0x8b/0xf0
<4>[19799.612869]  [<ffffffff8112fc1c>] do_try_to_free_pages+0x11c/0x3a0
<4>[19799.619068]  [<ffffffff81130066>] try_to_free_pages+0xd6/0x1b0
<4>[19799.624922]  [<ffffffff811375b0>] ? next_zone+0x30/0x40
<4>[19799.630165]  [<ffffffff81125406>] __alloc_pages_nodemask+0x596/0x8f0
<4>[19799.636541]  [<ffffffff8115bb1a>] alloc_pages_current+0xba/0x170
<4>[19799.642569]  [<ffffffff81516d3e>] sk_page_frag_refill+0x7e/0x130
<4>[19799.648593]  [<ffffffff8156f5a5>] tcp_sendmsg+0x305/0xe50
<4>[19799.654010]  [<ffffffff8159af99>] inet_sendmsg+0xb9/0xf0
<4>[19799.659339]  [<ffffffff8159aee5>] ? inet_sendmsg+0x5/0xf0
<4>[19799.664760]  [<ffffffff81510de2>] sock_sendmsg+0xc2/0xe0
<4>[19799.670090]  [<ffffffff812ee35b>] ? chksum_update+0x1b/0x30
<4>[19799.675686]  [<ffffffff812ea1e8>] ? crypto_shash_update+0x18/0x30
<4>[19799.681814]  [<ffffffffa0000056>] ? crc32c+0x56/0x7c [libcrc32c]
<4>[19799.687842]  [<ffffffff81510e40>] kernel_sendmsg+0x40/0x60
<4>[19799.693353]  [<ffffffffa05424d8>] con_work+0xc28/0x2d50 [libceph]
<4>[19799.699468]  [<ffffffff81080bb3>] ? idle_balance+0x133/0x180
<4>[19799.705145]  [<ffffffff81071b78>] ? finish_task_switch+0x48/0x110
<4>[19799.711257]  [<ffffffff81071b78>] ? finish_task_switch+0x48/0x110
<4>[19799.717371]  [<ffffffff8105f3da>] process_one_work+0x1da/0x540
<4>[19799.723220]  [<ffffffff8105f36f>] ? process_one_work+0x16f/0x540
<4>[19799.729247]  [<ffffffff810605bc>] worker_thread+0x11c/0x370
<4>[19799.734840]  [<ffffffff810604a0>] ? manage_workers.isra.20+0x2e0/0x2e0
<4>[19799.741388]  [<ffffffff8106727a>] kthread+0xea/0xf0
<4>[19799.746283]  [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
<4>[19799.752658]  [<ffffffff8163ff9c>] ret_from_fork+0x7c/0xb0
<4>[19799.758076]  [<ffffffff81067190>] ? flush_kthread_worker+0x150/0x150
<3>[19820.788280] INFO: rcu_sched self-detected stall on CPU
<3>[19820.788286] INFO: rcu_sched self-detected stall on CP
<4>[19820.788287]  

job was
ubuntu@teuthology:/a/teuthology-2013-06-22_01:00:51-kernel-next-testing-basic/42857$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 2dd322b42d608a37f3e5beed57a8fbc673da6e32
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
      osd:
        filestore flush min: 0
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 94eada40460cc6010be23110ef8ce0e3d92691af
  install:
    ceph:
      sha1: 94eada40460cc6010be23110ef8ce0e3d92691af
  s3tests:
    branch: next
  workunit:
    sha1: 94eada40460cc6010be23110ef8ce0e3d92691af
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds: null
- kclient: null
- workunit:
    clients:
      all:
      - suites/ffsb.sh


Files

dump2.txt (141 KB) dump2.txt Sage Weil, 06/23/2013 10:23 AM
dump3.txt (144 KB) dump3.txt Sage Weil, 06/28/2013 10:51 AM
Actions

Also available in: Atom PDF