Project

General

Profile

Actions

Bug #5636

closed

krbd: crash in image refresh

Added by Sage Weil almost 11 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

dumpall is attached

[1]kdb> bt
Stack traceback for pid 19757
0xffff880223cadeb0    19757        2  1    1   R  0xffff880223cae338 *kworker/u64:3
 ffff88020b4e9cc8 0000000000000018 ffffffffa0248dbd ffff88014ea5f800
 ffff88020cb4bf60 0000000800000006 0000000800000006 ffff88020b4e9d18
 ffffffffa0248ec1 ffff88020c8e9360 ffff88020cb4bf60 0000000000000018
Call Trace:
 [<ffffffffa0248dbd>] ? rbd_dev_refresh+0x6d/0x130 [rbd]
 [<ffffffffa0248ec1>] ? rbd_watch_cb+0x41/0x140 [rbd]
 [<ffffffffa06a8d42>] ? do_event_work+0x52/0xc0 [libceph]
 [<ffffffff8105f3ea>] ? process_one_work+0x1da/0x540
 [<ffffffff8105f37f>] ? process_one_work+0x16f/0x540
 [<ffffffff810605cc>] ? worker_thread+0x11c/0x370
 [<ffffffff810604b0>] ? manage_workers.isra.20+0x2e0/0x2e0
 [<ffffffff8106728a>] ? kthread+0xea/0xf0
 [<ffffffff810671a0>] ? flush_kthread_worker+0x150/0x150
 [<ffffffff8164071c>] ? ret_from_fork+0x7c/0xb0
 [<ffffffff810671a0>] ? flush_kthread_worker+0x150/0x150

job was
ubuntu@teuthology:/a/teuthology-2013-07-15_01:01:11-kernel-next-testing-basic/67676$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 365b57b1317524bb0cdd15859a224ba1ab58d1d7
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      global:
        ms inject socket failures: 500
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 9baa66801ab02854c344eb2fd1a8da8c5806125b
  install:
    ceph:
      sha1: 9baa66801ab02854c344eb2fd1a8da8c5806125b
  s3tests:
    branch: next
  workunit:
    sha1: 9baa66801ab02854c344eb2fd1a8da8c5806125b
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph: null
- workunit:
    clients:
      all:
      - rbd/image_read.sh


Files

dump.txt (96.4 KB) dump.txt Sage Weil, 07/15/2013 02:58 PM

Related issues 1 (0 open1 closed)

Has duplicate rbd - Bug #5391: krbd: crash in rbd_obj_request_create -> strlenDuplicate06/18/2013

Actions
Actions #1

Updated by Sage Weil almost 11 years ago

Actions #2

Updated by Sage Weil over 10 years ago

  • Priority changed from Urgent to High
Actions #3

Updated by Sage Weil over 10 years ago

again on ubuntu@teuthology:/a/teuthology-2013-08-22_01:01:30-krbd-next-testing-basic-plana/1020

<1>[  257.820476] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
<1>[  257.828499] IP: [<ffffffffa00ffdde>] rbd_dev_refresh+0x8e/0x130 [rbd]
<4>[  257.835061] PGD 0 
<4>[  257.837183] Oops: 0002 [#1] SMP 

which is here:

        /* If it's a mapped snapshot, validate its EXISTS flag */

        rbd_exists_validate(rbd_dev);
        up_write(&rbd_dev->header_rwsem);
    5db5:       4c 89 ef                mov    %r13,%rdi
    5db8:       e8 00 00 00 00          callq  5dbd <rbd_dev_refresh+0x6d>
                        5db9: R_X86_64_PC32     up_write-0x4

        if (mapping_size != rbd_dev->mapping.size) {
    5dbd:       4c 8b ab 98 01 00 00    mov    0x198(%rbx),%r13
^^^^^
    5dc4:       4d 39 f5                cmp    %r14,%r13
    5dc7:       74 22                   je     5deb <rbd_dev_refresh+0x9b>
                sector_t size;

                size = (sector_t)rbd_dev->mapping.size / SECTOR_SIZE;
    5dc9:       49 c1 ed 09             shr    $0x9,%r13
                dout("setting size to %llu sectors", (unsigned long long)size);
    5dcd:       f6 05 00 00 00 00 04    testb  $0x4,0x0(%rip)        # 5dd4 <rbd_dev_refresh+0x84>

this is a work item.. and we just released the lock. maybe the rbd reference went away?

Actions #4

Updated by Josh Durgin over 10 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Josh Durgin

branch wip-rbd-bugs-shutdown-lock contains a few fixes

Actions #5

Updated by Sage Weil over 10 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF