Project

General

Profile

Actions

Bug #1990

closed

rbd: null pointer dereference during map

Added by Josh Durgin about 12 years ago. Updated about 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
rbd
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

These commands from teuthology:

2012-01-25T02:23:30.606 INFO:teuthology.task.rbd:Creating rbd block devices...
2012-01-25T02:23:30.606 DEBUG:teuthology.orchestra.run:Running: 'echo \'KERNEL=="rbd[0-9]*", PROGRAM="/tmp/cephtest/binary/usr/local/bin/ceph-rbdnamer %n", SYMLINK+="rbd/%c{1}/%c{2}"\' > /tmp/cephtest/51-rbd.rules'
2012-01-25T02:23:30.613 DEBUG:teuthology.orchestra.run:Running: 'sudo mv /tmp/cephtest/51-rbd.rules /etc/udev/rules.d/'
2012-01-25T02:23:30.688 DEBUG:teuthology.orchestra.run:Running: '/tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/binary/usr/local/bin/ceph-authtool --name=client.0 --print-key /tmp/cephtest/data/client.0.keyring > /tmp/cephtest/data/client.0.secret'
2012-01-25T02:23:30.762 DEBUG:teuthology.orchestra.run:Running: "sudo LD_LIBRARY_PATH=/tmp/cephtest/binary/usr/local/lib /tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/binary/usr/local/bin/rbd -c /tmp/cephtest/ceph.conf --user 0 --secret /tmp/cephtest/data/client.0.secret -p rbd map testimage.client.0 && while test '!' -e /dev/rbd/rbd/testimage.client.0 ; do sleep 1 ; done" 

Cause this null dereference:

Jan 25 02:01:51 sepia71 kernel: [ 8054.327255] rbd: loaded rbd (rados block device)
Jan 25 02:01:51 sepia71 kernel: [ 8054.552795] libceph: client0 fsid e5db76be-7bee-40e4-bbeb-803e540d2c90
Jan 25 02:01:51 sepia71 kernel: [ 8054.554223] libceph: mon1 10.3.14.197:6789 session established
Jan 25 02:01:51 sepia71 kernel: [ 8054.554272] BUG: unable to handle kernel NULL pointer dereference at           (null)
Jan 25 02:01:51 sepia71 kernel: [ 8054.564212] IP: [<ffffffffa00fdbf3>] rbd_add+0x3a3/0xd04 [rbd]
Jan 25 02:01:51 sepia71 kernel: [ 8054.564212] PGD 10313e067 PUD 111ed9067 PMD 0 
Jan 25 02:01:51 sepia71 kernel: [ 8054.564212] Oops: 0000 [#1] SMP 
Jan 25 02:01:51 sepia71 kernel: [ 8054.564212] CPU 1 
Jan 25 02:01:51 sepia71 kernel: [ 8054.564212] Modules linked in: rbd cryptd aes_x86_64 aes_generic ceph libceph ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs exportfs reiserfs radeon ttm drm_kms_helper drm psmouse i2c_algo_bit amd64_edac_mod lp shpchp edac_core edac_mce_amd i2c_piix4 k8temp parport serio_raw btrfs tg3 zlib_deflate sata_svw floppy pata_serverworks crc32c libcrc32c
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] 
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] Pid: 4496, comm: rbd Not tainted 3.2.0-ceph-00037-g7cb9059 #1 Supermicro H8SSL-I2/H8SSL-I2
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] RIP: 0010:[<ffffffffa00fdbf3>]  [<ffffffffa00fdbf3>] rbd_add+0x3a3/0xd04 [rbd]
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] RSP: 0018:ffff8800ed801dc8  EFLAGS: 00010292
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] RAX: ffff8800ed8b3ee0 RBX: ffff8800dbd31000 RCX: 0000000000000100
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] RDX: 0000000000000000 RSI: ffffffff820cc920 RDI: ffffffffa00ff740
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] RBP: ffff8800ed801e98 R08: ffffffff81f67d90 R09: 0000000000000000
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8800dbd31298
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] R13: ffff88002dcab000 R14: ffff88002dcaa800 R15: ffff8800dbd312a8
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] FS:  00007fe29708b760(0000) GS:ffff880111d00000(0000) knlGS:0000000000000000
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] CR2: 0000000000000000 CR3: 00000000dbc09000 CR4: 00000000000006e0
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] Process rbd (pid: 4496, threadinfo ffff8800ed800000, task ffff8800ed8b3ee0)
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] Stack:
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069]  ffff8800dbd31270 ffff8801032b9d10 0000000000000000 ffffffff811eb5fd
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069]  0000000000000002 0000000000000001 ffff8801055cc818 ffff8800dbd311b4
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069]  ffff8800dbd31020 ffff8800dbd31040 ffff8800dbd31270 ffff8800dbd310a8
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] Call Trace:
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069]  [<ffffffff811eb5fd>] ? sysfs_write_file+0xcd/0x170
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069]  [<ffffffff813e0487>] bus_attr_store+0x27/0x30
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069]  [<ffffffff811eb616>] sysfs_write_file+0xe6/0x170
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069]  [<ffffffff811763f8>] vfs_write+0xc8/0x190
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069]  [<ffffffff811765b1>] sys_write+0x51/0x90
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069]  [<ffffffff8160e1c2>] system_call_fastpath+0x16/0x1b
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] Code: 09 00 00 48 89 c7 e8 bd e8 37 00 85 c0 0f 88 6b 01 00 00 48 8b 45 b0 48 8b 4d b8 48 c7 c7 40 f7 0f a0 48 89 41 08 e8 4d 7b 50 e1 <f6> 04 25 00 00 00 00 02 0f 85 f8 03 00 00 48 8b 05 78 1b 00 00 
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] RIP  [<ffffffffa00fdbf3>] rbd_add+0x3a3/0xd04 [rbd]
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069]  RSP <ffff8800ed801dc8>
Jan 25 02:01:51 sepia71 kernel: [ 8054.640069] CR2: 0000000000000000
Jan 25 02:01:51 sepia71 kernel: [ 8054.660215] ---[ end trace f6b64936b6fb3499 ]---

This was on commit 7cb90590532ebd04a0ec43ecc3a4e4832e34cb9c, which apparently doesn't exist anymore after a force push.
This worked fine the previous night, so it's probably related to the recent rbd changes.

Actions #1

Updated by Alex Elder about 12 years ago

  • Status changed from New to In Progress
  • Assignee set to Alex Elder

I was already looking at another commit which I found after
review to be suspect:
rbd: adequately protect rbd client creation from dups
The underlying bug still exists, but it's not as bad as the
bug(s) I inserted by "fixing" this.

Anyway, I had reproduced the problem by running these tasks
in a teuthology file:
tasks:
- ceph:
- rbd:
all:

I removed that commit and rebased everything after it and the
problem no longer appeared.

I'm not going to mark this resolved until we see how well
the more complete set of tests goes.

Actions #2

Updated by Alex Elder about 12 years ago

  • Status changed from In Progress to Resolved

Problem seems to be gone now.

Actions

Also available in: Atom PDF