Project

General

Profile

Bug #4559

krbd: kernel BUG when mapping unexisting rbd device

Added by Alex Elder over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Taken from the ceph-users mailing list.

I told Dan I'd take a look at this.

-------- Original Message --------
Subject: [ceph-users] kernel BUG when mapping unexisting rbd device
Date: Mon, 25 Mar 2013 12:03:19 +0100
From: Dan van der Ster <>
To:

Hi,
Apologies if this is already a known bug (though I didn't find it).

If we try to map a device that doesn't exist, we get an immediate and
reproduceable kernel BUG (see the P.S.). We hit this by accident
because we forgot to add the --pool <ourpool>.

This works:

[root@afs245 /]# rbd map afs254-vicepa --pool afs --id afs --keyring
/etc/ceph/ceph.client.afs.keyring
[root@afs245 /]# rbd showmapped
id pool image snap device
1 afs afs254-vicepa - /dev/rbd1

But this BUGS:

[root@afs245 /]# rbd map afs254-vicepa
BUG...

Any clue?

Cheers,
Dan, CERN IT

Mar 25 11:48:25 afs245 kernel: kernel BUG at mm/slab.c:3130!
Mar 25 11:48:25 afs245 kernel: invalid opcode: 0000 [#1] SMP
Mar 25 11:48:25 afs245 kernel: Modules linked in: rbd libceph
libcrc32c cpufreq_ondemand ipv6 ext2 iTCO_wdt iTCO_vendor_support
coretemp acpi_cpufreq freq_tabl
e mperf kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode
pcspkr serio_raw i2c_i801 lpc_ich joydev e1000e ses enclosure sg ixgbe
hwmon dca ptp pps_core
mdio ext3 jbd mbcache sd_mod crc_t10dif aesni_intel ablk_helper cryptd
lrw aes_x86_64 xts gf128mul ahci libahci 3w_9xxx mpt2sas
scsi_transport_sas raid_class v
ideo mgag200 ttm drm_kms_helper dm_mirror dm_region_hash dm_log dm_mod
Mar 25 11:48:25 afs245 kernel: CPU 3
Mar 25 11:48:25 afs245 kernel: Pid: 7444, comm: rbd Not tainted
3.8.4-1.el6.elrepo.x86_64 #1 Supermicro X9SCL/X9SCM/X9SCL/X9SCM
Mar 25 11:48:25 afs245 kernel: RIP: 0010:[<ffffffff81179790>]
[<ffffffff81179790>] cache_alloc_refill+0x270/0x3c0
Mar 25 11:48:25 afs245 kernel: RSP: 0018:ffff8808028e5c48 EFLAGS: 00010082
Mar 25 11:48:25 afs245 kernel: RAX: 00000000ffffffff RBX:
ffff88082f000e00 RCX: ffff88082f000e00
Mar 25 11:48:25 afs245 kernel: RDX: ffff8808055fba80 RSI:
ffff88082f0028d0 RDI: ffff88082f002900
Mar 25 11:48:25 afs245 kernel: RBP: ffff8808028e5ca8 R08:
ffff88082f0028e0 R09: ffff8808010068c0
Mar 25 11:48:25 afs245 kernel: R10: dead000000200200 R11:
0000000000000003 R12: 0000000000000000
Mar 25 11:48:25 afs245 kernel: R13: ffff880807a71ec0 R14:
ffff88082f0028c0 R15: 0000000000000004
Mar 25 11:48:25 afs245 kernel: FS: 00007ff85056e760(0000)
GS:ffff88082fd80000(0000) knlGS:0000000000000000
Mar 25 11:48:25 afs245 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 25 11:48:25 afs245 kernel: CR2: 0000000000428220 CR3:
00000007eee7e000 CR4: 00000000001407e0
Mar 25 11:48:25 afs245 kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Mar 25 11:48:25 afs245 kernel: DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Mar 25 11:48:25 afs245 kernel: Process rbd (pid: 7444, threadinfo
ffff8808028e4000, task ffff8807ef6fb520)
Mar 25 11:48:25 afs245 kernel: Stack:
Mar 25 11:48:25 afs245 kernel: ffff8808028e5d68 ffffffff8112fd5d
ffff8808028e5de8 ffff880800ac7000
Mar 25 11:48:25 afs245 kernel: ffffffff028e5c78 00000000000080d0
ffff8808028e5fd8 ffff88082f000e00
Mar 25 11:48:25 afs245 kernel: 0000000000001078 0000000000000010
00000000000080d0 00000000000080d0
Mar 25 11:48:25 afs245 kernel: Call Trace:
Mar 25 11:48:25 afs245 kernel: [<ffffffff8112fd5d>] ?
get_page_from_freelist+0x22d/0x710
Mar 25 11:48:25 afs245 kernel: [<ffffffff8117ab08>] kmalloc+0x168/0x340
Mar 25 11:48:25 afs245 kernel: [<ffffffffa0380a35>] ?
ceph_parse_options+0x65/0x410 [libceph]
Mar 25 11:48:25 afs245 kernel: [<ffffffffa03ad2a0>] ? kzalloc+0x20/0x20 [rbd]
Mar 25 11:48:25 afs245 kernel: [<ffffffffa0380a35>]
ceph_parse_options+0x65/0x410 [libceph]
Mar 25 11:48:25 afs245 kernel: [<ffffffff8117a2a4>] ?
kmem_cache_alloc_trace+0x214/0x2e0
Mar 25 11:48:25 afs245 kernel: [<ffffffff8117ac17>] ? _kmalloc+0x277/0x340
Mar 25 11:48:25 afs245 kernel: [<ffffffffa03ad28f>] ? kzalloc+0xf/0x20 [rbd]
Mar 25 11:48:25 afs245 kernel: [<ffffffffa03ae96a>]
rbd_add_parse_args+0x1fa/0x250 [rbd]
Mar 25 11:48:25 afs245 kernel: [<ffffffffa03b18d4>] rbd_add+0x84/0x2b4 [rbd]
Mar 25 11:48:25 afs245 kernel: [<ffffffff813bc9a7>] bus_attr_store+0x27/0x30
Mar 25 11:48:25 afs245 kernel: [<ffffffff81205d6f>] sysfs_write_file+0xef/0x170
Mar 25 11:48:25 afs245 kernel: [<ffffffff81191904>] vfs_write+0xb4/0x130
Mar 25 11:48:25 afs245 kernel: [<ffffffff811920af>] sys_write+0x5f/0xa0
Mar 25 11:48:25 afs245 kernel: [<ffffffff810df656>] ?
_audit_syscall_exit+0x246/0x2f0
Mar 25 11:48:25 afs245 kernel: [<ffffffff815e1499>]
system_call_fastpath+0x16/0x1b
Mar 25 11:48:25 afs245 kernel: Code: 48 8b 00 48 8b 55 b0 8b 4d b8 48
8b 75 a8 4c 8b 45 a0 4c 8b 4d c0 a8 40 0f 84 b8 fe ff ff 49 83 cf 01
e9 af fe ff ff 0f 0b eb fe <0f> 0b eb fe 8b 75 c8 8b 55 cc 31 c9 48 89
df 81 ce 00 12 04 00
Mar 25 11:48:25 afs245 kernel: RIP [<ffffffff81179790>]
cache_alloc_refill+0x270/0x3c0
Mar 25 11:48:25 afs245 kernel: RSP <ffff8808028e5c48>
Mar 25 11:48:25 afs245 kernel: ---[ end trace 46b67e5b8b69abcb ]---
___________________________________________
ceph-users mailing list

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

dmesg (222 KB) Alex Elder, 03/27/2013 07:53 PM

History

#1 Updated by Alex Elder over 9 years ago

  • Status changed from New to In Progress

I've requested more information. At this point though
my first step is reproducing the problem.

I did reproduce something a little different when I
did a quick attempt, but I need to step back and make
sure I know what I'm working with before making any
real claims here.

#2 Updated by Alex Elder over 9 years ago

  • Priority changed from Normal to High

I think I have reproduced the problem. I don't know
much more but it appears to have something to do with
the use of a non-standard pool name (i.e., not "rbd").

#3 Updated by Dan van der Ster over 9 years ago

Here is another trace of the BUG another slightly different way. It could be the lack of a keyring that causes the BUG.

[root@afs245 ~]# rbd ls -p afs --id afs --keyring /etc/ceph/ceph.client.afs.keyring
afs254-vicepa
afs254-vicepb
vicepa
[root@afs245 ~]# rbd map afs254-vicepa --pool afs --id afs --keyring /etc/ceph/ceph.client.afs.keyring
[root@afs245 ~]# rbd map afs254-vicepb --pool afs --id afs --keyring /etc/ceph/ceph.client.afs.keyring
[root@afs245 ~]# rbd map afs254-vicepc --pool afs --id afs --keyring /etc/ceph/ceph.client.afs.keyring
rbd: add failed: (2) No such file or directory
[root@afs245 ~]# rbd showmapped
id pool image snap device
1 afs afs254-vicepa - /dev/rbd1
2 afs afs254-vicepb - /dev/rbd2

[root@afs245 ~]# # all aboved worked OK as expected

[root@afs245 ~]# rbd unmap /dev/rbd1
[root@afs245 ~]# rbd unmap /dev/rbd2
[root@afs245 ~]# rmmod rbd
[root@afs245 ~]# rmmod libceph
[root@afs245 ~]# modprobe rbd
[root@afs245 ~]# rbd ls -p afs --id afs --keyring /etc/ceph/ceph.client.afs.keyring
afs254-vicepa
afs254-vicepb
vicepa
[root@afs245 ~]# rbd map afs254-vicepa --pool afs
rbd: add failed: (22) Invalid argument
[root@afs245 ~]# # BUG here

#4 Updated by Alex Elder over 9 years ago

I think I know why the crash is occurring. I don't
fully understand the big picture yet, but here's
the problem I know. (I remember seeing this and
fixing it a while ago, but I think it was in the
midst of some changes that have not yet been committed.
That's bad news, I know.)

OK, the object completion code assumes that an ENOENT
response means there's a hole in an object that needs
to be zero-filled.

However, that's only the case for object requests that
are part of an image request. Objects that are not part
of an image that don't exist, well, they just don't exist,
period.

Furthermore, that same code assumes that a non-existent
object has its data represented as a bio, which is not
necessarily the case.

The fix should be to only do the zeroing if the object
request is part of an image request. And zeroing the
object data should be done in a way dependent on how
the data is represented for the object. I'll fix both
problems and see if it makes things all better.

What I still need to work out is why the image header
is showing up non-existent. (Reading the image header
object is where this problem arose.) That may be a
problem in the rbd CLI when creating an rbd image in a
non-standard pool.

#5 Updated by Dan van der Ster over 9 years ago

Thanks. Without looking at the code, what I don't understand is why the user-mode rbd tool sends anything to the kernel module before confirming that (a) the given rbd image exists (in the given pool) and (b) the authz is valid for the given image.

#6 Updated by Alex Elder over 9 years ago

  • Status changed from In Progress to Fix Under Review

I have posted the following patch for review:
[PATCH] rbd: don't zero-fill non-image object requests

Sage, this fix should get sent to Linus after it's committed
to the testing branch. It's in the new request code so there
should be no need to back-port it to any other Linux releases.

#7 Updated by Alex Elder over 9 years ago

OK, I've reproduced something using 3.8.4, but I don't
understand it yet. But it seems similar, the crash occurred a second
or two after the rbd command returned, which wasn't explicitly
stated but I have the impression it might be the case in the
original report from Dan.

#8 Updated by Alex Elder over 9 years ago

Not a lot can be immediately seen from the specifics
of the crash. But I suspect there's some memory
corruption, like a use-after-free, which eventually
leads to a crash.

[ 309.687289] libceph: no secret set (for auth_x protocol)
[ 309.692656] libceph: error -22 on auth protocol 2 init
[ 309.698067] libceph: client4123 fsid f1d418be-a183-42ff-84da-34da341d3f1a

Entering kdb (current=0xffff880225d1deb0, pid 0) on processor 2 Oops: (null)
due to oops @ 0xffffffff8143e598
CPU 2 dPid: 0, comm: swapper/2 Not tainted 3.8.4-ceph-00412-g405acc3 #1 Dell Inc. PowerEdge R410/01V648
dRIP: 0010:[<ffffffff8143e598>] [<ffffffff8143e598>] scsi_dispatch_cmd+0x58/0x300
dRSP: 0018:ffff880227243c30 EFLAGS: 00010206
dRAX: 0000000005000000 RBX: ffff88020d3d0000 RCX: ffff88020c8e3538
dRDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88020d3d0000
dRBP: ffff880227243c60 R08: fffffffff6e00248 R09: 0000000000000000
dR10: 0000000000000001 R11: 0000000000000000 R12: ffff88020c301530
dR13: 9990020000000000 R14: ffff88020d26a000 R15: 000000000000000f
dFS: 0000000000000000(0000) GS:ffff880227240000(0000) knlGS:0000000000000000
dCS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
dCR2: 00007fb0e24d3120 CR3: 0000000001c0c000 CR4: 00000000000007e0
dDR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
dDR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
dProcess swapper/2 (pid: 0, threadinfo ffff880225d28000, task ffff880225d1deb0)
dStack:
ffff880227243c40 ffff88020d26f000 ffff88020c301530 ffff88020c8e3440
ffff88020d26a000 000000000000000f ffff880227243cd0 ffffffff814454e6
ffffffff00000000 ffffea000832c400 ffff880227243cc0 ffff88020d26f190
dCall Trace:
d <IRQ> d [<ffffffff814454e6>] scsi_request_fn+0x3a6/0x570
d [<ffffffff8130c777>] __blk_run_queue+0x37/0x50
d [<ffffffff8130e3ff>] blk_run_queue+0x2f/0x50
d [<ffffffff814443d2>] scsi_run_queue+0xd2/0x290
d [<ffffffff814165f7>] ? put_device+0x17/0x20
d [<ffffffff81446882>] scsi_next_command+0x42/0x60
d [<ffffffff81446ab2>] scsi_io_completion+0x1b2/0x6a0
d [<ffffffff8143ca91>] scsi_finish_command+0xc1/0x120
d [<ffffffff814467ff>] scsi_softirq_done+0x13f/0x160
d [<ffffffff81318f13>] blk_done_softirq+0x83/0xa0
d [<ffffffff8105d4ad>] __do_softirq+0xcd/0x280
d [<ffffffff81341a3d>] ? do_raw_spin_unlock+0x5d/0xb0
d [<ffffffff8166947c>] call_softirq+0x1c/0x30
d [<ffffffff810162d5>] do_softirq+0xa5/0xe0
d [<ffffffff8105d7fe>] irq_exit+0x9e/0xc0
d [<ffffffff81669d33>] do_IRQ+0x63/0xe0
d [<ffffffff8165f9b2>] common_interrupt+0x72/0x72
d <EOI> d [<ffffffff8165fa73>] ? retint_restore_args+0x13/0x13
d [<ffffffff81517c65>] ? cpuidle_wrap_enter+0x55/0xa0
d [<ffffffff81517c61>] ? cpuidle_wrap_enter+0x51/0xa0
d [<ffffffff81517cc0>] cpuidle_enter_tk+0x10/0x20
d [<ffffffff8151785f>] cpuidle_idle_call+0xbf/0x2e0
d [<ffffffff8101d525>] cpu_idle+0xd5/0x130
d [<ffffffff8164d370>] start_secondary+0x26c/0x26e
dCode: 00 00 83 fa 04 0f 84 91 02 00 00 83 ea 08 83 fa 01 0f 86 62 02 00 00 0f b6 90 d9 00 00 00 84 d2 74 09 80 fa 03 0f 8e 98 00

[2]kdb>

#9 Updated by Alex Elder over 9 years ago

A few notes.

I just ran this, without any other setup, and that caused
a crash:
strace rbd map afs254-vicepa --pool afs --id afs --keyring /etc/ceph/ceph.client.afs.keyring

The messages in the log showed:
. . .
[ 309.687289] libceph: no secret set (for auth_x protocol)
[ 309.692656] libceph: error -22 on auth protocol 2 init
[ 309.698067] libceph: client4123 fsid f1d418be-a183-42ff-84da-34da341d3f1a

Entering kdb (current=0xffff880225d1deb0, pid 0) on processor 2 Oops: (null)
due to oops @ 0xffffffff8143e598

Those indicate the messenger received an incoming message for
the monitor, so it called mon_client.c:dispatch() to handle it.

The type of the message was CEPH_MSG_AUTH_REPLY, so
handle_auth_reply(monc, msg) was called.

ceph_handle_auth_reply(ac, buf, len, reply_buf, reply_len)
ac->negotiating was true
ac->protocol was different from what came back from the server
ceph_auth_init_protocol(ac, CEPH_AUTH_CEPHX) calls ceph_x_init()
ceph_x_init(ac) returns -EINVAL because ac->key was null
...and as a result, ceph_auth_init_protocol() returns -EINVAL

==========

I rebooted the machine.

I ran this simple command on a newly-booted machine set up
with teuthology, with all debug output enabled:
strace rbd map afs254-vicepa

That showed that the options passed to ceph_parse_options was:
name=admin,key=client.admin
And the dev_name argument (i.e., monitor addresses) was:
10.214.133.28:6789,10.214.133.30:6789,10.214.133.28:6790

That did not crash.

Then I ran this:
strace rbd map afs254-vicepa --pool afs --id afs --keyring /etc/ceph/ceph.client.afs.keyring

In this case the options argument passed to ceph_parse_options was:
name=afs
And the dev_name argument was the same.

That also did not crash.

However, shortly after the debug output finished getting
dumped (to the console, so it was slow), (basically 10 seconds
after the last message was time stamped) I saw that the
monc delayed work fired. That scheduled a keepalive,
validated the authentication, and found that the monitor
was authenticated, so it called __send_subscribe(). That
message went out and we got a monmap and then an acknowledgement
for the subscribe back.

10 seconds after that, monc delayed work fired, same
basic sequence, but no messages sent or received.

20 seconds later, again. And 20, 40, 60, and 80 seconds
later, same thing.

Meanwhile, every 30 seconds we get an osds timeout and
remove_old_osds() gets called.

I got a crash (in the SCSI driver) just after the last
report of the call to remove_old_osds(). (But I don't
know exactly how long between them.

I'm going to try again.

#10 Updated by Alex Elder over 9 years ago

OK, I did some more tests. Running this command by itself is fine:
strace rbd map afs254-vicepa
(Note, I haven't defined that "afs254-vicepa" as an image, just
running the commands shown after a newly-booted teuthology
system.)

So running that, and then waiting a few minutes, all is well.

Then, I ran the command with the additional flags:
strace rbd map afs254-vicepa --pool afs --id afs --keyring /etc/ceph/ceph.client.afs.keyring

That seemed fine at first. But about 23 seconds later I got a
panic in unrelated code, while basically sitting idle.

This time I had debug messages on but did not send them to
the console. I have saved the 3391 lines of debug output
generated during these tests, and am going to try to attach
it to this issue.

#11 Updated by Alex Elder over 9 years ago

OK, clearly if you run the command with the additional
arguments specifying the pool, id, and keyring you get
a kernel crash a little more than 20 seconds later.

This is going to be a little tough to figure out, at least
for me, because I'm not very familiar with what's going on
in the mon client and the auth modules, etc. It appears
to me that something is happening that for some reason
takes a little more than 20 seconds to result in corruption
elsewhere in the system that causes a crash.

My plan is to put this away, at least until tomorrow
afternoon, and hope someone else has better ideas than
I do about it.

Messages on the console:

plana90 login: [ 1866.690915] kgdb: Registered I/O driver kgdboc.
[ 1977.441814] Key type ceph registered
[ 1977.445431] libceph: loaded (mon/osd proto 15/24, osdmap 5/6 5/6)
[ 1977.481786] rbd: loaded rbd (rados block device)
[ 1977.725244] libceph: client4110 fsid 9e76d672-979d-4628-ad40-18e1eb1adfb0
[ 1977.734384] libceph: mon0 10.214.133.28:6789 session established
[ 1977.768308] rbd1: unknown partition table
[ 1977.772931] rbd: rbd1: added with size 0x280000000
[ 1982.689814] kjournald starting. Commit interval 5 seconds
[ 1982.717312] EXT3-fs (rbd1): using internal journal
[ 1982.722193] EXT3-fs (rbd1): mounted filesystem with ordered data mode
[ 2082.368730] libceph: no secret set (for auth_x protocol)
[ 2082.374052] libceph: error 22 on auth protocol 2 init
[ 2082.379356] libceph: client4115 fsid 9e76d672-979d-4628-ad40-18e1eb1adfb0
---
This corresponds to the completion of the command -----
[ 2103.982250] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 2103.990122] IP: [<ffffffff8154b459>] __skb_recv_datagram+0x1f9/0x300
[ 2103.996497] PGD 2231a1067 PUD 20d284067 PMD 0
[ 2104.000979] Oops: 0002 [#1] SMP

Entering kdb (current=0xffff880224375eb0, pid 1305) on processor 4 Oops: (null)

And here are the debug messages logged leading up to the crash:

<7>[ 2082.386700] rbd: Error adding device 10.214.133.28:6789,10.214.133.30:6789,10.214.133.28:6790 name=afs afs afs254-vicepa
<7>[ 2087.766378] libceph: monc delayed_work
<7>[ 2087.766384] libceph: con_keepalive ffff8802236b4410
<7>[ 2087.766390] libceph: queue_con_delay ffff8802236b4410 0
<7>[ 2087.766394] libceph: ceph_x_should_authenticate want=39 need=0 have=39
<7>[ 2087.766397] libceph: ceph_x_is_authenticated want=39 need=0 have=39
<7>[ 2087.766400] libceph: __send_subscribe sub_sent=0 exp=0 want_osd=0
<7>[ 2087.766403] libceph: __schedule_delayed after 2000
<7>[ 2087.766409] libceph: try_read start on ffff8802236b4410 state 5
<7>[ 2087.766412] libceph: try_read tag 1 in_base_pos 0
<7>[ 2087.766419] libceph: try_read done on ffff8802236b4410 ret 0
<7>[ 2087.766422] libceph: try_write start ffff8802236b4410 state 5
<7>[ 2087.766424] libceph: try_write out_kvec_bytes 0
<7>[ 2087.766427] libceph: prepare_write_keepalive ffff8802236b4410
<7>[ 2087.766429] libceph: try_write out_kvec_bytes 1
<7>[ 2087.766432] libceph: write_partial_kvec ffff8802236b4410 1 left
<7>[ 2087.766441] libceph: write_partial_kvec ffff8802236b4410 0 left in 0 kvecs ret = 1
<7>[ 2087.766443] libceph: try_write nothing else to write.
<7>[ 2087.766446] libceph: try_write done on ffff8802236b4410 ret 0
<7>[ 2097.349721] libceph: osds timeout
<7>[ 2097.349727] libceph: __remove_old_osds ffff8802236b4960
<1>[ 2103.982250] BUG: unable to handle kernel NULL pointer dereference at (null)
<1>[ 2103.990122] IP: [<ffffffff8154b459>] __skb_recv_datagram+0x1f9/0x300
<4>[ 2103.996497] PGD 2231a1067 PUD 20d284067 PMD 0
<4>[ 2104.000979] Oops: 0002 [#1] SMP

#12 Updated by Sage Weil over 9 years ago

my guess is we are putting too many references to the socket, and some lingering state is trying to call the socket state change callback on and finding nothing there. ensure we are calling con_close()? or maybe deliberately leak the reference and see if the problem is still there?

#13 Updated by Dan van der Ster over 9 years ago

Just awake now: I'll respond to #7: yes, the crash happens like 2-3 seconds after I run rbd map.

FYI: after today I'll be offline for 2 weeks so won't be able to test any patches until mid-April. Frankly, we can live with this bug for now since we'll just try not to make mistakes typing the image name or omit the keyring ;)

#14 Updated by Alex Elder over 9 years ago

For the hell of it I tried this using the 3.7 kernel and
found that I don't see the crash. That puts a bound
on where the problem occurred.

Strangely, the rbd CLI status indicated success.

root@mira032:~# root@mira032:~# rbd map afs254-vicepa --pool afs --id afs --keyring /etc/ceph/ceph.client.afs.keyring
root@mira032:~#: command not found
root@mira032:~# rbd map afs254-vicepa --pool afs --id afs --keyring /etc/ceph/ceph.client.afs.keyring
root@mira032:~# echo $?
0
root@mira032:~# rbd ls -l
NAME SIZE PARENT FMT PROT LOCK
testimage.client.0 10240M 1
root@mira032:~#

Over on the console:
[ 744.175173] libceph: no secret set (for auth_x protocol)
[ 744.180551] libceph: error -22 on auth protocol 2 init
[ 744.185717] libceph: client4129 fsid 67435bba-4f79-4c7f-9413-8a8fc79ec015

#15 Updated by Alex Elder over 9 years ago

  • Status changed from Fix Under Review to 12
  • Priority changed from High to Normal

I'm updating this because the bug I identified in the
new code while trying to reproduce this has been
committed. It was committed to the ceph-client "master"
branch and sent to Linus, and was cherry-picked into
the "testing" branch as well. So I've moved it out
of "Need review" status.

The original problem I believe remains. However Dan
(the original reporter) has indicated that he can "live
with this" for now, so I'm changing this back to Normal
priority.

commit 3d2429d859a595593d39219ff9470e81f2551d4b
Author: Alex Elder <>
Date: Wed Mar 27 09:16:30 2013 -0500

rbd: don't zero-fill non-image object requests
A result of ENOENT from a read request for an object that's part of
an rbd image indicates that there is a hole in that portion of the
image. Similarly, a short read for such an object indicates that
the remainder of the read should be interpreted a full read with
zeros filling out the end of the request.
This behavior is not correct for objects that are not backing rbd
image data. Currently rbd_img_obj_request_callback() assumes it
should be done for all objects.
Change rbd_img_obj_request_callback() so it only does this zeroing
for image objects. Encapsulate that special handling in its own
function. Add an assertion that the image object request is a bio
request, since we assume that (and we currently don't support any
other types).
This resolves a problem identified here:
http://tracker.ceph.com/issues/4559
Reported-by: Dan van der Ster &lt;&gt;
Signed-off-by: Alex Elder &lt;&gt;
Reviewed-off-by: Sage Weil &lt;&gt;
(cherry picked from commit 86c58b1d2c195e86468366326f0fe2793f517bcb)

#16 Updated by Alex Elder over 9 years ago

  • Priority changed from Normal to High

I haven't checked, but I presume this one is still
happening. It's pretty important, considering it
produces a crash. I'm raising the priority.

#17 Updated by Maciej Galkiewicz over 9 years ago

I am affected by the same bug but only with ceph 0.61.1. Every time I try to map any volume (with proper keyring, id and pool name) my virtual machine crash. The same kernel (3.8.11) works fine in another cluster (bobtail).

#18 Updated by Maciej Galkiewicz over 9 years ago

Do you suggest any workaround? I dont think that it is possible to downgrade to bobtail.

#19 Updated by Alex Elder over 9 years ago

  • Status changed from 12 to In Progress

Finally starting to look at this again. I'm going to
start by trying to reproduce it in the newly-reported
context.

#20 Updated by Alex Elder over 9 years ago

I asked Maciej to report more specifically the platform he
was using (VM, and kernel/user space rbd). Answer:

Openstack with libvirt and Xen. We need to map many rbd volumes from
different pools on VMs. I have not integrated openstack and ceph.

#21 Updated by Alex Elder over 9 years ago

I believe I have found the problem.

In rbd_add(), if rbd_client_create() failed the error path
would call ceph_destroy_options() with the options structure
it had passed to rbd_client_create().

The problem is that rbd_client_create() destroys that options
structure too (whether it succeeds or fails).

Interestingly, a patch I just posted for review got rid of
the errant problem, basically by accident. (It has not been
reviewed yet nor committed yet.)

I will test it on a real machine shortly, and will post the
change for review pending that.

#22 Updated by Alex Elder over 9 years ago

I don't have enough information nor do I have a Xen setup so
it isn't easy for me to try to reproduce the problem reported
by Maciej. I have asked whether he can test the fix himself
to see if it makes his issue go away.

#23 Updated by Alex Elder over 9 years ago

  • Status changed from In Progress to Fix Under Review

The following has been posted for review:

[PATCH] rbd: don't destroy ceph_opts in rbd_add()

#24 Updated by Maciej Galkiewicz over 9 years ago

I am going to test the patch. I will let you know about results probably next week.

#25 Updated by Alex Elder over 9 years ago

  • Status changed from Fix Under Review to Need More Info

I have committed the following to the "testing" branch of
the ceph-client git respository:

7262cfc rbd: don't destroy ceph_opts in rbd_add()

That commit is also queued to be sent to Linus for the
3.10 Linux release cycle.

I have verified it resolves a crash I saw, and it explains
the crash originally reported by Dan van der Ster for this
bug.

For now though I'm not marking this resolved. I'm going to
wait to hear back from Maciej on the results of his test.

#26 Updated by Maciej Galkiewicz over 9 years ago

# cat /etc/ceph/keyring
[client.wtfcm]
        key = <mykey>
        caps mon = "allow r" 
        caps osd = "allow rwx pool=wtfcm" 

# rbd list -p wtfcm --id wtfcm --keyring /etc/ceph/keyring
mongodb
# rbd map mongodb --pool wtfcm --id wtfcm --keyring /etc/ceph/keyring
rbd: add failed: (95) Operation not supported

logs from mon:

2013-05-20 15:33:43.074178 7f7f56736700  0 -- 10.255.128.1:6789/0 >> 10.255.0.9:0/1279975636 pipe(0x3517b80 sd=24 :6789 s=0 pgs=0 cs=0 l=0).accept peer addr is really 10.255.0.9:0/1279975636 (socket is 10.255.0.9:47695/0)
2013-05-20 15:33:43.074417 7f7f57b3c700  1 mon.n1cc@0(leader).auth v3431 unknown.0 10.255.0.9:0/1279975636 supports cephx but not signatures and 'cephx [service] require signatures = true'; disallowing cephx
2013-05-20 15:33:43.074432 7f7f57b3c700  1 mon.n1cc@0(leader).auth v3431 client did not provide supported auth type

# cat /etc/ceph/ceph.conf
[global]
  fsid =  805f11f9-3c19-4be8-8c72-d6e5720a1215
  mon initial members = n1cc
  mon host = 10.255.128.1:6789
  auth cluster required = cephx
  auth service required = cephx
  auth client required = cephx
  cephx require signatures = true
  public network = 10.255.128.0/17
  cluster network = 10.255.128.0/17

[osd]
  osd journal size = 10000
  osd scrub load threshold = 4
  debug osd = 1

[mon]
  mon clock drift allowed = 0.5
  mon debug dump transactions = false

[mds]
  debug mds = 1

Removing "cephx require signatures = true" from mon/osd nodes fixes the problem. Looks like kernel rbd does not support signatures. I have tested cb53a7ee6c6a42a0d8b472eb1d296219def2d877 from testing branch and it is not crashing my VM anymore.

#27 Updated by Alex Elder over 9 years ago

  • Status changed from Need More Info to Resolved

Maciej is correct, message signatures are not yet supported
in the kernel client. This issue has been created to get
that implemented:
http://tracker.ceph.com/issues/5109

I believe his statement also means my fix got rid of the
problem he saw, so I'm marking this issue resolved.

Also available in: Atom PDF