rbd: crashes with 10Gbit network and fio
|Assignee:||Alex Elder||% Done:|
Hi, we are currently testing CEPH with RBD on a cluster with 1GBit and 10Gbit interfaces. While we see no kernel crashes with RBD if the cluster runs on the 1GBit interfaces, we see very frequent kernel crashes with the 10Gbit network while running tests with e.g. fio against the RBDs. I've tested it with kernel v3.0 and also 3.3.0 (with the patches from the 'for-linus' branch from ceph-client.git at git.kernel.org). With more client machines running tests the crashes occur even much faster. The issue is fully reproducible here. Has anyone seen similar problems? See the backtrace below. Regards Danny PID: 10902 TASK: ffff88032a9a2080 CPU: 0 COMMAND: "kworker/0:0" #0 [ffff8803235fd950] machine_kexec at ffffffff810265ee #1 [ffff8803235fd9a0] crash_kexec at ffffffff810a3bda #2 [ffff8803235fda70] oops_end at ffffffff81444688 #3 [ffff8803235fda90] __bad_area_nosemaphore at ffffffff81032a35 #4 [ffff8803235fdb50] do_page_fault at ffffffff81446d3e #5 [ffff8803235fdc50] page_fault at ffffffff81443865 [exception RIP: read_partial_message+816] RIP: ffffffffa041e500 RSP: ffff8803235fdd00 RFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000000009d7 RCX: 0000000000008000 RDX: 0000000000000000 RSI: 00000000000009d7 RDI: ffffffff813c8d78 RBP: ffff880328827030 R8: 00000000000009d7 R9: 0000000000004000 R10: 0000000000000000 R11: ffffffff81205800 R12: 0000000000000000 R13: 0000000000000069 R14: ffff88032a9bc780 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #6 [ffff8803235fdd38] thread_return at ffffffff81440e82 #7 [ffff8803235fdd78] try_read at ffffffffa041ed58 [libceph] #8 [ffff8803235fddf8] con_work at ffffffffa041fb2e [libceph] #9 [ffff8803235fde28] process_one_work at ffffffff8107487c #10 [ffff8803235fde78] worker_thread at ffffffff8107740a #11 [ffff8803235fdee8] kthread at ffffffff8107b736 #12 [ffff8803235fdf48] kernel_thread_helper at ffffffff8144c144
#1 Updated by Danny Kukawka about 1 year ago
Here some more info from the crash:
[58113.180039] libceph: tid 387083 timed out on osd92, will reset osd
[58183.111592] libceph: tid 395388 timed out on osd92, will reset osd
[58268.028399] libceph: tid 399638 timed out on osd92, will reset osd
[61176.040372] libceph: tid 2197147 timed out on osd52, will reset osd
[61176.081394] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[61176.098840] IP: [<ffffffffa03bd500>] read_partial_message+0x330/0x730 [libceph]
[61176.115112] PGD 0
[61176.119617] Oops: 0000 [#1] SMP
[61176.126814] CPU 14
[61176.131038] Modules linked in: lp parport_pc joydev st ide_cd_mod ide_core ppdev parport rbd libceph crc32c libcrc32c edd af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode loop dm_mod ipv6 ipv6_lib ixgbe igb i7core_edac sg edac_core dca iTCO_wdt i2c_i801 i2c_core mdio iTCO_vendor_support sr_mod cdrom button rtc_cmos pcspkr acpi_power_meter container ac ext3 jbd mbcache uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif processor thermal_sys hwmon scsi_dh_alua scsi_dh_rdac scsi_dh_emc scsi_dh_hp_sw scsi_dh ata_generic ata_piix libata mptsas mptscsih mptbase scsi_transport_sas megaraid_sas scsi_mod [last unloaded: parport_pc]
[61176.264383] Supported: Yes
[61176.273754] Pid: 24133, comm: kworker/14:0 Not tainted 3.0.27-1-default #1 FUJITSU PRIMERGY RX300 S6 /D2619
[61176.302468] RIP: 0010:[<ffffffffa03bd500>] [<ffffffffa03bd500>] read_partial_message+0x330/0x730 [libceph]
[61176.324095] RSP: 0018:ffff880824c7fd00 EFLAGS: 00010246
[61176.335866] RAX: 0000000000000000 RBX: 0000000000000c97 RCX: 000000000000b000
[61176.351661] RDX: 0000000000000000 RSI: 0000000000000c97 RDI: ffffffff813c8d78
[61176.367455] RBP: ffff880825c6d830 R08: 0000000000000c97 R09: 0000000000004000
[61176.384508] R10: 0000000000000000 R11: ffffffff81205800 R12: 0000000000000000
[61176.400301] R13: 0000000000000069 R14: ffff8808259be380 R15: 0000000000000000
[61176.416095] FS: 0000000000000000(0000) GS:ffff88083fd80000(0000) knlGS:0000000000000000
[61176.434074] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[61176.446837] CR2: 0000000000000048 CR3: 0000000001a03000 CR4: 00000000000006e0
[61176.462632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[61176.478425] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[61176.494220] Process kworker/14:0 (pid: 24133, threadinfo ffff880824c7e000, task ffff880825c3c600)
[61176.518409] ffff880825c3c600 ffff88083fd91600 0000000000000000 0000000000000001
[61176.534895] 0001f00000000000 0000000000000002 ffff8807f6855000 ffffffff81440e82
[61176.551366] 0000000000000000 ffff880825c6d858 ffff880825c6d830 00000000ffffffff
[61176.567833] Call Trace:
[61176.573300] [<ffffffffa03bdd58>] try_read+0x458/0x680 [libceph]
[61176.586612] [<ffffffffa03beb2e>] con_work+0x6e/0x240 [libceph]
[61176.599730] [<ffffffff8107487c>] process_one_work+0x16c/0x350
[61176.612654] [<ffffffff8107740a>] worker_thread+0x17a/0x410
[61176.625005] [<ffffffff8107b736>] kthread+0x96/0xa0
[61176.635824] [<ffffffff8144c144>] kernel_thread_helper+0x4/0x10
[61176.648939] Code: c0 e8 a5 49 ea e0 e9 21 fd ff ff 49 83 be 90 00 00 00 00 0f 84 53 01 00 00 4d 63 a6 a0 00 00 00 49 8b 86 98 00 00 00 49 c1 e4 04 <4c> 03 60 48 49 81 fc 00 f0 ff ff 76 11 45 85 e4 44 89 e3 0f 8f
[61176.691509] RIP [<ffffffffa03bd500>] read_partial_message+0x330/0x730 [libceph]
[61176.707965] RSP <ffff880824c7fd00>
[61176.715713] CR2: 0000000000000048
If you need more information or the dump, let me know.
#2 Updated by Alex Elder about 1 year ago
- Assignee set to Alex Elder
This is one of a family of bugs we've been trying to understand.
Here is another one:
In some instances it's in try_read() and other it's try_write().
It's difficult to get very far though because it's doing its work
via a workqueue, which means we lose some of the context of what
produced the read or write operation that's leading to failure.
I expect to be digging into this next week though and I may
be able to get some more clues then about what's going on.
#4 Updated by Alex Elder about 1 year ago
A kernel dump would likely help, but there's no guarantee because
of the delayed execution of the operation. It wouldn't hurt.
If I remember right though I think xfstests #049 may be able to
reproduce this bug (or something similar) so I'm going to see
if that leads anywhere.
#5 Updated by Danny Kukawka about 1 year ago
- File bug-2287_libceph-bio-iter-fix.patch added
We used the attached patch to resolve the immediate problem.
But we still see other crashes over the time. I found no time yet to
check if they are related.
If you plan to integrate the patch into git, let me know and I will
attach correct version with commit info and the author.
- Status changed from Verified to Resolved
- Priority changed from Urgent to Normal
This looks like the bio->iter problem, which is now fixed by commit:43643528cce60ca184fe8197efa8e8da7c89a037 in ceph-client.git (the sha1 will change once it goes into the master branch).
If you see problems beyond that, we should track it separately. thanks!