Actions
Bug #12160
closedkernel BUG at fs/ceph/caps.c:2307, ceph_put_wrbuffer_cap_refs
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
I am seeing this bug again:
2015-06-25T14:11:52+02:00 kaa-29 kernel: kernel BUG at fs/ceph/caps.c:2307! 2015-06-25T14:11:52+02:00 kaa-29 kernel: invalid opcode: 0000 [#2] SMP 2015-06-25T14:11:52+02:00 kaa-29 kernel: Modules linked in: cbc ceph libceph ipmi_watchdog adm1026 w83795 w83793 hwmon_vid jc42 8021q garp mrp stp llc autofs4 xfs ipmi_devintf mgag200 syscopyarea sysfillrect sysimgblt ttm x86_pkg_temp_thermal drm_kms_helper coretemp iTCO _wdt drm iTCO_vendor_support microcode pcspkr sb_edac evdev lpc_ich edac_core ipmi_si i2c_i801 mfd_core rtc_cmos ipmi_msghandler processor thermal_sys ioatdma button dca rpcsec_gss_krb5 fuse nfsv4 nfs af_packet hid_generic usbhid hid bonding sd_mod ehci_pci ehci_hcd crc3 2c_intel ahci libahci usbcore libata usb_common ipv6 dm_mirror dm_region_hash dm_log dm_mod unix 2015-06-25T14:11:52+02:00 kaa-29 kernel: CPU: 25 PID: 698 Comm: kworker/25:1 Tainted: P D W O 3.18.10-gentoo #1 2015-06-25T14:11:52+02:00 kaa-29 kernel: Hardware name: Supermicro X9DRT-HF+/X9DRT-HF+, BIOS 3.0c 05/21/2014 2015-06-25T14:11:52+02:00 kaa-29 kernel: Workqueue: ceph-msgr con_work [libceph] 2015-06-25T14:11:52+02:00 kaa-29 kernel: task: ffff88101dc4e400 ti: ffff88101ef70000 task.ti: ffff88101ef70000 2015-06-25T14:11:52+02:00 kaa-29 kernel: RIP: 0010:[<ffffffffa05233a7>] [<ffffffffa05233a7>] ceph_put_wrbuffer_cap_refs+0x252/0x26e [ceph] 2015-06-25T14:11:52+02:00 kaa-29 kernel: RSP: 0018:ffff88101ef73b98 EFLAGS: 00010246 2015-06-25T14:11:52+02:00 kaa-29 kernel: RAX: ffff880b621e2d30 RBX: ffff880b621e2b80 RCX: ffff880b621e2d30 2015-06-25T14:11:52+02:00 kaa-29 kernel: RDX: 000000000000132b RSI: 0000000000000020 RDI: 0000000000000000 2015-06-25T14:11:52+02:00 kaa-29 kernel: RBP: ffff88101ef73c08 R08: ffff88107ffc5468 R09: 0000000000013c84 2015-06-25T14:11:52+02:00 kaa-29 kernel: R10: 000000000000001f R11: 0000000000013c28 R12: ffff88101de54fa0 2015-06-25T14:11:52+02:00 kaa-29 kernel: R13: ffff8820241a0000 R14: ffff880b621e2d20 R15: 0000000000000000 2015-06-25T14:11:52+02:00 kaa-29 kernel: FS: 0000000000000000(0000) GS:ffff88103fba0000(0000) knlGS:0000000000000000 2015-06-25T14:11:52+02:00 kaa-29 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2015-06-25T14:11:52+02:00 kaa-29 kernel: CR2: 00007ff4564430f0 CR3: 0000000001811000 CR4: 00000000001407e0 2015-06-25T14:11:52+02:00 kaa-29 kernel: Stack: 2015-06-25T14:11:52+02:00 kaa-29 kernel: 0000000000000282 ffff88103f419400 00ffea001e075cc0 ffff8820241a0110 2015-06-25T14:11:52+02:00 kaa-29 kernel: ffff88101ef73be8 ffffea000f36d280 ffff880b621e2d30 ffff880b621e2eb0 2015-06-25T14:11:52+02:00 kaa-29 kernel: 0000000000000000 ffff880b621e2eb0 ffff8820241a0110 ffff8820241a0000 2015-06-25T14:11:52+02:00 kaa-29 kernel: Call Trace: 2015-06-25T14:11:52+02:00 kaa-29 kernel: [<ffffffffa051c41a>] writepages_finish+0x20a/0x26c [ceph] 2015-06-25T14:11:52+02:00 kaa-29 kernel: [<ffffffffa0148c0e>] dispatch+0x5a0/0x8af [libceph] 2015-06-25T14:11:53+02:00 kaa-29 kernel: [<ffffffffa01410c3>] con_work+0x1087/0x239d [libceph] 2015-06-25T14:11:53+02:00 kaa-29 kernel: [<ffffffff8105a238>] ? update_next_balance.constprop.66+0x15/0x42 2015-06-25T14:11:53+02:00 kaa-29 kernel: [<ffffffff81008fe6>] ? native_sched_clock+0x35/0x37 2015-06-25T14:11:53+02:00 kaa-29 kernel: [<ffffffff810577fc>] ? sched_clock_cpu+0x1b/0xa4 2015-06-25T14:11:53+02:00 kaa-29 kernel: [<ffffffff81058087>] ? arch_vtime_task_switch+0x6b/0x70 2015-06-25T14:11:53+02:00 kaa-29 kernel: [<ffffffff81051a57>] ? finish_task_switch+0x95/0xea 2015-06-25T14:11:53+02:00 kaa-29 kernel: [<ffffffff810496a9>] process_one_work+0x154/0x213 2015-06-25T14:11:53+02:00 kaa-29 kernel: [<ffffffff81049bf0>] worker_thread+0x1c2/0x299 2015-06-25T14:11:53+02:00 kaa-29 kernel: [<ffffffff81049a2e>] ? cancel_delayed_work_sync+0x10/0x10 2015-06-25T14:11:53+02:00 kaa-29 kernel: [<ffffffff8104d8cd>] kthread+0xa0/0xa8 2015-06-25T14:11:53+02:00 kaa-29 kernel: [<ffffffff8104d82d>] ? __kthread_parkme+0x5c/0x5c 2015-06-25T14:11:53+02:00 kaa-29 kernel: [<ffffffff813ecd48>] ret_from_fork+0x58/0x90 2015-06-25T14:11:53+02:00 kaa-29 kernel: [<ffffffff8104d82d>] ? __kthread_parkme+0x5c/0x5c 2015-06-25T14:11:53+02:00 kaa-29 kernel: Code: df e8 49 e0 ff ff 48 8d bb 70 01 00 00 31 c9 31 d2 be 03 00 00 00 e8 dd fb b3 e0 45 85 e4 74 18 48 8b 7d c8 e8 6e 4a bf e0 eb 0d <0f> 0b 45 31 e4 45 31 ed e9 ee fe ff ff 48 83 c4 48 5b 41 5c 41 2015-06-25T14:11:53+02:00 kaa-29 kernel: RIP [<ffffffffa05233a7>] ceph_put_wrbuffer_cap_refs+0x252/0x26e [ceph] 2015-06-25T14:11:53+02:00 kaa-29 kernel: RSP <ffff88101ef73b98> 2015-06-25T14:11:53+02:00 kaa-29 kernel: ---[ end trace 3649ad6510e9703c ]--- 2015-06-25T14:11:53+02:00 kaa-29 kernel: BUG: unable to handle kernel paging request at ffffffffffffffd8 2015-06-25T14:11:53+02:00 kaa-29 kernel: IP: [<ffffffff8104dabe>] kthread_data+0xb/0x11 2015-06-25T14:11:53+02:00 kaa-29 kernel: PGD 1812067 PUD 1814067 PMD 0 2015-06-25T14:11:53+02:00 kaa-29 kernel: Oops: 0000 [#3] SMP 2015-06-25T14:11:53+02:00 kaa-29 kernel: Modules linked in: cbc ceph libceph ipmi_watchdog adm1026 w83795 w83793 hwmon_vid jc42 8021q garp mrp stp llc autofs4 xfs ipmi_devintf mgag200 syscopyarea sysfillrect sysimgblt ttm x86_pkg_temp_thermal drm_kms_helper coretemp iTCO _wdt drm iTCO_vendor_support microcode pcspkr sb_edac evdev lpc_ich edac_core ipmi_si i2c_i801 mfd_core rtc_cmos ipmi_msghandler processor thermal_sys ioatdma button dca rpcsec_gss_krb5 fuse nfsv4 nfs af_packet hid_generic usbhid hid bonding sd_mod ehci_pci ehci_hcd crc3 2c_intel ahci libahci usbcore libata usb_common ipv6 dm_mirror dm_region_hash dm_log dm_mod unix 2015-06-25T14:11:53+02:00 kaa-29 kernel: CPU: 25 PID: 698 Comm: kworker/25:1 Tainted: P D W O 3.18.10-gentoo #1 2015-06-25T14:11:53+02:00 kaa-29 kernel: Hardware name: Supermicro X9DRT-HF+/X9DRT-HF+, BIOS 3.0c 05/21/2014 2015-06-25T14:11:53+02:00 kaa-29 kernel: task: ffff88101dc4e400 ti: ffff88101ef70000 task.ti: ffff88101ef70000 2015-06-25T14:11:53+02:00 kaa-29 kernel: RIP: 0010:[<ffffffff8104dabe>] [<ffffffff8104dabe>] kthread_data+0xb/0x11 2015-06-25T14:11:53+02:00 kaa-29 kernel: RSP: 0018:ffff88101ef73838 EFLAGS: 00010002 2015-06-25T14:11:53+02:00 kaa-29 kernel: RAX: 0000000000000000 RBX: ffff88103fbb0780 RCX: ffff88103fbb07f8 2015-06-25T14:11:53+02:00 kaa-29 kernel: RDX: 0000000000000000 RSI: 0000000000000019 RDI: ffff88101dc4e400 2015-06-25T14:11:53+02:00 kaa-29 kernel: RBP: ffff88101ef73838 R08: ffffffff819a6dc0 R09: 000000000000000f 2015-06-25T14:11:53+02:00 kaa-29 kernel: R10: 00000000ffffffff R11: 000000000000b80e R12: 0000000000000019 2015-06-25T14:11:53+02:00 kaa-29 kernel: R13: ffff88101dc4e960 R14: ffff881029378000 R15: ffff88101dc4e400 2015-06-25T14:11:53+02:00 kaa-29 kernel: FS: 0000000000000000(0000) GS:ffff88103fba0000(0000) knlGS:0000000000000000 2015-06-25T14:11:53+02:00 kaa-29 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2015-06-25T14:11:53+02:00 kaa-29 kernel: CR2: 0000000000000028 CR3: 0000000001811000 CR4: 00000000001407e0 2015-06-25T14:11:53+02:00 kaa-29 kernel: Stack: 2015-06-25T14:11:53+02:00 kaa-29 kernel: ffff88101ef73858 ffffffff81049f42 ffff88103fbb0780 0000000000000019
Previously, I saw this bug on kernel 3.14.y, but the patch "ceph: introduce global empty snap context" seemed to have solved this. Having upgraded to Ceph 0.94.2 and kernel 3.18.10, the problem reapperead today on one machine, despite having the following patches applied:
ceph: exclude setfilelock requests when calculating oldest tid e8a7b8b12b13831467c6158c1e82801e25b5dd98 ceph: fix dentry leaks 5cba372c0fe78d24e83d9e0556ecbeb219625c15 libceph: kfree() in put_osd() shouldn't depend on authorizer b28ec2f37e6a2bbd0bdf74b39cb89c74e4ad17f3 libceph: request a new osdmap if lingering request maps to no osd b0494532214bdfbf241e94fabab5dd46f7b82631 ceph: re-send requests when MDS enters reconnecting stage 3de22be6771353241eaec237fe594dfea3daf30f Revert "libceph: clear r_req_lru_item in __unregister_linger_request()" 521a04d06a729e5971cdee7f84080387ed320527 ceph: introduce global empty snap context 97c85a828f36bbfffe9d77b977b65a5872b6cad4
Is there any other fix I could try?
Files
Actions