Project

General

Profile

Actions

Bug #12160

closed

kernel BUG at fs/ceph/caps.c:2307, ceph_put_wrbuffer_cap_refs

Added by Markus Blank-Burian almost 9 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

I am seeing this bug again:

2015-06-25T14:11:52+02:00 kaa-29 kernel: kernel BUG at fs/ceph/caps.c:2307!
2015-06-25T14:11:52+02:00 kaa-29 kernel: invalid opcode: 0000 [#2] SMP 
2015-06-25T14:11:52+02:00 kaa-29 kernel: Modules linked in: cbc ceph libceph ipmi_watchdog adm1026 w83795 w83793 hwmon_vid jc42 8021q garp mrp stp llc autofs4 xfs ipmi_devintf mgag200 syscopyarea sysfillrect sysimgblt ttm x86_pkg_temp_thermal drm_kms_helper coretemp iTCO
_wdt drm iTCO_vendor_support microcode pcspkr sb_edac evdev lpc_ich edac_core ipmi_si i2c_i801 mfd_core rtc_cmos ipmi_msghandler processor thermal_sys ioatdma button dca rpcsec_gss_krb5 fuse nfsv4 nfs af_packet hid_generic usbhid hid bonding sd_mod ehci_pci ehci_hcd crc3
2c_intel ahci libahci usbcore libata usb_common ipv6 dm_mirror dm_region_hash dm_log dm_mod unix
2015-06-25T14:11:52+02:00 kaa-29 kernel: CPU: 25 PID: 698 Comm: kworker/25:1 Tainted: P      D W  O   3.18.10-gentoo #1
2015-06-25T14:11:52+02:00 kaa-29 kernel: Hardware name: Supermicro X9DRT-HF+/X9DRT-HF+, BIOS 3.0c 05/21/2014
2015-06-25T14:11:52+02:00 kaa-29 kernel: Workqueue: ceph-msgr con_work [libceph]
2015-06-25T14:11:52+02:00 kaa-29 kernel: task: ffff88101dc4e400 ti: ffff88101ef70000 task.ti: ffff88101ef70000
2015-06-25T14:11:52+02:00 kaa-29 kernel: RIP: 0010:[<ffffffffa05233a7>]  [<ffffffffa05233a7>] ceph_put_wrbuffer_cap_refs+0x252/0x26e [ceph]
2015-06-25T14:11:52+02:00 kaa-29 kernel: RSP: 0018:ffff88101ef73b98  EFLAGS: 00010246
2015-06-25T14:11:52+02:00 kaa-29 kernel: RAX: ffff880b621e2d30 RBX: ffff880b621e2b80 RCX: ffff880b621e2d30
2015-06-25T14:11:52+02:00 kaa-29 kernel: RDX: 000000000000132b RSI: 0000000000000020 RDI: 0000000000000000
2015-06-25T14:11:52+02:00 kaa-29 kernel: RBP: ffff88101ef73c08 R08: ffff88107ffc5468 R09: 0000000000013c84
2015-06-25T14:11:52+02:00 kaa-29 kernel: R10: 000000000000001f R11: 0000000000013c28 R12: ffff88101de54fa0
2015-06-25T14:11:52+02:00 kaa-29 kernel: R13: ffff8820241a0000 R14: ffff880b621e2d20 R15: 0000000000000000
2015-06-25T14:11:52+02:00 kaa-29 kernel: FS:  0000000000000000(0000) GS:ffff88103fba0000(0000) knlGS:0000000000000000
2015-06-25T14:11:52+02:00 kaa-29 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2015-06-25T14:11:52+02:00 kaa-29 kernel: CR2: 00007ff4564430f0 CR3: 0000000001811000 CR4: 00000000001407e0
2015-06-25T14:11:52+02:00 kaa-29 kernel: Stack:
2015-06-25T14:11:52+02:00 kaa-29 kernel:  0000000000000282 ffff88103f419400 00ffea001e075cc0 ffff8820241a0110
2015-06-25T14:11:52+02:00 kaa-29 kernel:  ffff88101ef73be8 ffffea000f36d280 ffff880b621e2d30 ffff880b621e2eb0
2015-06-25T14:11:52+02:00 kaa-29 kernel:  0000000000000000 ffff880b621e2eb0 ffff8820241a0110 ffff8820241a0000
2015-06-25T14:11:52+02:00 kaa-29 kernel: Call Trace:
2015-06-25T14:11:52+02:00 kaa-29 kernel:  [<ffffffffa051c41a>] writepages_finish+0x20a/0x26c [ceph]
2015-06-25T14:11:52+02:00 kaa-29 kernel:  [<ffffffffa0148c0e>] dispatch+0x5a0/0x8af [libceph]
2015-06-25T14:11:53+02:00 kaa-29 kernel:  [<ffffffffa01410c3>] con_work+0x1087/0x239d [libceph]
2015-06-25T14:11:53+02:00 kaa-29 kernel:  [<ffffffff8105a238>] ? update_next_balance.constprop.66+0x15/0x42
2015-06-25T14:11:53+02:00 kaa-29 kernel:  [<ffffffff81008fe6>] ? native_sched_clock+0x35/0x37
2015-06-25T14:11:53+02:00 kaa-29 kernel:  [<ffffffff810577fc>] ? sched_clock_cpu+0x1b/0xa4
2015-06-25T14:11:53+02:00 kaa-29 kernel:  [<ffffffff81058087>] ? arch_vtime_task_switch+0x6b/0x70
2015-06-25T14:11:53+02:00 kaa-29 kernel:  [<ffffffff81051a57>] ? finish_task_switch+0x95/0xea
2015-06-25T14:11:53+02:00 kaa-29 kernel:  [<ffffffff810496a9>] process_one_work+0x154/0x213
2015-06-25T14:11:53+02:00 kaa-29 kernel:  [<ffffffff81049bf0>] worker_thread+0x1c2/0x299
2015-06-25T14:11:53+02:00 kaa-29 kernel:  [<ffffffff81049a2e>] ? cancel_delayed_work_sync+0x10/0x10
2015-06-25T14:11:53+02:00 kaa-29 kernel:  [<ffffffff8104d8cd>] kthread+0xa0/0xa8
2015-06-25T14:11:53+02:00 kaa-29 kernel:  [<ffffffff8104d82d>] ? __kthread_parkme+0x5c/0x5c
2015-06-25T14:11:53+02:00 kaa-29 kernel:  [<ffffffff813ecd48>] ret_from_fork+0x58/0x90
2015-06-25T14:11:53+02:00 kaa-29 kernel:  [<ffffffff8104d82d>] ? __kthread_parkme+0x5c/0x5c
2015-06-25T14:11:53+02:00 kaa-29 kernel: Code: df e8 49 e0 ff ff 48 8d bb 70 01 00 00 31 c9 31 d2 be 03 00 00 00 e8 dd fb b3 e0 45 85 e4 74 18 48 8b 7d c8 e8 6e 4a bf e0 eb 0d <0f> 0b 45 31 e4 45 31 ed e9 ee fe ff ff 48 83 c4 48 5b 41 5c 41 
2015-06-25T14:11:53+02:00 kaa-29 kernel: RIP  [<ffffffffa05233a7>] ceph_put_wrbuffer_cap_refs+0x252/0x26e [ceph]
2015-06-25T14:11:53+02:00 kaa-29 kernel:  RSP <ffff88101ef73b98>
2015-06-25T14:11:53+02:00 kaa-29 kernel: ---[ end trace 3649ad6510e9703c ]---
2015-06-25T14:11:53+02:00 kaa-29 kernel: BUG: unable to handle kernel paging request at ffffffffffffffd8
2015-06-25T14:11:53+02:00 kaa-29 kernel: IP: [<ffffffff8104dabe>] kthread_data+0xb/0x11
2015-06-25T14:11:53+02:00 kaa-29 kernel: PGD 1812067 PUD 1814067 PMD 0 
2015-06-25T14:11:53+02:00 kaa-29 kernel: Oops: 0000 [#3] SMP 
2015-06-25T14:11:53+02:00 kaa-29 kernel: Modules linked in: cbc ceph libceph ipmi_watchdog adm1026 w83795 w83793 hwmon_vid jc42 8021q garp mrp stp llc autofs4 xfs ipmi_devintf mgag200 syscopyarea sysfillrect sysimgblt ttm x86_pkg_temp_thermal drm_kms_helper coretemp iTCO
_wdt drm iTCO_vendor_support microcode pcspkr sb_edac evdev lpc_ich edac_core ipmi_si i2c_i801 mfd_core rtc_cmos ipmi_msghandler processor thermal_sys ioatdma button dca rpcsec_gss_krb5 fuse nfsv4 nfs af_packet hid_generic usbhid hid bonding sd_mod ehci_pci ehci_hcd crc3
2c_intel ahci libahci usbcore libata usb_common ipv6 dm_mirror dm_region_hash dm_log dm_mod unix
2015-06-25T14:11:53+02:00 kaa-29 kernel: CPU: 25 PID: 698 Comm: kworker/25:1 Tainted: P      D W  O   3.18.10-gentoo #1
2015-06-25T14:11:53+02:00 kaa-29 kernel: Hardware name: Supermicro X9DRT-HF+/X9DRT-HF+, BIOS 3.0c 05/21/2014
2015-06-25T14:11:53+02:00 kaa-29 kernel: task: ffff88101dc4e400 ti: ffff88101ef70000 task.ti: ffff88101ef70000
2015-06-25T14:11:53+02:00 kaa-29 kernel: RIP: 0010:[<ffffffff8104dabe>]  [<ffffffff8104dabe>] kthread_data+0xb/0x11
2015-06-25T14:11:53+02:00 kaa-29 kernel: RSP: 0018:ffff88101ef73838  EFLAGS: 00010002
2015-06-25T14:11:53+02:00 kaa-29 kernel: RAX: 0000000000000000 RBX: ffff88103fbb0780 RCX: ffff88103fbb07f8
2015-06-25T14:11:53+02:00 kaa-29 kernel: RDX: 0000000000000000 RSI: 0000000000000019 RDI: ffff88101dc4e400
2015-06-25T14:11:53+02:00 kaa-29 kernel: RBP: ffff88101ef73838 R08: ffffffff819a6dc0 R09: 000000000000000f
2015-06-25T14:11:53+02:00 kaa-29 kernel: R10: 00000000ffffffff R11: 000000000000b80e R12: 0000000000000019
2015-06-25T14:11:53+02:00 kaa-29 kernel: R13: ffff88101dc4e960 R14: ffff881029378000 R15: ffff88101dc4e400
2015-06-25T14:11:53+02:00 kaa-29 kernel: FS:  0000000000000000(0000) GS:ffff88103fba0000(0000) knlGS:0000000000000000
2015-06-25T14:11:53+02:00 kaa-29 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2015-06-25T14:11:53+02:00 kaa-29 kernel: CR2: 0000000000000028 CR3: 0000000001811000 CR4: 00000000001407e0
2015-06-25T14:11:53+02:00 kaa-29 kernel: Stack:
2015-06-25T14:11:53+02:00 kaa-29 kernel:  ffff88101ef73858 ffffffff81049f42 ffff88103fbb0780 0000000000000019

Previously, I saw this bug on kernel 3.14.y, but the patch "ceph: introduce global empty snap context" seemed to have solved this. Having upgraded to Ceph 0.94.2 and kernel 3.18.10, the problem reapperead today on one machine, despite having the following patches applied:

ceph: exclude setfilelock requests when calculating oldest tid
e8a7b8b12b13831467c6158c1e82801e25b5dd98

ceph: fix dentry leaks
5cba372c0fe78d24e83d9e0556ecbeb219625c15

libceph: kfree() in put_osd() shouldn't depend on authorizer
b28ec2f37e6a2bbd0bdf74b39cb89c74e4ad17f3

libceph: request a new osdmap if lingering request maps to no osd
b0494532214bdfbf241e94fabab5dd46f7b82631

ceph: re-send requests when MDS enters reconnecting stage
3de22be6771353241eaec237fe594dfea3daf30f

Revert "libceph: clear r_req_lru_item in __unregister_linger_request()" 
521a04d06a729e5971cdee7f84080387ed320527

ceph: introduce global empty snap context
97c85a828f36bbfffe9d77b977b65a5872b6cad4

Is there any other fix I could try?


Files

Actions

Also available in: Atom PDF