Project

General

Profile

Bug #10450

NULL pointer dereference at send_mds_reconnect

Added by Markus Blank-Burian about 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

When our mds crashes (see #10449), some kernel throw

History

#1 Updated by Markus Blank-Burian about 9 years ago

Submitted by pressing enter at the wrong time. So here are more details:

When our mds crashes (see #10449), some clients throw a NULL pointer dereference in send_mds_reconnect. The call stack is the following:

2014-12-20T21:42:23+01:00 kaa-14 kernel: [107876.727375] ceph: mds0 closed our session
2014-12-20T21:42:23+01:00 kaa-14 kernel: [107876.727376] ceph: mds0 reconnect start
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.727430] BUG: unable to handle kernel NULL pointer dereference at           (null)
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731402] IP: [<ffffffffa01ed3c3>] send_mds_reconnect+0x272/0x592 [ceph]
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731424] PGD 80918d067 PUD 806d43067 PMD 0
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731427] Oops: 0000 [#1] SMP
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731428] Modules linked in: cbc ceph libceph ipmi_watchdog w83627ehf adm1026 w83795 w83793 hwmon_vid jc42 8021q garp stp mrp llc autofs4 cpufreq_ondemand xfs ext4 mbcache jbd2 ipmi_si ipmi_
devintf ipmi_msghandler sr_mod cdrom mgag200 syscopyarea sysfillrect sysimgblt ttm kvm_amd drm_kms_helper kvm drm microcode usb_storage amd64_edac_mod pcspkr psmouse evdev edac_mce_amd sp5100_tco acpi_cpufreq edac_core rtc_cmos k10temp i
2c_piix4 button processor thermal_sys rpcsec_gss_krb5 fuse nfsv4 nfs af_packet hid_generic usbhid hid bonding sd_mod ata_generic ohci_pci ehci_pci ohci_hcd ehci_hcd ahci usbcore pata_atiixp libahci libata usb_common ipv6 dm_mirror dm_reg
ion_hash dm_log dm_mod unix
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731468] CPU: 7 PID: 30382 Comm: kworker/7:2 Tainted: P        W  O 3.14.26-gentoo #1
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731469] Hardware name: Supermicro H8DGU/H8DGU, BIOS 1.0c       10/14/10
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731483] Workqueue: ceph-msgr con_work [libceph]
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731485] task: ffff88080a128000 ti: ffff880809608000 task.ti: ffff880809608000
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731498] RIP: 0010:[<ffffffffa01ed3c3>]  [<ffffffffa01ed3c3>] send_mds_reconnect+0x272/0x592 [ceph]
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731499] RSP: 0000:ffff880809609ce8  EFLAGS: 00010246
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731500] RAX: 0000000000000000 RBX: ffff880c0835d000 RCX: 0000000000000000
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731501] RDX: 0000000000000000 RSI: ffffe8f7efcc2a00 RDI: ffff880c0835d4f0
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731501] RBP: ffff880809609d88 R08: 0000000000000001 R09: 0000000000000000
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731502] R10: ffff880809609bd0 R11: 000000000000bba7 R12: ffff880807d8c140
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731503] R13: ffff88040983bc00 R14: ffff880c0835d518 R15: ffff880c0835d2b8
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731504] FS:  00007fe82f169780(0000) GS:ffff88080fcc0000(0000) knlGS:0000000000000000
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731505] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731506] CR2: 0000000000000000 CR3: 0000000806d27000 CR4: 00000000000007e0
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731506] Stack:
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731508]  0000000700000000 0000000000000246 ffff880c0835d4f0 ffff88040983bc90
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731510]  ffff880c0835d020 ffff88040983bc08 ffff880c0835d048 000000000835d048
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731511]  ffff880806d9fe08 ffffffff813c63a1 0000000000000010 ffff880809609d98
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731511] Call Trace:
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731519]  [<ffffffff813c63a1>] ? printk+0x4a/0x52
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731521]  [<ffffffff813c63a1>] ? printk+0x4a/0x52
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731533]  [<ffffffffa01ed70c>] peer_reset+0x29/0x2e [ceph]
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731541]  [<ffffffffa0196a91>] con_work+0x1d2c/0x2374 [libceph]
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731547]  [<ffffffff8105db4c>] ? arch_vtime_task_switch+0x87/0x8c
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731549]  [<ffffffff8105db76>] ? vtime_common_task_switch+0x25/0x28
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731551]  [<ffffffff810579b1>] ? finish_task_switch+0xe4/0xff
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731555]  [<ffffffff8104ba9f>] process_one_work+0x154/0x221
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731557]  [<ffffffff8104c1e2>] worker_thread+0x13e/0x1d7
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731559]  [<ffffffff8104c0a4>] ? cancel_delayed_work_sync+0x10/0x10
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731560]  [<ffffffff81050cc5>] kthread+0xb2/0xba
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731562]  [<ffffffff81050c13>] ? __kthread_parkme+0x62/0x62
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731565]  [<ffffffff813cc13c>] ret_from_fork+0x7c/0xb0
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731567]  [<ffffffff81050c13>] ? __kthread_parkme+0x62/0x62
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731580] Code: 18 8b 53 08 48 c7 c6 30 dc 1f a0 48 c7 c7 b8 5e 20 a0 31 c0 e8 48 69 fe e0 4c 8b b3 18 05 00 00 f6 05 f3 8a 01 00 04 4d 8b 4e 80 <45> 8b 39 74 30 8b 53 08 49 8d 8e 38 ff ff ff 45 89 f8 48 c7 c6
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731591] RIP  [<ffffffffa01ed3c3>] send_mds_reconnect+0x272/0x592 [ceph]
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731592]  RSP <ffff880809609ce8>
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731592] CR2: 0000000000000000
2014-12-20T21:42:24+01:00 kaa-14 kernel: [107876.731594] ---[ end trace 245831ae66853a17 ]---

I am using the the LTS kernel v3.14.26 and tested with ceph v0.80.4 and v0.87. I tested both with a single mds as well as with an additional standby-mds.

#2 Updated by Zheng Yan about 9 years ago

  • Status changed from New to 12

it's already fixed by commit 00bd8edb86 (ceph: fix null pointer dereference in discard_cap_releases()), this fix is included in 3.15 kernel

#3 Updated by Zheng Yan about 9 years ago

  • Status changed from 12 to Resolved

Also available in: Atom PDF