Project

General

Profile

Actions

Bug #23706

closed

NULL sock gets passed to ceph_tcp_sendmsg()

Added by Bertrand Gouny about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
libceph
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Hello,

we noticed some server crash with this kind of logs:

<1>[48385.115715] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
<1>[48385.124186] IP: selinux_socket_sendmsg+0x5/0x20
<6>[48385.129060] PGD 0 P4D 0 
<4>[48385.131921] Oops: 0000 [#1] SMP PTI
<4>[48385.135745] Modules linked in: xfs cbc xt_statistic xt_physdev xt_nat ipt_REJECT nf_reject_ipv4 xt_addrtype xt_comment xt_mark br_netfilter veth tun bridge stp llc nf_conntrack_netlink xfrm_user xfrm_algo ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat overlay nf_conntrack_ipv4 nf_defrag_ipv4 xt_recent xt_conntrack iptable_filter nls_ascii nls_cp437 vfat fat ipmi_ssif sb_edac edac_core coretemp x86_pkg_temp_thermal kvm_intel ipmi_si mei_me ipmi_devintf kvm irqbypass mousedev evdev mei i2c_i801 ipmi_msghandler sch_fq_codel button xt_set ip_set nfnetlink ip6_tables ip_vs nf_conntrack ceph libceph libcrc32c crc32c_generic fscache hid_generic usbhid hid ext4 crc16 mbcache jbd2 fscrypto dm_verity dm_bufio sd_mod crc32c_intel igb ahci libahci ehci_pci i2c_algo_bit aesni_intel ehci_hcd
<4>[48385.209746]  aes_x86_64 libata i2c_core crypto_simd hwmon cryptd usbcore ptp scsi_mod glue_helper pps_core usb_common dm_mirror dm_region_hash dm_log dm_mod dax
<4>[48385.225000] CPU: 0 PID: 22717 Comm: kworker/0:0 Not tainted 4.14.19-coreos #1
<4>[48385.232461] Hardware name: Supermicro SYS-5038MD-H24TRF-OS012/X10SDE-DF, BIOS 1.3 01/05/2018
<4>[48385.241541] Workqueue: ceph-msgr ceph_msg_new [libceph]
<4>[48385.247107] task: ffff9db493603c80 task.stack: ffffbc96ccbf0000
<4>[48385.253366] RIP: 0010:selinux_socket_sendmsg+0x5/0x20
<4>[48385.258770] RSP: 0018:ffffbc96ccbf3d48 EFLAGS: 00010206
<4>[48385.264319] RAX: 000000000000c040 RBX: ffffffff96e66388 RCX: 0000000000000003
<4>[48385.271780] RDX: 0000000000000113 RSI: ffffbc96ccbf3de8 RDI: 0000000000000000
<4>[48385.279242] RBP: 0000000000000113 R08: 0000000000000113 R09: 0000000000000003
<4>[48385.286737] R10: ffffde87dc369900 R11: ffff9db49ec03200 R12: ffffbc96ccbf3de8
<4>[48385.294191] R13: 0000000000000000 R14: ffff9db22de63830 R15: ffff9db22de63830
<4>[48385.301670] FS:  0000000000000000(0000) GS:ffff9db49f200000(0000) knlGS:0000000000000000
<4>[48385.310390] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[48385.316490] CR2: 0000000000000020 CR3: 000000048200a004 CR4: 00000000003606f0
<4>[48385.323962] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[48385.331424] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[48385.338911] Call Trace:
<4>[48385.341680]  security_socket_sendmsg+0x3c/0x50
<4>[48385.346541]  sock_sendmsg+0x15/0x40
<4>[48385.350388]  ceph_msg_new+0x108e/0x2470 [libceph]
<4>[48385.355451]  ? pick_next_task_fair+0x469/0x580
<4>[48385.360253]  ? __switch_to+0xa8/0x460
<4>[48385.364255]  process_one_work+0x144/0x350
<4>[48385.368605]  worker_thread+0x4d/0x3e0
<4>[48385.372608]  kthread+0xfc/0x130
<4>[48385.376108]  ? rescuer_thread+0x310/0x310
<4>[48385.380459]  ? kthread_park+0x60/0x60
<4>[48385.384472]  ? do_syscall_64+0x66/0x1d0
<4>[48385.388650]  ? SyS_exit_group+0x10/0x10
<4>[48385.392843]  ret_from_fork+0x35/0x40
<4>[48385.396761] Code: 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 7f 20 be 02 00 00 00 e9 bd fe ff ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <48> 8b 7f 20 be 04 00 00 00 e9 9d fe ff ff 0f 1f 00 66 2e 0f 1f 
<1>[48385.416505] RIP: selinux_socket_sendmsg+0x5/0x20 RSP: ffffbc96ccbf3d48
<4>[48385.423374] CR2: 0000000000000020
<4>[48385.427044] ---[ end trace f37cf512d0bc4b2f ]---

not sure if this is linked but we see also a lot of :

<6>[33859.884045] libceph: osd2 up
<6>[34200.599876] libceph: osd1 down
<6>[34205.260202] libceph: osd0 down
<6>[34205.263644] libceph: osd1 up
<6>[34211.300661] libceph: osd0 up
<6>[34413.401422] libceph: osd1 down
<6>[34413.404841] libceph: osd0 down
<6>[34413.408291] libceph: osd1 up
<6>[34413.411539] libceph: osd0 up
<6>[34822.730351] libceph: osd2 down
<6>[34822.733804] libceph: osd2 up
<6>[35058.407999] libceph: osd2 down
<6>[35058.411415] libceph: osd2 up
<6>[35705.727712] libceph: osd1 down
<6>[35709.489983] libceph: osd1 up
<6>[35712.425469] libceph: osd0 down
<6>[35712.428871] libceph: osd0 up
<6>[35724.584535] libceph: osd1 down
...

and

Apr 13 08:53:07 black-mirror-osixia-cluster kernel: libceph: mon2 10.244.73.3:6789 session lost, hunting for new mon
Apr 13 08:53:07 black-mirror-osixia-cluster kernel: libceph: mon1 10.244.46.3:6789 session established
Apr 13 08:53:16 black-mirror-osixia-cluster kernel: libceph: mon2 10.244.73.3:6789 session lost, hunting for new mon
Apr 13 08:53:16 black-mirror-osixia-cluster kernel: libceph: mon0 10.244.43.12:6789 session established
Apr 13 08:53:37 black-mirror-osixia-cluster kernel: libceph: mon1 10.244.46.3:6789 session lost, hunting for new mon
Apr 13 08:53:37 black-mirror-osixia-cluster kernel: libceph: mon2 10.244.73.3:6789 session established
Apr 13 08:53:47 black-mirror-osixia-cluster kernel: libceph: mon0 10.244.43.12:6789 session lost, hunting for new mon
Apr 13 08:53:47 black-mirror-osixia-cluster kernel: libceph: mon1 10.244.46.3:6789 session established

Module infos

filename:       /lib/modules/4.14.32-coreos/kernel/net/ceph/libceph.ko
license:        GPL
description:    Ceph core library
author:         Patience Warnick <patience@newdream.net>
author:         Yehuda Sadeh <yehuda@hq.newdream.net>
author:         Sage Weil <sage@newdream.net>
depends:        libcrc32c
retpoline:      Y
intree:         Y
name:           libceph
vermagic:       4.14.32-coreos SMP mod_unload


Files

vmcore-dmesg.txt (141 KB) vmcore-dmesg.txt Yong Wang, 04/25/2018 06:13 AM
Actions

Also available in: Atom PDF