Project

General

Profile

Actions

Bug #55818

closed

kernel crash when trying to call sendpage on bogus page pointer

Added by Jeff Layton almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

In a teuthology run today, I hit crashes on two machines. Logs are here:

http://qa-proxy.ceph.com/teuthology/jlayton-2022-06-01_13:00:23-fs-wip-jlayton-qa-custom-kernel-wip-fscrypt-default-smithi/6858411/console_logs/smithi055.log
http://qa-proxy.ceph.com/teuthology/jlayton-2022-06-01_13:00:23-fs-wip-jlayton-qa-custom-kernel-wip-fscrypt-default-smithi/6858408/console_logs/smithi005.log

One is using msgr2 and the other msgr1, but I think they both failed calling sendpage_ok on a bogus pointer:

[ 2004.531981] BUG: unable to handle page fault for address: 0000000000400008
[ 2004.538949] #PF: supervisor read access in kernel mode
[ 2004.544166] #PF: error_code(0x0000) - not-present page
[ 2004.549386] PGD 0 P4D 0 
[ 2004.552003] Oops: 0000 [#1] PREEMPT SMP PTI
[ 2004.556275] CPU: 6 PID: 99368 Comm: kworker/6:0 Tainted: G S                5.18.0-ceph-geada6106e6cb #1
[ 2004.565862] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015
[ 2004.573449] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[ 2004.579042] RIP: 0010:do_try_sendpage+0xc2/0x1c0 [libceph]
[ 2004.584623] Code: 8b 4c 24 18 49 8b 7c 24 08 8b 51 0c 48 8b 31 01 fa 48 89 34 24 89 54 24 0c 8b 49 08 48 29 f9 48 39 c1 48 0f 47 c8 89 4c 24 08 <48> 8b 46 08 a8 01 0f 85 a8 00 00 00 66 90 48 89 f0 48 8b 00 89 c9
[ 2004.603516] RSP: 0018:ffffc9000cd5bcf0 EFLAGS: 00010246
[ 2004.608821] RAX: 0000000000001000 RBX: ffff88815382a100 RCX: 0000000000001000
[ 2004.616037] RDX: 0000000000000000 RSI: 0000000000400000 RDI: 0000000000000000
[ 2004.623254] RBP: ffffc9000cd5bd00 R08: 0000000000001000 R09: 0000000000000000
[ 2004.630474] R10: 0000000000000000 R11: ffff8883a597cff0 R12: ffff888159eee348
[ 2004.637695] R13: ffff888159eee348 R14: ffffffffa08b04c8 R15: 0000000000000001
[ 2004.644913] FS:  0000000000000000(0000) GS:ffff88885fd80000(0000) knlGS:0000000000000000
[ 2004.653105] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2004.658932] CR2: 0000000000400008 CR3: 00000001068c4001 CR4: 00000000001706e0
[ 2004.666146] Call Trace:
[ 2004.668677]  <TASK>
[ 2004.670861]  ceph_con_v2_try_write+0x73/0x4c0 [libceph]
[ 2004.676180]  ceph_con_workfn+0x2c0/0x6e0 [libceph]
[ 2004.681058]  process_one_work+0x240/0x5a0
[ 2004.685148]  worker_thread+0x3c/0x370
[ 2004.688890]  ? process_one_work+0x5a0/0x5a0
[ 2004.693155]  kthread+0xf2/0x120
[ 2004.696377]  ? kthread_complete_and_exit+0x20/0x20
[ 2004.701250]  ret_from_fork+0x1f/0x30
[ 2004.704909]  </TASK>
[ 2004.707173] Modules linked in: ceph libceph dns_resolver fscache netfs veth nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink bridge stp llc xfs libcrc32c sunrpc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass joydev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_i801 i2c_smbus lpc_ich mfd_core mei_me mei ioatdma wmi ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ixgbe igb mdio crc32c_intel ptp i2c_algo_bit nvme pps_core nvme_core dca fuse

Entering kdb (current=0xffff88811a552a40, pid 99368) on processor 6 Oops: (null)
due to oops @ 0xffffffffa08a4782
CPU: 6 PID: 99368 Comm: kworker/6:0 Tainted: G S                5.18.0-ceph-geada6106e6cb #1
Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015
Workqueue: ceph-msgr ceph_con_workfn [libceph]
RIP: 0010:do_try_sendpage+0xc2/0x1c0 [libceph]
Code: 8b 4c 24 18 49 8b 7c 24 08 8b 51 0c 48 8b 31 01 fa 48 89 34 24 89 54 24 0c 8b 49 08 48 29 f9 48 39 c1 48 0f 47 c8 89 4c 24 08 <48> 8b 46 08 a8 01 0f 85 a8 00 00 00 66 90 48 89 f0 48 8b 00 89 c9
RSP: 0018:ffffc9000cd5bcf0 EFLAGS: 00010246
RAX: 0000000000001000 RBX: ffff88815382a100 RCX: 0000000000001000
RDX: 0000000000000000 RSI: 0000000000400000 RDI: 0000000000000000
RBP: ffffc9000cd5bd00 R08: 0000000000001000 R09: 0000000000000000
R10: 0000000000000000 R11: ffff8883a597cff0 R12: ffff888159eee348
R13: ffff888159eee348 R14: ffffffffa08b04c8 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff88885fd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000400008 CR3: 00000001068c4001 CR4: 00000000001706e0
Call Trace:
 <TASK>
 ceph_con_v2_try_write+0x73/0x4c0 [libceph]

This almost certainly comes from writepages, so we may be an array overrun or some sort of memory scribble.

Actions

Also available in: Atom PDF