Project

General

Profile

Actions

Bug #53819

closed

Kernel null pointer derefecence during kernel mount fsync on Linux 5.15

Added by Niklas Hambuechen over 2 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
fs/ceph
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
kcephfs
Crash signature (v1):
Crash signature (v2):

Description

Extracted from https://tracker.ceph.com/issues/53809#note-8:

After upgrading from a 5.10 kernel to 5.15 kernel today, it crashed all 3 server/OSD nodes of my Ceph cluster simulatneously with a kernel null pointer dereference:

Jan 10 15:23:46 node-4 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
Jan 10 15:23:46 node-4 kernel: #PF: supervisor read access in kernel mode
Jan 10 15:23:46 node-4 kernel: #PF: error_code(0x0000) - not-present page

I suspect that the Crash is in the Ceph kernel module, because these machines mainly run Ceph, and there wouldn't be much else that would be able to synchronise those machines to crash at exactly the same time.

2 more nodes that are Ceph clients (not servers) were also upgraded to the 5.15 kernel. Those did not crash.

The Ceph servers and OSDs run v16.2.7.

More evidence that it's related to Ceph, since there's fsync in the crash trace:

Call Trace:
 <TASK>
 ? __fget_files+0x97/0xc0
 __x64_sys_fsync+0x34/0x60
 do_syscall_64...

The crash appeared approximately 13:12 hours after the 5.15 kernel booted and mounted the CephFS.

This is a regression, since the 5.10 kernel did not crash.


Files

Actions

Also available in: Atom PDF