Project

General

Profile

Bug #57986

ceph: ceph_fl_release_lock cause "unable to handle kernel paging request at ffffffffffffff34"

Added by Xiubo Li 3 months ago. Updated 19 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

 #0 [ffff95202f33b960] machine_kexec at ffffffff890662f4
 #1 [ffff95202f33b9c0] __crash_kexec at ffffffff89122b82
 #2 [ffff95202f33ba90] crash_kexec at ffffffff89122c70
 #3 [ffff95202f33baa8] oops_end at ffffffff89791798
 #4 [ffff95202f33bad0] no_context at ffffffff89075d14
 #5 [ffff95202f33bb20] __bad_area_nosemaphore at ffffffff89075fe2
 #6 [ffff95202f33bb70] bad_area_nosemaphore at ffffffff89076104
 #7 [ffff95202f33bb80] __do_page_fault at ffffffff89794750
 #8 [ffff95202f33bbf0] do_page_fault at ffffffff89794975
 #9 [ffff95202f33bc20] page_fault at ffffffff89790778
    [exception RIP: ceph_fl_release_lock+20]
    RIP: ffffffffc08247a4  RSP: ffff95202f33bcd0  RFLAGS: 00010286
    RAX: ffff952d4ebd8a00  RBX: 0000000000000000  RCX: dead000000000200
    RDX: ffff95202f33bd60  RSI: ffff95202f33bd60  RDI: ffff9526b6ac5b00
    RBP: ffff95202f33bce0   R8: ffff9526b6ac5b18   R9: ffffffffc083c368
    R10: 0000000000001109  R11: 0000000000000000  R12: ffff95202f33bd60
    R13: ffff9526b6ac5b00  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff95202f33bce8] locks_release_private at ffffffff892ab3d7
#11 [ffff95202f33bd00] locks_free_lock at ffffffff892ac34d
#12 [ffff95202f33bd18] locks_dispose_list at ffffffff892ac44b
#13 [ffff95202f33bd40] __posix_lock_file at ffffffff892acdfa
#14 [ffff95202f33bda8] posix_lock_file at ffffffff892ad146
#15 [ffff95202f33bdb8] ceph_lock at ffffffffc0824e8a [ceph]
#16 [ffff95202f33bdf8] vfs_lock_file at ffffffff892ad185
#17 [ffff95202f33be08] locks_remove_posix at ffffffff892ad239
#18 [ffff95202f33bee0] locks_remove_posix at ffffffff892ad2a0
#19 [ffff95202f33bef0] filp_close at ffffffff8924baa6
#20 [ffff95202f33bf18] __close_fd at ffffffff8926f89c
#21 [ffff95202f33bf40] sys_close at ffffffff8924d503
#22 [ffff95202f33bf50] system_call_fastpath at ffffffff89799f92
    RIP: 00007f806ec446ab  RSP: 00007f80517f0d90  RFLAGS: 00010206
    RAX: 0000000000000003  RBX: 00007f8030001a20  RCX: 00007f80300386b0
    RDX: 00007f806ef0d880  RSI: 0000000000000001  RDI: 0000000000000006
    RBP: 00007f806ef0e3c0   R8: 00007f80517fa700   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000206  R12: 0000000000000000
    R13: 00007f80300035b0  R14: 00007f80517f1104  R15: 000000000000006c
    ORIG_RAX: 0000000000000003  CS: 0033  SS: 002b

History

#1 Updated by Xiubo Li 3 months ago

  • Description updated (diff)

#2 Updated by Xiubo Li 3 months ago

There should be a race in 'filp_close()`, for example in a single process a file is opened twice with two different file descripters in two different threads: filpA for Userspace ThreadA and filpB for Userspace ThreadB, then both ThreadA and ThreadB set posix locks for the file:

Userspace ThreadA:                                      Userspace ThreadB:
 filp_close():                                           filp_close():
   ->locks_remove_posix(filpA):                            ->locks_remove_posix(filpB):
     ->vfs_lock_file():                                      ->vfs_lock_file():
       ->ceph_lock():                                          ->ceph_lock():
         ->posix_lock_file(): 
           ->__posix_lock_file():
             ->Iterate and remove all the inode's
               posix locks with the same owner,
               which all the posix lock owner are
               the same: current->files.
               This will also close ThreadB's
               posix lock.
                                                                 ->posix_lock_file():
                                                                   ->__posix_lock_file():
                                                                     ->Will do nothing since there
                                                                       is no any posix lock in the
                                                                       inode
                                                                   ->locks_dispose_list():
                                                                     ->Do nothing too.
                                                           ->fput(filpB):
                                                             ->__fput(filpB):
                                                               ->file->f_path.dentry = NULL;
                                                               ->file->f_inode = NULL;
             ->locks_dispose_list():
               ->locks_free_lock():
                 ->locks_release_private():
                   ->Remove both ThreadA and ThreadB
                     posix locks. And when accessing
                     filpB it will crash.
   ->fput(filpA)

The ThreadA and ThreadB in kernel space will share the same file descripters, if my understanding is correct the posix locks' owner will be the same: current->files.

#3 Updated by Xiubo Li 2 months ago

  • Status changed from In Progress to Fix Under Review

#4 Updated by Xiubo Li 19 days ago

  • Status changed from Fix Under Review to Resolved

Applied to the mainline and closing this tracker.

Also available in: Atom PDF