Bug #52997

rhel: hanging umount

Added by Patrick Donnelly over 1 year ago. Updated about 1 year ago.

Target version:
% Done:


3 - minor
Affected Versions:
Crash signature (v1):
Crash signature (v2):


2021-10-19T06:01:20.792209+00:00 smithi018 kernel: INFO: task umount:50325 blocked for more than 120 seconds.
2021-10-19T06:01:20.792336+00:00 smithi018 kernel:      Not tainted 4.18.0-305.el8.x86_64 #1
2021-10-19T06:01:20.792364+00:00 smithi018 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2021-10-19T06:01:20.800384+00:00 smithi018 kernel: umount          D    0 50325  50309 0x00004080
2021-10-19T06:01:20.806235+00:00 smithi018 kernel: Call Trace:
2021-10-19T06:01:20.809055+00:00 smithi018 kernel: __schedule+0x2c4/0x700
2021-10-19T06:01:20.812929+00:00 smithi018 kernel: schedule+0x38/0xa0
2021-10-19T06:01:20.816441+00:00 smithi018 kernel: ceph_mdsc_sync+0x2f1/0x350 [ceph]
2021-10-19T06:01:20.821265+00:00 smithi018 kernel: ? finish_wait+0x80/0x80
2021-10-19T06:01:20.825208+00:00 smithi018 kernel: ceph_sync_fs+0x2f/0xb0 [ceph]
2021-10-19T06:01:20.833782+00:00 smithi018 kernel: sync_filesystem+0x71/0x90
2021-10-19T06:01:20.833831+00:00 smithi018 kernel: generic_shutdown_super+0x22/0x100
2021-10-19T06:01:20.838599+00:00 smithi018 kernel: kill_anon_super+0x14/0x30
2021-10-19T06:01:20.842716+00:00 smithi018 kernel: ceph_kill_sb+0x39/0x70 [ceph]
2021-10-19T06:01:20.847197+00:00 smithi018 kernel: deactivate_locked_super+0x34/0x70
2021-10-19T06:01:20.852010+00:00 smithi018 kernel: cleanup_mnt+0x3b/0x70
2021-10-19T06:01:20.855748+00:00 smithi018 kernel: task_work_run+0x8a/0xb0
2021-10-19T06:01:20.859661+00:00 smithi018 kernel: exit_to_usermode_loop+0xeb/0xf0
2021-10-19T06:01:20.864281+00:00 smithi018 kernel: do_syscall_64+0x198/0x1a0
2021-10-19T06:01:20.868376+00:00 smithi018 kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca
2021-10-19T06:01:20.873763+00:00 smithi018 kernel: RIP: 0033:0x7fb6d38bbdfb
2021-10-19T06:01:20.884568+00:00 smithi018 kernel: Code: Unable to access opcode bytes at RIP 0x7fb6d38bbdd1.
2021-10-19T06:01:20.884617+00:00 smithi018 kernel: RSP: 002b:00007ffdd1555918 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
2021-10-19T06:01:20.892491+00:00 smithi018 kernel: RAX: 0000000000000000 RBX: 0000558ed24fd5d0 RCX: 00007fb6d38bbdfb
2021-10-19T06:01:20.899980+00:00 smithi018 kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000558ed24fd7b0
2021-10-19T06:01:20.907467+00:00 smithi018 kernel: RBP: 0000000000000000 R08: 0000558ed24fd7e0 R09: 00007fb6d393f580
2021-10-19T06:01:20.914960+00:00 smithi018 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000558ed24fd7b0
2021-10-19T06:01:20.922456+00:00 smithi018 kernel: R13: 00007fb6d4669184 R14: 0000000000000000 R15: 00000000ffffffff
2021-10-19T06:02:09.066934+00:00 smithi018 kernel: libceph: mds0 (2) socket closed (con state OPEN)
2021-10-19T06:02:09.067075+00:00 smithi018 kernel: libceph: mds1 (2) socket closed (con state OPEN)
2021-10-19T06:02:09.090515+00:00 smithi018 kernel: libceph: mds3 (2) socket closed (con state OPEN)
2021-10-19T06:02:09.090645+00:00 smithi018 kernel: libceph: mds2 (2) socket closed (con state OPEN)
2021-10-19T06:02:09.178612+00:00 smithi018 kernel: libceph: mds4 (2) socket closed (con state OPEN)
2021-10-19T06:03:23.672205+00:00 smithi018 kernel: INFO: task umount:50325 blocked for more than 120 seconds.
2021-10-19T06:03:23.672336+00:00 smithi018 kernel:      Not tainted 4.18.0-305.el8.x86_64 #1
2021-10-19T06:03:23.672378+00:00 smithi018 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2021-10-19T06:03:23.686280+00:00 smithi018 kernel: umount          D    0 50325  50309 0x00004080

From: /ceph/teuthology-archive/pdonnell-2021-10-19_04:32:14-fs-wip-pdonnell-testing-20211019.013028-distro-basic-smithi/6450404/remote/smithi018/syslog/kern.log.gz

Related issues

Related to Linux kernel client - Bug #51279: kclient hangs on umount (testing branch) Resolved


#1 Updated by Jeff Layton over 1 year ago

I took a look at the log and it doesn't tell us much, unfortunately. The task attempting to umount is hung waiting for caps to be flushed:

(gdb) list *(ceph_mdsc_sync+0x2f1)
0x2d591 is in ceph_mdsc_sync (fs/ceph/mds_client.c:2040).

static void wait_caps_flush(struct ceph_mds_client *mdsc,
                            u64 want_flush_tid)
        dout("check_caps_flush want %llu\n", want_flush_tid);

                   check_caps_flush(mdsc, want_flush_tid));      <<<< STUCK HERE

        dout("check_caps_flush ok, flushed thru %llu\n", want_flush_tid);

Unfortunately, we don't know anything about the state of the caps being flushed and why they're not making progress.

Note too that this is a RHEL 8.4 kernel, and there are a pile of fixes that may be related in the 8.5 kernel.

#2 Updated by Jeff Layton over 1 year ago

I would suspect that this is another manifestation of #51279, but I don't see any evidence of blocklisting in the logs. It may still be related though.

#3 Updated by Patrick Donnelly over 1 year ago

  • Related to Bug #51279: kclient hangs on umount (testing branch) added

#4 Updated by Patrick Donnelly over 1 year ago

  • Subject changed from testing: hang ing umount to rhel: hanging umount

#5 Updated by Jeff Layton about 1 year ago

  • Status changed from New to Duplicate

Closing as a dup of #51279. Please reopen if you see it on kernels with the patches for that.

Also available in: Atom PDF