Project

General

Profile

Actions

Bug #63710

closed

client.5394 isn't responding to mclientcaps(revoke), ino 0x10000000001 pending pAsLsXs issued pAsLsXsFs, sent 30723.964282 seconds ago

Added by Xiubo Li 5 months ago. Updated 4 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
pacific, quincy, reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This time it says the Fs caps is not released by the client, while from the client side when I stat the corresponding file by enabling the kernel debug logs:

[xiubli@ceph kcephfs.B]$ stat .
  File: .
  Size: 20            Blocks: 0          IO Block: 65536  directory
Device: 2dh/45d    Inode: 1099511627777  Links: 22
Access: (0755/drwxr-xr-x)  Uid: ( 1000/  xiubli)   Gid: ( 1000/  xiubli)
Access: 2023-11-30 17:23:51.320089270 +0800
Modify: 2023-11-30 21:24:04.227273296 +0800
Change: 2023-11-30 21:24:04.227273296 +0800
 Birth: 2023-11-30 17:23:51.320089270 +0800
<7>[84024.461142] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] delayed_work: mdsc delayed_work
<7>[84024.461166] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_check_delayed_caps: begin
<7>[84024.461173] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_check_delayed_caps: done
<7>[84024.461183] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_queue_cap_reclaim_work: caps reclaim work queued
<7>[84025.491515] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 4256] delayed_work: mdsc delayed_work
<7>[84025.491552] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 4256] ceph_check_delayed_caps: begin
<7>[84025.491560] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 4256] ceph_check_delayed_caps: done
<7>[84025.491572] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 4256] ceph_queue_cap_reclaim_work: caps reclaim work queued
<7>[84025.719421] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __ceph_do_getattr: inode 00000000745cb99f 10000000001.fffffffffffffffe mask As mode 040755
<7>[84025.719448] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __ceph_caps_issued_mask: mask 00000000745cb99f 10000000001.fffffffffffffffe cap 00000000ff1f815a issued pAsLsXs (mask As)
<7>[84025.719461] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __touch_cap: 00000000745cb99f 10000000001.fffffffffffffffe cap 00000000ff1f815a mds0
<7>[84025.719476] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __ceph_do_getattr: inode 00000000745cb99f 10000000001.fffffffffffffffe mask AsXsFs mode 040755
<7>[84025.719487] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __ceph_caps_issued: 00000000745cb99f 10000000001.fffffffffffffffe cap 00000000ff1f815a issued pAsLsXs
<7>[84025.719535] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_mdsc_do_request: do_request on 00000000018dbb8a
<7>[84025.719545] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_mdsc_submit_request: submit_request on 00000000018dbb8a for inode 0000000000000000
<7>[84025.719556] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_reserve_caps: ctx=00000000c3e0afdc need=1
<7>[84025.719598] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_reserve_caps: ctx=00000000c3e0afdc 86 = 3 used + 83 resv + 0 avail
<7>[84025.719607] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __register_request: 00000000018dbb8a tid 28
<7>[84025.719619] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __choose_mds: 00000000745cb99f 10000000001.fffffffffffffffe is_hash=0 (0x0) mode 2
<7>[84025.719630] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __choose_mds: 00000000745cb99f 10000000001.fffffffffffffffe mds0 (auth cap 00000000ff1f815a)
<7>[84025.719642] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __do_request: mds0 session 000000006dcba806 state open
<7>[84025.719653] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __prepare_send_request: 00000000018dbb8a tid 28 getattr (attempt 1)
<7>[84025.719663] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] set_request_path_attr:  inode 00000000745cb99f 10000000001.fffffffffffffffe
<7>[84025.719786] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __prepare_send_request:  r_parent = 0000000000000000
<7>[84025.719821] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_mdsc_wait_request: do_request waiting
<7>[84025.721451] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] handle_reply: handle_reply 00000000018dbb8a
<7>[84025.721483] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __unregister_request: 00000000018dbb8a tid 28
<7>[84025.721497] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] handle_reply: tid 28 result 0
<7>[84025.721519] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_get_inode: on 10000000001=10000000001.fffffffffffffffe got 00000000745cb99f new 0
<7>[84025.721539] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_update_snap_trace: deletion=0
<7>[84025.721554] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __lookup_snap_realm: 1 00000000183cd74c
<7>[84025.721568] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_update_snap_trace: 1 00000000183cd74c seq 1 unchanged
<7>[84025.721583] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_update_snap_trace: done with 1 00000000183cd74c, rebuild_snapcs=0, 00000000cdae45ec 00000000cdae45ec
<7>[84025.721601] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_fill_trace: 00000000018dbb8a is_dentry 0 is_target 1
<7>[84025.721617] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_fill_inode: 00000000745cb99f ino 10000000001.fffffffffffffffe v 157290 had 157290
<7>[84025.721636] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_get_cap: ctx=00000000c3e0afdc (1) 86 = 3 used + 83 resv + 0 avail
<7>[84025.721782] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __ceph_caps_issued: 00000000745cb99f 10000000001.fffffffffffffffe cap 00000000ff1f815a issued pAsLsXs
<7>[84025.721804] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_add_cap: 00000000745cb99f 10000000001.fffffffffffffffe mds0 cap 4f30 pAsLsXs seq 22
<7>[84025.721823] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __ceph_caps_issued: 00000000745cb99f 10000000001.fffffffffffffffe cap 00000000ff1f815a issued pAsLsXs
<7>[84025.721841] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_add_cap: inode 00000000745cb99f 10000000001.fffffffffffffffe cap 00000000ff1f815a pAsLsXs now pAsLsXs seq 22 mds0
<7>[84025.721863] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_put_cap: 00000000a3c5a7b4 86 = 4 used + 82 resv + 0 avail
<7>[84025.721951] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_fill_trace: done err=0
<7>[84025.722046] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_mdsc_wait_request: do_request waited, got 0
<7>[84025.722062] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] ceph_mdsc_do_request: do_request 00000000018dbb8a done, result 0
<7>[84025.722177] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __ceph_put_cap_refs: 00000000745cb99f 10000000001.fffffffffffffffe had p
<7>[84025.722210] ceph:  [6e016130-5830-4f25-b08d-dd1693d1b894 5424] __ceph_do_getattr: result=0

I can see that the inode 10000000001 has already released it:

10000000001.fffffffffffffffe cap 00000000ff1f815a issued pAsLsXs

It should be the MDS missed it in some corner cases.


Related issues 1 (0 open1 closed)

Related to CephFS - Bug #57244: [WRN] : client.408214273 isn't responding to mclientcaps(revoke), ino 0x10000000003 pending pAsLsXsFs issued pAsLsXsFs, sent 62.303702 seconds agoResolvedXiubo Li

Actions
Actions #1

Updated by Xiubo Li 5 months ago

[xiubli@ceph build]$ ./bin/ceph health detail
HEALTH_WARN 3 OSD(s) experiencing BlueFS spillover; 1 client(s) laggy due to laggy OSDs; 1 clients failing to respond to capability release; 1 MDSs report slow requests
[WRN] BLUEFS_SPILLOVER: 3 OSD(s) experiencing BlueFS spillover
     osd.0 spilled over 314 MiB metadata from 'db' device (298 MiB used of 1024 MiB) to slow device
     osd.1 spilled over 191 MiB metadata from 'db' device (321 MiB used of 1024 MiB) to slow device
     osd.2 spilled over 264 MiB metadata from 'db' device (350 MiB used of 1024 MiB) to slow device
[WRN] MDS_CLIENTS_LAGGY: 1 client(s) laggy due to laggy OSDs
    mds.a(mds.0): Client 5394 is laggy; not evicted because some OSD(s) is/are laggy
[WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability release
    mds.a(mds.0): Client ceph failing to respond to capability release client_id: 5394
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
    mds.a(mds.0): 6 slow requests are blocked > 30 secs
Actions #2

Updated by Xiubo Li 5 months ago

  • Status changed from New to In Progress

I couldn't reproduce it.

Actions #3

Updated by Xiubo Li 5 months ago

  • Status changed from In Progress to Need More Info
Actions #4

Updated by Venky Shankar 5 months ago

  • Related to Bug #57244: [WRN] : client.408214273 isn't responding to mclientcaps(revoke), ino 0x10000000003 pending pAsLsXsFs issued pAsLsXsFs, sent 62.303702 seconds ago added
Actions #5

Updated by Stefan Kooman 5 months ago

We have parsed our logs and occasionally see the same Ceph health warning "MDS_CLIENT_LATE_RELEASE" with exactly these pending / issued CAPS. Not sure how often these CAPS are issued and are indeed legitimate.

How would we enable kernel cephfs debug logging like you did here? So we can verify that we are affected by this issue as well?

Actions #6

Updated by Xiubo Li 5 months ago

Stefan Kooman wrote:

We have parsed our logs and occasionally see the same Ceph health warning "MDS_CLIENT_LATE_RELEASE" with exactly these pending / issued CAPS. Not sure how often these CAPS are issued and are indeed legitimate.

How would we enable kernel cephfs debug logging like you did here? So we can verify that we are affected by this issue as well?

Please do:

echo "module ceph +p" | sudo tee /sys/kernel/debug/dynamic_debug/control

Disable it:

echo "module ceph -p" | sudo tee /sys/kernel/debug/dynamic_debug/control
Actions #7

Updated by Xiubo Li 4 months ago

  • Status changed from Need More Info to Duplicate

This should be the same issue with https://tracker.ceph.com/issues/50223#note-20.

Actions

Also available in: Atom PDF