Project

General

Profile

Actions

Bug #62567

closed

postgres workunit times out - MDS_SLOW_REQUEST in logs

Added by Venky Shankar 9 months ago. Updated 7 months ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds, qa, qa-failure, scrub
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/vshankar-2023-08-23_03:59:53-fs-wip-vshankar-testing-20230822.060131-testing-default-smithi/7377197

fs/workload/{0-rhel_8 begin/{0-install 1-cephadm 2-logrotate} clusters/1a11s-mds-1c-client-3node conf/{client mds mon osd} mount/kclient/{base/{mount-syntax/{v1} mount overrides/{distro/testing/k-testing ms-die-on-skipped}} ms_mode/crc wsync/no} objectstore-ec/bluestore-bitmap omap_limit/10000 overrides/{cephsqlite-timeout frag ignorelist_health ignorelist_wrongly_marked_down osd-asserts session_timeout} ranks/multi/{balancer/automatic export-check n/5 replication/default} standby-replay tasks/{0-subvolume/{with-no-extra-options} 1-check-counter 2-scrub/yes 3-snaps/yes 4-flush/yes 5-workunit/postgres}}

postgres job times out. Lots of `MDS_SLOW_REQUEST` seen. needs investigation.

Actions #1

Updated by Venky Shankar 8 months ago

  • Assignee set to Xiubo Li

Xiubo, this might be related to the slow rename issue you have a PR for. Could you please check?

Actions #2

Updated by Xiubo Li 7 months ago

Venky Shankar wrote:

Xiubo, this might be related to the slow rename issue you have a PR for. Could you please check?

Sorry I think I missed this tracker. Will check it today.

Actions #3

Updated by Xiubo Li 7 months ago

There was a link request blocked for some reason:

2023-08-23T05:46:13.588 INFO:journalctl@ceph.mon.a.smithi033.stdout:Aug 23 05:46:13 smithi033 ceph-mon[97369]: 1 slow requests, 1 included below; oldest blocked for > 180.777069 secs
2023-08-23T05:46:13.588 INFO:journalctl@ceph.mon.a.smithi033.stdout:Aug 23 05:46:13 smithi033 ceph-mon[97369]: slow request 180.777069 seconds old, received at 2023-08-23T05:43:12.169484+0000: peer_request:client.25229:3464 currently acquired locks
2023-08-23T05:46:13.758 INFO:journalctl@ceph.mon.b.smithi153.stdout:Aug 23 05:46:13 smithi153 ceph-mon[130626]: 1 slow requests, 1 included below; oldest blocked for > 180.777069 secs
2023-08-23T05:46:13.758 INFO:journalctl@ceph.mon.b.smithi153.stdout:Aug 23 05:46:13 smithi153 ceph-mon[130626]: slow request 180.777069 seconds old, received at 2023-08-23T05:43:12.169484+0000: peer_request:client.25229:3464 currently acquired locks
2023-08-23T05:46:13.780 INFO:journalctl@ceph.mon.c.smithi158.stdout:Aug 23 05:46:13 smithi158 ceph-mon[131093]: 1 slow requests, 1 included below; oldest blocked for > 180.777069 secs
2023-08-23T05:46:13.780 INFO:journalctl@ceph.mon.c.smithi158.stdout:Aug 23 05:46:13 smithi158 ceph-mon[131093]: slow request 180.777069 seconds old, received at 2023-08-23T05:43:12.169484+0000: peer_request:client.25229:3464 currently acquired locks
2023-08-23T05:46:14.588 INFO:journalctl@ceph.mon.a.smithi033.stdout:Aug 23 05:46:14 smithi033 ceph-mon[97369]: pgmap v156: 97 pgs: 97 active+clean; 7.2 GiB data, 22 GiB used, 1.0 TiB / 1.0 TiB avail; 20 KiB/s rd, 29 KiB/s wr, 26 op/s
2023-08-23T05:46:14.758 INFO:journalctl@ceph.mon.b.smithi153.stdout:Aug 23 05:46:14 smithi153 ceph-mon[130626]: pgmap v156: 97 pgs: 97 active+clean; 7.2 GiB data, 22 GiB used, 1.0 TiB / 1.0 TiB avail; 20 KiB/s rd, 29 KiB/s wr, 26 op/s
2023-08-23T05:46:14.780 INFO:journalctl@ceph.mon.c.smithi158.stdout:Aug 23 05:46:14 smithi158 ceph-mon[131093]: pgmap v156: 97 pgs: 97 active+clean; 7.2 GiB data, 22 GiB used, 1.0 TiB / 1.0 TiB avail; 20 KiB/s rd, 29 KiB/s wr, 26 op/s
2023-08-23T05:46:16.588 INFO:journalctl@ceph.mon.a.smithi033.stdout:Aug 23 05:46:16 smithi033 ceph-mon[97369]: pgmap v157: 97 pgs: 97 active+clean; 7.2 GiB data, 22 GiB used, 1.0 TiB / 1.0 TiB avail; 9.9 KiB/s rd, 6.4 KiB/s wr, 9 op/s
2023-08-23T05:46:16.758 INFO:journalctl@ceph.mon.b.smithi153.stdout:Aug 23 05:46:16 smithi153 ceph-mon[130626]: pgmap v157: 97 pgs: 97 active+clean; 7.2 GiB data, 22 GiB used, 1.0 TiB / 1.0 TiB avail; 9.9 KiB/s rd, 6.4 KiB/s wr, 9 op/s
2023-08-23T05:46:16.780 INFO:journalctl@ceph.mon.c.smithi158.stdout:Aug 23 05:46:16 smithi158 ceph-mon[131093]: pgmap v157: 97 pgs: 97 active+clean; 7.2 GiB data, 22 GiB used, 1.0 TiB / 1.0 TiB avail; 9.9 KiB/s rd, 6.4 KiB/s wr, 9 op/s
2023-08-23T05:46:17.588 INFO:journalctl@ceph.mon.a.smithi033.stdout:Aug 23 05:46:17 smithi033 ceph-mon[97369]: Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)
2023-08-23T05:46:17.588 INFO:journalctl@ceph.mon.a.smithi033.stdout:Aug 23 05:46:17 smithi033 ceph-mon[97369]: mds.? [v2:172.21.15.158:6832/1576882046,v1:172.21.15.158:6833/1576882046] up:active
2023-08-23T05:46:17.589 INFO:journalctl@ceph.mon.a.smithi033.stdout:Aug 23 05:46:17 smithi033 ceph-mon[97369]: fsmap cephfs:5 5 up:standby-replay 2 up:standby 5 up:active
2023-08-23T05:46:17.589 INFO:journalctl@ceph.mon.a.smithi033.stdout:Aug 23 05:46:17 smithi033 ceph-mon[97369]: 1 slow requests, 1 included below; oldest blocked for > 184.901300 secs
2023-08-23T05:46:17.589 INFO:journalctl@ceph.mon.a.smithi033.stdout:Aug 23 05:46:17 smithi033 ceph-mon[97369]: slow request 184.901299 seconds old, received at 2023-08-23T05:43:12.158524+0000: client_request(client.25229:3464 link #0x10000000007/0000000100000000000000F1 #0x10000000007/0000000100000000000000C5 2023-08-23T05:43:12.157270+0000 caller_uid=26, caller_gid=26{26,}) currently acquired locks
2023-08-23T05:46:17.758 INFO:journalctl@ceph.mon.b.smithi153.stdout:Aug 23 05:46:17 smithi153 ceph-mon[130626]: Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)
2023-08-23T05:46:17.758 INFO:journalctl@ceph.mon.b.smithi153.stdout:Aug 23 05:46:17 smithi153 ceph-mon[130626]: mds.? [v2:172.21.15.158:6832/1576882046,v1:172.21.15.158:6833/1576882046] up:active
2023-08-23T05:46:17.758 INFO:journalctl@ceph.mon.b.smithi153.stdout:Aug 23 05:46:17 smithi153 ceph-mon[130626]: fsmap cephfs:5 5 up:standby-replay 2 up:standby 5 up:active
2023-08-23T05:46:17.759 INFO:journalctl@ceph.mon.b.smithi153.stdout:Aug 23 05:46:17 smithi153 ceph-mon[130626]: 1 slow requests, 1 included below; oldest blocked for > 184.901300 secs
2023-08-23T05:46:17.759 INFO:journalctl@ceph.mon.b.smithi153.stdout:Aug 23 05:46:17 smithi153 ceph-mon[130626]: slow request 184.901299 seconds old, received at 2023-08-23T05:43:12.158524+0000: client_request(client.25229:3464 link #0x10000000007/0000000100000000000000F1 #0x10000000007/0000000100000000000000C5 2023-08-23T05:43:12.157270+0000 caller_uid=26, caller_gid=26{26,}) currently acquired locks
2023-08-23T05:46:17.780 INFO:journalctl@ceph.mon.c.smithi158.stdout:Aug 23 05:46:17 smithi158 ceph-mon[131093]: Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)
2023-08-23T05:46:17.780 INFO:journalctl@ceph.mon.c.smithi158.stdout:Aug 23 05:46:17 smithi158 ceph-mon[131093]: mds.? [v2:172.21.15.158:6832/1576882046,v1:172.21.15.158:6833/1576882046] up:active
2023-08-23T05:46:17.781 INFO:journalctl@ceph.mon.c.smithi158.stdout:Aug 23 05:46:17 smithi158 ceph-mon[131093]: fsmap cephfs:5 5 up:standby-replay 2 up:standby 5 up:active
2023-08-23T05:46:17.781 INFO:journalctl@ceph.mon.c.smithi158.stdout:Aug 23 05:46:17 smithi158 ceph-mon[131093]: 1 slow requests, 1 included below; oldest blocked for > 184.901300 secs
2023-08-23T05:46:17.781 INFO:journalctl@ceph.mon.c.smithi158.stdout:Aug 23 05:46:17 smithi158 ceph-mon[131093]: slow request 184.901299 seconds old, received at 2023-08-23T05:43:12.158524+0000: client_request(client.25229:3464 link #0x10000000007/0000000100000000000000F1 #0x10000000007/0000000100000000000000C5 2023-08-23T05:43:12.157270+0000 caller_uid=26, caller_gid=26{26,}) currently acquired locks
2023-08-23T05:46:18.588 INFO:journalctl@ceph.mon.a.smithi033.stdout:Aug 23 05:46:18 smithi033 ceph-mon[97369]: pgmap v158: 97 pgs: 97 active+clean; 7.2 GiB data, 22 GiB used, 1.0 TiB / 1.0 TiB avail; 9.9 KiB/s rd, 6.4 KiB/s wr, 9 op/s
2023-08-23T05:46:18.589 INFO:journalctl@ceph.mon.a.smithi033.stdout:Aug 23 05:46:18 smithi033 ceph-mon[97369]: 1 slow requests, 0 included below; oldest blocked for > 185.777110 secs
2023-08-23T05:46:18.758 INFO:journalctl@ceph.mon.b.smithi153.stdout:Aug 23 05:46:18 smithi153 ceph-mon[130626]: pgmap v158: 97 pgs: 97 active+clean; 7.2 GiB data, 22 GiB used, 1.0 TiB / 1.0 TiB avail; 9.9 KiB/s rd, 6.4 KiB/s wr, 9 op/s
2023-08-23T05:46:18.758 INFO:journalctl@ceph.mon.b.smithi153.stdout:Aug 23 05:46:18 smithi153 ceph-mon[130626]: 1 slow requests, 0 included below; oldest blocked for > 185.777110 secs
2023-08-23T05:46:18.780 INFO:journalctl@ceph.mon.c.smithi158.stdout:Aug 23 05:46:18 smithi158 ceph-mon[131093]: pgmap v158: 97 pgs: 97 active+clean; 7.2 GiB data, 22 GiB used, 1.0 TiB / 1.0 TiB avail; 9.9 KiB/s rd, 6.4 KiB/s wr, 9 op/s

Actions #4

Updated by Xiubo Li 7 months ago

  • Status changed from New to Won't Fix

The logs the last time the link request went to sleep:

2023-08-23T05:43:12.179+0000 7f2ddfafd700  1 -- [v2:172.21.15.153:6838/4070841602,v1:172.21.15.153:6839/4070841602] <== mds.4 v2:172.21.15.158:6832/1576882046 808 ==== peer_request(client.25229:3464.0 xlock_ack) v1 ==== 187+0+0 (crc 0 0 0) 0x5600a902d880 con 0x5600a82a2c00
2023-08-23T05:43:12.179+0000 7f2ddfafd700  4 mds.0.server handle_peer_request client.25229:3464 from mds.4
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.cache request_get client.25229:3464 request(client.25229:3464 nref=3 cr=0x5600a8ff7500)
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.server got remote xlock on (ilink lock) on [inode 0x10000000534 [...3,head] ~mds4/stray2/10000000534 rep@4.1 v2945 snaprealm=0x5600a88698c0 s=16777216 n(v0 rc2023-08-23T05:42:52.319584+0000 b16777216 1=1+0) (iauth lock) (ilink lock) (ifile lock) (ixattr lock) (iversion lock) cr={25229=0-37748736@2} | request=1 lock=1 remoteparent=1 0x5600a92ea100]
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.server dispatch_client_request client_request(client.25229:3464 link #0x10000000007/0000000100000000000000F1 #0x10000000007/0000000100000000000000C5 2023-08-23T05:43:12.157270+0000 caller_uid=26, caller_gid=26{26,}) v6
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.server handle_client_link #0x10000000007/0000000100000000000000F1 to #0x10000000007/0000000100000000000000C5
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.server rdlock_two_paths_xlock_destdn request(client.25229:3464 nref=5 cr=0x5600a8ff7500) #0x10000000007/0000000100000000000000F1 #0x10000000007/0000000100000000000000C5
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.server handle_client_link link 0000000100000000000000F1 in [dir 0x10000000007.101* /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/ [2,head] auth{1=2,2=1,3=1,4=1} v=2934 cv=2931/2931 ap=0+7 state=1610874881|complete f(v5 m2023-08-23T05:43:02.934416+0000 10=10+0) n(v2 rc2023-08-23T05:43:02.934416+0000 b16777216 1=1+0) hs=10+9,ss=14+0 | child=1 frozen=0 replicated=1 dirty=1 waiter=0 authpin=0 0x5600a89bcd00]
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.server target is [inode 0x10000000534 [...3,head] ~mds4/stray2/10000000534 rep@4.1 v2945 snaprealm=0x5600a88698c0 s=16777216 n(v0 rc2023-08-23T05:42:52.319584+0000 b16777216 1=1+0) (iauth lock) (ilink lock x=1 by 0x5600a8817680) (ifile lock) (ixattr lock) (iversion lock) cr={25229=0-37748736@2} | request=1 lock=2 remoteparent=1 0x5600a92ea100]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker acquire_locks request(client.25229:3464 nref=5 cr=0x5600a8ff7500)
2023-08-23T05:43:12.179+0000 7f2ddfafd700 20 mds.0.locker  must xlock (isnap lock x=1 by 0x5600a8817680) [inode 0x10000000534 [...3,head] ~mds4/stray2/10000000534 rep@4.1 v2945 snaprealm=0x5600a88698c0 s=16777216 n(v0 rc2023-08-23T05:42:52.319584+0000 b16777216 1=1+0) (iauth lock) (ilink lock x=1 by 0x5600a8817680) (ifile lock) (ixattr lock) (iversion lock) cr={25229=0-37748736@2} | request=1 lock=2 remoteparent=1 0x5600a92ea100]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 20 mds.0.locker  must xlock (ilink lock x=1 by 0x5600a8817680) [inode 0x10000000534 [...3,head] ~mds4/stray2/10000000534 rep@4.1 v2945 snaprealm=0x5600a88698c0 s=16777216 n(v0 rc2023-08-23T05:42:52.319584+0000 b16777216 1=1+0) (iauth lock) (ilink lock x=1 by 0x5600a8817680) (ifile lock) (ixattr lock) (iversion lock) cr={25229=0-37748736@2} | request=1 lock=2 remoteparent=1 0x5600a92ea100]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker  must authpin [inode 0x10000000534 [...3,head] ~mds4/stray2/10000000534 rep@4.1 v2945 snaprealm=0x5600a88698c0 s=16777216 n(v0 rc2023-08-23T05:42:52.319584+0000 b16777216 1=1+0) (iauth lock) (ilink lock x=1 by 0x5600a8817680) (ifile lock) (ixattr lock) (iversion lock) cr={25229=0-37748736@2} | request=1 lock=2 remoteparent=1 0x5600a92ea100]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker  already auth_pinned [inode 0x10000000534 [...3,head] ~mds4/stray2/10000000534 rep@4.1 v2945 snaprealm=0x5600a88698c0 s=16777216 n(v0 rc2023-08-23T05:42:52.319584+0000 b16777216 1=1+0) (iauth lock) (ilink lock x=1 by 0x5600a8817680) (ifile lock) (ixattr lock) (iversion lock) cr={25229=0-37748736@2} | request=1 lock=2 remoteparent=1 0x5600a92ea100]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker  already xlocked (isnap lock x=1 by 0x5600a8817680) [inode 0x10000000534 [...3,head] ~mds4/stray2/10000000534 rep@4.1 v2945 snaprealm=0x5600a88698c0 s=16777216 n(v0 rc2023-08-23T05:42:52.319584+0000 b16777216 1=1+0) (iauth lock) (ilink lock x=1 by 0x5600a8817680) (ifile lock) (ixattr lock) (iversion lock) cr={25229=0-37748736@2} | request=1 lock=2 remoteparent=1 0x5600a92ea100]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker  already xlocked (ilink lock x=1 by 0x5600a8817680) [inode 0x10000000534 [...3,head] ~mds4/stray2/10000000534 rep@4.1 v2945 snaprealm=0x5600a88698c0 s=16777216 n(v0 rc2023-08-23T05:42:52.319584+0000 b16777216 1=1+0) (iauth lock) (ilink lock x=1 by 0x5600a8817680) (ifile lock) (ixattr lock) (iversion lock) cr={25229=0-37748736@2} | request=1 lock=2 remoteparent=1 0x5600a92ea100]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 20 Session check_access stray_prior_path /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/00000001000000000000008E
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 MDSAuthCap is_capable inode(path /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/00000001000000000000008E owner 26:26 mode 0100600) by caller 26:26 mask 2 new 0:0 cap: MDSAuthCaps[allow *]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 20 Session check_access path /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 MDSAuthCap is_capable inode(path /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal owner 26:26 mode 040700) by caller 26:26 mask 2 new 0:0 cap: MDSAuthCaps[allow *]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 20 mds.0.server fragment [dir 0x10000000007.101* /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/ [2,head] auth{1=2,2=1,3=1,4=1} v=2934 cv=2931/2931 ap=0+7 state=1610874881|complete f(v5 m2023-08-23T05:43:02.934416+0000 10=10+0) n(v2 rc2023-08-23T05:43:02.934416+0000 b16777216 1=1+0) hs=10+9,ss=14+0 | child=1 frozen=0 replicated=1 dirty=1 waiter=0 authpin=0 0x5600a89bcd00] size 10 < 10000
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10  mds.0.cache.snaprealm(0x104 seq 1 0x5600a8868d80) build_snap_set on snaprealm(0x104 seq 1 lc 0 cr 0 cps 1 snaps={} last_modified 0.000000 change_attr 0 0x5600a8868d80)
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10  mds.0.cache.snaprealm(0x104 seq 1 0x5600a8868d80) build_snap_trace my_snaps []
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10  mds.0.cache.snaprealm(0x104 seq 1 0x5600a8868d80) check_cache rebuilt  seq 1 cached_seq 1 cached_last_created 0 cached_last_destroyed 1)
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.locker rdlock_finish on (isnap sync) on [inode 0x10000000005 [...4,head] /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/ auth{1=1,2=1,3=1,4=1} v178 f(v0 m2023-08-23T05:41:10.439193+0000 1=0+1) n(v18 rc2023-08-23T05:43:03.017415+0000 b4358282429 1016=988+28) old_inodes=2 (inest mix) (ifile excl) (iversion lock) caps={25229=pAsLsXsFsx/pAsLsXsFsxcral@5},l=25229 | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x5600a7391600]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker simple_eval (isnap sync) on [inode 0x10000000005 [...4,head] /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/ auth{1=1,2=1,3=1,4=1} v178 f(v0 m2023-08-23T05:41:10.439193+0000 1=0+1) n(v18 rc2023-08-23T05:43:03.017415+0000 b4358282429 1016=988+28) old_inodes=2 (inest mix) (ifile excl) (iversion lock) caps={25229=pAsLsXsFsx/pAsLsXsFsxcral@5},l=25229 | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x5600a7391600]
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.locker rdlock_finish on (isnap sync) on [inode 0x1 [...4,head] / auth{1=1,2=1,3=1,4=1} v80 snaprealm=0x5600a74778c0 f(v0 m2023-08-23T05:39:07.617519+0000 1=0+1) n(v19 rc2023-08-23T05:43:02.934416+0000 b4643495335 rs2 1044=1009+35)/n(v0 rc2023-08-23T05:39:02.361049+0000 1=0+1) old_inodes=2 (inest mix dirty) (iversion lock) caps={15777=pAsLsXsFs/-@8} | dirtyscattered=1 request=0 lock=0 dirfrag=1 caps=1 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x5600a7392680]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker simple_eval (isnap sync) on [inode 0x1 [...4,head] / auth{1=1,2=1,3=1,4=1} v80 snaprealm=0x5600a74778c0 f(v0 m2023-08-23T05:39:07.617519+0000 1=0+1) n(v19 rc2023-08-23T05:43:02.934416+0000 b4643495335 rs2 1044=1009+35)/n(v0 rc2023-08-23T05:39:02.361049+0000 1=0+1) old_inodes=2 (inest mix dirty) (iversion lock) caps={15777=pAsLsXsFs/-@8} | dirtyscattered=1 request=0 lock=0 dirfrag=1 caps=1 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x5600a7392680]
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.locker rdlock_finish on (isnap sync) on [inode 0x10000000000 [...4,head] /volumes/ auth{1=1,2=1,3=1,4=1} v183 f(v0 m2023-08-23T05:39:09.153514+0000 3=2+1) n(v17 rc2023-08-23T05:43:02.934416+0000 b4643495335 1043=1009+34) old_inodes=2 (inest mix dirty) (iversion lock) | dirtyscattered=1 request=0 lock=0 dirfrag=1 caps=0 dirtyrstat=1 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x5600a82ff180]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker simple_eval (isnap sync) on [inode 0x10000000000 [...4,head] /volumes/ auth{1=1,2=1,3=1,4=1} v183 f(v0 m2023-08-23T05:39:09.153514+0000 3=2+1) n(v17 rc2023-08-23T05:43:02.934416+0000 b4643495335 1043=1009+34) old_inodes=2 (inest mix dirty) (iversion lock) | dirtyscattered=1 request=0 lock=0 dirfrag=1 caps=0 dirtyrstat=1 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x5600a82ff180]
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.locker rdlock_finish on (isnap sync) on [inode 0x10000000001 [...4,head] /volumes/qa/ auth{1=1,2=1,3=1,4=1} v187 f(v0 m2023-08-23T05:39:09.133871+0000 2=0+2) n(v17 rc2023-08-23T05:43:02.934416+0000 b4643495335 1040=1007+33) old_inodes=2 (inest mix dirty) (iversion lock) | dirtyscattered=1 request=0 lock=0 dirfrag=1 caps=0 dirtyrstat=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x5600a82ff700]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker simple_eval (isnap sync) on [inode 0x10000000001 [...4,head] /volumes/qa/ auth{1=1,2=1,3=1,4=1} v187 f(v0 m2023-08-23T05:39:09.133871+0000 2=0+2) n(v17 rc2023-08-23T05:43:02.934416+0000 b4643495335 1040=1007+33) old_inodes=2 (inest mix dirty) (iversion lock) | dirtyscattered=1 request=0 lock=0 dirfrag=1 caps=0 dirtyrstat=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x5600a82ff700]
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.locker rdlock_finish on (isnap sync) on [inode 0x100000001fa [...4,head] /volumes/qa/sv_1/ auth{1=1,2=1,3=1,4=1} v200 snaprealm=0x5600a8301200 f(v0 m2023-08-23T05:39:09.151900+0000 2=1+1) n(v14 rc2023-08-23T05:43:02.934416+0000 b4643495218 1036=1006+30) old_inodes=2 (inest mix dirty) (iversion lock) caps={25229=pAsLsXsFs/-@1},l=25229 | dirtyscattered=1 request=0 lock=0 dirfrag=1 caps=1 dirtyrstat=1 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x5600a8328c00]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker simple_eval (isnap sync) on [inode 0x100000001fa [...4,head] /volumes/qa/sv_1/ auth{1=1,2=1,3=1,4=1} v200 snaprealm=0x5600a8301200 f(v0 m2023-08-23T05:39:09.151900+0000 2=1+1) n(v14 rc2023-08-23T05:43:02.934416+0000 b4643495218 1036=1006+30) old_inodes=2 (inest mix dirty) (iversion lock) caps={25229=pAsLsXsFs/-@1},l=25229 | dirtyscattered=1 request=0 lock=0 dirfrag=1 caps=1 dirtyrstat=1 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x5600a8328c00]
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.locker rdlock_finish on (isnap sync) on [inode 0x100000001fb [...4,head] /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/ auth{1=1,2=1,3=1,4=1} v182 f(v0 m2023-08-23T05:41:10.284195+0000 1=0+1) n(v15 rc2023-08-23T05:43:03.017415+0000 b4358282429 1017=988+29) old_inodes=2 (inest mix dirty) (iversion lock) caps={25229=pAsLsXsFs/pAsLsXsFsxcral@9} mcw={1=AsLsXsFs} | dirtyscattered=1 request=0 lock=0 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x5600a8329180]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker simple_eval (isnap sync) on [inode 0x100000001fb [...4,head] /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/ auth{1=1,2=1,3=1,4=1} v182 f(v0 m2023-08-23T05:41:10.284195+0000 1=0+1) n(v15 rc2023-08-23T05:43:03.017415+0000 b4358282429 1017=988+29) old_inodes=2 (inest mix dirty) (iversion lock) caps={25229=pAsLsXsFs/pAsLsXsFsxcral@9} mcw={1=AsLsXsFs} | dirtyscattered=1 request=0 lock=0 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x5600a8329180]
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.locker rdlock_finish on (isnap sync) on [inode 0x10000000006 [...4,head] /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/ auth{1=1,2=1,3=1,4=1} v356 f(v0 m2023-08-23T05:41:17.958074+0000 26=8+18) n(v16 rc2023-08-23T05:43:03.017415+0000 b4358282429 1015=988+27) old_inodes=2 (inest mix) (ifile excl) (iversion lock) caps={25229=pAsLsXsFsx/pAsLsXsFsxcral@47},l=25229 | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x5600a84e4580]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker simple_eval (isnap sync) on [inode 0x10000000006 [...4,head] /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/ auth{1=1,2=1,3=1,4=1} v356 f(v0 m2023-08-23T05:41:17.958074+0000 26=8+18) n(v16 rc2023-08-23T05:43:03.017415+0000 b4358282429 1015=988+27) old_inodes=2 (inest mix) (ifile excl) (iversion lock) caps={25229=pAsLsXsFsx/pAsLsXsFsxcral@47},l=25229 | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x5600a84e4580]
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.locker wrlock_finish on (ifile mix w=1) on [inode 0x10000000007 [...4,head] /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/ auth{1=1,2=1,3=2,4=2} fragtree_t(*^3) v4048 ap=1 f(v5 m2023-08-23T05:43:11.618279+0000 66=65+1) n(v2 rc2023-08-23T05:43:03.017415+0000 b352321536 23=21+2) old_inodes=2 (isnap sync r=1) (inest mix w=1) (ifile mix w=1) (iversion lock) caps={25229=pAsLsXs/pAsLsXsFsxcral@2269} mcw={1=AsLsXsFs,2=AsLsXsFs,3=AsLsXsFs,4=AsLsXsFs} | dirtyscattered=0 request=0 lock=3 dirfrag=8 caps=1 dirtyrstat=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 scrubqueue=0 0x5600a84e4b00]
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.locker file_eval wanted=sxcral loner_wanted= other_wanted=sxcral  filelock=(ifile mix) on [inode 0x10000000007 [...4,head] /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/ auth{1=1,2=1,3=2,4=2} fragtree_t(*^3) v4048 ap=1 f(v5 m2023-08-23T05:43:11.618279+0000 66=65+1) n(v2 rc2023-08-23T05:43:03.017415+0000 b352321536 23=21+2) old_inodes=2 (isnap sync r=1) (inest mix w=1) (ifile mix) (iversion lock) caps={25229=pAsLsXs/pAsLsXsFsxcral@2269} mcw={1=AsLsXsFs,2=AsLsXsFs,3=AsLsXsFs,4=AsLsXsFs} | dirtyscattered=0 request=0 lock=2 dirfrag=8 caps=1 dirtyrstat=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 scrubqueue=0 0x5600a84e4b00]
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.locker rdlock_finish on (isnap sync) on [inode 0x10000000007 [...4,head] /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/ auth{1=1,2=1,3=2,4=2} fragtree_t(*^3) v4048 ap=1 f(v5 m2023-08-23T05:43:11.618279+0000 66=65+1) n(v2 rc2023-08-23T05:43:03.017415+0000 b352321536 23=21+2) old_inodes=2 (inest mix w=1) (ifile mix) (iversion lock) caps={25229=pAsLsXs/pAsLsXsFsxcral@2269} mcw={1=AsLsXsFs,2=AsLsXsFs,3=AsLsXsFs,4=AsLsXsFs} | dirtyscattered=0 request=0 lock=1 dirfrag=8 caps=1 dirtyrstat=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 scrubqueue=0 0x5600a84e4b00]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker simple_eval (isnap sync) on [inode 0x10000000007 [...4,head] /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/ auth{1=1,2=1,3=2,4=2} fragtree_t(*^3) v4048 ap=1 f(v5 m2023-08-23T05:43:11.618279+0000 66=65+1) n(v2 rc2023-08-23T05:43:03.017415+0000 b352321536 23=21+2) old_inodes=2 (inest mix w=1) (ifile mix) (iversion lock) caps={25229=pAsLsXs/pAsLsXsFsxcral@2269} mcw={1=AsLsXsFs,2=AsLsXsFs,3=AsLsXsFs,4=AsLsXsFs} | dirtyscattered=0 request=0 lock=1 dirfrag=8 caps=1 dirtyrstat=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 scrubqueue=0 0x5600a84e4b00]
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.locker wrlock_finish on (inest mix w=1) on [inode 0x10000000007 [...4,head] /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/ auth{1=1,2=1,3=2,4=2} fragtree_t(*^3) v4048 ap=1 f(v5 m2023-08-23T05:43:11.618279+0000 66=65+1) n(v2 rc2023-08-23T05:43:03.017415+0000 b352321536 23=21+2) old_inodes=2 (inest mix w=1) (ifile mix) (iversion lock) caps={25229=pAsLsXs/pAsLsXsFsxcral@2269} mcw={1=AsLsXsFs,2=AsLsXsFs,3=AsLsXsFs,4=AsLsXsFs} | dirtyscattered=0 request=0 lock=1 dirfrag=8 caps=1 dirtyrstat=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 scrubqueue=0 0x5600a84e4b00]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker scatter_eval (inest mix) on [inode 0x10000000007 [...4,head] /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/ auth{1=1,2=1,3=2,4=2} fragtree_t(*^3) v4048 ap=1 f(v5 m2023-08-23T05:43:11.618279+0000 66=65+1) n(v2 rc2023-08-23T05:43:03.017415+0000 b352321536 23=21+2) old_inodes=2 (inest mix) (ifile mix) (iversion lock) caps={25229=pAsLsXs/pAsLsXsFsxcral@2269} mcw={1=AsLsXsFs,2=AsLsXsFs,3=AsLsXsFs,4=AsLsXsFs} | dirtyscattered=0 request=0 lock=0 dirfrag=8 caps=1 dirtyrstat=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 discoverbase=0 scrubqueue=0 0x5600a84e4b00]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker xlock_finish on (dn xlock x=1 by 0x5600a8817680) [dentry #0x1/volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/0000000100000000000000F1 [4,head] auth NULL (dn xlock x=1 by 0x5600a8817680) (dversion lock w=1 last_client=25229) pv=0 v=2934 ap=2 ino=(nil) state=1073741824 | request=1 lock=2 authpin=1 0x5600a87b0c80]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker eval_gather (dn xlock) on [dentry #0x1/volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/0000000100000000000000F1 [4,head] auth NULL (dn xlock) (dversion lock w=1 last_client=25229) pv=0 v=2934 ap=2 ino=(nil) state=1073741824 | request=1 lock=1 authpin=1 0x5600a87b0c80]
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.locker eval_gather finished gather on (dn xlock) on [dentry #0x1/volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/0000000100000000000000F1 [4,head] auth NULL (dn xlock) (dversion lock w=1 last_client=25229) pv=0 v=2934 ap=2 ino=(nil) state=1073741824 | request=1 lock=1 authpin=1 0x5600a87b0c80]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.cache.den(0x10000000007.101* 0000000100000000000000F1) auth_unpin by 0x5600a87b0d68 on [dentry #0x1/volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/0000000100000000000000F1 [4,head] auth NULL (dversion lock w=1 last_client=25229) pv=0 v=2934 ap=1 ino=(nil) state=1073741824 | request=1 lock=1 authpin=1 0x5600a87b0c80] now 1
2023-08-23T05:43:12.179+0000 7f2ddfafd700 15 mds.0.cache.dir(0x10000000007.101*) adjust_nested_auth_pins -1 on [dir 0x10000000007.101* /volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/ [2,head] auth{1=2,2=1,3=1,4=1} v=2934 cv=2931/2931 ap=0+6 state=1610874881|complete f(v5 m2023-08-23T05:43:02.934416+0000 10=10+0) n(v2 rc2023-08-23T05:43:02.934416+0000 b16777216 1=1+0) hs=10+9,ss=14+0 | child=1 frozen=0 replicated=1 dirty=1 waiter=0 authpin=0 0x5600a89bcd00] by 0x5600a87b0d68 count now 0/6
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker simple_eval (dn sync) on [dentry #0x1/volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/0000000100000000000000F1 [4,head] auth NULL (dversion lock w=1 last_client=25229) pv=0 v=2934 ap=1 ino=(nil) state=1073741824 | request=1 lock=1 authpin=1 0x5600a87b0c80]
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.locker local_wrlock_finish  on (dversion lock w=1 last_client=25229) on [dentry #0x1/volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/0000000100000000000000F1 [4,head] auth NULL (dversion lock w=1 last_client=25229) pv=0 v=2934 ap=1 ino=(nil) state=1073741824 | request=1 lock=1 authpin=1 0x5600a87b0c80]
2023-08-23T05:43:12.179+0000 7f2ddfafd700  7 mds.0.locker rdlock_finish on (dn sync) on [dentry #0x1/volumes/qa/sv_1/37334fc9-0706-434b-a409-057396d439e4/postgres/data/pg_wal/0000000100000000000000C5 [3,head] rep@3.1 REMOTE(reg) (dversion lock) pv=0 v=2823 ino=0x10000000534 state=0 0x5600a89f0c80]
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.locker _drop_non_rdlocks dropping remote locks on mds.4
2023-08-23T05:43:12.179+0000 7f2ddfafd700  1 -- [v2:172.21.15.153:6838/4070841602,v1:172.21.15.153:6839/4070841602] send_to--> mds [v2:172.21.15.158:6832/1576882046,v1:172.21.15.158:6833/1576882046] -- peer_request(client.25229:3464.0 drop_locks) v1 -- ?+0 0x5600a8b43880
2023-08-23T05:43:12.179+0000 7f2ddfafd700  1 -- [v2:172.21.15.153:6838/4070841602,v1:172.21.15.153:6839/4070841602] --> [v2:172.21.15.158:6832/1576882046,v1:172.21.15.158:6833/1576882046] -- peer_request(client.25229:3464.0 drop_locks) v1 -- 0x5600a8b43880 con 0x5600a82a2c00
2023-08-23T05:43:12.179+0000 7f2ddfafd700 10 mds.0.cache.ino(0x10000000534) add_waiter tag 10 0x5600a81dc7a0 !ambig 1 !frozen 1 !freezing 1
2023-08-23T05:43:12.179+0000 7f2ddfafd700 15 mds.0.cache.ino(0x10000000534) taking waiter here

And it was caused by: https://github.com/ceph/ceph-ci/blob/wip-vshankar-testing-20230822.060131/src/mds/Server.cc#L7396C35-L7396C46, which the corresponding commits have been reverted already. It makes no sense to debug it any further.

Actions

Also available in: Atom PDF