Bug #65603
openmds: quiesce timeout due to a freezing directory
0%
Description
Analyzing one of the ETIMEDOUT error for a quiesce, looking at
https://pulpito.ceph.com/leonidus-2024-04-21_11:37:13-fs-wip-lusov-quiescer-distro-default-smithi/7666603/
The problem was with the following quiesce request:
2024-04-21T12:04:11.496 INFO:teuthology.orchestra.run.smithi148.stderr:2024-04-21T12:04:11.493+0000 7f49bd39f740 4 client.5370 mds_command: new command op to 4282 tid=0 multi_id=0 [{"prefix": "quiesce db", "roots": ["/"], "timeout": 60.0, "expiration": 90.0, "await": true}] 2024-04-21T12:04:11.496 INFO:teuthology.orchestra.run.smithi148.stderr:2024-04-21T12:04:11.493+0000 7f49bd39f740 1 -- 172.21.15.148:0/714775596 --> [v2:172.21.15.157:6833/2409369305,v1:172.21.15.157:6835/2409369305] -- command(tid 0: {"prefix": "quiesce db", "roots": ["/"], "timeout": 60.0, "expiration": 90.0, "await": true}) -- 0x556bb10c3980 con 0x556bb116fbc0 2024-04-21T12:05:11.496 INFO:teuthology.orchestra.run.smithi148.stderr:2024-04-21T12:05:11.495+0000 7f49bd39f740 1 -- 172.21.15.148:0/714775596 shutdown_connections 2024-04-21T12:05:11.496 INFO:teuthology.orchestra.run.smithi148.stderr:2024-04-21T12:05:11.495+0000 7f49bd39f740 1 -- 172.21.15.148:0/714775596 wait complete. 2024-04-21T12:05:11.496 INFO:teuthology.orchestra.run.smithi148.stderr:2024-04-21T12:05:11.495+0000 7f49bd39f740 20 client.5370 trim_cache size 0 max 16384 2024-04-21T12:05:11.496 INFO:teuthology.orchestra.run.smithi148.stdout:{ 2024-04-21T12:05:11.496 INFO:teuthology.orchestra.run.smithi148.stdout: "epoch": 9, 2024-04-21T12:05:11.496 INFO:teuthology.orchestra.run.smithi148.stdout: "leader": 4282, 2024-04-21T12:05:11.496 INFO:teuthology.orchestra.run.smithi148.stdout: "set_version": 22, 2024-04-21T12:05:11.496 INFO:teuthology.orchestra.run.smithi148.stdout: "sets": { 2024-04-21T12:05:11.496 INFO:teuthology.orchestra.run.smithi148.stdout: "75b48996": { 2024-04-21T12:05:11.497 INFO:teuthology.orchestra.run.smithi148.stdout: "version": 22, 2024-04-21T12:05:11.497 INFO:teuthology.orchestra.run.smithi148.stdout: "age_ref": 0.0, 2024-04-21T12:05:11.497 INFO:teuthology.orchestra.run.smithi148.stdout: "state": { 2024-04-21T12:05:11.497 INFO:teuthology.orchestra.run.smithi148.stdout: "name": "TIMEDOUT", 2024-04-21T12:05:11.497 INFO:teuthology.orchestra.run.smithi148.stdout: "age": 0.0 2024-04-21T12:05:11.497 INFO:teuthology.orchestra.run.smithi148.stdout: }, 2024-04-21T12:05:11.497 INFO:teuthology.orchestra.run.smithi148.stdout: "timeout": 60.0, 2024-04-21T12:05:11.497 INFO:teuthology.orchestra.run.smithi148.stdout: "expiration": 90.0, 2024-04-21T12:05:11.497 INFO:teuthology.orchestra.run.smithi148.stdout: "members": { 2024-04-21T12:05:11.497 INFO:teuthology.orchestra.run.smithi148.stdout: "file:/": { 2024-04-21T12:05:11.497 INFO:teuthology.orchestra.run.smithi148.stdout: "excluded": false, 2024-04-21T12:05:11.497 INFO:teuthology.orchestra.run.smithi148.stdout: "state": { 2024-04-21T12:05:11.497 INFO:teuthology.orchestra.run.smithi148.stdout: "name": "QUIESCING", 2024-04-21T12:05:11.497 INFO:teuthology.orchestra.run.smithi148.stdout: "age": 60.0 2024-04-21T12:05:11.497 INFO:teuthology.orchestra.run.smithi148.stdout: } 2024-04-21T12:05:11.498 INFO:teuthology.orchestra.run.smithi148.stdout: } 2024-04-21T12:05:11.498 INFO:teuthology.orchestra.run.smithi148.stdout: } 2024-04-21T12:05:11.498 INFO:teuthology.orchestra.run.smithi148.stdout: } 2024-04-21T12:05:11.498 INFO:teuthology.orchestra.run.smithi148.stdout: } 2024-04-21T12:05:11.498 INFO:teuthology.orchestra.run.smithi148.stdout:} 2024-04-21T12:05:11.498 DEBUG:teuthology.orchestra.run:got remote process result: 110 2024-04-21T12:05:11.499 INFO:teuthology.orchestra.run.smithi148.stderr:Error ETIMEDOUT: 2024-04-21T12:05:11.499 ERROR:tasks.quiescer.fs.[cephfs]:Couldn't quiesce root with rc: -110 (ETIMEDOUT), stdout:
Extracting the information from the logs and deducing incomplete requests yields:
[ 0x1] since 2024-04-21 12:04:11.493000: / [0x10000000000] since 2024-04-21 12:04:11.494000: /client.0/ [0x100000017b0] since 2024-04-21 12:04:11.494000: /client.0/tmp/ [0x100000017b1] since 2024-04-21 12:04:11.495000: /client.0/tmp/ceph/ [0x10000002fe1] since 2024-04-21 12:04:11.496000: /client.0/tmp/ceph/src/ [0x10000003733] since 2024-04-21 12:04:11.504000: /client.0/tmp/ceph/src/libcephfs.cc [0x10000003167] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph_common.sh [0x100000035ca] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/dmclock/ [0x10000003043] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/btrfs_ioc_test.c [0x1000000305a] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph [0x10000002fe7] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/TODO [0x10000003049] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph [0x10000003419] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/crimson/ [0x100000036ca] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/isa [0x10000003041] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/blkin/ [0x10000003a99] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/mgr/ [0x10000003b8f] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/mypy.ini [0x10000003b9f] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/os/ [0x10000003b8b] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/mstart.sh [0x10000003615] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/exporter/ [0x100000036cb] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/jaegertracing/ [0x100000036c9] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/init [0x10000003171] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/cephadm/ [0x10000003047] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph [0x100000035e3] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/erasure [0x10000003726] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/kv/ [0x100000036c8] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/init [0x100000035cb] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/doc/ [0x100000033f6] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/compressor/ [0x10000003b8c] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/mstop.sh [0x10000003614] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/etc [0x10000003635] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/googletest/ [0x10000003734] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/libcephsqlite.cc [0x10000003956] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/loadclass.sh [0x10000003b91] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/neorados/ [0x10000003165] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph.conf.twoosds [0x10000003bde] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/osd/ [0x100000036f2] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/journal/ [0x10000003045] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph [0x10000003b1c] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/mount.fuse.ceph [0x10000003029] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/bash_completion/ [0x10000003b90] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/nasm [0x10000003597] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/crush/ [0x10000002ff3] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/auth/ [0x10000003725] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/krbd.cc [0x10000003b9b] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ocf/ [0x10000003636] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/include/ [0x10000003168] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph_fuse.cc [0x10000003292] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/cls_acl.cc [0x10000003042] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/boost_redis/ [0x10000003b1d] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/mount/ [0x1000000304a] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph [0x10000002ff2] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/arrow/ [0x10000003754] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/librbd/ [0x10000003736] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/librados [0x10000003b99] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/objsync/ [0x1000000316c] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph_osd.cc [0x10000003294] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/cmonctl [0x1000000374d] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/libradosstriper/ [0x10000003044] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/c [0x1000000395f] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/mds/ [0x1000000361e] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/extblkdev/ [0x1000000320a] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/cls/ [0x10000003735] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/libkmip/ [0x1000000316e] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph_syn.cc [0x1000000316b] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph_mon.cc [0x10000002fe8] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/arch/ [0x10000002fe3] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/CMakeLists.txt [0x1000000370f] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/json_spirit/ [0x1000000305b] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph [0x10000003b24] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/mrgw.sh [0x10000003b8e] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/mypy [0x1000000371e] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/key_value_store/ [0x10000002fe5] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/SimpleRADOSStriper.cc [0x1000000305d] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph [0x10000003b96] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/objclass/ [0x100000031e3] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ckill.sh [0x1000000316f] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph_ver.c [0x10000003ade] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/mon/ [0x10000003048] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph [0x1000000316a] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph_mgr.cc [0x10000002fe6] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/SimpleRADOSStriper.h [0x100000035c9] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/cstart.sh [0x100000031e4] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/client/ [0x10000002fe4] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/README [0x10000003b26] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/msg/ [0x10000003737] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/librados/ [0x10000003b8d] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/multi [0x10000003169] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph_mds.cc [0x1000000302f] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/blk/ [0x10000003295] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/common/ [0x10000003629] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/fmt/ [0x10000002fe2] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/.gitignore [0x1000000305c] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph [0x10000003957] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/log/ [0x10000003166] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph.in [0x10000003293] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/cls_crypto.cc [0x1000000395e] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/logrotate.conf [0x100000039ec] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/messages/ [0x1000000362a] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/global/ [0x100000035b0] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/crypto/ [0x100000036ce] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/java/ [0x100000035da] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/dokan/ [0x10000003b25] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/mrun [0x10000003046] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph [0x10000003059] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph [0x1000000316d] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph_release [0x10000003170] since 2024-04-21 12:04:11.505000: /client.0/tmp/ceph/src/ceph_ver.h.in.cmake total pending: 108 complete (min/max/avg/stdev): 6175 (0.0/4.282/0.71/0.46)
We can see that all of the pending quiesces are under @[0x10000002fe1] /client.0/tmp/ceph/src
@. All inodes under the directory complain about the inability to authpin:
2024-04-21T12:04:11.506+0000 7fbb7d1d4640 10 mds.0.locker can't auth_pin (freezing?), waiting to authpin [inode 0x10000003733 [1fa,head] /client.0/tmp/ceph/src/libcephfs.cc auth v4191 DIRTYPARENT s=65791 n(v0 rc2024-04-21T12:03:56.550134+0000 b65791 1=1+0) caps={4753=pAsLsXsFsc/-@2},l=4753(-1) | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=1 dirty=0 authpin=0 0x558d7a5ba680]
The parent dir appears to be frozen:
2024-04-21T12:04:11.603+0000 7fbb769c7640 10 mds.0.cache.dir(0x10000002fe1) _freeze_dir [dir 0x10000002fe1 /client.0/tmp/ceph/src/ [2,head] auth v=5484 cv=5322/5322 ap=1+0 state=1610895377|complete|freezingdir|fragmenting|dnpinnedfrag f(v0 m2024-04-21T12:04:09.612818+0000 103=51+52) n(v4 rc2024-04-21T12:04:10.288801+0000 b23531464 3168=2900+268)/n(v4 rc2024-04-21T12:04:09.612818+0000 b23531464 3065=2799+266) hs=103+2,ss=0+0 dirty=2 | child=1 dirty=1 waiter=1 authpin=1 0x558d780fe400]
Updated by Leonid Usov 13 days ago · Edited
The directory appears to be fragmenting, as we see from a few messages in the log
state=1610895377|complete|freezingdir|fragmenting|dnpinnedfrag: 2024-04-21T12:04:11.603+0000 7fbb769c7640 10 mds.0.cache.dir(0x10000002fe1) _mark_dirty (already dirty) [dir 0x10000002fe1 /client.0/tmp/ceph/src/ [2,head] auth pv=5484 v=5482 cv=5322/5322 ap=2+1 state=1610895377|complete|freezingdir|fragmenting|dnpinnedfrag f(v0 m2024-04-21T12:04:09.612818+0000 103=51+52)->f(v0 m2024-04-21T12:04:09.612818+0000 103=51+52) n(v4 rc2024-04-21T12:04:10.260802+0000 b23531464 3146=2878+268)/n(v4 rc2024-04-21T12:04:09.612818+0000 b23531464 3065=2799+266)->n(v4 rc2024-04-21T12:04:10.288801+0000 b23531464 3168=2900+268)/n(v4 rc2024-04-21T12:04:09.612818+0000 b23531464 3065=2799+266) hs=103+2,ss=0+0 dirty=2 | child=1 dirty=1 waiter=1 authpin=1 0x558d780fe400] version 5482 ... state=1610895369|complete|frozendir|fragmenting|dnpinnedfrag: 2024-04-21T12:04:15.279+0000 7fbb7b9d1640 7 mds.0.cache can_fragment: already fragmenting [dir 0x10000002fe1 /client.0/tmp/ceph/src/ [2,head] auth v=5484 cv=5322/5322 state=1610895369|complete|frozendir|fragmenting|dnpinnedfrag f(v0 m2024-04-21T12:04:09.612818+0000 103=51+52) n(v4 rc2024-04-21T12:04:10.288801+0000 b23531464 3168=2900+268)/n(v4 rc2024-04-21T12:04:09.612818+0000 b23531464 3065=2799+266) hs=103+2,ss=0+0 dirty=2 | child=1 frozen=1 dirty=1 waiter=1 authpin=0 0x558d780fe400]
Updated by Leonid Usov 13 days ago
leonidus@teuthology:/a/leonidus-2024-04-21_11:37:13-fs-wip-lusov-quiescer-distro-default-smithi/7666602/remote/smithi087/log$ zgrep -n 97a5832d ceph-mds.d.log 30034137:2024-04-21T12:10:45.860+0000 7fb9f60ef640 15 quiesce.mgr.4300 <leader_update_set> [97a5832d@0] updated to match the min state of the remaining (1) members: QS_QUIESCING (1) ... 30347412:2024-04-21T12:11:45.861+0000 7fb9f60ef640 10 quiesce.mgr.4300 <leader_upkeep_set> [97a5832d@45,file:/] detected a member quiesce timeout 30347413:2024-04-21T12:11:45.861+0000 7fb9f60ef640 10 quiesce.mgr.4300 <leader_upkeep_awaits> completing an await for the set '97a5832d' with rc: 110
[ 0x1] since 2024-04-21 12:10:45.860000: / [0x10000000000] since 2024-04-21 12:10:45.861000: /client.0/ [0x100000017b0] since 2024-04-21 12:10:45.862000: /client.0/tmp/ [0x100000017b1] since 2024-04-21 12:10:45.862000: /client.0/tmp/ceph/ [0x10000002fe2] since 2024-04-21 12:10:45.863000: /client.0/tmp/ceph/src/ [0x10000001889] since 2024-04-21 12:10:45.863000: /client.0/tmp/ceph/debian/ [0x100000057d8] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/cmonctl [0x100000052a3] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/jobs/ [0x10000005a0b] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/mkmonfs.cc [0x10000003960] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/mds/ [0x10000005a66] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/crushtool.cc [0x10000003c46] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/osdc/ [0x10000005cd2] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/Makefile.am [0x10000005a46] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/barclass.cc [0x10000005cee] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/csyn.cc [0x10000005cef] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/dumpjournal.cc [0x10000005ad3] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/objclass/ [0x100000058be] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/ceph_ver.c [0x10000005a6a] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/fooclass.cc [0x10000005a69] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/dupstore.cc [0x10000005ca5] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/psim.cc [0x10000005b7b] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/mkcephfs.in [0x10000005cd4] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/ceph.cc [0x10000005209] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/Makefile.orig [0x10000003ade] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/mon/ [0x100000058a5] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/verify [0x10000005ca7] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/radosgw/ [0x10000003b26] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/msg/ [0x100000039ed] since 2024-04-21 12:10:45.873000: /client.0/tmp/ceph/src/messages/ [0x10000005cc3] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/changelog [0x10000005968] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/ceph.install [0x10000005b07] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/rules [0x100000055f1] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/ceph [0x10000005cc5] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/libceph [0x10000005a43] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/rules.modules.in [0x100000051f3] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/dirs [0x100000051f6] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/emacsen [0x10000005202] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/prerm.ex [0x100000051ec] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/ceph.doc [0x100000058b3] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/ceph.docs [0x100000051fe] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/menu.ex [0x10000005200] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/postrm.ex [0x10000005967] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/ceph [0x100000058b1] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/ceph [0x10000005201] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/preinst.ex [0x10000005cc6] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/libceph [0x1000000596b] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/copyright [0x100000051f5] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/emacsen [0x100000051ef] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/compat [0x10000005ccb] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/libcrush.dirs [0x100000051fb] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/manpage.1.ex [0x10000005cc8] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/libceph.install [0x100000051fa] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/init.d.lsb.ex [0x100000051f2] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/cron.d.ex [0x100000051f4] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/docs [0x10000005cc9] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/libcrush [0x10000005cce] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/librados [0x100000058b4] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/ceph.init [0x10000005ccd] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/librados [0x10000005cc4] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/control [0x10000005ccf] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/librados.dirs [0x10000005cca] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/libcrush [0x10000005204] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/watch.ex [0x10000005cc7] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/libceph.dirs [0x10000005ccc] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/libcrush.install [0x100000058b2] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/ceph.dirs [0x100000051ff] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/postinst.ex [0x100000051e9] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/ceph [0x100000055ee] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/README.ceph [0x100000051fd] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/manpage.xml.ex [0x100000058b0] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/ceph [0x10000005600] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/control.modules.in [0x100000051f9] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/init.d.ex [0x10000005cd0] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/librados.install [0x100000051f7] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/emacsen [0x100000051fc] since 2024-04-21 12:10:45.888000: /client.0/tmp/ceph/debian/manpage.sgml.ex [0x10000005d3d] since 2024-04-21 12:10:45.971000: /client.0/tmp/ceph/src/msg/SimpleMessenger.cc [0x10000005d3a] since 2024-04-21 12:10:45.971000: /client.0/tmp/ceph/src/msg/Message.cc [0x10000005394] since 2024-04-21 12:10:45.971000: /client.0/tmp/ceph/src/msg/tcp.cc [0x10000005d3b] since 2024-04-21 12:10:45.971000: /client.0/tmp/ceph/src/msg/Message.h [0x1000000579f] since 2024-04-21 12:10:45.971000: /client.0/tmp/ceph/src/msg/Messenger.cc [0x1000000579c] since 2024-04-21 12:10:45.971000: /client.0/tmp/ceph/src/msg/FakeMessenger.cc [0x10000005d3e] since 2024-04-21 12:10:45.971000: /client.0/tmp/ceph/src/msg/SimpleMessenger.h [0x10000005389] since 2024-04-21 12:10:45.971000: /client.0/tmp/ceph/src/msg/Dispatcher.cc [0x10000005d39] since 2024-04-21 12:10:45.971000: /client.0/tmp/ceph/src/msg/Dispatcher.h [0x10000005d3f] since 2024-04-21 12:10:45.971000: /client.0/tmp/ceph/src/msg/msg_types.h [0x10000005395] since 2024-04-21 12:10:45.971000: /client.0/tmp/ceph/src/msg/tcp.h [0x10000005d3c] since 2024-04-21 12:10:45.971000: /client.0/tmp/ceph/src/msg/Messenger.h [0x100000055b9] since 2024-04-21 12:10:45.971000: /client.0/tmp/ceph/src/msg/FakeMessenger.h [0x10000005b60] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MGetPoolStatsReply.h [0x10000005583] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MMonElection.h [0x1000000568c] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MClientSnap.h [0x10000005c8e] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MPoolOpReply.h [0x10000005861] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MOSDPing.h [0x10000005784] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MOSDPGRemove.h [0x10000005b6b] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MOSDBoot.h [0x10000005abb] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MOSDSubOpReply.h [0x10000005a00] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MLock.h [0x10000005d22] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MClientReply.h [0x10000005b6f] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MOSDMap.h [0x10000005b65] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MMonCommand.h [0x1000000584f] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MExportDirWarning.h [0x10000005b79] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MStatfsReply.h [0x10000005b66] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MMonCommandAck.h [0x10000005b56] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MClass.h [0x10000005d29] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MRoute.h [0x1000000584d] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MExportDirNotify.h [0x10000005697] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MMDSTableRequest.h [0x10000005866] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MPGStatsAck.h [0x1000000569c] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MOSDPGNotify.h [0x10000005575] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MHeartbeat.h [0x10000005b6c] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MOSDFailure.h [0x10000005b68] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MMonObserveNotify.h [0x10000005d25] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MMonSubscribe.h [0x10000005574] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MGenericMessage.h [0x10000005b5a] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MClientReconnect.h [0x10000005b78] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MStatfs.h [0x10000005b6a] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MOSDAlive.h [0x10000005b72] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MOSDScrub.h [0x10000005b59] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MClientMount.h [0x10000005560] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MClientRequestForward.h [0x10000005d23] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MExportCaps.h [0x10000005d27] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MOSDOp.h [0x1000000584b] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MExportDirDiscover.h [0x10000005849] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MDiscoverReply.h [0x10000005b64] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MMDSMap.h [0x10000005576] since 2024-04-21 12:10:45.988000: /client.0/tmp/ceph/src/messages/MInodeFileCaps.h [0x10000005b6d] since 2024-04-21 12:10:45.989000: /client.0/tmp/ceph/src/messages/MOSDGetMap.h [0x10000005b73] since 2024-04-21 12:10:45.989000: /client.0/tmp/ceph/src/messages/MOSDSubOp.h [0x10000005567] since 2024-04-21 12:10:45.989000: /client.0/tmp/ceph/src/messages/MExportCapsAck.h [0x10000005b71] since 2024-04-21 12:10:45.989000: /client.0/tmp/ceph/src/messages/MOSDOut.h [0x10000005848] since 2024-04-21 12:10:45.989000: /client.0/tmp/ceph/src/messages/MDiscover.h [0x10000005b63] since 2024-04-21 12:10:45.989000: /client.0/tmp/ceph/src/messages/MMDSGetMap.h [0x10000005b7a] since 2024-04-21 12:10:45.989000: /client.0/tmp/ceph/src/messages/PaxosServiceMessage.h [0x10000005579] since 2024-04-21 12:10:45.989000: /client.0/tmp/ceph/src/messages/MMDSBoot.h [0x10000005b57] since 2024-04-21 12:10:45.989000: /client.0/tmp/ceph/src/messages/MClassAck.h [0x10000005690] since 2024-04-21 12:10:45.989000: /client.0/tmp/ceph/src/messages/MExportDir.h [0x10000005b77] since 2024-04-21 12:10:45.989000: /client.0/tmp/ceph/src/messages/MRemoveSnaps.h [0x10000005d26] since 2024-04-21 12:10:45.989000: /client.0/tmp/ceph/src/messages/MMonSubscribeAck.h [0x10000005591] since 2024-04-21 12:10:45.989000: /client.0/tmp/ceph/src/messages/MOSDPGLog.h [0x1000000584a] since 2024-04-21 12:10:45.989000: /client.0/tmp/ceph/src/messages/MExportDirCancel.h [0x10000005a04] since 2024-04-21 12:10:45.989000: /client.0/tmp/ceph/src/messages/MMDSCacheRejoin.h total pending: 142 complete (min/max/avg/stdev): 1029 (0.0/1.921/0.23/0.31)
Updated by Patrick Donnelly 12 days ago
- Assignee set to Patrick Donnelly
- Priority changed from Normal to High
- Target version set to v20.0.0
- Source set to Q/A
- Backport set to squid
Updated by Leonid Usov 12 days ago
Another instance of the same at https://pulpito.ceph.com/leonidus-2024-04-22_12:36:42-fs-wip-lusov-quiescer-distro-default-smithi/7668264/
2024-04-22T13:18:13.128+0000 7ff5db5fd700 10 mds.0.cache.dir(0x10000001fb2) freeze_dir + wait [dir 0x10000001fb2 /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ [2,head] auth v=344 cv=115/115 ap=2+0 state=1610878993|complete|freezingdir|fragmenting f(v0 m2024-04-22T13:17:56.178118+0000 56=55+1) n(v1 rc2024-04-22T13:17:56.179118+0000 b639400 57=56+1) hs=56+0,ss=55+0 dirty=111 | child=1 dirty=1 authpin=1 0x563bc6330d00]
However, following our talk about the recursive call to `can_auth_pin`, I'm less confident now this could be our issue. The thing is, here we can see that the directory has been frozen before we began quiescing:
2024-04-22T13:18:13.128+0000 7ff5db5fd700 10 mds.0.cache.dir(0x10000001fb2) auth_unpin by 0x563bc6330d00 on [dir 0x10000001fb2 /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ [2,head] auth v=344 cv=115/115 state=1610895369|complete|frozendir|fragmenting|dnpinnedfrag f(v0 m2024-04-22T13:17:56.178118+0000 56=55+1) n(v1 rc2024-04-22T13:17:56.179118+0000 b639400 57=56+1) hs=56+0,ss=55+0 dirty=111 | child=1 frozen=1 dirty=1 authpin=0 0x563bc6330d00] count now 0 ... 2024-04-22T13:18:14.263+0000 7ff5dce00700 20 mds.0.cache dispatch_quiesce_inode request(mds.0:3370 nref=3) quiescing [inode 0x10000001fb2 [...5,head] /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ auth v620 pv624 ap=3 f(v0 m2024-04-22T13:17:56.178118+0000 56=55+1)->f(v0 m2024-04-22T13:17:56.178118+0000 56=55+1) n(v1 rc2024-04-22T13:17:56.179118+0000 b639400 58=56+2)->n(v2 rc2024-04-22T13:17:56.179118+0000 b639400 58=56+2) old_inodes=1 (inest lock->sync w=1 flushing) (ifile excl) (iquiesce lock x=1 by request(mds.0:3370 nref=4)) caps={15195=pAsLsXs/pAsLsXsFsxcral@58},l=15195 | ptrwaiter=0 dirtyscattered=1 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 dirty=1 waiter=1 authpin=1 0x563bcc332c00] ... 2024-04-22T13:18:14.263+0000 7ff5dce00700 10 mds.0.locker already auth_pinned [inode 0x10000001fb2 [...5,head] /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-f eatures/ffsb-6.0-rc2/ auth v620 pv624 ap=3 f(v0 m2024-04-22T13:17:56.178118+0000 56=55+1)->f(v0 m2024-04-22T13:17:56.178118+0000 56=55+1) n(v1 rc2024-04-22T13:17:56.179118+0000 b639400 58=56+2)->n(v2 rc2024-04-22T13:17:56.179118+0000 b6 39400 58=56+2) old_inodes=1 (inest lock->sync w=1 flushing) (ifile excl) (iquiesce lock x=1 by request(mds.0:3370 nref=5)) caps={15195=pAsLsXs/pAsLsXsFsxcral@58},l=15195 | ptrwaiter=0 dirtyscattered=1 request=0 lock=2 dirfrag=1 caps=1 d irtyrstat=0 dirtyparent=0 dirty=1 waiter=1 authpin=1 0x563bcc332c00] ... (still frozen) 2024-04-22T13:18:19.265+0000 7ff5d65f3700 20 mds.0.scrubstack scrub_dir_inode freezing/frozen [dir 0x10000001fb2 /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ [2,head] auth v=344 cv=115/115 state=1610895369|complete|frozendir|fragmenting|dnpinnedfrag f(v0 m2024-04-22T13:17:56.178118+0000 56=55+1) n(v1 rc2024-04-22T13:17:56.179118+0000 b639400 57=56+1) hs=56+0,ss=55+0 dirty=111 | child=1 frozen=1 dirty=1 waiter=1 authpin=0 0x563bc6330d00]
why do we have authpin=1 there?
2024-04-22T13:18:14.122+0000 7ff5d65f3700 15 mds.0.journal try_to_expire waiting for unfreeze on [dir 0x10000001fb2 /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-n ew-features/ffsb-6.0-rc2/ [2,head] auth v=344 cv=115/115 state=1610895369|complete|frozendir|fragmenting|dnpinnedfrag f(v0 m2024-04-22T13:17:56.178118+0000 56=55+1) n(v1 rc2024-04-22T13:17:56.179118+0000 b639400 57=56+1) hs=56+0,ss=55+0 dirty=111 | child=1 frozen=1 dirty=1 waiter=1 authpin=0 0x563bc6330d00] 2024-04-22T13:18:14.123+0000 7ff5d65f3700 10 mds.0.journal try_to_expire waiting for nest flush on [inode 0x10000001fb2 [...5,head] /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/ kernel/fs/ext4-new-features/ffsb-6.0-rc2/ auth v620 ap=3 f(v0 m2024-04-22T13:17:56.178118+0000 56=55+1) n(v1 rc2024-04-22T13:17:56.179118+0000 b639400 58=56+2) old_inodes=1 (inest lock dirty) (ifile excl->lock) (iquiesce lock x=1 by req uest(mds.0:3370 nref=4)) caps={15195=pAsLsXs/pAsLsXsFsx/pAsLsXsFsxcral@58},l=15195 | ptrwaiter=0 dirtyscattered=1 request=0 lock=1 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 dirty=1 waiter=1 authpin=1 0x563bcc332c00]
here we see different authpin counts for `dir` and the `inode`, maybe this is what you mentioned before about freezing the inode vs freezing the cdir?
Updated by Leonid Usov 12 days ago
leonidus@teuthology:/a/leonidus-2024-04-22_12:36:42-fs-wip-lusov-quiescer-distro-default-smithi/7668264$ gzip -cd remote/smithi178/log/e540e406-00a8-11ef-bc93-c7b262605968/ceph-mds.b.log.gz | python3 ~/qlog.py 16b08332 looking for the first mention of '16b08332' 4937493: 2024-04-22T13:18:05.771+0000 7ff5d3dee700 15 quiesce.mgr.24506 <leader_update_set> [16b08332@0] updated to match the min state of the remaining (1) members: QS_QUIESCING (1) 8126288: 2024-04-22T13:19:05.772+0000 7ff5d3dee700 10 quiesce.mgr.24506 <leader_upkeep_awaits> completing an await for the set '16b08332' with rc: 110 [ 0x1] since 2024-04-22 13:18:05.772000: / [0x10000000000] since 2024-04-22 13:18:05.775000: /volumes/ [0x10000000001] since 2024-04-22 13:18:05.776000: /volumes/qa/ [0x100000001fa] since 2024-04-22 13:18:05.777000: /volumes/qa/sv_1/ [0x100000001fb] since 2024-04-22 13:18:05.777000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/ [0x10000000005] since 2024-04-22 13:18:08.926000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/ [0x10000000006] since 2024-04-22 13:18:09.854000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/ [0x10000000007] since 2024-04-22 13:18:09.863000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ [0x10000000202] since 2024-04-22 13:18:09.874000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/ [0x10000000204] since 2024-04-22 13:18:09.897000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/ [0x10000001f2c] since 2024-04-22 13:18:09.967000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/ [0x10000001fa5] since 2024-04-22 13:18:10.239000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ [0x10000001fab] since 2024-04-22 13:18:10.760000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ [0x10000001fb2] since 2024-04-22 13:18:11.691000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ [0x10000001fe8] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/fileops.h [0x10000001fe9] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/rbt.h [0x10000001fc8] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/rwlock.h [0x10000001fca] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/examples/ [0x10000001fd4] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/depcomp [0x10000001fb9] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/cirlist.c [0x10000001fe4] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/rand.c [0x10000001fe0] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/fileops.c [0x10000001fb4] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ffsb.h [0x10000001fc0] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/filelist.h [0x10000001fbf] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/filelist.c [0x10000001fc2] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/config.guess [0x10000001fe3] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/fh.c [0x10000001fe2] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/registration-description [0x10000001fbb] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ffsb_tg.c [0x10000001fc4] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/metaops.c [0x10000001fba] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/config.h.in [0x10000001fe5] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/aclocal.m4 [0x10000001fdd] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/config.sub [0x10000001fce] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ffsb_fs.c [0x10000001fb8] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/LICENSE [0x10000001fd6] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/stamp-h.in [0x10000001fd1] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ffsb_fc.c [0x10000001fd2] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/install-sh [0x10000001fd8] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/Makefile.in [0x10000001fcd] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ffsb_fs.h [0x10000001fc5] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/parser.h [0x10000001fbd] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ffsb_op.c [0x10000001fc7] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ffsb_op.h [0x10000001fdf] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/rbt.c [0x10000001fb6] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/util.h [0x10000001fd7] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/metaops.h [0x10000001fc3] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ffsb_stats.c [0x10000001fda] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/public-description [0x10000001fea] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/Makefile.am [0x10000001fde] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/INSTALL [0x10000001fdc] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/AUTHORS [0x10000001fe6] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/main.c [0x10000001fc1] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ffsb_thread.h [0x10000001fdb] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/configure.in [0x10000001fd9] since 2024-04-22 13:18:14.264000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ffsb_tg.h [0x10000001fb7] since 2024-04-22 13:18:14.265000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/mkinstalldirs [0x10000001fcc] since 2024-04-22 13:18:14.265000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/README [0x10000001fd5] since 2024-04-22 13:18:14.265000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ffsb_thread.c [0x10000001fc9] since 2024-04-22 13:18:14.265000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/cirlist.h [0x10000001fb5] since 2024-04-22 13:18:14.265000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/util.c [0x10000001fbc] since 2024-04-22 13:18:14.265000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/parser.c [0x10000001fc6] since 2024-04-22 13:18:14.265000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/rwlock.c [0x10000001fe7] since 2024-04-22 13:18:14.265000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/ffsb_stats.h [0x10000001fd0] since 2024-04-22 13:18:14.265000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/missing [0x10000001fcf] since 2024-04-22 13:18:14.265000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/list.c [0x10000001fbe] since 2024-04-22 13:18:14.265000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/configure [0x10000001fd3] since 2024-04-22 13:18:14.265000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/list.h [0x10000001feb] since 2024-04-22 13:18:14.265000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/rand.h [0x10000001fb3] since 2024-04-22 13:18:14.265000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/COPYING [0x10000001fe1] since 2024-04-22 13:18:14.265000: /volumes/qa/sv_1/4e8b897f-0e18-4fab-b6f5-72008b3927bc/client.0/tmp/fsstress/ltp-full-20091231/testcases/kernel/fs/ext4-new-features/ffsb-6.0-rc2/fh.h total pending: 70 complete (min/max/avg/stdev): 11949 (0.0/11.308/2.3/1.1)
Updated by Leonid Usov 12 days ago
Another case: https://pulpito.ceph.com/leonidus-2024-04-22_12:36:42-fs-wip-lusov-quiescer-distro-default-smithi/7668294
2024-04-22T13:58:22.827 ERROR:tasks.quiescer.fs.[cephfs]:Couldn't quiesce root with rc: -110 (ETIMEDOUT), stdout: { "epoch": 17, "leader": 4318, "set_version": 74, "sets": { "27dc9b58": { "version": 74, "age_ref": 0.0, "state": { "name": "TIMEDOUT", "age": 0.0 }, "timeout": 60.0, "expiration": 78.6, "members": { "file:/": { "excluded": false, "state": { "name": "QUIESCING", "age": 60.0 } } } } } }
2024-04-22T13:57:23.399+0000 7fd27a7bf640 10 mds.0.cache.dir(0x1000000cb95) freeze_dir + wait [dir 0x1000000cb95 /client.0/tmp/linux-2.6.33/arch/blackfin/include/asm/ [2,head] auth v=261 cv=0/0 ap=2+0 state=1610878993|complete|freezingdir|fragmenting f(v0 m2024-04-22T13:57:18.472722+0000 130=130+0) n(v0 rc2024-04-22T13:57:18.472722+0000 130=130+0) hs=130+0,ss=0+0 dirty=130 | child=1 dirty=1 authpin=1 0x55b26345ad80] 2024-04-22T13:57:23.399+0000 7fd27a7bf640 10 mds.0.cache fragment_mark_and_complete [0x55b26345ad80] on [inode 0x1000000cb95 [...300,head] /client.0/tmp/linux-2.6.33/arch/blackfin/include/asm/ auth v262 ap=2 DIRTYPARENT f(v0 m2024-04-22T13:57:18.472722+0000 130=130+0) n(v0 rc2024-04-22T13:57:18.472722+0000 131=130+1) (inest lock) (ifile excl->lock) (iquiesce lock x=1 by request(mds.0:51939 nref=4)) caps={4809=pAsLsXs/pAsLsXsFsx/-@133},l=4809 | request=0 lock=1 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x55b25b988000]
Updated by Leonid Usov 12 days ago · Edited
Another one: https://pulpito.ceph.com/leonidus-2024-04-22_12:36:42-fs-wip-lusov-quiescer-distro-default-smithi/7668295
2024-04-22T13: 49: 55.390 ERROR:tasks.quiescer.fs.[cephfs]:Couldn't quiesce root with rc: -110 (ETIMEDOUT), stdout: { "epoch": 17, "leader": 4290, "set_version": 34, "sets": { "4bf6425d": { "version": 34, "age_ref": 0.0, "state": { "name": "TIMEDOUT", "age": 0.0 }, "timeout": 60.0, "expiration": 86.9, "members": { "file:/": { "excluded": false, "state": { "name": "QUIESCING", "age": 60.0 } } } } } }
[ 0x1] since 2024-04-22 13:48:55.386000: / [0x10000000000] since 2024-04-22 13:48:55.396000: /client.0/ [0x100000017b0] since 2024-04-22 13:48:55.421000: /client.0/tmp/ [0x100000017b1] since 2024-04-22 13:48:55.423000: /client.0/tmp/ceph/ [0x10000002fe2] since 2024-04-22 13:48:55.425000: /client.0/tmp/ceph/src/ [0x100000048ca] since 2024-04-22 13:48:55.444000: /client.0/tmp/ceph/src/test/ [0x100000078b0] since 2024-04-22 13:48:55.444000: /client.0/tmp/ceph/src/log/ [0x100000082a8] since 2024-04-22 13:48:55.444000: /client.0/tmp/ceph/src/sample.ceph.conf [0x10000008548] since 2024-04-22 13:48:55.444000: /client.0/tmp/ceph/src/cls_crypto.cc [0x1000000846f] since 2024-04-22 13:48:55.444000: /client.0/tmp/ceph/src/logrotate.conf [0x100000083f1] since 2024-04-22 13:48:55.444000: /client.0/tmp/ceph/src/testmsgr.cc [0x100000039ec] since 2024-04-22 13:48:55.444000: /client.0/tmp/ceph/src/messages/ [0x10000008394] since 2024-04-22 13:48:55.648000: /client.0/tmp/ceph/src/messages/MTimeCheck.h total pending: 13 complete (min/max/avg/stdev): 1859 (0.0/1.041/0.36/0.18)
Updated by Patrick Donnelly 11 days ago
- Status changed from New to In Progress
- Pull request ID set to 57059
Updated by Patrick Donnelly 10 days ago
- Subject changed from Quiesce timeout due to a freezing directory to mds: quiesce timeout due to a freezing directory
- Status changed from In Progress to Fix Under Review
- Component(FS) MDS added
- Labels (FS) qa-failure added
Updated by Patrick Donnelly 3 days ago
- Category set to Correctness/Safety
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot 3 days ago
- Copied to Backport #65738: squid: mds: quiesce timeout due to a freezing directory added