Project

General

Profile

Actions

Bug #64707

open

suites/fsstress.sh hangs on one client - test times out

Added by Venky Shankar 2 months ago. Updated 3 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
quincy,reef,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, kceph
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

https://pulpito.ceph.com/vshankar-2024-03-04_08:26:39-fs-wip-vshankar-testing-20240304.042522-testing-default-smithi/7580951/

Description: fs/upgrade/mds_upgrade_sequence/{bluestore-bitmap centos_9.stream conf/{client mds mon osd} fail_fs/yes overrides/{ignorelist_health ignorelist_upgrade ignorelist_wrongly_marked_down pg-warn syntax} roles tasks/{0-from/quincy 1-volume/{0-create 1-ranks/1 2-allow_standby_replay/no 3-inline/yes 4-verify} 2-client/kclient 3-upgrade-mgr-staggered 4-config-upgrade/{fail_fs} 5-upgrade-with-workload 6-verify}}

client.1 runs to completion, however, fsstress on client.0 times out with the following last few operations

2024-03-04T11:06:15.538 INFO:tasks.workunit.client.0.smithi078.stdout:5/993: creat d15/d64/d4d/d49/dda/ddc/f139 x:0 0 0
2024-03-04T11:06:15.541 INFO:tasks.workunit.client.0.smithi078.stdout:5/994: getdents d15/d2e/d11d 0
2024-03-04T11:06:15.554 INFO:tasks.workunit.client.0.smithi078.stdout:5/995: dread d15/d2e/f9a [0,4194304] 0
2024-03-04T11:06:15.554 INFO:tasks.workunit.client.0.smithi078.stdout:5/996: chown d15/d2e/l12e 14317335 1
2024-03-04T11:06:15.558 INFO:tasks.workunit.client.0.smithi078.stdout:5/997: mknod d15/d40/dbe/dcb/dcc/c13a 0
2024-03-04T11:06:15.558 INFO:tasks.workunit.client.0.smithi078.stdout:5/998: write d15/d40/fe0 [5130847,44776] 0
2024-03-04T11:06:15.560 INFO:tasks.workunit.client.0.smithi078.stdout:5/999: mkdir d15/d64/d4d/d49/d73/d13b 0
2024-03-04T11:06:16.193 INFO:journalctl@ceph.mon.smithi078.smithi078.stdout:Mar 04 11:06:15 smithi078 ceph-mon[29227]: pgmap v48: 65 pgs: 65 active+clean; 3.8
2024-03-04T14:04:14.958 DEBUG:teuthology.orchestra.run:got remote process result: 124
2024-03-04T14:04:14.959 INFO:tasks.workunit:Stopping ['suites/fsstress.sh'] on client.0...
2024-03-04T14:04:14.959 DEBUG:teuthology.orchestra.run.smithi078:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0

Interestingly, the kerne ring buffers are empty:

- ./remote/smithi078/syslog/kern.log.gz
- ./remote/smithi175/syslog/kern.log.gz

which should not be the case - there should be something in those, isn't it?


Related issues 2 (2 open0 closed)

Related to CephFS - Bug #50821: qa: untar_snap_rm failure during mds thrashingFix Under ReviewXiubo Li

Actions
Related to CephFS - Fix #52916: mds,client: formally remove inline data supportIn ProgressMilind Changire

Actions
Actions #1

Updated by Venky Shankar about 1 month ago

Venky Shankar wrote:

https://pulpito.ceph.com/vshankar-2024-03-04_08:26:39-fs-wip-vshankar-testing-20240304.042522-testing-default-smithi/7580951/

[...]

client.1 runs to completion, however, fsstress on client.0 times out with the following last few operations

[...]

[...]

Interestingly, the kerne ring buffers are empty:

- ./remote/smithi078/syslog/kern.log.gz
- ./remote/smithi175/syslog/kern.log.gz

which should not be the case - there should be something in those, isn't it?

That's also the case for untar_snap_rm failure - https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612983/

empty kernel ring buffer. I had communicated this to Xiubo few weeks ago

Actions #2

Updated by Venky Shankar about 1 month ago

Venky Shankar wrote:

Venky Shankar wrote:

https://pulpito.ceph.com/vshankar-2024-03-04_08:26:39-fs-wip-vshankar-testing-20240304.042522-testing-default-smithi/7580951/

[...]

client.1 runs to completion, however, fsstress on client.0 times out with the following last few operations

[...]

[...]

Interestingly, the kerne ring buffers are empty:

- ./remote/smithi078/syslog/kern.log.gz
- ./remote/smithi175/syslog/kern.log.gz

which should not be the case - there should be something in those, isn't it?

That's also the case for untar_snap_rm failure - https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612983/

empty kernel ring buffer. I had communicated this to Xiubo few weeks ago

Xiubo mentioned over slack that the empty kernel ring buffers aren't an issue (log copy issues from remotes) - we can always check the kernel logs from console_logs.

So, back to the issues:

They seem unrelated now - the only things clubbing both failures was the empty kernel ring buffer which not is a non issue (at least not a serious one).

Actions #3

Updated by Venky Shankar about 1 month ago

  • Related to Bug #50821: qa: untar_snap_rm failure during mds thrashing added
Actions #4

Updated by Xiubo Li about 1 month ago

The mds log files are full of client_request(client.14524:1494739 getattr Fsr #0x1000000587b, the same getattr requests from the same client:

2024-03-04T11:20:13.346+0000 7f6844000700  1 -- [v2:172.21.15.78:6828/2596812178,v1:172.21.15.78:6829/2596812178] <== client.14524 v1:192.168.0.1:0/3128030669 1523336 ==== client_request(client.14524:1494739 getattr Fsr #0x1000000587b 2024-03-04T11:20:13.346283+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6 ==== 200+0+0 (unknown 1479355263 0 0) 0x558c4fc64b00 con 0x558c4bd45c00
2024-03-04T11:20:13.346+0000 7f6844000700  4 mds.0.server handle_client_request client_request(client.14524:1494739 getattr Fsr #0x1000000587b 2024-03-04T11:20:13.346283+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.12 get_session have 0x558c4bd8cf00 client.14524 v1:192.168.0.1:0/3128030669 state open
2024-03-04T11:20:13.346+0000 7f6844000700 15 mds.0.server  oldest_client_tid=1494739
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.cache request_start request(client.14524:1494739 nref=2 cr=0x558c4fc64b00)
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.server dispatch_client_request client_request(client.14524:1494739 getattr Fsr #0x1000000587b 2024-03-04T11:20:13.346283+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.cache traverse: opening base ino 0x1000000587b snap head
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.cache no tail dentry, base [inode 0x1000000587b [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/d2b/f38 auth v95 snaprealm=0x558c5154c800 DIRTYPARENT s=8388608 n(v0 rc2024-03-04T11:05:41.106738+0000 b8388608 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={14524=0-20971520@1} caps={14524=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@1478309},l=14524 | ptrwaiter=0 request=0 lock=0 caps=1 remoteparent=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c54cc7b80]
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.cache found target [inode 0x1000000587b [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/d2b/f38 auth v95 snaprealm=0x558c5154c800 DIRTYPARENT s=8388608 n(v0 rc2024-03-04T11:05:41.106738+0000 b8388608 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={14524=0-20971520@1} caps={14524=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@1478309},l=14524 | ptrwaiter=0 request=0 lock=0 caps=1 remoteparent=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c54cc7b80]
2024-03-04T11:20:13.346+0000 7f6844000700 10 mds.0.cache path_traverse finish on snapid head
2024-03-04T11:20:13.346+0000 7f6844000700 10 mds.0.server rdlock_path_pin_ref request(client.14524:1494739 nref=3 cr=0x558c4fc64b00) #0x1000000587b
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.cache traverse: opening base ino 0x1000000587b snap head
2024-03-04T11:20:13.346+0000 7f6844000700 10 mds.0.locker try_rdlock_snap_layout request(client.14524:1494739 nref=4 cr=0x558c4fc64b00) [inode 0x1000000587b [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/d2b/f38 auth v95 snaprealm=0x558c5154c800 DIRTYPARENT s=8388608 n(v0 rc2024-03-04T11:05:41.106738+0000 b8388608 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={14524=0-20971520@1} caps={14524=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@1478309},l=14524 | ptrwaiter=0 request=1 lock=0 caps=1 remoteparent=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c54cc7b80]
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.locker  got rdlock on (isnap sync r=1) [inode 0x1000000587b [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/d2b/f38 auth v95 snaprealm=0x558c5154c800 DIRTYPARENT s=8388608 n(v0 rc2024-03-04T11:05:41.106738+0000 b8388608 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={14524=0-20971520@1} caps={14524=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@1478309},l=14524 | ptrwaiter=0 request=1 lock=1 caps=1 remoteparent=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c54cc7b80]
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.locker  got rdlock on (isnap sync r=2) [inode 0x10000005893 [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/d2b/ auth v173 DIRTYPARENT f(v0 m2024-03-04T11:05:41.918724+0000 6=3+3) n(v2 rc2024-03-04T11:05:43.069705+0000 b13262684 11=7+4) (isnap sync r=2) (inest lock) (ipolicy sync r=1) (iversion lock) caps={14524=pAsLsXsFs/-@23},l=14524 | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c51c49080]
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.locker  got rdlock on (isnap sync r=2) [inode 0x1000000582d [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/ auth v206 DIRTYPARENT f(v0 m2024-03-04T11:05:42.923707+0000 6=1+5) n(v4 rc2024-03-04T11:05:43.069705+0000 b28422893 40=29+11) (isnap sync r=2) (inest lock) (ipolicy sync r=1) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@17},l=14524 | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c536a4000]
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.locker  got rdlock on (isnap sync r=2) [inode 0x10000005829 [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/ auth v203 DIRTYPARENT f(v0 m2024-03-04T11:05:40.999740+0000 7=6+1) n(v5 rc2024-03-04T11:05:43.069705+0000 b36816291 45=33+12) (isnap sync r=2) (inest lock dirty) (ipolicy sync r=1) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@19},l=14524 | dirtyscattered=1 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c509fc100]
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.locker  got rdlock on (isnap sync r=2) [inode 0x1000000583f [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/ auth v222 DIRTYPARENT f(v0 m2024-03-04T11:05:42.872708+0000 9=7+2) n(v4 rc2024-03-04T11:05:43.069705+0000 b47251207 56=41+15) (isnap sync r=2) (inest lock) (ipolicy sync r=1) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@19},l=14524 | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c54fb0580]
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.locker  got rdlock on (isnap sync r=3) [inode 0x10000005828 [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/ auth v261 DIRTYPARENT f(v0 m2024-03-04T11:05:42.872708+0000 9=7+2) n(v10 rc2024-03-04T11:05:43.081705+0000 b47255954 65=48+17) (isnap sync r=3) (inest lock) (ipolicy sync r=2) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@15},l=14524 | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 authpin=0 0x558c5320c100]
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.locker  got rdlock on (isnap sync r=5) [inode 0x1000000581d [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/ auth v272 DIRTYPARENT f(v0 m2024-03-04T11:05:41.023739+0000 14=12+2) n(v9 rc2024-03-04T11:05:43.081705+0000 b59249536 84=65+19) (isnap sync r=5) (inest lock dirty) (ipolicy sync r=3) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@33},l=14524 | dirtyscattered=1 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c54f43080]
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.locker  got rdlock on (isnap sync r=5) [inode 0x100000057fd [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/ auth v1929 DIRTYPARENT f(v0 m2024-03-04T11:05:30.089925+0000 5=4+1) n(v11 rc2024-03-04T11:05:43.081705+0000 b71140230 88=68+20) (isnap sync r=5) (inest lock dirty) (ipolicy sync r=3) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@15},l=14524 | dirtyscattered=1 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c543c9600]
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.locker  got rdlock on (isnap sync r=99) [inode 0x100000057e6 [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/ auth v472 DIRTYPARENT f(v0 m2024-03-04T11:05:23.141043+0000 10=0+10) n(v25 rc2024-03-04T11:06:15.563154+0000 b1926360674 2137=1696+441) (isnap sync r=99) (inest lock) (ipolicy sync r=80) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@15},l=14524 | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 authpin=0 0x558c524c4c00]
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.locker  got rdlock on (isnap sync r=1210) [inode 0x10000000001 [...2,head] /client.0/tmp/ auth v323 f(v0 m2024-03-04T11:05:23.086044+0000 2=0+2) n(v30 rc2024-03-04T11:06:15.563154+0000 b1996184534 14090=12343+1747) (isnap sync r=1210) (inest lock) (ipolicy sync r=1191) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@3},l=14524 | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 dirty=1 authpin=0 0x558c4be48580]
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.locker  got rdlock on (isnap sync r=1210) [inode 0x10000000000 [...2,head] /client.0/ auth v709 f(v0 m2024-03-04T11:04:13.692241+0000 1=0+1) n(v28 rc2024-03-04T11:06:15.563154+0000 b1996184534 14091=12343+1748) (isnap sync r=1210) (inest lock) (ipolicy sync r=1191) (iversion lock) caps={14524=pAsLsXsFs/-@3},l=14524 | ptrwaiter=0 dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 dirty=1 authpin=0 0x558c4bb87180]
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.locker  got rdlock on (isnap sync r=1210) [inode 0x1 [...2,head] / auth v200 snaprealm=0x558c4bd46000 f(v0 m2024-03-04T11:02:44.773897+0000 2=0+2) n(v41 rc2024-03-04T11:07:35.753304+0000 b1996184534 14093=12343+1750)/n(v0 rc2024-03-04T11:01:52.968331+0000 1=0+1) (isnap sync r=1210) (inest lock) (ipolicy sync r=1191) (iversion lock) caps={14524=pAsLsXsFs/pAsLsXsFsxcral@82,24287=pAsLsXsFs/pAsLsXsFsxcral@13} | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 openingsnapparents=0 dirtyparent=0 dirty=1 waiter=0 authpin=0 0x558c4bd12100]
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.cache no tail dentry, base [inode 0x1000000587b [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/d2b/f38 auth v95 snaprealm=0x558c5154c800 DIRTYPARENT s=8388608 n(v0 rc2024-03-04T11:05:41.106738+0000 b8388608 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={14524=0-20971520@1} caps={14524=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@1478309},l=14524 | ptrwaiter=0 request=1 lock=1 caps=1 remoteparent=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c54cc7b80]
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.cache found target [inode 0x1000000587b [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/d2b/f38 auth v95 snaprealm=0x558c5154c800 DIRTYPARENT s=8388608 n(v0 rc2024-03-04T11:05:41.106738+0000 b8388608 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={14524=0-20971520@1} caps={14524=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@1478309},l=14524 | ptrwaiter=0 request=1 lock=1 caps=1 remoteparent=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c54cc7b80]
2024-03-04T11:20:13.346+0000 7f6844000700 10 mds.0.cache path_traverse finish on snapid head
2024-03-04T11:20:13.346+0000 7f6844000700 10 mds.0.server ref is [inode 0x1000000587b [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/d2b/f38 auth v95 snaprealm=0x558c5154c800 DIRTYPARENT s=8388608 n(v0 rc2024-03-04T11:05:41.106738+0000 b8388608 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={14524=0-20971520@1} caps={14524=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@1478309},l=14524 | ptrwaiter=0 request=1 lock=1 caps=1 remoteparent=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c54cc7b80]
2024-03-04T11:20:13.346+0000 7f6844000700 10 mds.0.locker acquire_locks request(client.14524:1494739 nref=3 cr=0x558c4fc64b00)
2024-03-04T11:20:13.346+0000 7f6844000700 20 Session check_access path /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/d2b/f38
2024-03-04T11:20:13.346+0000 7f6844000700 10 MDSAuthCap is_capable inode(path /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/d2b/f38 owner 1000:1301 mode 0100666) by caller 1000:1301 mask 1 new 0:0 cap: MDSAuthCaps[allow *]
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.bal hit_dir 0 pop is 14001.6, frag * size 6 [pop IRD:[C 1.40e+04] IWR:[C 0.00e+00] RDR:[C 0.00e+00] FET:[C 0.00e+00] STR:[C 0.00e+00] *LOAD:14001.6]

2024-03-04T11:20:13.346+0000 7f6844000700 10 mds.0.server reply to stat on client_request(client.14524:1494739 getattr Fsr #0x1000000587b 2024-03-04T11:20:13.346283+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.server respond_to_request batch head request(client.14524:1494739 nref=3 cr=0x558c4fc64b00)
2024-03-04T11:20:13.346+0000 7f6844000700 20 respond: responding to batch ops with result=0: [batch front=request(client.14524:1494739 nref=3 cr=0x558c4fc64b00)]
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.14524:1494739 getattr Fsr #0x1000000587b 2024-03-04T11:20:13.346283+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6
2024-03-04T11:20:13.346+0000 7f6844000700 10 mds.0.server apply_allocated_inos 0x0 / [] / 0x0
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.server lat 0.000172
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.server set_trace_dist snapid head
2024-03-04T11:20:13.346+0000 7f6844000700 10 mds.0.server set_trace_dist snaprealm snaprealm(0x1000000587b seq 1 lc 0 cr 1 cps 2 snaps={} 0x558c5154c800) len=96
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.cache.ino(0x1000000587b)  pfile 1 pauth 1 plink 1 pxattr 1 plocal 0 mtime 2024-03-04T11:05:41.106738+0000 ctime 2024-03-04T11:05:41.106738+0000 change_attr 10 valid=1
2024-03-04T11:20:13.346+0000 7f6844000700 10 mds.0.cache.ino(0x1000000587b) encode_inodestat issuing pAsxLsXsxFsxcrwb seq 1478310
2024-03-04T11:20:13.346+0000 7f6844000700 10 mds.0.cache.ino(0x1000000587b) encode_inodestat caps pAsxLsXsxFsxcrwb seq 1478310 mseq 0 xattrv 0
2024-03-04T11:20:13.346+0000 7f6844000700 20 mds.0.server set_trace_dist added in   [inode 0x1000000587b [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/d2b/f38 auth v95 snaprealm=0x558c5154c800 DIRTYPARENT s=8388608 n(v0 rc2024-03-04T11:05:41.106738+0000 b8388608 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={14524=0-20971520@1} caps={14524=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@1478310},l=14524 | ptrwaiter=0 request=1 lock=1 caps=1 remoteparent=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c54cc7b80]
2024-03-04T11:20:13.346+0000 7f6844000700 10 mds.0.12 send_message_client client.14524 v1:192.168.0.1:0/3128030669 client_reply(???:1494739 = 0 (0) Success) v1
2024-03-04T11:20:13.346+0000 7f6844000700  1 -- [v2:172.21.15.78:6828/2596812178,v1:172.21.15.78:6829/2596812178] --> v1:192.168.0.1:0/3128030669 -- client_reply(???:1494739 = 0 (0) Success) v1 -- 0x558c4c432c40 con 0x558c4bd45c00
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.cache request_finish request(client.14524:1494739 nref=3 cr=0x558c4fc64b00)
2024-03-04T11:20:13.346+0000 7f6844000700 15 mds.0.cache request_cleanup request(client.14524:1494739 nref=3 cr=0x558c4fc64b00)
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.locker rdlock_finish on (isnap sync r=1209) on [inode 0x10000000000 [...2,head] /client.0/ auth v709 f(v0 m2024-03-04T11:04:13.692241+0000 1=0+1) n(v28 rc2024-03-04T11:06:15.563154+0000 b1996184534 14091=12343+1748) (isnap sync r=1209) (inest lock) (ipolicy sync r=1191) (iversion lock) caps={14524=pAsLsXsFs/-@3},l=14524 | ptrwaiter=0 dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 dirty=1 authpin=0 0x558c4bb87180]
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.locker rdlock_finish on (isnap sync r=1209) on [inode 0x1 [...2,head] / auth v200 snaprealm=0x558c4bd46000 f(v0 m2024-03-04T11:02:44.773897+0000 2=0+2) n(v41 rc2024-03-04T11:07:35.753304+0000 b1996184534 14093=12343+1750)/n(v0 rc2024-03-04T11:01:52.968331+0000 1=0+1) (isnap sync r=1209) (inest lock) (ipolicy sync r=1191) (iversion lock) caps={14524=pAsLsXsFs/pAsLsXsFsxcral@82,24287=pAsLsXsFs/pAsLsXsFsxcral@13} | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 openingsnapparents=0 dirtyparent=0 dirty=1 waiter=0 authpin=0 0x558c4bd12100]
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.locker rdlock_finish on (isnap sync r=1209) on [inode 0x10000000001 [...2,head] /client.0/tmp/ auth v323 f(v0 m2024-03-04T11:05:23.086044+0000 2=0+2) n(v30 rc2024-03-04T11:06:15.563154+0000 b1996184534 14090=12343+1747) (isnap sync r=1209) (inest lock) (ipolicy sync r=1191) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@3},l=14524 | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=0 dirty=1 authpin=0 0x558c4be48580]
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.locker rdlock_finish on (isnap sync r=1) on [inode 0x10000005829 [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/ auth v203 DIRTYPARENT f(v0 m2024-03-04T11:05:40.999740+0000 7=6+1) n(v5 rc2024-03-04T11:05:43.069705+0000 b36816291 45=33+12) (isnap sync r=1) (inest lock dirty) (ipolicy sync r=1) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@19},l=14524 | dirtyscattered=1 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c509fc100]
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.locker rdlock_finish on (isnap sync r=1) on [inode 0x10000005893 [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/d2b/ auth v173 DIRTYPARENT f(v0 m2024-03-04T11:05:41.918724+0000 6=3+3) n(v2 rc2024-03-04T11:05:43.069705+0000 b13262684 11=7+4) (isnap sync r=1) (inest lock) (ipolicy sync r=1) (iversion lock) caps={14524=pAsLsXsFs/-@23},l=14524 | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c51c49080]
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.locker rdlock_finish on (isnap sync r=98) on [inode 0x100000057e6 [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/ auth v472 DIRTYPARENT f(v0 m2024-03-04T11:05:23.141043+0000 10=0+10) n(v25 rc2024-03-04T11:06:15.563154+0000 b1926360674 2137=1696+441) (isnap sync r=98) (inest lock) (ipolicy sync r=80) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@15},l=14524 | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 authpin=0 0x558c524c4c00]
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.locker rdlock_finish on (isnap sync r=2) on [inode 0x10000005828 [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/ auth v261 DIRTYPARENT f(v0 m2024-03-04T11:05:42.872708+0000 9=7+2) n(v10 rc2024-03-04T11:05:43.081705+0000 b47255954 65=48+17) (isnap sync r=2) (inest lock) (ipolicy sync r=2) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@15},l=14524 | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 authpin=0 0x558c5320c100]
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.locker rdlock_finish on (isnap sync r=1) on [inode 0x1000000582d [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/ auth v206 DIRTYPARENT f(v0 m2024-03-04T11:05:42.923707+0000 6=1+5) n(v4 rc2024-03-04T11:05:43.069705+0000 b28422893 40=29+11) (isnap sync r=1) (inest lock) (ipolicy sync r=1) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@17},l=14524 | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c536a4000]
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.locker rdlock_finish on (isnap sync r=4) on [inode 0x100000057fd [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/ auth v1929 DIRTYPARENT f(v0 m2024-03-04T11:05:30.089925+0000 5=4+1) n(v11 rc2024-03-04T11:05:43.081705+0000 b71140230 88=68+20) (isnap sync r=4) (inest lock dirty) (ipolicy sync r=3) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@15},l=14524 | dirtyscattered=1 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c543c9600]
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.locker rdlock_finish on (isnap sync) on [inode 0x1000000587b [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/d2b/f38 auth v95 snaprealm=0x558c5154c800 DIRTYPARENT s=8388608 n(v0 rc2024-03-04T11:05:41.106738+0000 b8388608 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={14524=0-20971520@1} caps={14524=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@1478310},l=14524 | ptrwaiter=0 request=1 lock=0 caps=1 remoteparent=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c54cc7b80]
2024-03-04T11:20:13.346+0000 7f6844000700 10 mds.0.locker simple_eval (isnap sync) on [inode 0x1000000587b [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/d7c/d13/d2b/f38 auth v95 snaprealm=0x558c5154c800 DIRTYPARENT s=8388608 n(v0 rc2024-03-04T11:05:41.106738+0000 b8388608 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={14524=0-20971520@1} caps={14524=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@1478310},l=14524 | ptrwaiter=0 request=1 lock=0 caps=1 remoteparent=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c54cc7b80]
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.locker rdlock_finish on (isnap sync r=4) on [inode 0x1000000581d [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/ auth v272 DIRTYPARENT f(v0 m2024-03-04T11:05:41.023739+0000 14=12+2) n(v9 rc2024-03-04T11:05:43.081705+0000 b59249536 84=65+19) (isnap sync r=4) (inest lock dirty) (ipolicy sync r=3) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@33},l=14524 | dirtyscattered=1 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c54f43080]
2024-03-04T11:20:13.346+0000 7f6844000700  7 mds.0.locker rdlock_finish on (isnap sync r=1) on [inode 0x1000000583f [...2,head] /client.0/tmp/tmp.Yz8EkYlwmi/p7/d8/d10/d1a/ auth v222 DIRTYPARENT f(v0 m2024-03-04T11:05:42.872708+0000 9=7+2) n(v4 rc2024-03-04T11:05:43.069705+0000 b47251207 56=41+15) (isnap sync r=1) (inest lock) (ipolicy sync r=1) (ifile excl) (iversion lock) caps={14524=pAsLsXsFsx/pAsLsXsFsxcral@19},l=14524 | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x558c54fb0580]
2024-03-04T11:20:13.347+0000 7f6844000700  1 -- [v2:172.21.15.78:6828/2596812178,v1:172.21.15.78:6829/2596812178] <== client.14524 v1:192.168.0.1:0/3128030669 1523337 ==== client_request(client.14524:1494740 getattr Fsr #0x1000000587b 2024-03-04T11:20:13.346283+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6 ==== 200+0+0 (unknown 2752217041 0 0) 0x558c4f718dc0 con 0x558c4bd45c00
2024-03-04T11:20:13.347+0000 7f6844000700  4 mds.0.server handle_client_request client_request(client.14524:1494740 getattr Fsr #0x1000000587b 2024-03-04T11:20:13.346283+0000 caller_uid=1000, caller_gid=1301{6,36,1000,1301,}) v6
2024-03-04T11:20:13.347+0000 7f6844000700 20 mds.0.12 get_session have 0x558c4bd8cf00 client.14524 v1:192.168.0.1:0/3128030669 state open
2024-03-04T11:20:13.347+0000 7f6844000700 15 mds.0.server  oldest_client_tid=1494740
...

This should be the reason why the test timed out, but still not sure why kclient did this.

Actions #5

Updated by Xiubo Li about 1 month ago

The inline_data feature, which is buggy as I mentioned before, is enabled:

2024-03-04T11:10:01.360 INFO:journalctl@ceph.mon.smithi175.smithi175.stdout:Mar 04 11:10:00 smithi175 ceph-mon[36098]: Health detail: HEALTH_WARN 1 filesystem with deprecated feature inline_data
2024-03-04T11:10:01.360 INFO:journalctl@ceph.mon.smithi175.smithi175.stdout:Mar 04 11:10:00 smithi175 ceph-mon[36098]: [WRN] FS_INLINE_DATA_DEPRECATED: 1 filesystem with deprecated feature inline_data
2024-03-04T11:10:01.360 INFO:journalctl@ceph.mon.smithi175.smithi175.stdout:Mar 04 11:10:00 smithi175 ceph-mon[36098]:     fs cephfs has deprecated feature inline_data enabled.
2024-03-04T11:10:01.362 INFO:journalctl@ceph.mon.smithi078.smithi078.stdout:Mar 04 11:10:00 smithi078 ceph-mon[29227]: pgmap v91: 65 pgs: 65 active+clean; 2.0 GiB data, 7.6 GiB used, 529 GiB / 536 GiB avail
2024-03-04T11:10:01.362 INFO:journalctl@ceph.mon.smithi078.smithi078.stdout:Mar 04 11:10:00 smithi078 ceph-mon[29227]: Health detail: HEALTH_WARN 1 filesystem with deprecated feature inline_data
2024-03-04T11:10:01.362 INFO:journalctl@ceph.mon.smithi078.smithi078.stdout:Mar 04 11:10:00 smithi078 ceph-mon[29227]: [WRN] FS_INLINE_DATA_DEPRECATED: 1 filesystem with deprecated feature inline_data
2024-03-04T11:10:01.362 INFO:journalctl@ceph.mon.smithi078.smithi078.stdout:Mar 04 11:10:00 smithi078 ceph-mon[29227]:     fs cephfs has deprecated feature inline_data enabled.

Not very sure whether was it caused by this.

Actions #6

Updated by Xiubo Li about 1 month ago

Venky Shankar wrote:

Venky Shankar wrote:

https://pulpito.ceph.com/vshankar-2024-03-04_08:26:39-fs-wip-vshankar-testing-20240304.042522-testing-default-smithi/7580951/

[...]

client.1 runs to completion, however, fsstress on client.0 times out with the following last few operations

[...]

[...]

Interestingly, the kerne ring buffers are empty:

- ./remote/smithi078/syslog/kern.log.gz
- ./remote/smithi175/syslog/kern.log.gz

which should not be the case - there should be something in those, isn't it?

That's also the case for untar_snap_rm failure - https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612983/

This actually the above one should be a different issue with this tracker. It doesn't fail because of timed out. Just failed when copying the snapshots.

Actions #7

Updated by Venky Shankar about 1 month ago

Xiubo Li wrote:

Venky Shankar wrote:

Venky Shankar wrote:

https://pulpito.ceph.com/vshankar-2024-03-04_08:26:39-fs-wip-vshankar-testing-20240304.042522-testing-default-smithi/7580951/

[...]

client.1 runs to completion, however, fsstress on client.0 times out with the following last few operations

[...]

[...]

Interestingly, the kerne ring buffers are empty:

- ./remote/smithi078/syslog/kern.log.gz
- ./remote/smithi175/syslog/kern.log.gz

which should not be the case - there should be something in those, isn't it?

That's also the case for untar_snap_rm failure - https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612983/

This actually the above one should be a different issue with this tracker. It doesn't fail because of timed out. Just failed when copying the snapshots.

That's right - failure is status 1 (timeout would have 124 as errcode) and it's a MDS thrash task.

Actions #8

Updated by Venky Shankar about 1 month ago

Xiubo Li wrote:

The mds log files are full of client_request(client.14524:1494739 getattr Fsr #0x1000000587b, the same getattr requests from the same client:

[...]

This should be the reason why the test timed out, but still not sure why kclient did this.

The request tid is not the same - so it's a new request. 1494739 has been replied back by the MDS, the next one is 1494740.

Actions #9

Updated by Xiubo Li about 1 month ago

Venky Shankar wrote:

Xiubo Li wrote:

The mds log files are full of client_request(client.14524:1494739 getattr Fsr #0x1000000587b, the same getattr requests from the same client:

[...]

This should be the reason why the test timed out, but still not sure why kclient did this.

The request tid is not the same - so it's a new request. 1494739 has been replied back by the MDS, the next one is 1494740.

Yeah, correct. I saw in kclient there have some cases will retry by issuing a getattr request in a loop. But still couldn't be sure exactly in which case. I know the inline_data is buggy and we are trying to remove this feature, so could we just disable the corresponding tests ? It may introduce some noise here.

Actions #10

Updated by Venky Shankar about 1 month ago

  • Related to Fix #52916: mds,client: formally remove inline data support added
Actions #11

Updated by Venky Shankar about 1 month ago

Xiubo Li wrote:

Venky Shankar wrote:

Xiubo Li wrote:

The mds log files are full of client_request(client.14524:1494739 getattr Fsr #0x1000000587b, the same getattr requests from the same client:

[...]

This should be the reason why the test timed out, but still not sure why kclient did this.

The request tid is not the same - so it's a new request. 1494739 has been replied back by the MDS, the next one is 1494740.

Yeah, correct. I saw in kclient there have some cases will retry by issuing a getattr request in a loop. But still couldn't be sure exactly in which case. I know the inline_data is buggy and we are trying to remove this feature, so could we just disable the corresponding tests ? It may introduce some noise here.

For tests sake, yes, disabling inline data and rerunning test is a good idea. Let me see what I can do.

BTW, I added the data-inline data removal prerequisite tracker to this for reference in case this really turns out to be related to data-inline.

Actions #12

Updated by Laura Flores about 1 month ago

  • Translation missing: en.field_tag_list set to test-failure

Looks like a case of this tracker:

/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7616007

2024-03-22T07:03:17.809 INFO:teuthology.orchestra.run.smithi112.stdout:HEALTH_WARN 1 filesystem with deprecated feature inline_data
2024-03-22T07:03:17.809 INFO:teuthology.orchestra.run.smithi112.stdout:[WRN] FS_INLINE_DATA_DEPRECATED: 1 filesystem with deprecated feature inline_data
2024-03-22T07:03:17.809 INFO:teuthology.orchestra.run.smithi112.stdout:    fs cephfs has deprecated feature inline_data enabled.

...
2024-03-22T10:03:02.367 INFO:journalctl@ceph.osd.4.smithi175.stdout:Mar 22 10:03:02 smithi175 ceph-a9f8bebc-e818-11ee-95cd-87774f69a715-osd-4[48542]: 2024-03-22T10:03:02.097+0000 7f9e40f2b700 -1 osd.4 46 heartbeat_check: no reply from 172.21.15.112:6806 osd.0 ever on either front or back, first ping sent 2024-03-22T09:52:58.341924+0000 (oldest deadline 2024-03-22T09:53:18.341924+0000)
2024-03-22T10:03:02.367 INFO:journalctl@ceph.osd.4.smithi175.stdout:Mar 22 10:03:02 smithi175 ceph-a9f8bebc-e818-11ee-95cd-87774f69a715-osd-4[48542]: 2024-03-22T10:03:02.097+0000 7f9e40f2b700 -1 osd.4 46 heartbeat_check: no reply from 172.21.15.112:6814 osd.1 ever on either front or back, first ping sent 2024-03-22T09:52:58.341924+0000 (oldest deadline 2024-03-22T09:53:18.341924+0000)
2024-03-22T10:03:02.367 INFO:journalctl@ceph.osd.4.smithi175.stdout:Mar 22 10:03:02 smithi175 ceph-a9f8bebc-e818-11ee-95cd-87774f69a715-osd-4[48542]: 2024-03-22T10:03:02.097+0000 7f9e40f2b700 -1 osd.4 46 heartbeat_check: no reply from 172.21.15.112:6822 osd.2 ever on either front or back, first ping sent 2024-03-22T09:52:58.341924+0000 (oldest deadline 2024-03-22T09:53:18.341924+0000)
2024-03-22T10:03:02.367 INFO:journalctl@ceph.osd.4.smithi175.stdout:Mar 22 10:03:02 smithi175 ceph-a9f8bebc-e818-11ee-95cd-87774f69a715-osd-4[48542]: 2024-03-22T10:03:02.097+0000 7f9e40f2b700 -1 osd.4 46 get_health_metrics reporting 37 slow ops, oldest is osd_op(client.14534.0:7061 3.3 3:c68a693c:::20000000d6c.00000000:head [write 0~1743 [1@-1] in=1743b] snapc 1=[] ondisk+write+known_if_redirected e46)
2024-03-22T10:03:02.367 INFO:journalctl@ceph.osd.4.smithi175.stdout:Mar 22 10:03:02 smithi175 ceph-a9f8bebc-e818-11ee-95cd-87774f69a715-osd-4[48542]: 2024-03-22T10:03:02.288+0000 7f9e3854e700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2024-03-22T09:03:02.290251+0000)
2024-03-22T10:03:02.793 DEBUG:teuthology.orchestra.run:got remote process result: 124
2024-03-22T10:03:02.794 INFO:tasks.workunit:Stopping ['suites/fsstress.sh'] on client.1...
2024-03-22T10:03:02.794 DEBUG:teuthology.orchestra.run.smithi175:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.1 /home/ubuntu/cephtest/clone.client.1
2024-03-22T10:03:03.086 INFO:journalctl@ceph.osd.3.smithi175.stdout:Mar 22 10:03:02 smithi175 ceph-a9f8bebc-e818-11ee-95cd-87774f69a715-osd-3[42120]: 2024-03-22T10:03:02.974+0000 7f587eb54700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2024-03-22T09:03:02.975892+0000)
2024-03-22T10:03:03.086 INFO:journalctl@ceph.osd.5.smithi175.stdout:Mar 22 10:03:02 smithi175 ceph-a9f8bebc-e818-11ee-95cd-87774f69a715-osd-5[55019]: 2024-03-22T10:03:02.962+0000 7fbbc52e0700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2024-03-22T09:03:02.964635+0000)
2024-03-22T10:03:03.088 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):

Actions #13

Updated by Laura Flores about 1 month ago

/a/teuthology-2024-03-29_02:08:11-upgrade-squid-distro-default-smithi/7629115

Actions #14

Updated by Venky Shankar about 1 month ago

Laura Flores wrote:

Looks like a case of this tracker:

/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7616007
[...]

This run and the one from the previous comment has inline-data enabled (for which cephfs has dropped support and asked users to stop using it). It's likely that it's causing the failures seen here. I've a custom qa branch that disables inline-data tests which I plan to put to test. If these failures don't show up in that run, then we'll have to expedite @mchangir's PR (for tracker #52916) so that we can drop those bits from cephfs entirely.

Actions #15

Updated by Xiubo Li about 1 month ago

Venky Shankar wrote:

Laura Flores wrote:

Looks like a case of this tracker:

/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7616007
[...]

This run and the one from the previous comment has inline-data enabled (for which cephfs has dropped support and asked users to stop using it). It's likely that it's causing the failures seen here. I've a custom qa branch that disables inline-data tests which I plan to put to test. If these failures don't show up in that run, then we'll have to expedite @mchangir's PR (for tracker #52916) so that we can drop those bits from cephfs entirely.

Sounds good.

Actions #16

Updated by Laura Flores 24 days ago

/a/yuriw-2024-04-09_14:58:21-upgrade-wip-yuri4-testing-2024-04-08-1432-distro-default-smithi/7648870

Actions #17

Updated by Xiubo Li 12 days ago

Laura Flores wrote in #note-16:

/a/yuriw-2024-04-09_14:58:21-upgrade-wip-yuri4-testing-2024-04-08-1432-distro-default-smithi/7648870

Actually this should be a same or similar issue with https://tracker.ceph.com/issues/50260.

Actions #18

Updated by Laura Flores 3 days ago

/a/lflores-2024-04-29_20:31:53-upgrade-wip-yuriw-testing-20240419.185239-main-distro-default-smithi/7679200

Actions

Also available in: Atom PDF