Project

General

Profile

Actions

Bug #63473

open

fsstressh.sh fails with errno 124

Added by Rishabh Dave 6 months ago. Updated 4 months ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
Backport:
quincy,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

https://pulpito.ceph.com/rishabh-2023-11-04_04:30:51-fs-rishabh-2023nov3-testing-default-smithi/7447198

Copying relevant entries -

2023-11-04T06:58:04.178 INFO:tasks.workunit.client.0.smithi006.stdout:9/996: rmdir d4/d5/db/d16/da2/d46/d86 39
2023-11-04T06:58:04.179 INFO:tasks.workunit.client.0.smithi006.stdout:9/997: stat d4/d5/db/d16/da2/dfb 0
2023-11-04T06:58:04.179 INFO:tasks.workunit.client.0.smithi006.stdout:9/998: creat d4/d8/d9/da0/f13a x:0 0 0
2023-11-04T06:58:04.179 INFO:tasks.workunit.client.0.smithi006.stdout:9/999: symlink d4/d8/d9/d67/d9b/d40/l13b 0
2023-11-04T06:58:04.185 INFO:tasks.workunit.client.0.smithi006.stderr:+ rm -rf -- ./tmp.DkKJUm7uTu
2023-11-04T06:59:00.099 INFO:tasks.workunit:Stopping ['suites/fsstress.sh'] on client.0...
2023-11-04T06:59:00.099 DEBUG:teuthology.orchestra.run.smithi006:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2023-11-04T09:56:47.707 DEBUG:teuthology.orchestra.run:got remote process result: 124
2023-11-04T09:56:47.709 INFO:tasks.workunit:Stopping ['suites/fsstress.sh'] on client.1...

The cause of this failure is not completely apparent. But none of the PRs in the testing batch seems to be related to this.

Actions #1

Updated by Venky Shankar 5 months ago

  • Category set to Correctness/Safety
  • Assignee set to Xiubo Li
  • Target version set to v19.0.0

Rishabh, could you please link the failed job here?

Actions #2

Updated by Venky Shankar 5 months ago

  • Status changed from New to Triaged
  • Backport set to quincy,reef
Actions #3

Updated by Rishabh Dave 5 months ago

  • Description updated (diff)
Actions #4

Updated by Xiubo Li 4 months ago

  • Status changed from Triaged to Need More Info
Actions #5

Updated by Xiubo Li 4 months ago

  • Status changed from Need More Info to In Progress
Actions #6

Updated by Xiubo Li 4 months ago

From the mds.0 logs, the client client.24371 infinitely sending the same client request as below:

2023-11-04T07:04:34.446+0000 7f3244beb700  1 -- [v2:172.21.15.6:6826/283082234,v1:172.21.15.6:6827/283082234] <== client.24371 v1:192.168.0.1:0/419062970 14825 ==== client_request(client.24371:1273772 getattr Fsr #0x200000027d7 2023-11-04T07:04:34.388233+0000 caller_uid=1000, caller_gid=1285{6,36,1000,1285,}) v6 ==== 188+0+0 (unknown 3615823349 0 0) 0x55d1965b8580 con 0x55d193e5dc00
2023-11-04T07:04:34.446+0000 7f3244beb700  4 mds.0.server handle_client_request client_request(client.24371:1273772 getattr Fsr #0x200000027d7 2023-11-04T07:04:34.388233+0000 caller_uid=1000, caller_gid=1285{6,36,1000,1285,}) v6
2023-11-04T07:04:34.446+0000 7f3244beb700 20 mds.0.4 get_session have 0x55d193e47400 client.24371 v1:192.168.0.1:0/419062970 state open
...
2023-11-04T07:04:34.446+0000 7f3244beb700 10 mds.0.server reply to stat on client_request(client.24371:1273772 getattr Fsr #0x200000027d7 2023-11-04T07:04:34.388233+0000 caller_uid=1000, caller_gid=1285{6,36,1000,1285,}) v6
2023-11-04T07:04:34.446+0000 7f3244beb700 20 mds.0.server respond_to_request batch head request(client.24371:1273772 nref=3 cr=0x55d1965b8580)
2023-11-04T07:04:34.446+0000 7f3244beb700 20 respond: responding to batch ops with result=0: [batch front=request(client.24371:1273772 nref=3 cr=0x55d1965b8580)]
2023-11-04T07:04:34.446+0000 7f3244beb700  7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.24371:1273772 getattr Fsr #0x200000027d7 2023-11-04T07:04:34.388233+0000 caller_uid=1000, caller_gid=1285{6,36,1000,1285,}) v6
...
2023-11-04T07:12:06.441+0000 7fc991ffe700  1 -- [v2:172.21.15.6:6826/2440777638,v1:172.21.15.6:6827/2440777638] <== client.24371 v1:192.168.0.1:0/419062970 881830 ==== client_request(client.24371:2261584 getattr Fsr #0x200000027d7 2023-11-04T07:12:06.441976+0000 caller_uid=1000, caller_gid=1285{6,36,1000,1285,}) v6 ==== 200+0+0 (unknown 1071190131 0 0) 0x55e9c337cf00 con 0x55e9bc3fbc00
...

But I didn't see the RETRY= logs for this client request, it seems not being triggered by retrying the request from the kclient. The bad news is that there is no any useful logs from the kclient.

Actions #7

Updated by Xiubo Li 4 months ago

I am afraid this is also a messenger issue similar to https://tracker.ceph.com/issues/63586.

Actions

Also available in: Atom PDF