Project

General

Profile

Bug #57656

[testing] dbench: write failed on handle 10009 (Resource temporarily unavailable)

Added by Patrick Donnelly 2 months ago. Updated about 1 month ago.

Status:
Need More Info
Priority:
High
Assignee:
Category:
fs/ceph
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

When testing with my postgres changes:

https://github.com/ceph/ceph/labels/wip-pdonnell-testing2

I've observed failures with the kernel testing branch. The set of PRs notably introduce periodic snapshots to fs:workload which seems to catch this behavior:

2022-09-22T14:04:02.650 INFO:tasks.workunit.client.0.smithi124.stdout:[1415095] write failed on handle 10009 (Resource temporarily unavailable)
2022-09-22T14:04:02.650 INFO:tasks.workunit.client.0.smithi124.stdout:Child failed with status 1

See also: https://sentry.ceph.com/organizations/ceph/issues/29106/events/01bb0d49cff145f8bf9aac52b2065400/events/?project=2

I've not gone through all of these events but the failures seem to share: (a) testing branch; (b) snapshots turned on.


Related issues

Related to CephFS - Bug #51410: kclient: fails to finish reconnect during MDS thrashing (testing branch) New

History

#1 Updated by Patrick Donnelly 2 months ago

/ceph/teuthology-archive/pdonnell-2022-09-26_19:11:10-fs-wip-pdonnell-testing-20220923.171109-distro-default-smithi/7044485/teuthology.log

/ceph/teuthology-archive/pdonnell-2022-09-26_19:11:10-fs-wip-pdonnell-testing-20220923.171109-distro-default-smithi/7044491/teuthology.log

#2 Updated by Xiubo Li about 2 months ago

  • Status changed from New to In Progress

#3 Updated by Xiubo Li about 2 months ago

Another failure is :

2022-09-26T20:03:01.601 INFO:tasks.workunit.client.0.smithi066.stdout:Wrote -1 instead of 4096 bytes.
2022-09-26T20:03:01.601 INFO:tasks.workunit.client.0.smithi066.stdout:Probably out of disk space
2022-09-26T20:03:01.602 INFO:tasks.workunit.client.0.smithi066.stderr:write: Resource temporarily unavailable
2022-09-26T20:03:01.639 DEBUG:teuthology.orchestra.run:got remote process result: 1
2022-09-26T20:03:01.640 INFO:tasks.workunit:Stopping ['suites/ffsb.sh'] on client.0...

This is kclient, and I didn't see any error in kernel logs. And also checked the mds, osd logs, didn't find nothing!

This should be a known issue: https://tracker.ceph.com/issues/51410

Patrick,

Could you reproduce this ?

Thanks!

#4 Updated by Xiubo Li about 2 months ago

  • Related to Bug #51410: kclient: fails to finish reconnect during MDS thrashing (testing branch) added

#5 Updated by Xiubo Li about 1 month ago

  • Status changed from In Progress to Need More Info

Today I spent more than half day to read the mds, osd side logs, but still couldn't find any suspect logs. Usually if we could some logs from kernel, it will be easy to know what exactly happen as the one in a previous tracker https://tracker.ceph.com/issues/54461.

I tried but couldn't reproduce it locally! Let's wait new ones to see whether could we get any useful info then.

Also available in: Atom PDF