Project

General

Profile

Actions

Bug #57656

open

[testing] dbench: write failed on handle 10009 (Resource temporarily unavailable)

Added by Patrick Donnelly over 1 year ago. Updated 13 days ago.

Status:
Need More Info
Priority:
High
Assignee:
Category:
fs/ceph
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

When testing with my postgres changes:

https://github.com/ceph/ceph/labels/wip-pdonnell-testing2

I've observed failures with the kernel testing branch. The set of PRs notably introduce periodic snapshots to fs:workload which seems to catch this behavior:

2022-09-22T14:04:02.650 INFO:tasks.workunit.client.0.smithi124.stdout:[1415095] write failed on handle 10009 (Resource temporarily unavailable)
2022-09-22T14:04:02.650 INFO:tasks.workunit.client.0.smithi124.stdout:Child failed with status 1

See also: https://sentry.ceph.com/organizations/ceph/issues/29106/events/01bb0d49cff145f8bf9aac52b2065400/events/?project=2

I've not gone through all of these events but the failures seem to share: (a) testing branch; (b) snapshots turned on.


Related issues 1 (1 open0 closed)

Related to CephFS - Bug #51410: kclient: fails to finish reconnect during MDS thrashing (testing branch)New

Actions
Actions #1

Updated by Patrick Donnelly over 1 year ago

/ceph/teuthology-archive/pdonnell-2022-09-26_19:11:10-fs-wip-pdonnell-testing-20220923.171109-distro-default-smithi/7044485/teuthology.log

/ceph/teuthology-archive/pdonnell-2022-09-26_19:11:10-fs-wip-pdonnell-testing-20220923.171109-distro-default-smithi/7044491/teuthology.log

Actions #2

Updated by Xiubo Li over 1 year ago

  • Status changed from New to In Progress
Actions #3

Updated by Xiubo Li over 1 year ago

Another failure is :

2022-09-26T20:03:01.601 INFO:tasks.workunit.client.0.smithi066.stdout:Wrote -1 instead of 4096 bytes.
2022-09-26T20:03:01.601 INFO:tasks.workunit.client.0.smithi066.stdout:Probably out of disk space
2022-09-26T20:03:01.602 INFO:tasks.workunit.client.0.smithi066.stderr:write: Resource temporarily unavailable
2022-09-26T20:03:01.639 DEBUG:teuthology.orchestra.run:got remote process result: 1
2022-09-26T20:03:01.640 INFO:tasks.workunit:Stopping ['suites/ffsb.sh'] on client.0...

This is kclient, and I didn't see any error in kernel logs. And also checked the mds, osd logs, find nothing!

This should be a known issue: https://tracker.ceph.com/issues/51410

Patrick,

Could you reproduce this ?

Thanks!

Actions #4

Updated by Xiubo Li over 1 year ago

  • Related to Bug #51410: kclient: fails to finish reconnect during MDS thrashing (testing branch) added
Actions #5

Updated by Xiubo Li over 1 year ago

  • Status changed from In Progress to Need More Info

Today I spent more than half day to read the mds, osd side logs, but still couldn't find any suspect logs. Usually if we could some logs from kernel, it will be easy to know what exactly happen as the one in a previous tracker https://tracker.ceph.com/issues/54461.

I tried but couldn't reproduce it locally! Let's wait new ones to see whether could we get any useful info then.

Actions #8

Updated by Patrick Donnelly 24 days ago

/teuthology/pdonnell-2024-03-24_04:56:01-fs-wip-batrick-testing-20240323.003144-squid-distro-default-smithi/7619048/teuthology.log

Actions #9

Updated by Venky Shankar 13 days ago

Actions

Also available in: Atom PDF