Project

General

Profile

Actions

Bug #17517

closed

ceph-fuse: does not handle ENOMEM during remount

Added by Patrick Donnelly over 7 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client, ceph-fuse
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When running 8 ceph-fuse clients on a single Linode VM with 2GB memory, I observed the clients would die under load with this assertion:

2016-10-05 21:01:32.159767 7fc11de19700 -1 client.15772 tried to remount (to trim kernel dentries) and got error -1
2016-10-05 21:01:32.207265 7fc11de19700 -1 /srv/autobuild-ceph/gitbuilder.git/build/rpmbuild/BUILD/ceph-11.0.0/src/client/Client.cc: In function 'virtual void C_Client_Remount::finish(int)' thread 7fc11de19700 time 2016-10-05 21:01:32.159874
/srv/autobuild-ceph/gitbuilder.git/build/rpmbuild/BUILD/ceph-11.0.0/src/client/Client.cc: 3949: FAILED assert(0 == "failed to remount for kernel dentry trimming")

 ceph version v11.0.0-3132-g1b90ab3 (1b90ab375eaa3dba76c5dda8b504a3c2d77faa58)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x55ae80ccebd5]
 2: (C_Client_Remount::finish(int)+0xdd) [0x55ae80bf9b4d]
 3: (Context::complete(int)+0x9) [0x55ae80bf6b49]
 4: (Finisher::finisher_thread_entry()+0x216) [0x55ae80ccde36]
 5: (()+0x7dc5) [0x7fc12b53edc5]
 6: (clone()+0x6d) [0x7fc12a425ced]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I tracked this down via strace to a failure to fork (https://github.com/ceph/ceph/blob/cf1785881fda92187b4996a9d839100e8513b61a/src/client/fuse_ll.cc#L891):

17719 clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fc11de17e50) = -1 ENOMEM (Cannot allocate memory)

(Not really unexpected to get that error!) This may be a situation where we want to retry or at least give a better error message.


Related issues 1 (0 open1 closed)

Related to CephFS - Bug #22254: client: give more descriptive error message for remount failuresResolvedPatrick Donnelly11/27/2017

Actions
Actions #1

Updated by Patrick Donnelly over 6 years ago

  • Related to Bug #22254: client: give more descriptive error message for remount failures added
Actions #2

Updated by Patrick Donnelly about 6 years ago

  • Subject changed from ceph-fuse does not handle ENOMEM during remount to ceph-fuse: does not handle ENOMEM during remount
  • Status changed from New to Resolved
  • Component(FS) Client added

I'm going to mark this as resolved since handling ENOMEM is generally a waste of time since another operation will fail similarly (or with a segfault).

Actions

Also available in: Atom PDF