Project

General

Profile

Actions

Bug #23446

closed

ceph-fuse: getgroups failure causes exception

Added by Patrick Donnelly about 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
ceph-fuse
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Problem described here:

https://github.com/ceph/ceph-csi/pull/30#issuecomment-375331907

2018-03-22 13:47:50.337463 7f9cb0101080  0 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process (unknown), pid 286
2018-03-22 13:47:50.341988 7f9cb0101080 -1 init, newargv = 0x560e639a3f20 newargc=9
ceph-fuse[286]: starting ceph client
ceph-fuse[286]: starting fuse
terminate called after throwing an instance of 'std::bad_array_new_length'
  what():  std::bad_array_new_length
*** Caught signal (Aborted) **
 in thread 7f9ca21b9700 thread_name:ceph-fuse
 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)
 1: (()+0x6d4784) [0x560e5a280784]
 2: (()+0x11390) [0x7f9caee89390]
 3: (gsignal()+0x38) [0x7f9cadc15428]
 4: (abort()+0x16a) [0x7f9cadc1702a]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f9cae55884d]
 6: (()+0x8d6b6) [0x7f9cae5566b6]
 7: (()+0x8d701) [0x7f9cae556701]
 8: (()+0x8d919) [0x7f9cae556919]
 9: (()+0x8c662) [0x7f9cae555662]
 10: (()+0x23ef1b) [0x560e59deaf1b]
 11: (Client::_opendir(Inode*, dir_result_t**, UserPerm const&)+0x5e) [0x560e59e3112e]
 12: (Client::ll_opendir(Inode*, int, dir_result_t**, UserPerm const&)+0xd5) [0x560e59e31935]
 13: (()+0x213917) [0x560e59dbf917]
 14: (()+0x15f4c) [0x7f9cafa53f4c]
 15: (()+0x15679) [0x7f9cafa53679]
 16: (()+0x11e38) [0x7f9cafa4fe38]
 17: (()+0x76ba) [0x7f9caee7f6ba]
 18: (clone()+0x6d) [0x7f9cadce741d]
2018-03-22 13:47:57.499793 7f9ca21b9700 -1 *** Caught signal (Aborted) **
 in thread 7f9ca21b9700 thread_name:ceph-fuse

Looks like gid_count in UserPerm gets set to a negative value. Most likely cause appears to be here:

https://github.com/ceph/ceph/blob/44f16c903a88bd02cdf27100eaeb47dd67a1d802/src/client/fuse_ll.cc#L142

Note: problem does not exist on Jewel.


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #23638: luminous: ceph-fuse: getgroups failure causes exceptionResolvedPatrick DonnellyActions
Actions #1

Updated by Jeff Layton about 6 years ago

Looks like bad error handling. Here's getgroups:

static int getgroups(fuse_req_t req, gid_t **sgids)
{
#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 8)
  assert(sgids);
  int c = fuse_req_getgroups(req, 0, NULL);
  if (c < 0) {
    return c;
  }
  if (c == 0) {
    return 0;
  }

  gid_t *gids = new (std::nothrow) gid_t[c];
  if (!gids) {
    return -ENOMEM;
  }
  c = fuse_req_getgroups(req, c, gids);
  if (c < 0) {
    delete[] gids;
  } else {
    *sgids = gids;
  }
  return c;
#endif
  return -ENOSYS;
}

...and the GET_GROUPS macro doesn't check for errors (negative return code) from getgroups(). Probably we want to just check for that and return an error if it occurs.

Actions #2

Updated by Jeff Layton about 6 years ago

Ok, draft patch is building in shaman now. It should make ceph-fuse send the error back to the kernel when this occurs and prevent the crash.

That may not really help the person who reported this though. Since this happened inside docker, I'm assuming it ran afoul of this in some fashion:

 *                                                                              
 * The current fuse kernel module in linux (as of 2.6.30) doesn't pass          
 * the group list to userspace, hence this function needs to parse              
 * "/proc/$TID/task/$TID/status" to get the group IDs.                          
 *                                                                              

So, the ceph.conf may need to set fuse_set_user_groups to "false" in order to function inside a container.

Actions #3

Updated by Jeff Layton about 6 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Patrick Donnelly about 6 years ago

  • Status changed from In Progress to Pending Backport
Actions #6

Updated by Nathan Cutler about 6 years ago

  • Copied to Backport #23638: luminous: ceph-fuse: getgroups failure causes exception added
Actions #8

Updated by Jeff Layton over 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF