Project

General

Profile

Bug #23446

ceph-fuse: getgroups failure causes exception

Added by Patrick Donnelly over 1 year ago. Updated 8 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
03/23/2018
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
ceph-fuse
Labels (FS):
Pull request ID:

Description

Problem described here:

https://github.com/ceph/ceph-csi/pull/30#issuecomment-375331907

2018-03-22 13:47:50.337463 7f9cb0101080  0 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process (unknown), pid 286
2018-03-22 13:47:50.341988 7f9cb0101080 -1 init, newargv = 0x560e639a3f20 newargc=9
ceph-fuse[286]: starting ceph client
ceph-fuse[286]: starting fuse
terminate called after throwing an instance of 'std::bad_array_new_length'
  what():  std::bad_array_new_length
*** Caught signal (Aborted) **
 in thread 7f9ca21b9700 thread_name:ceph-fuse
 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)
 1: (()+0x6d4784) [0x560e5a280784]
 2: (()+0x11390) [0x7f9caee89390]
 3: (gsignal()+0x38) [0x7f9cadc15428]
 4: (abort()+0x16a) [0x7f9cadc1702a]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f9cae55884d]
 6: (()+0x8d6b6) [0x7f9cae5566b6]
 7: (()+0x8d701) [0x7f9cae556701]
 8: (()+0x8d919) [0x7f9cae556919]
 9: (()+0x8c662) [0x7f9cae555662]
 10: (()+0x23ef1b) [0x560e59deaf1b]
 11: (Client::_opendir(Inode*, dir_result_t**, UserPerm const&)+0x5e) [0x560e59e3112e]
 12: (Client::ll_opendir(Inode*, int, dir_result_t**, UserPerm const&)+0xd5) [0x560e59e31935]
 13: (()+0x213917) [0x560e59dbf917]
 14: (()+0x15f4c) [0x7f9cafa53f4c]
 15: (()+0x15679) [0x7f9cafa53679]
 16: (()+0x11e38) [0x7f9cafa4fe38]
 17: (()+0x76ba) [0x7f9caee7f6ba]
 18: (clone()+0x6d) [0x7f9cadce741d]
2018-03-22 13:47:57.499793 7f9ca21b9700 -1 *** Caught signal (Aborted) **
 in thread 7f9ca21b9700 thread_name:ceph-fuse

Looks like gid_count in UserPerm gets set to a negative value. Most likely cause appears to be here:

https://github.com/ceph/ceph/blob/44f16c903a88bd02cdf27100eaeb47dd67a1d802/src/client/fuse_ll.cc#L142

Note: problem does not exist on Jewel.


Related issues

Copied to fs - Backport #23638: luminous: ceph-fuse: getgroups failure causes exception Resolved

History

#1 Updated by Jeff Layton over 1 year ago

Looks like bad error handling. Here's getgroups:

static int getgroups(fuse_req_t req, gid_t **sgids)
{
#if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 8)
  assert(sgids);
  int c = fuse_req_getgroups(req, 0, NULL);
  if (c < 0) {
    return c;
  }
  if (c == 0) {
    return 0;
  }

  gid_t *gids = new (std::nothrow) gid_t[c];
  if (!gids) {
    return -ENOMEM;
  }
  c = fuse_req_getgroups(req, c, gids);
  if (c < 0) {
    delete[] gids;
  } else {
    *sgids = gids;
  }
  return c;
#endif
  return -ENOSYS;
}

...and the GET_GROUPS macro doesn't check for errors (negative return code) from getgroups(). Probably we want to just check for that and return an error if it occurs.

#2 Updated by Jeff Layton over 1 year ago

Ok, draft patch is building in shaman now. It should make ceph-fuse send the error back to the kernel when this occurs and prevent the crash.

That may not really help the person who reported this though. Since this happened inside docker, I'm assuming it ran afoul of this in some fashion:

 *                                                                              
 * The current fuse kernel module in linux (as of 2.6.30) doesn't pass          
 * the group list to userspace, hence this function needs to parse              
 * "/proc/$TID/task/$TID/status" to get the group IDs.                          
 *                                                                              

So, the ceph.conf may need to set fuse_set_user_groups to "false" in order to function inside a container.

#3 Updated by Jeff Layton over 1 year ago

  • Status changed from New to In Progress

#5 Updated by Patrick Donnelly about 1 year ago

  • Status changed from In Progress to Pending Backport

#6 Updated by Nathan Cutler about 1 year ago

  • Copied to Backport #23638: luminous: ceph-fuse: getgroups failure causes exception added

#8 Updated by Jeff Layton 8 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF