Bug #23446
closedceph-fuse: getgroups failure causes exception
0%
Description
Problem described here:
https://github.com/ceph/ceph-csi/pull/30#issuecomment-375331907
2018-03-22 13:47:50.337463 7f9cb0101080 0 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process (unknown), pid 286 2018-03-22 13:47:50.341988 7f9cb0101080 -1 init, newargv = 0x560e639a3f20 newargc=9 ceph-fuse[286]: starting ceph client ceph-fuse[286]: starting fuse terminate called after throwing an instance of 'std::bad_array_new_length' what(): std::bad_array_new_length *** Caught signal (Aborted) ** in thread 7f9ca21b9700 thread_name:ceph-fuse ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable) 1: (()+0x6d4784) [0x560e5a280784] 2: (()+0x11390) [0x7f9caee89390] 3: (gsignal()+0x38) [0x7f9cadc15428] 4: (abort()+0x16a) [0x7f9cadc1702a] 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f9cae55884d] 6: (()+0x8d6b6) [0x7f9cae5566b6] 7: (()+0x8d701) [0x7f9cae556701] 8: (()+0x8d919) [0x7f9cae556919] 9: (()+0x8c662) [0x7f9cae555662] 10: (()+0x23ef1b) [0x560e59deaf1b] 11: (Client::_opendir(Inode*, dir_result_t**, UserPerm const&)+0x5e) [0x560e59e3112e] 12: (Client::ll_opendir(Inode*, int, dir_result_t**, UserPerm const&)+0xd5) [0x560e59e31935] 13: (()+0x213917) [0x560e59dbf917] 14: (()+0x15f4c) [0x7f9cafa53f4c] 15: (()+0x15679) [0x7f9cafa53679] 16: (()+0x11e38) [0x7f9cafa4fe38] 17: (()+0x76ba) [0x7f9caee7f6ba] 18: (clone()+0x6d) [0x7f9cadce741d] 2018-03-22 13:47:57.499793 7f9ca21b9700 -1 *** Caught signal (Aborted) ** in thread 7f9ca21b9700 thread_name:ceph-fuse
Looks like gid_count in UserPerm gets set to a negative value. Most likely cause appears to be here:
Note: problem does not exist on Jewel.
Updated by Jeff Layton about 6 years ago
Looks like bad error handling. Here's getgroups:
static int getgroups(fuse_req_t req, gid_t **sgids) { #if FUSE_VERSION >= FUSE_MAKE_VERSION(2, 8) assert(sgids); int c = fuse_req_getgroups(req, 0, NULL); if (c < 0) { return c; } if (c == 0) { return 0; } gid_t *gids = new (std::nothrow) gid_t[c]; if (!gids) { return -ENOMEM; } c = fuse_req_getgroups(req, c, gids); if (c < 0) { delete[] gids; } else { *sgids = gids; } return c; #endif return -ENOSYS; }
...and the GET_GROUPS macro doesn't check for errors (negative return code) from getgroups(). Probably we want to just check for that and return an error if it occurs.
Updated by Jeff Layton about 6 years ago
Ok, draft patch is building in shaman now. It should make ceph-fuse send the error back to the kernel when this occurs and prevent the crash.
That may not really help the person who reported this though. Since this happened inside docker, I'm assuming it ran afoul of this in some fashion:
* * The current fuse kernel module in linux (as of 2.6.30) doesn't pass * the group list to userspace, hence this function needs to parse * "/proc/$TID/task/$TID/status" to get the group IDs. *
So, the ceph.conf may need to set fuse_set_user_groups to "false" in order to function inside a container.
Updated by Jeff Layton about 6 years ago
- Status changed from New to In Progress
Pull request here:
https://github.com/ceph/ceph/pull/21025
Updated by Jeff Layton about 6 years ago
New PR here:
Updated by Patrick Donnelly about 6 years ago
- Status changed from In Progress to Pending Backport
Updated by Nathan Cutler about 6 years ago
- Copied to Backport #23638: luminous: ceph-fuse: getgroups failure causes exception added
Updated by Jeff Layton over 5 years ago
- Status changed from Pending Backport to Resolved