Bug #2846
closedMalformed keyring file causes kernel null pointer deref on "rbd map"
0%
Description
Reported by Sylvain Munaut ("tnt" on OFTC):
(12:30:27) tnt: Is mounting a RBD on a machine that has an OSD supposed to kernel panic ?
(12:33:20) fghaas: I'd say kernel panics are never supposed to happen
(12:33:49) fghaas: however are you sure it's a panic and not just a stack trace?
(12:33:49) tnt: http://pastebin.com/RgnapSvk
(12:33:56) tnt: Well the machine is dead
(12:34:34) tnt: and it's 100% reproductible on that setup ... (I rebooted it and retried a rbd map and bam ... dead again)
(12:34:41) fghaas: is this right after "rbd map" or after actually opening your /dev/rbdX?
(12:35:12) tnt: right after rbd map. I don't even see it returning, the ssh connection dies before that.
(12:35:47) fghaas: 0.48, ubuntu 3.2.0 kernel? does this also occur when you're mapping from a different box?
(12:37:38) tnt: It's 0.48 and a up to date ubuntu 12.04 ( so yes, 3.2.0-24-generic kernel ). I'll try from another box.
(12:42:10) tnt: fghaas: seems to work fine from another box
(12:42:36) fghaas: ok, that's what I thought. I'd be surprised if it didn't, it's been working fine for me that way
(12:42:45) fghaas: never tried mounting from an osd though
(12:43:15) fghaas: now it does say somewhere that you shouldn't be mounting cephfs from a ceph cluster nodes, but afair nothing about rbd
(12:43:38) fghaas: but the problem seems to be in libceph, which both use
(12:44:14) tnt: strangly I used the manual "echo ..." method and ... it worked fine.
(12:44:38) fghaas: you mean via the sysfs file?
(12:45:36) tnt: Yup ...
(12:45:40) fghaas: what's your exact "rbd map" command line?
(12:45:52) tnt: rbd map es-data-0 --pool rbd --name client.admin --secret /etc/ceph/keyring
(12:46:04) tnt: and for the echo I do:
(12:46:09) tnt: echo "10.208.3.67 name=admin,secret=AQDgARFQ8MzbLxAA5bouqkjb26sZk5eGqfkKcQ== rbd es-data-0" > /sys/bus/rbd/add
(12:47:07) fghaas: um, wait. it's not supposed to work with --secret pointing to the standard keyring file
(12:47:46) fghaas: do echo AQDgARFQ8MzbLxAA5bouqkjb26sZk5eGqfkKcQ== > /tmp/secret, and then rbd map es-data-0 --pool rbd --name client.admin --secret /tmp/secret
(12:48:09) tnt: Ah yes indeed, this way it works !
Files
Updated by Sylvain Munaut over 11 years ago
I was pointing to a keyring file directly that happened to start with an empty line. So in rbd.cc, the function read_secret_from_file only read the first line (which was empty) and was validated as proper base64 by set_kernel_secret.
So the first fix is to the user space program to report empty secret as errors. (If the user specified a secret, he probably wanted a non-empty one).
Then of course bad input to the sys file should not trigger a panic. I'm still tracing that ...
As a side note when reading the rbd.cc code, I stumbled at line 698 on :
char key_name[strlen(user) + strlen("client.")];
snprintf(key_name, sizeof(key_name), "client.%s", user);
It seems key_name is gonna be 1 char too short ... it should be
char key_name[strlen(user) + strlen("client.")+1];
snprintf(key_name, sizeof(key_name), "client.%s", user);
Patches will follow when I confirmed they work.
Updated by Sylvain Munaut over 11 years ago
- File fix-empty-secret.diff fix-empty-secret.diff added
- File fix-off-by-one-keyname.diff fix-off-by-one-keyname.diff added
wrt to kernel crash, here's a minimal test case that will crash any machine that has rbd module loaded (works as user, no need to be root):
@
/* Compile with gcc rbd-crash.c -o rbd-crash -lkeyutils */
#include <sys/types.h>
#include <keyutils.h>
int main(void)
{
const char pl4 = { 0, 0, 0, 0 };
add_key("user", "bug", pl, 0, KEY_SPEC_PROCESS_KEYRING);
return 0;
}
@
Updated by Sylvain Munaut over 11 years ago
Damnit ... first it didn't take the formatting and second I pasted the wrong code :p
#include <sys/types.h> #include <keyutils.h> int main(void) { const char pl[4] = { 0, 0, 0, 0 }; add_key("ceph", "bug", pl, 4, KEY_SPEC_PROCESS_KEYRING); return 0; }
Updated by Sylvain Munaut over 11 years ago
- File fix-rbd-client fix-rbd-client added
Ok, I finally know the failing path.
So when you call add_key with an invalid payload, it will be parsed by ceph_key_instantiate in linux/net/ceph/crypto.c
During the decoding, it will be correctly identified as a bad payload and will return -EINVAL
BUT the destroy method will still be called by the kernel and so it need to be resilient to the case where payload.data is NULL. Note that I think payload.data itself should be freed as well.
Patch attached.
Updated by Sage Weil over 11 years ago
- Assignee set to Sage Weil
- Priority changed from Normal to Urgent
Updated by Sage Weil over 11 years ago
- Status changed from New to 12
kernel patch is in testing branch.
Updated by Sage Weil over 11 years ago
- Status changed from 12 to Resolved
userland fixes applied to stable, next.
thanks!