Bug #2846: Malformed keyring file causes kernel null pointer deref on "rbd map" - Ceph - Ceph

Actions

Copy link

Bug #2846

closed

Malformed keyring file causes kernel null pointer deref on "rbd map"

Added by Florian Haas over 11 years ago. Updated over 11 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Sage Weil

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Reported by Sylvain Munaut ("tnt" on OFTC):

(12:30:27) tnt: Is mounting a RBD on a machine that has an OSD supposed to kernel panic ?
(12:33:20) fghaas: I'd say kernel panics are never supposed to happen
(12:33:49) fghaas: however are you sure it's a panic and not just a stack trace?
(12:33:49) tnt: http://pastebin.com/RgnapSvk
(12:33:56) tnt: Well the machine is dead
(12:34:34) tnt: and it's 100% reproductible on that setup ... (I rebooted it and retried a rbd map and bam ... dead again)
(12:34:41) fghaas: is this right after "rbd map" or after actually opening your /dev/rbdX?
(12:35:12) tnt: right after rbd map. I don't even see it returning, the ssh connection dies before that.
(12:35:47) fghaas: 0.48, ubuntu 3.2.0 kernel? does this also occur when you're mapping from a different box?
(12:37:38) tnt: It's 0.48 and a up to date ubuntu 12.04 ( so yes, 3.2.0-24-generic kernel ). I'll try from another box.
(12:42:10) tnt: fghaas: seems to work fine from another box
(12:42:36) fghaas: ok, that's what I thought. I'd be surprised if it didn't, it's been working fine for me that way
(12:42:45) fghaas: never tried mounting from an osd though
(12:43:15) fghaas: now it does say somewhere that you shouldn't be mounting cephfs from a ceph cluster nodes, but afair nothing about rbd
(12:43:38) fghaas: but the problem seems to be in libceph, which both use
(12:44:14) tnt: strangly I used the manual "echo ..." method and ... it worked fine.
(12:44:38) fghaas: you mean via the sysfs file?
(12:45:36) tnt: Yup ...
(12:45:40) fghaas: what's your exact "rbd map" command line?
(12:45:52) tnt: rbd map es-data-0 --pool rbd --name client.admin --secret /etc/ceph/keyring
(12:46:04) tnt: and for the echo I do:
(12:46:09) tnt: echo "10.208.3.67 name=admin,secret=AQDgARFQ8MzbLxAA5bouqkjb26sZk5eGqfkKcQ== rbd es-data-0" > /sys/bus/rbd/add
(12:47:07) fghaas: um, wait. it's not supposed to work with --secret pointing to the standard keyring file
(12:47:46) fghaas: do echo AQDgARFQ8MzbLxAA5bouqkjb26sZk5eGqfkKcQ== > /tmp/secret, and then rbd map es-data-0 --pool rbd --name client.admin --secret /tmp/secret
(12:48:09) tnt: Ah yes indeed, this way it works !

Files

Download all files

fix-empty-secret.diff (551 Bytes) fix-empty-secret.diff	Fix empty secret (i.e. wrong secret file) handling	Sylvain Munaut, 07/26/2012 08:05 AM
fix-off-by-one-keyname.diff (447 Bytes) fix-off-by-one-keyname.diff	Fix off by one in constructing the key name	Sylvain Munaut, 07/26/2012 08:05 AM
fix-rbd-client (632 Bytes) fix-rbd-client	Fix RBD client by making ceph_crypto_key_destroy resilient to NULL and fix a memory leak as well.	Sylvain Munaut, 07/26/2012 08:26 AM

Actions

Copy link

Updated by Sylvain Munaut over 11 years ago

I was pointing to a keyring file directly that happened to start with an empty line. So in rbd.cc, the function read_secret_from_file only read the first line (which was empty) and was validated as proper base64 by set_kernel_secret.

So the first fix is to the user space program to report empty secret as errors. (If the user specified a secret, he probably wanted a non-empty one).

Then of course bad input to the sys file should not trigger a panic. I'm still tracing that ...

As a side note when reading the rbd.cc code, I stumbled at line 698 on :

char key_name[strlen(user) + strlen("client.")]; snprintf(key_name, sizeof(key_name), "client.%s", user);

It seems key_name is gonna be 1 char too short ... it should be

char key_name[strlen(user) + strlen("client.")+1]; snprintf(key_name, sizeof(key_name), "client.%s", user);

Patches will follow when I confirmed they work.

Actions

Copy link Download all files

Updated by Sylvain Munaut over 11 years ago

File fix-empty-secret.diff fix-empty-secret.diff added
File fix-off-by-one-keyname.diff fix-off-by-one-keyname.diff added

wrt to kernel crash, here's a minimal test case that will crash any machine that has rbd module loaded (works as user, no need to be root):

@
/* Compile with gcc rbd-crash.c -o rbd-crash -lkeyutils */

#include <sys/types.h>
#include <keyutils.h>

int main(void) {
const char pl⁴ = { 0, 0, 0, 0 };

add_key("user", "bug", pl, 0, KEY_SPEC_PROCESS_KEYRING);

return 0;
}
@

Actions

Copy link

Updated by Sylvain Munaut over 11 years ago

Damnit ... first it didn't take the formatting and second I pasted the wrong code :p

#include <sys/types.h>
#include <keyutils.h>

int main(void)
{
    const char pl[4] = { 0, 0, 0, 0 };

    add_key("ceph", "bug", pl, 4, KEY_SPEC_PROCESS_KEYRING);

    return 0;
}

Actions

Copy link

Updated by Sylvain Munaut over 11 years ago

File fix-rbd-client fix-rbd-client added

Ok, I finally know the failing path.

So when you call add_key with an invalid payload, it will be parsed by ceph_key_instantiate in linux/net/ceph/crypto.c
During the decoding, it will be correctly identified as a bad payload and will return -EINVAL

BUT the destroy method will still be called by the kernel and so it need to be resilient to the case where payload.data is NULL. Note that I think payload.data itself should be freed as well.

Patch attached.

Actions

Copy link