Project

General

Profile

Actions

Bug #40362

open

amending caps with "ceph auth caps" vs using an existing client

Added by Ilya Dryomov almost 5 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
rbd
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

$ ceph osd pool create pool1 4
pool 'pool1' created
$ ceph osd pool create pool2 4
pool 'pool2' created
$ ceph auth add client.foo mon 'allow r' osd 'allow rwx pool=pool1'
added key for client.foo
$ ceph auth get-or-create client.foo >keyring
$ rbd create --id foo --keyring keyring --size 1 pool1/img
$ sudo rbd map --id foo --keyring keyring pool1/img
/dev/rbd0

$ ceph auth caps client.foo mon 'allow r' osd 'allow rwx pool=pool1,allow rwx pool=pool2'
updated caps for client.foo
$ rbd create --id foo --keyring keyring --size 1 pool2/img
$ sudo rbd map --id foo --keyring keyring pool2/img
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (1) Operation not permitted

This is because the existing libceph instance (created for /dev/rbd0) has tickets that it considers valid and, more importantly, has open connections to some or all OSDs. Invalidating tickets when using an existing libceph instance won't help unless we also force reset all OSD connections, which is probably not a good idea.

Actions #1

Updated by Ilya Dryomov almost 5 years ago

If we simply invalidate, things become super confusing. If the OSD that pool2/img header resides on is open, "rbd map" fails as above. If that OSD isn't open, we open it with the new authorizer and "rbd map" succeeds, but further I/O sporadically fails:

$ xfs_io -d -c 'pwrite 28M 512' /dev/rbd1  # to the OSD that isn't open
wrote 512/512 bytes at offset 29360128
512.000000 bytes, 1 ops; 0.0118 sec (42.102 KiB/sec and 84.2034 ops/sec)
$ xfs_io -d -c 'pwrite 32M 512' /dev/rbd1  # to the OSD that is open
pwrite: Input/output error

[ 1785.481381] rbd: rbd1: write at objno 8 0~512 result -1
[ 1785.482762] print_req_error: I/O error, dev rbd1, sector 65536 flags 8801
Actions #2

Updated by Ilya Dryomov almost 5 years ago

Note that these irregular I/O errors can occur today (i.e. with no invalidation in the kernel client) if enough time passes after "ceph auth caps" invocation. The service tickets are rotated every hour, so the kernel client gets a new authorizer after at most an hour anyway and the user gets the ability to interact with a subset of OSDs (read: attempt to map and if that succeeds experience EPERM errors on random parts of the image). This subset is dynamic: it expands with time as OSD sessions get closed due to inactivity or get reset so after a while the image can "clear up" entirely.

Actions #3

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions #4

Updated by Ilya Dryomov over 4 years ago

  • Assignee deleted (Ilya Dryomov)
Actions

Also available in: Atom PDF