Bug #37486
closedtmpfs in /var/lib/ceph/osd/X sometimes created with wrong permissions
0%
Description
I've encountered a strange issue on a customer's system today: Some OSDs didn't come up after a reboot, it was somewhat random which ones. I've tracked it down to an issue with the permissions of /var/lib/ceph/osd/ceph-*
The tmpfs was created properly and mounted and all files existed (except the block.db symlink for some reason? probably failed before?) and seemed correct. However, most files had the wrong owner:
/var/lib/ceph/osd/ceph-X
was owned by root:root/var/lib/ceph/osd/ceph-X/block
was owned by ceph:ceph- other files in the directory were owned by root:root
This wouldn't be a problem if the block symlink was also owned by root.
But it causes ceph-volume to fail because of fs.protected_symlinks which prevents dereferencing the symlink in this case (world-writable tmpfs mountpoint owned by root, symlink owned by ceph, target of the symlink owned by root).
Work-around: disable fs.protected_symlinks or chown the directory to ceph:ceph
I have no idea how it got into this situation in the first place. Weird thing is that this only happens on one specific hardware and only with kernel 4.18. It works fine with kernel 4.9 and it works fine with 4.18 on all other hardware that we ever encountered. We boot the exact same image on a lot of servers.
Our image is Debian, but there's nothing special in our boot routine that could cause this, we leave that part to ceph-volume.
Anyways, what is causing these permissions in the first place is beside the point I think. I think we should try to make ceph-volume more robust and handle this case?
Steps to kind of reproduce:
root@x /var/lib/ceph/osd/ceph-14 $ systemctl stop ceph-osd@14 root@x /var/lib/ceph/osd/ceph-14 $ chown root:root . root@x /var/lib/ceph/osd/ceph-14 $ ceph-volume lvm activate --all --> Activating OSD ID 14 FSID 2f8651bb-d404-44bf-b4d2-67c1aa3d5be1 Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-20802f35-15ad-438c-aab1-df5b72325ce1/osd-block-2f8651bb-d404-44bf-b4d2-67c1aa3d5be1 --path /var/lib/ceph/osd/ceph-14 --no-mon-config stderr: error symlinking /var/lib/ceph/osd/ceph-14/block: (13) Permission denied --> RuntimeError: command returned non-zero exit status: 1
I think it should automatically fix the chown here or at least show a better error message. Took me an hour or so to track this down to fs.protected_symlinks
Updated by Alfredo Deza over 5 years ago
- Status changed from New to 12
- Assignee set to Alfredo Deza
Updated by Alfredo Deza over 5 years ago
What version of Ceph was this? There was a problem with permissions a while ago, see issue: http://tracker.ceph.com/issues/24661
Specifically, this changed here helped: https://github.com/ceph/ceph/pull/22462/files#diff-d96a27274e642aeac837f66e2d406dc5R103
Using 13.2.0 I can't see the problem:
root@node9:/var/lib/ceph/osd/ceph-13# ls -alh total 28K drwxrwxrwt 2 ceph ceph 200 Dec 7 23:38 . drwxr-xr-x 55 ceph ceph 4.0K Dec 5 17:04 .. lrwxrwxrwx 1 ceph ceph 99 Dec 7 23:38 block -> /dev/ceph-block-c5019075-26a5-4ccc-af1e-006598f7ee64/osd-block-d64b56fc-db27-4510-81ee-7a224ffdbb62 lrwxrwxrwx 1 root root 106 Dec 7 23:38 block.db -> /dev/ceph-block-dbs-843d6828-7c62-4545-91bb-5a185c0dd829/osd-block-db-d5d5f1ae-6bbf-41da-86ea-4eeeb7be9214 -rw------- 1 ceph ceph 37 Dec 7 23:38 ceph_fsid -rw------- 1 ceph ceph 37 Dec 7 23:38 fsid -rw------- 1 ceph ceph 56 Dec 7 23:38 keyring -rw------- 1 ceph ceph 6 Dec 7 23:38 ready -rw------- 1 ceph ceph 10 Dec 7 23:38 type -rw------- 1 ceph ceph 3 Dec 7 23:38 whoami root@node9:/var/lib/ceph/osd/ceph-13# systemctl stop ceph-osd@13 root@node9:/var/lib/ceph/osd/ceph-13# chown -R root:root . root@node9:/var/lib/ceph/osd/ceph-13# ls -alh total 28K drwxrwxrwt 2 root root 200 Dec 7 23:38 . drwxr-xr-x 55 ceph ceph 4.0K Dec 5 17:04 .. lrwxrwxrwx 1 root root 99 Dec 7 23:38 block -> /dev/ceph-block-c5019075-26a5-4ccc-af1e-006598f7ee64/osd-block-d64b56fc-db27-4510-81ee-7a224ffdbb62 lrwxrwxrwx 1 root root 106 Dec 7 23:38 block.db -> /dev/ceph-block-dbs-843d6828-7c62-4545-91bb-5a185c0dd829/osd-block-db-d5d5f1ae-6bbf-41da-86ea-4eeeb7be9214 -rw------- 1 root root 37 Dec 7 23:38 ceph_fsid -rw------- 1 root root 37 Dec 7 23:38 fsid -rw------- 1 root root 56 Dec 7 23:38 keyring -rw------- 1 root root 6 Dec 7 23:38 ready -rw------- 1 root root 10 Dec 7 23:38 type -rw------- 1 root root 3 Dec 7 23:38 whoami root@node9:/var/lib/ceph/osd# ceph --version ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable) root@node9:/var/lib/ceph/osd/ceph-13# cat fsid 059d5aa6-bbfa-42b2-9c52-fe0bf23d9647 root@node9:/var/lib/ceph/osd/ceph-13# ceph-volume lvm activate 13 059d5aa6-bbfa-42b2-9c52-fe0bf23d9647 Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-block-c5019075-26a5-4ccc-af1e-006598f7ee64/osd-block-d64b56fc-db27-4510-81ee-7a224ffdbb62 --path /var/lib/ceph/osd/ceph-13 Running command: /bin/ln -snf /dev/ceph-block-c5019075-26a5-4ccc-af1e-006598f7ee64/osd-block-d64b56fc-db27-4510-81ee-7a224ffdbb62 /var/lib/ceph/osd/ceph-13/block Running command: /bin/chown -R ceph:ceph /dev/dm-8 Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-13 Running command: /bin/ln -snf /dev/ceph-block-dbs-843d6828-7c62-4545-91bb-5a185c0dd829/osd-block-db-d5d5f1ae-6bbf-41da-86ea-4eeeb7be9214 /var/lib/ceph/osd/ceph-13/block.db Running command: /bin/chown -R ceph:ceph /dev/dm-5 Running command: /bin/systemctl enable ceph-volume@lvm-13-059d5aa6-bbfa-42b2-9c52-fe0bf23d9647 Running command: /bin/systemctl start ceph-osd@13 --> ceph-volume lvm activate successful for osd ID: 13 root@node9:/var/lib/ceph/osd/ceph-13# ls -alh total 28K drwxrwxrwt 2 ceph ceph 200 Dec 7 23:41 . drwxr-xr-x 55 ceph ceph 4.0K Dec 5 17:04 .. lrwxrwxrwx 1 ceph ceph 99 Dec 7 23:41 block -> /dev/ceph-block-c5019075-26a5-4ccc-af1e-006598f7ee64/osd-block-d64b56fc-db27-4510-81ee-7a224ffdbb62 lrwxrwxrwx 1 root root 106 Dec 7 23:41 block.db -> /dev/ceph-block-dbs-843d6828-7c62-4545-91bb-5a185c0dd829/osd-block-db-d5d5f1ae-6bbf-41da-86ea-4eeeb7be9214 -rw------- 1 ceph ceph 37 Dec 7 23:41 ceph_fsid -rw------- 1 ceph ceph 37 Dec 7 23:41 fsid -rw------- 1 ceph ceph 56 Dec 7 23:41 keyring -rw------- 1 ceph ceph 6 Dec 7 23:41 ready -rw------- 1 ceph ceph 10 Dec 7 23:41 type -rw------- 1 ceph ceph 3 Dec 7 23:41 whoami # cat /etc/os-release NAME="Ubuntu" VERSION="16.04 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" UBUNTU_CODENAME=xenial
Updated by Paul Emmerich over 5 years ago
13.2.2 on Debian Stretch.
The problem was that root owns . and ceph the symlink, in your example root owns everything which is fine. Run chown
without -R
to reproduce
Updated by Alfredo Deza over 5 years ago
- Status changed from 12 to In Progress
root@node9:/var/lib/ceph/osd/ceph-13# ls -alh total 28K drwxrwxrwt 2 root root 200 Dec 10 19:23 . drwxr-xr-x 55 ceph ceph 4.0K Dec 5 17:04 .. lrwxrwxrwx 1 ceph ceph 99 Dec 10 19:23 block -> /dev/ceph-block-c5019075-26a5-4ccc-af1e-006598f7ee64/osd-block-d64b56fc-db27-4510-81ee-7a224ffdbb62 lrwxrwxrwx 1 root root 106 Dec 10 19:23 block.db -> /dev/ceph-block-dbs-843d6828-7c62-4545-91bb-5a185c0dd829/osd-block-db-d5d5f1ae-6bbf-41da-86ea-4eeeb7be9214 -rw------- 1 root root 37 Dec 10 19:23 ceph_fsid -rw------- 1 root root 37 Dec 10 19:23 fsid -rw------- 1 root root 56 Dec 10 19:23 keyring -rw------- 1 root root 6 Dec 10 19:23 ready -rw------- 1 root root 10 Dec 10 19:23 type -rw------- 1 root root 3 Dec 10 19:23 whoami root@node9:/var/lib/ceph/osd/ceph-13# ceph-volume lvm activate 13 059d5aa6-bbfa-42b2-9c52-fe0bf23d9647 Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-block-c5019075-26a5-4ccc-af1e-006598f7ee64/osd-block-d64b56fc-db27-4510-81ee-7a224ffdbb62 --path /var/lib/ceph/osd/ceph-13 stderr: error symlinking stderr: /var/lib/ceph/osd/ceph-13/block: (13) Permission denied --> RuntimeError: command returned non-zero exit status: 1 root@node9:/var/lib/ceph/osd/ceph-13# CEPH_VOLUME_DEBUG=1 ceph-volume lvm activate 13 059d5aa6-bbfa-42b2-9c52-fe0bf23d9647 Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-block-c5019075-26a5-4ccc-af1e-006598f7ee64/osd-block-d64b56fc-db27-4510-81ee-7a224ffdbb62 --path /var/lib/ceph/osd/ceph-13 stderr: error symlinking /var/lib/ceph/osd/ceph-13/block: (13) Permission denied Traceback (most recent call last): File "/usr/sbin/ceph-volume", line 6, in <module> main.Volume() File "/usr/lib/python2.7/dist-packages/ceph_volume/main.py", line 37, in __init__ self.main(self.argv) File "/usr/lib/python2.7/dist-packages/ceph_volume/decorators.py", line 59, in newfunc return f(*a, **kw) File "/usr/lib/python2.7/dist-packages/ceph_volume/main.py", line 153, in main terminal.dispatch(self.mapper, subcommand_args) File "/usr/lib/python2.7/dist-packages/ceph_volume/terminal.py", line 182, in dispatch instance.main() File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/main.py", line 38, in main terminal.dispatch(self.mapper, self.argv) File "/usr/lib/python2.7/dist-packages/ceph_volume/terminal.py", line 182, in dispatch instance.main() File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/activate.py", line 318, in main self.activate(args) File "/usr/lib/python2.7/dist-packages/ceph_volume/decorators.py", line 16, in is_root return func(*a, **kw) File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/activate.py", line 242, in activate activate_bluestore(lvs, no_systemd=args.no_systemd) File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/activate.py", line 154, in activate_bluestore '--path', osd_path]) File "/usr/lib/python2.7/dist-packages/ceph_volume/process.py", line 149, in run raise RuntimeError(msg) RuntimeError: command returned non-zero exit status: 1
Updated by Alfredo Deza over 5 years ago
Potential working fix:
diff --git a/src/ceph-volume/ceph_volume/devices/lvm/activate.py b/src/ceph-volume/ceph_volume/devices/lvm/activate.py index acebfe123b..d13de5d9cc 100644 --- a/src/ceph-volume/ceph_volume/devices/lvm/activate.py +++ b/src/ceph-volume/ceph_volume/devices/lvm/activate.py @@ -152,6 +152,7 @@ def activate_bluestore(lvs, no_systemd=False): wal_device_path = get_osd_device_path(osd_lv, lvs, 'wal', dmcrypt_secret=dmcrypt_secret) # Once symlinks are removed, the osd dir can be 'primed again. + system.chown(osd_path) prime_command = [ 'ceph-bluestore-tool', '--cluster=%s' % conf.cluster, 'prime-osd-dir', '--dev', osd_lv_path,
Was able to activate correctly:
(tmp) root@node9:/var/lib/ceph/osd/ceph-13# chown -h ceph:ceph block (tmp) root@node9:/var/lib/ceph/osd/ceph-13# ls -alh total 28K drwxrwxrwt 2 root root 200 Dec 10 21:05 . drwxr-xr-x 55 ceph ceph 4.0K Dec 5 17:04 .. lrwxrwxrwx 1 ceph ceph 99 Dec 10 21:05 block -> /dev/ceph-block-c5019075-26a5-4ccc-af1e-006598f7ee64/osd-block-d64b56fc-db27-4510-81ee-7a224ffdbb62 lrwxrwxrwx 1 root root 106 Dec 10 21:05 block.db -> /dev/ceph-block-dbs-843d6828-7c62-4545-91bb-5a185c0dd829/osd-block-db-d5d5f1ae-6bbf-41da-86ea-4eeeb7be9214 -rw------- 1 root root 37 Dec 10 21:05 ceph_fsid -rw------- 1 root root 37 Dec 10 21:05 fsid -rw------- 1 root root 56 Dec 10 21:05 keyring -rw------- 1 root root 6 Dec 10 21:05 ready -rw------- 1 root root 10 Dec 10 21:05 type -rw------- 1 root root 3 Dec 10 21:05 whoami (tmp) root@node9:/var/lib/ceph/osd/ceph-13# cd ../ (tmp) root@node9:/var/lib/ceph/osd# systemctl stop ceph-osd@13 (tmp) root@node9:/var/lib/ceph/osd# ceph-volume lvm activate 13 059d5aa6-bbfa-42b2-9c52-fe0bf23d9647 Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-13 Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-block-c5019075-26a5-4ccc-af1e-006598f7ee64/osd-block-d64b56fc-db27-4510-81ee-7a224ffdbb62 --path /var/lib/ceph/osd/ceph-13 --no-mon-config Running command: /bin/ln -snf /dev/ceph-block-c5019075-26a5-4ccc-af1e-006598f7ee64/osd-block-d64b56fc-db27-4510-81ee-7a224ffdbb62 /var/lib/ceph/osd/ceph-13/block Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-13/block Running command: /bin/chown -R ceph:ceph /dev/dm-8 Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-13 Running command: /bin/ln -snf /dev/ceph-block-dbs-843d6828-7c62-4545-91bb-5a185c0dd829/osd-block-db-d5d5f1ae-6bbf-41da-86ea-4eeeb7be9214 /var/lib/ceph/osd/ceph-13/block.db Running command: /bin/chown -h ceph:ceph /dev/ceph-block-dbs-843d6828-7c62-4545-91bb-5a185c0dd829/osd-block-db-d5d5f1ae-6bbf-41da-86ea-4eeeb7be9214 Running command: /bin/chown -R ceph:ceph /dev/dm-5 Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-13/block.db Running command: /bin/chown -R ceph:ceph /dev/dm-5 Running command: /bin/systemctl enable ceph-volume@lvm-13-059d5aa6-bbfa-42b2-9c52-fe0bf23d9647 Running command: /bin/systemctl enable --runtime ceph-osd@13 Running command: /bin/systemctl start ceph-osd@13 --> ceph-volume lvm activate successful for osd ID: 13 (tmp) root@node9:/var/lib/ceph/osd# echo $? 0
Will follow up with functional tests
Updated by Alfredo Deza over 5 years ago
master PR https://github.com/ceph/ceph/pull/25477
Updated by Alfredo Deza over 5 years ago
mimic PR: https://github.com/ceph/ceph/pull/25777
luminous PR: https://github.com/ceph/ceph/pull/25778
Updated by Alfredo Deza over 5 years ago
- Status changed from In Progress to Resolved