Support #17722
openOSD`s doesnt start after reboot
0%
Description
Hi,
There are 3 servers:
[ceph@CloudKVM-2 ~]$ uname -a Linux CloudKVM-2 3.10.0-327.36.2.el7.x86_64 #1 SMP Mon Oct 10 23:08:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
[ceph@CloudKVM-2 ~]$ cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core)
After reboot OSD`s doesn't start.
[ceph@CloudKVM-2 ~]$ systemctl --failed UNIT LOAD ACTIVE SUB DESCRIPTION ● ceph-osd@1.service loaded failed failed Ceph object storage daemon ● ceph-osd@10.service loaded failed failed Ceph object storage daemon ● ceph-osd@11.service loaded failed failed Ceph object storage daemon ● ceph-osd@5.service loaded failed failed Ceph object storage daemon ● ceph-osd@6.service loaded failed failed Ceph object storage daemon ● ceph-osd@7.service loaded failed failed Ceph object storage daemon ● ceph-osd@8.service loaded failed failed Ceph object storage daemon ● ceph-osd@9.service loaded failed failed Ceph object storage daemon
[ceph@CloudKVM-2 ~]$ systemctl is-enabled ceph-mon.target enabled [ceph@CloudKVM-2 ~]$ systemctl is-enabled ceph-osd.target enabled [ceph@CloudKVM-2 ~]$ systemctl is-enabled ceph-osd@.service enabled [ceph@CloudKVM-2 ~]$ systemctl is-enabled ceph-mon@.service enabled [ceph@CloudKVM-2 ~]$ systemctl is-enabled ceph-mon@.service enabled [ceph@CloudKVM-2 ~]$ systemctl is-enabled ceph-mon@.service enabled
[ceph@CloudKVM-2 ~]$ systemctl status ceph-osd@1.service ● ceph-osd@1.service - Ceph object storage daemon Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled) Active: failed (Result: start-limit) since Thu 2016-10-27 13:20:17 MSK; 4min 41s ago Process: 4390 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE) Process: 4070 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS) Main PID: 4390 (code=exited, status=1/FAILURE) Oct 27 13:20:17 CloudKVM-2 systemd[1]: ceph-osd@1.service failed. Oct 27 13:22:15 CloudKVM-2 systemd[1]: start request repeated too quickly for ceph-osd@1.service Oct 27 13:22:15 CloudKVM-2 systemd[1]: Failed to start Ceph object storage daemon. Oct 27 13:22:15 CloudKVM-2 systemd[1]: ceph-osd@1.service failed. Oct 27 13:23:27 CloudKVM-2 systemd[1]: start request repeated too quickly for ceph-osd@1.service Oct 27 13:23:27 CloudKVM-2 systemd[1]: Failed to start Ceph object storage daemon. Oct 27 13:23:27 CloudKVM-2 systemd[1]: ceph-osd@1.service failed. Oct 27 13:24:50 CloudKVM-2 systemd[1]: start request repeated too quickly for ceph-osd@1.service Oct 27 13:24:50 CloudKVM-2 systemd[1]: Failed to start Ceph object storage daemon. Oct 27 13:24:50 CloudKVM-2 systemd[1]: ceph-osd@1.service failed.
All permissions are right.
It is clean install on Jewel, at Infernalis there are no problems.
What the problem it is?
Updated by Nathan Cutler over 7 years ago
Are the OSD journals in separate partitions? If so, can you post the output of the following command for the journal partition devices?
blkid -o udev -p $JOURNAL_PARTITION_DEVICE
Updated by Nathan Cutler over 7 years ago
For more information, see http://ceph.com/planet/creating-a-ceph-osd-from-a-designated-disk-partition/
Updated by Roman Bogachev over 7 years ago
[ceph@CloudKVM-2 ~]$ blkid -o udev -p $JOURNAL_PARTITION_DEVICE The low-level probing mode requires a device
Disk /dev/sdb: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 1 4294967295 2147483647+ ee GPT Disk /dev/sdf: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdf1 1 4294967295 2147483647+ ee GPT Disk /dev/sdh: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdh1 1 4294967295 2147483647+ ee GPT Disk /dev/sdc: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdc1 1 4294967295 2147483647+ ee GPT Disk /dev/sdg: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdg1 1 4294967295 2147483647+ ee GPT Disk /dev/sde: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sde1 1 4294967295 2147483647+ ee GPT
So, I am using ssd caching.
[ceph@CloudManage ceph]$ ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -100 9.00000 root hdd -102 3.00000 host CloudKVM-1-hdd 3 3.00000 osd.3 up 1.00000 1.00000 4 3.00000 osd.4 up 1.00000 1.00000 -103 3.00000 host CloudKVM-2-hdd 5 3.00000 osd.5 up 1.00000 1.00000 6 3.00000 osd.6 up 1.00000 1.00000 7 3.00000 osd.7 up 1.00000 1.00000 8 3.00000 osd.8 up 1.00000 1.00000 9 3.00000 osd.9 up 1.00000 1.00000 10 3.00000 osd.10 up 1.00000 1.00000 -104 3.00000 host CloudKVM-3-hdd 11 3.00000 osd.11 up 1.00000 1.00000 12 3.00000 osd.12 up 1.00000 1.00000 -1 3.00000 root ssd-cache -2 1.00000 host CloudKVM-1-ssd-cache 0 0.50000 osd.0 up 1.00000 1.00000 -3 1.00000 host CloudKVM-2-ssd-cache 1 0.50000 osd.1 up 1.00000 1.00000 -4 1.00000 host CloudKVM-3-ssd-cache 2 0.50000 osd.2 up 1.00000 1.00000
Updated by Roman Bogachev over 7 years ago
Sorry, don`t see)
All devices are `ID_PART_TABLE_TYPE=gpt`.
There are no journal partitions.
Updated by Nathan Cutler over 7 years ago
What happens when you start an OSD from the command-line? For example, run the following command on hostCloudKVM-2-hdd as root:
/usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph
Updated by Nathan Cutler over 7 years ago
Also, please do ceph-disk list
on CloudKVM-2-hdd and post the output here.
Updated by Roman Bogachev over 7 years ago
[root@CloudKVM-2 ceph]# /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph 2016-10-28 13:06:12.485242 7efe2948f800 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-5: (2) No such file or directory
[root@CloudKVM-2 ceph]# ceph-disk list /dev/sda : /dev/sda4 other, 0x5 /dev/sda5 swap, swap /dev/sda3 other, ext4, mounted on / /dev/sda1 other, ext4, mounted on /boot /dev/sda2 other, ext4, mounted on /var /dev/sdb : /dev/sdb2 ceph journal, for /dev/sdb1 /dev/sdb1 ceph data, prepared, cluster ceph, osd.5, journal /dev/sdb2 /dev/sdc : /dev/sdc2 ceph journal, for /dev/sdc1 /dev/sdc1 ceph data, prepared, cluster ceph, osd.8, journal /dev/sdc2 /dev/sdd : /dev/sdd2 ceph journal, for /dev/sdd1 /dev/sdd1 ceph data, prepared, cluster ceph, osd.1, journal /dev/sdd2 /dev/sde : /dev/sde2 ceph journal, for /dev/sde1 /dev/sde1 ceph data, prepared, cluster ceph, osd.7, journal /dev/sde2 /dev/sdf : /dev/sdf2 ceph journal, for /dev/sdf1 /dev/sdf1 ceph data, prepared, cluster ceph, osd.10, journal /dev/sdf2 /dev/sdg : /dev/sdg2 ceph journal, for /dev/sdg1 /dev/sdg1 ceph data, prepared, cluster ceph, osd.6, journal /dev/sdg2 /dev/sdh : /dev/sdh2 ceph journal, for /dev/sdh1 /dev/sdh1 ceph data, prepared, cluster ceph, osd.9, journal /dev/sdh2
[ceph@CloudKVM-2 osd]$ pwd /var/lib/ceph/osd [ceph@CloudKVM-2 osd]$ ls -lah total 36K drwxr-x--- 9 ceph ceph 4.0K Oct 26 17:12 . drwxr-x--- 10 ceph ceph 4.0K Oct 26 17:02 .. drwxr-xr-x 2 root root 4.0K Oct 26 17:08 ceph-1 drwxr-xr-x 2 root root 4.0K Oct 26 17:12 ceph-10 drwxr-xr-x 2 root root 4.0K Oct 26 17:10 ceph-5 drwxr-xr-x 2 root root 4.0K Oct 26 17:11 ceph-6 drwxr-xr-x 2 root root 4.0K Oct 26 17:11 ceph-7 drwxr-xr-x 2 root root 4.0K Oct 26 17:11 ceph-8 drwxr-xr-x 2 root root 4.0K Oct 26 17:12 ceph-9
Updated by Nathan Cutler over 7 years ago
drwxr-xr-x 2 root root 4.0K Oct 26 17:08 ceph-1
drwxr-xr-x 2 root root 4.0K Oct 26 17:12 ceph-10
drwxr-xr-x 2 root root 4.0K Oct 26 17:10 ceph-5
drwxr-xr-x 2 root root 4.0K Oct 26 17:11 ceph-6
drwxr-xr-x 2 root root 4.0K Oct 26 17:11 ceph-7
drwxr-xr-x 2 root root 4.0K Oct 26 17:11 ceph-8
drwxr-xr-x 2 root root 4.0K Oct 26 17:12 ceph-9
OSD directories have wrong ownership.
On each node, make sure all the OSDs are shut down (I guess they aren't running). Then, run the following command as root:
chown -R ceph.ceph /var/lib/ceph
Then reboot.
Updated by Roman Bogachev over 7 years ago
The same error.
[root@CloudKVM-2 ceph]# ls -lah total 40K drwxr-x--- 10 ceph ceph 4.0K Oct 26 17:02 . drwxr-xr-x. 39 root root 4.0K Oct 30 20:00 .. drwxr-x--- 2 ceph ceph 4.0K Oct 26 17:05 bootstrap-mds drwxr-x--- 2 ceph ceph 4.0K Oct 26 17:05 bootstrap-osd drwxr-x--- 2 ceph ceph 4.0K Oct 26 17:05 bootstrap-rgw drwxr-x--- 2 ceph ceph 4.0K Sep 21 14:35 mds drwxr-x--- 3 ceph ceph 4.0K Oct 26 17:05 mon drwxr-x--- 9 ceph ceph 4.0K Oct 26 17:12 osd drwxr-xr-x 2 ceph ceph 4.0K Sep 21 14:35 radosgw drwxr-x--- 2 ceph ceph 4.0K Oct 30 19:29 tmp [root@CloudKVM-2 ceph]# /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph 2016-10-30 20:14:51.163177 7f93d352e800 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-5: (2) No such file or directory
Updated by Nathan Cutler over 7 years ago
What happens after you reboot?
The OSD will not start if the data and journal partitions are not mounted with the correct permissions. This is done automatically by the udev rules when the system boots.
Updated by Ivan Guan over 7 years ago
Nathan Cutler: Why not the ceph/osd/ceph-*'s permissions are root.root?
Updated by Nathan Cutler over 7 years ago
- Status changed from New to 4
Because the daemons run as ceph.ceph and they need to access the data under /var/lib/ceph.
See http://docs.ceph.com/docs/jewel/release-notes/#upgrading-from-hammer
If you installed Ceph from a distro package, the ownership should have been set correctly.
Updated by Nathan Cutler over 7 years ago
The OSD will not start if the data and journal partitions are not mounted with the correct permissions.
Sorry, that statement is not entirely correct. What I meant to say was:
The OSD will not start if the data and journal partition devices do not have the correct (ceph.ceph) ownership. Udev rules, which are installed with the ceph-osd package, get triggered at each boot and set the ownership of OSD data and journal partition devices to ceph.ceph based on partition GUID codes that are written to the GPT partition table by ceph-disk when the OSD is provisioned.
This is all for naught if you change the ownership of the OSD directories to root.root. Although the OSD process starts as root, one of the first things it does is to drop permissions. After the permissions drop it can no longer access its filestore, so it terminates with an error.