Project

General

Profile

Actions

Bug #63319

open

Different UID of ceph user on system and in containers is causing issues

Added by Martin Lacko 6 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,
Different UID of ceph user on system and in containers is causing issues when for example ceph-bluestore-tool is used for extending OSD.
I installed new cluster , added nodes and OSD and then extended one OSD disk and wanted to extend OSD. So, I scanned PV (size was increased), then extended logical volume. Then I stopped OSD service managing this disk, used ceph-bluestore-tool to extend OSD. This all worked.

Problem appeared when I tried to start OSD service again. It failed with "permission denied". After some investigation I found out that cause is different UID of ceph user on system and in containers. When I set UID and GID of ceph user to same value as in containers , OSD service started and problem was solved.

@root@ceph01:~# ceph version
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
@root@ceph03:~# cephadm install ceph-common ceph-osd
Installing packages ['ceph-common', 'ceph-osd']...
root@ceph03:~#
root@ceph03:~# systemctl stop ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@osd.1
root@ceph03:~# ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/cbd96647-70de-11ee-a34a-df5a876dc2c5/osd.1
inferring bluefs devices from bluestore path
1 : device size 0x257fc00000 : using 0x5c142f000(23 GiB)
Expanding DB/WAL...
1 : expanding  from 0x1fbfc00000 to 0x257fc00000
2023-10-22T14:32:41.955+0000 7ff7418e6a80 -1 bluestore(/var/lib/ceph/cbd96647-70de-11ee-a34a-df5a876dc2c5/osd.1) _read_bdev_label failed to read from /var/lib/ceph/cbd96647-70de-11ee-a34a-df5a876dc2c5/osd.1: (21) Is a directory
2023-10-22T14:32:41.955+0000 7ff7418e6a80 -1 bluestore(/var/lib/ceph/cbd96647-70de-11ee-a34a-df5a876dc2c5/osd.1) unable to read label for /var/lib/ceph/cbd96647-70de-11ee-a34a-df5a876dc2c5/osd.1: (21) Is a directory
root@ceph03:~# systemctl start ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@osd.1                                                                      loaded active     running      Ceph crash dump collector
root@ceph03:~# systemctl list-units --type=service | grep ceph
  ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@crash.ceph03.service                            loaded active running Ceph crash.ceph03 for cbd96647-70de-11ee-a34a-df5a876dc2c5
  ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@mon.ceph03.service                              loaded active running Ceph mon.ceph03 for cbd96647-70de-11ee-a34a-df5a876dc2c5
  ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@osd.1.service                                   loaded active running Ceph osd.1 for cbd96647-70de-11ee-a34a-df5a876dc2c5
  ceph-crash.service                                                                        loaded active running Ceph crash dump collector
root@ceph03:~#

root@ceph03:~# systemctl status ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@osd.1
× ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@osd.1.service - Ceph osd.1 for cbd96647-70de-11ee-a34a-df5a876dc2c5
     Loaded: loaded (/etc/systemd/system/ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Sun 2023-10-22 14:33:57 UTC; 59s ago
    Process: 10878 ExecStart=/bin/bash /var/lib/ceph/cbd96647-70de-11ee-a34a-df5a876dc2c5/osd.1/unit.run (code=exited, status=1/FAILURE)
    Process: 11198 ExecStopPost=/bin/bash /var/lib/ceph/cbd96647-70de-11ee-a34a-df5a876dc2c5/osd.1/unit.poststop (code=exited, status=0/SUCCESS)
   Main PID: 10878 (code=exited, status=1/FAILURE)
        CPU: 198ms

Oct 22 14:33:57 ceph03 systemd[1]: ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@osd.1.service: Scheduled restart job, restart counter is at 5.
Oct 22 14:33:57 ceph03 systemd[1]: Stopped Ceph osd.1 for cbd96647-70de-11ee-a34a-df5a876dc2c5.
Oct 22 14:33:57 ceph03 systemd[1]: ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@osd.1.service: Start request repeated too quickly.
Oct 22 14:33:57 ceph03 systemd[1]: ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@osd.1.service: Failed with result 'exit-code'.
Oct 22 14:33:57 ceph03 systemd[1]: Failed to start Ceph osd.1 for cbd96647-70de-11ee-a34a-df5a876dc2c5.
root@ceph03:~# systemctl reset-failed ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@osd.1
root@ceph03:~# systemctl start ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@osd.1
root@ceph03:~# systemctl status ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@osd.1
● ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@osd.1.service - Ceph osd.1 for cbd96647-70de-11ee-a34a-df5a876dc2c5
     Loaded: loaded (/etc/systemd/system/ceph-cbd96647-70de-11ee-a34a-df5a876dc2c5@.service; enabled; vendor preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since Sun 2023-10-22 14:35:50 UTC; 8s ago
    Process: 11365 ExecStart=/bin/bash /var/lib/ceph/cbd96647-70de-11ee-a34a-df5a876dc2c5/osd.1/unit.run (code=exited, status=1/FAILURE)
    Process: 11686 ExecStopPost=/bin/bash /var/lib/ceph/cbd96647-70de-11ee-a34a-df5a876dc2c5/osd.1/unit.poststop (code=exited, status=0/SUCCESS)
   Main PID: 11365 (code=exited, status=1/FAILURE)
        CPU: 209ms
root@ceph03:~#
Oct 22 14:46:36 ceph03 bash[14942]: debug 2023-10-22T14:46:36.475+0000 7f42fa572540 -1  ** ERROR: osd init failed: (13) Permission denied
Oct 22 14:46:36 ceph03 bash[14942]: debug 2023-10-22T14:46:36.475+0000 7f42fa572540 -1 osd.1 0 OSD:init: unable to mount object store
Oct 22 14:46:36 ceph03 bash[14942]: debug 2023-10-22T14:46:36.475+0000 7f42fa572540 -1 bdev(0x558d4c4c0000 /var/lib/ceph/osd/ceph-1/block) open open got: (13) Permission denied
Oct 22 14:46:36 ceph03 bash[14942]: debug 2023-10-22T14:46:36.475+0000 7f42fa572540 -1 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-1/block: (13) Permission denied
Oct 22 14:46:36 ceph03 bash[14942]: debug 2023-10-22T14:46:36.475+0000 7f42fa572540 -1 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-1/block: (13) Permission denied
Oct 22 14:46:36 ceph03 bash[14942]: debug 2023-10-22T14:46:36.475+0000 7f42fa572540 -1 bdev(0x558d4c4c0000 /var/lib/ceph/osd/ceph-1/block) open open got: (13) Permission denied
Oct 22 14:46:36 ceph03 bash[14942]: debug 2023-10-22T14:46:36.475+0000 7f42fa572540 -1 bdev(0x558d4c4c0000 /var/lib/ceph/osd/ceph-1/block) open open got: (13) Permission denied
Oct 22 14:46:36 ceph03 bash[14942]: debug 2023-10-22T14:46:36.475+0000 7f42fa572540 -1 bdev(0x558d4c4c0000 /var/lib/ceph/osd/ceph-1/block) open open got: (13) Permission denied
Oct 22 14:46:36 ceph03 bash[14942]: debug 2023-10-22T14:46:36.475+0000 7f42fa572540 -1 bdev(0x558d4c4c0000 /var/lib/ceph/osd/ceph-1/block) open open got: (13) Permission denied
Oct 22 14:46:36 ceph03 bash[14942]: debug 2023-10-22T14:46:36.475+0000 7f42fa572540 -1 bdev(0x558d4c4c0000 /var/lib/ceph/osd/ceph-1/block) open open got: (13) Permission denied
Oct 22 14:46:36 ceph03 bash[14942]: debug 2023-10-22T14:46:36.471+0000 7f42fa572540 -1 bdev(0x558d4c4c0000 /var/lib/ceph/osd/ceph-1/block) open open got: (13) Permission denied
Oct 22 14:46:36 ceph03 bash[14942]: debug 2023-10-22T14:46:36.467+0000 7f42fa572540 -1 Falling back to public interface
Oct 22 14:46:35 ceph03 systemd[1]: Started libcontainer container 615250af234a698a87c66ef3391429d57590ffb368adbb6072c5a165efb96c0d.
Oct 22 14:46:35 ceph03 systemd[1]: var-lib-docker-overlay2-2c2e759d559a2f0f4023c83f0ff3712784b361af6d8abd286235cb5e37df39da-merged.mount: Deactivated successfully.

/var/lib/ceph was owned by UID 617 (that is ceph UID in containers) , but ceph UID on system was different (I think 64045)

So I stopped all ceph processes, changed system ceph UID and GID to 617. And started ceph. Now OSD service started.

No data to display

Actions

Also available in: Atom PDF