Bug #42313
closed"No space left on device" errors following 41a13eca480e38cfeeba7a180b4516b90598c39b
0%
Description
We are seeing lots of jobs failed due to this issue.
It's blocking point releases testing,
http://pulpito.ceph.com/yuriw-2019-10-11_12:58:22-rbd-wip-yuri6-testing-2019-10-10-2057-mimic-distro-basic-smithi/
http://pulpito.ceph.com/yuriw-2019-10-11_19:41:35-rbd-wip-yuri8-testing-2019-10-11-1347-luminous-distro-basic-smithi/
2019-10-13T09:31:17.097 INFO:tasks.ceph.mon.a.smithi167.stderr:2019-10-13 09:31:17.095566 7f18a21be700 -1 log_channel(cluster) log [ERR] : Health check failed: 4 full osd(s) (OSD_FULL) 2019-10-13T09:31:22.117 INFO:tasks.ceph.mon.a.smithi167.stderr:2019-10-13 09:31:22.115362 7f18a21be700 -1 log_channel(cluster) log [ERR] : Health check update: 1 full osd(s) (OSD_FULL) 2019-10-13T09:31:27.632 INFO:tasks.ceph.mon.a.smithi167.stderr:2019-10-13 09:31:27.630483 7f189f9b9700 -1 log_channel(cluster) log [ERR] : Health check failed: mon b is very low on available space (MON_DISK_CRIT) 2019-10-13T09:31:28.672 INFO:tasks.ceph.mon.a.smithi167.stderr:2019-10-13 09:31:28.670676 7f18a21be700 -1 log_channel(cluster) log [ERR] : Health check failed: 4 full osd(s) (OSD_FULL) 2019-10-13T09:31:32.980 INFO:tasks.ceph.mon.a.smithi167.stderr:2019-10-13 09:31:32.978369 7f18a21be700 -1 log_channel(cluster) log [ERR] : Health check update: mons a,b,c are very low on available space (MON_DISK_CRIT) 2019-10-13T09:31:34.061 INFO:tasks.ceph.mon.a.smithi167.stderr:2019-10-13 09:31:34.053935 7f18a21be700 -1 log_channel(cluster) log [ERR] : Health check update: 2 full osd(s) (OSD_FULL) 2019-10-13T09:31:40.153 INFO:tasks.ceph.osd.0.smithi167.stderr:2019-10-13 09:31:40.151323 7fd400753700 -1 log_channel(cluster) log [ERR] : full status failsafe engaged, dropping updates, now 97% full 2019-10-13T09:31:40.328 INFO:tasks.ceph.mon.a.smithi167.stderr:2019-10-13 09:31:40.326090 7f18a21be700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 full osd(s) (OSD_FULL) 2019-10-13T09:31:40.494 INFO:tasks.ceph.osd.3.smithi167.stderr:2019-10-13 09:31:40.492169 7fda29d9c700 -1 log_channel(cluster) log [ERR] : full status failsafe engaged, dropping updates, now 97% full 2019-10-13T09:31:40.900 INFO:tasks.ceph.osd.1.smithi167.stderr:2019-10-13 09:31:40.898979 7f10785bb700 -1 log_channel(cluster) log [ERR] : full status failsafe engaged, dropping updates, now 98% full 2019-10-13T09:31:41.279 INFO:tasks.ceph.osd.4.smithi174.stderr:2019-10-13 09:31:41.277991 7f8651337700 -1 log_channel(cluster) log [ERR] : full status failsafe engaged, dropping updates, now 98% full 2019-10-13T09:31:41.525 INFO:tasks.ceph.osd.2.smithi167.stderr:2019-10-13 09:31:41.523075 7f1df905a700 -1 log_channel(cluster) log [ERR] : full status failsafe engaged, dropping updates, now 98% full 2019-10-13T09:31:42.493 INFO:tasks.ceph.osd.7.smithi174.stderr:2019-10-13 09:31:42.491132 7febd19f8700 -1 log_channel(cluster) log [ERR] : full status failsafe engaged, dropping updates, now 98% full 2019-10-13T09:31:43.281 INFO:tasks.ceph.osd.5.smithi174.stderr:2019-10-13 09:31:43.279310 7f9c80f21700 -1 log_channel(cluster) log [ERR] : full status failsafe engaged, dropping updates, now 98% full 2019-10-13T09:31:44.399 INFO:tasks.ceph.osd.6.smithi174.stderr:2019-10-13 09:31:44.398004 7f4c2bf61700 -1 log_channel(cluster) log [ERR] : full status failsafe engaged, dropping updates, now 98% full
Updated by David Galloway over 4 years ago
Oct 11 11:24:54 <dgalloway> jdillaman, i'm looking at the "no space left on device" error you were messaging about yesterday Oct 11 11:25:18 <dgalloway> ceph-cm-ansible is configured to create 4x ~90GB logical volumes for OSDs and a 15GB logical volume for journals Oct 11 11:25:35 <dgalloway> i see the 15GB mounted at /var/lib/ceph and the other 4 LVs aren't in use at all Oct 11 11:25:58 <dgalloway> that seems wrong Oct 11 11:27:07 <dgalloway> i SSHed to a random smithi and it's also set up that way. making me think something in a yaml somewhere got misconfigured
If the tests are writing data to /var/lib/ceph/osd/$id
and there's no volume mounted there, of course /var/lib/ceph
is going to fill up and quickly.
I was wondering if maybe https://github.com/ceph/teuthology/pull/1332 is related?
Updated by Nathan Cutler over 4 years ago
Due to some recent change [1], what used to be:
> ls -l /dev/disk/by-id/wwn-* lrwxrwxrwx 1 root root 9 Oct 11 12:22 /dev/disk/by-id/wwn-0x5000c50091e316b4 -> ../../sda lrwxrwxrwx 1 root root 10 Oct 11 12:22 /dev/disk/by-id/wwn-0x5000c50091e316b4-part1 -> ../../sda1
is now showing in teuthology.log thusly:
> ls -l '/dev/disk/by-id/wwn-*' ls: cannot access '/dev/disk/by-id/wwn-*': No such file or directory
Note the single quotes around the device path, which includes a glob character. I tried the following experiment on my laptop:
$ ls -l '*' ls: cannot access '*': No such file or directory
[1] https://github.com/ceph/teuthology/commit/41a13eca480e38cfeeba7a180b4516b90598c39b being the obvious candidate, but I've been staring at the code and I can't figure out how the quoting is getting triggered.
Updated by Nathan Cutler over 4 years ago
Looking at http://pulpito.ceph.com, I can see that some users are getting
> ls -l /dev/disk/by-id/wwn-*
while others (notably yuriw) are getting the buggy version:
ls -l '/dev/disk/by-id/wwn-*'
This might indicate that the buggy code is getting read from the teuthology code in the user's virtualenv. That would make it easy to verify (just try it with https://github.com/ceph/teuthology/commit/41a13eca480e38cfeeba7a180b4516b90598c39b and without).
Updated by Nathan Cutler over 4 years ago
Testing that hypothesis here: http://pulpito.ceph.com/smithfarm-2019-10-15_14:05:10-rbd:mirror-thrash-mimic-distro-basic-smithi/
(My teuthology clone does not have https://github.com/ceph/teuthology/commit/41a13eca480e38cfeeba7a180b4516b90598c39b in it.)
And, sure enough:
2019-10-15T14:32:24.660 INFO:teuthology.orchestra.run.smithi012:Running: 2019-10-15T14:32:24.660 INFO:teuthology.orchestra.run.smithi012:> ls -l /dev/disk/by-id/wwn-*
Updated by Kyrylo Shatskyy over 4 years ago
David Galloway wrote:
[...]
If the tests are writing data to
/var/lib/ceph/osd/$id
and there's no volume mounted there, of course/var/lib/ceph
is going to fill up and quickly.I was wondering if maybe https://github.com/ceph/teuthology/pull/1332 is related?
Why do you think it is related, the PR1332 is not merged. If you thinks that 41a13ec is related to this issue, then it is not true either, because the log in description says that this patch has not been merged yet at the time this run been executed.
Updated by Yuri Weinstein over 4 years ago
Updated by David Galloway over 4 years ago
Kyrylo Shatskyy wrote:
Why do you think it is related, the PR1332 is not merged. If you thinks that 41a13ec is related to this issue, then it is not true either, because the log in description says that this patch has not been merged yet at the time this run been executed.
I'm wondering if PR 1322 will fix the issue described in this bug. Not the other way around.
Updated by David Galloway over 4 years ago
I just checked a log from last month and see:
2019-09-11T10:08:58.047 INFO:tasks.ceph:fs option selected, checking for scratch devs 2019-09-11T10:08:58.047 INFO:tasks.ceph:found devs: ['/dev/vg_nvme/lv_4', '/dev/vg_nvme/lv_3', '/dev/vg_nvme/lv_2', '/dev/vg_nvme/lv_1'] 2019-09-11T10:08:58.047 INFO:teuthology.orchestra.run.smithi116:Running: 2019-09-11T10:08:58.047 INFO:teuthology.orchestra.run.smithi116:> ls -l '/dev/disk/by-id/wwn-*' 2019-09-11T10:08:58.119 INFO:teuthology.orchestra.run.smithi116.stderr:ls: cannot access '/dev/disk/by-id/wwn-*': No such file or directory 2019-09-11T10:08:58.120 DEBUG:teuthology.orchestra.run:got remote process result: 2 2019-09-11T10:08:58.120 INFO:teuthology.misc:Failed to get wwn devices! Using /dev/sd* devices... 2019-09-11T10:08:58.120 INFO:tasks.ceph:dev map: {'osd.1': '/dev/vg_nvme/lv_4', 'osd.3': '/dev/vg_nvme/lv_2', 'osd.2': '/dev/vg_nvme/lv_1'}
So the LVs we use on the smithi were being used correctly then. http://qa-proxy.ceph.com/teuthology/prsrivas-2019-09-11_08:36:08-rgw-wip-rgw-omap-offload-distro-basic-smithi/4298148/teuthology.log
I suspect "No space left on device" is a side effect of the problem this PR fixes: https://github.com/ceph/teuthology/pull/1332
I'm not comfortable reviewing and merging a PR I don't understand though.
EDIT: PR 1332 DOES NOT fix "no space left on device"
The LVs are recognized and added to the dev map but they don't get mounted or anything.
2019-10-16T15:42:06.527 INFO:tasks.ceph:found devs: ['/dev/vg_nvme/lv_4', '/dev/vg_nvme/lv_3', '/dev/vg_nvme/lv_2', '/dev/vg_nvme/lv_1'] 2019-10-16T15:42:06.527 INFO:teuthology.orchestra.run.smithi044:Running: 2019-10-16T15:42:06.527 INFO:teuthology.orchestra.run.smithi044:> ls -l /dev/disk/by-id/dm-name-* 2019-10-16T15:42:06.606 INFO:teuthology.orchestra.run.smithi044.stdout:lrwxrwxrwx 1 root root 10 Oct 16 15:39 /dev/disk/by-id/dm-name-vg_nvme-lv_1 -> ../../dm-4 2019-10-16T15:42:06.607 INFO:teuthology.orchestra.run.smithi044.stdout:lrwxrwxrwx 1 root root 10 Oct 16 15:39 /dev/disk/by-id/dm-name-vg_nvme-lv_2 -> ../../dm-3 2019-10-16T15:42:06.607 INFO:teuthology.orchestra.run.smithi044.stdout:lrwxrwxrwx 1 root root 10 Oct 16 15:39 /dev/disk/by-id/dm-name-vg_nvme-lv_3 -> ../../dm-2 2019-10-16T15:42:06.607 INFO:teuthology.orchestra.run.smithi044.stdout:lrwxrwxrwx 1 root root 10 Oct 16 15:39 /dev/disk/by-id/dm-name-vg_nvme-lv_4 -> ../../dm-1 2019-10-16T15:42:06.607 INFO:teuthology.orchestra.run.smithi044.stdout:lrwxrwxrwx 1 root root 10 Oct 16 15:39 /dev/disk/by-id/dm-name-vg_nvme-lv_5 -> ../../dm-0 2019-10-16T15:42:06.607 INFO:tasks.ceph:dev map: {} 2019-10-16T15:42:06.608 INFO:tasks.ceph:Generating config... 2019-10-16T15:42:06.614 INFO:tasks.ceph:[global] ms inject socket failures = 5000 2019-10-16T15:42:06.614 INFO:tasks.ceph:[client] rbd default features = 125 2019-10-16T15:42:06.615 INFO:tasks.ceph:[client] rbd cache = True 2019-10-16T15:42:06.615 INFO:tasks.ceph:[osd] debug ms = 1 2019-10-16T15:42:06.615 INFO:tasks.ceph:[osd] debug journal = 20 2019-10-16T15:42:06.615 INFO:tasks.ceph:[osd] osd shutdown pgref assert = True 2019-10-16T15:42:06.615 INFO:tasks.ceph:[osd] debug osd = 25 2019-10-16T15:42:06.615 INFO:tasks.ceph:[osd] debug filestore = 20 2019-10-16T15:42:06.615 INFO:tasks.ceph:[osd] osd objectstore = filestore 2019-10-16T15:42:06.615 INFO:tasks.ceph:[osd] osd sloppy crc = True 2019-10-16T15:42:06.616 INFO:tasks.ceph:[mon] debug mon = 20 2019-10-16T15:42:06.616 INFO:tasks.ceph:[mon] debug paxos = 20 2019-10-16T15:42:06.616 INFO:tasks.ceph:[mon] debug ms = 1 2019-10-16T15:42:06.616 INFO:tasks.ceph:Setting up mon.a... 2019-10-16T15:42:06.616 INFO:teuthology.orchestra.run.smithi044:Running: 2019-10-16T15:42:06.616 INFO:teuthology.orchestra.run.smithi044:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-authtool --create-keyring /etc/ceph/ceph.keyring 2019-10-16T15:42:06.760 INFO:teuthology.orchestra.run.smithi044.stdout:creating /etc/ceph/ceph.keyring 2019-10-16T15:42:06.763 INFO:teuthology.orchestra.run.smithi044:Running: 2019-10-16T15:42:06.763 INFO:teuthology.orchestra.run.smithi044:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-authtool --gen-key --name=mon. /etc/ceph/ceph.keyring 2019-10-16T15:42:06.808 INFO:teuthology.orchestra.run.smithi044:Running: 2019-10-16T15:42:06.808 INFO:teuthology.orchestra.run.smithi044:> sudo chmod 0644 /etc/ceph/ceph.keyring 2019-10-16T15:42:06.893 DEBUG:teuthology.misc:Ceph mon addresses: [('a', '172.21.15.44:6789'), ('c', '172.21.15.44:6790'), ('b', '172.21.15.39:6789')] 2019-10-16T15:42:06.893 INFO:teuthology.orchestra.run.smithi044:Running: 2019-10-16T15:42:06.893 INFO:teuthology.orchestra.run.smithi044:> adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage monmaptool --create --clobber --add a 172.21.15.44:6789 --add c 172.21.15.44:6790 --add b 172.21.15.39:6789 --print /home/ubuntu/cephtest/ceph.monmap 2019-10-16T15:42:06.991 INFO:teuthology.orchestra.run.smithi044.stdout:monmaptool: monmap file /home/ubuntu/cephtest/ceph.monmap 2019-10-16T15:42:06.992 INFO:teuthology.orchestra.run.smithi044.stdout:monmaptool: generated fsid 9c51fdd3-14bb-4b92-96b9-ffaede16be28 2019-10-16T15:42:06.992 INFO:teuthology.orchestra.run.smithi044.stdout:epoch 0 2019-10-16T15:42:06.992 INFO:teuthology.orchestra.run.smithi044.stdout:fsid 9c51fdd3-14bb-4b92-96b9-ffaede16be28 2019-10-16T15:42:06.992 INFO:teuthology.orchestra.run.smithi044.stdout:last_changed 2019-10-16 15:42:06.993945 2019-10-16T15:42:06.992 INFO:teuthology.orchestra.run.smithi044.stdout:created 2019-10-16 15:42:06.993945 2019-10-16T15:42:06.993 INFO:teuthology.orchestra.run.smithi044.stdout:0: 172.21.15.39:6789/0 mon.b 2019-10-16T15:42:06.993 INFO:teuthology.orchestra.run.smithi044.stdout:1: 172.21.15.44:6789/0 mon.a 2019-10-16T15:42:06.993 INFO:teuthology.orchestra.run.smithi044.stdout:2: 172.21.15.44:6790/0 mon.c 2019-10-16T15:42:06.993 INFO:teuthology.orchestra.run.smithi044.stdout:monmaptool: writing epoch 0 to /home/ubuntu/cephtest/ceph.monmap (3 monitors) 2019-10-16T15:42:06.994 INFO:tasks.ceph:Writing /etc/ceph/ceph.conf for FSID 9c51fdd3-14bb-4b92-96b9-ffaede16be28... 2019-10-16T15:42:06.996 INFO:teuthology.orchestra.run.smithi039:Running: 2019-10-16T15:42:06.996 INFO:teuthology.orchestra.run.smithi039:> sudo mkdir -p /etc/ceph && sudo chmod 0755 /etc/ceph && sudo python -c 'import shutil, sys; shutil.copyfileobj(sys.stdin, file(sys.argv[1], "wb"))' /etc/ceph/ceph.conf && sudo chmod 0644 /etc/ceph/ceph.conf 2019-10-16T15:42:07.001 INFO:teuthology.orchestra.run.smithi044:Running: 2019-10-16T15:42:07.001 INFO:teuthology.orchestra.run.smithi044:> sudo mkdir -p /etc/ceph && sudo chmod 0755 /etc/ceph && sudo python -c 'import shutil, sys; shutil.copyfileobj(sys.stdin, file(sys.argv[1], "wb"))' /etc/ceph/ceph.conf && sudo chmod 0644 /etc/ceph/ceph.conf 2019-10-16T15:42:07.070 INFO:teuthology.orchestra.run.smithi168:Running: 2019-10-16T15:42:07.071 INFO:teuthology.orchestra.run.smithi168:> sudo mkdir -p /etc/ceph && sudo chmod 0755 /etc/ceph && sudo python -c 'import shutil, sys; shutil.copyfileobj(sys.stdin, file(sys.argv[1], "wb"))' /etc/ceph/ceph.conf && sudo chmod 0644 /etc/ceph/ceph.conf 2019-10-16T15:42:07.124 INFO:tasks.ceph:Creating admin key on mon.a... 2019-10-16T15:42:07.124 INFO:teuthology.orchestra.run.smithi044:Running: 2019-10-16T15:42:07.124 INFO:teuthology.orchestra.run.smithi044:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-authtool --gen-key --name=client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *' --cap mgr 'allow *' /etc/ceph/ceph.keyring 2019-10-16T15:42:07.237 INFO:tasks.ceph:Copying monmap to all nodes... 2019-10-16T15:42:07.310 DEBUG:teuthology.orchestra.remote:smithi044:/etc/ceph/ceph.keyring is 216B 2019-10-16T15:42:07.331 DEBUG:teuthology.orchestra.remote:smithi044:/home/ubuntu/cephtest/ceph.monmap is 357B 2019-10-16T15:42:07.342 INFO:tasks.ceph:Sending monmap to node ubuntu@smithi039.front.sepia.ceph.com 2019-10-16T15:42:07.342 INFO:teuthology.orchestra.run.smithi039:Running: 2019-10-16T15:42:07.342 INFO:teuthology.orchestra.run.smithi039:> sudo sh -c 'cat > /etc/ceph/ceph.keyring' && sudo chmod 0644 /etc/ceph/ceph.keyring 2019-10-16T15:42:07.398 INFO:teuthology.orchestra.run.smithi039:Running: 2019-10-16T15:42:07.399 INFO:teuthology.orchestra.run.smithi039:> cat > /home/ubuntu/cephtest/ceph.monmap 2019-10-16T15:42:07.512 INFO:tasks.ceph:Sending monmap to node ubuntu@smithi044.front.sepia.ceph.com 2019-10-16T15:42:07.512 INFO:teuthology.orchestra.run.smithi044:Running: 2019-10-16T15:42:07.512 INFO:teuthology.orchestra.run.smithi044:> sudo sh -c 'cat > /etc/ceph/ceph.keyring' && sudo chmod 0644 /etc/ceph/ceph.keyring 2019-10-16T15:42:07.571 INFO:teuthology.orchestra.run.smithi044:Running: 2019-10-16T15:42:07.571 INFO:teuthology.orchestra.run.smithi044:> cat > /home/ubuntu/cephtest/ceph.monmap 2019-10-16T15:42:07.684 INFO:tasks.ceph:Sending monmap to node ubuntu@smithi168.front.sepia.ceph.com 2019-10-16T15:42:07.684 INFO:teuthology.orchestra.run.smithi168:Running: 2019-10-16T15:42:07.684 INFO:teuthology.orchestra.run.smithi168:> sudo sh -c 'cat > /etc/ceph/ceph.keyring' && sudo chmod 0644 /etc/ceph/ceph.keyring 2019-10-16T15:42:07.742 INFO:teuthology.orchestra.run.smithi168:Running: 2019-10-16T15:42:07.742 INFO:teuthology.orchestra.run.smithi168:> cat > /home/ubuntu/cephtest/ceph.monmap 2019-10-16T15:42:07.857 INFO:tasks.ceph:Setting up mon nodes... 2019-10-16T15:42:07.858 INFO:tasks.ceph:Setting up mgr nodes... 2019-10-16T15:42:07.858 INFO:teuthology.orchestra.run.smithi039:Running: 2019-10-16T15:42:07.858 INFO:teuthology.orchestra.run.smithi039:> sudo mkdir -p /var/lib/ceph/mgr/ceph-y && sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-authtool --create-keyring --gen-key --name=mgr.y /var/lib/ceph/mgr/ceph-y/keyring 2019-10-16T15:42:07.940 INFO:teuthology.orchestra.run.smithi039.stdout:creating /var/lib/ceph/mgr/ceph-y/keyring 2019-10-16T15:42:07.943 INFO:teuthology.orchestra.run.smithi044:Running: 2019-10-16T15:42:07.943 INFO:teuthology.orchestra.run.smithi044:> sudo mkdir -p /var/lib/ceph/mgr/ceph-x && sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-authtool --create-keyring --gen-key --name=mgr.x /var/lib/ceph/mgr/ceph-x/keyring 2019-10-16T15:42:07.986 INFO:teuthology.orchestra.run.smithi044.stdout:creating /var/lib/ceph/mgr/ceph-x/keyring 2019-10-16T15:42:07.989 INFO:tasks.ceph:Setting up mds nodes... 2019-10-16T15:42:07.989 INFO:tasks.ceph_client:Setting up client nodes... 2019-10-16T15:42:07.989 INFO:teuthology.orchestra.run.smithi168:Running: 2019-10-16T15:42:07.990 INFO:teuthology.orchestra.run.smithi168:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-authtool --create-keyring --gen-key --name=client.0 /etc/ceph/ceph.client.0.keyring && sudo chmod 0644 /etc/ceph/ceph.client.0.keyring 2019-10-16T15:42:08.066 INFO:teuthology.orchestra.run.smithi168.stdout:creating /etc/ceph/ceph.client.0.keyring 2019-10-16T15:42:08.079 INFO:tasks.ceph:Running mkfs on osd nodes... 2019-10-16T15:42:08.079 INFO:tasks.ceph:ctx.disk_config.remote_to_roles_to_dev: {Remote(name='ubuntu@smithi039.front.sepia.ceph.com'): {}, Remote(name='ubuntu@smithi044.front.sepia.ceph.com'): {}} 2019-10-16T15:42:08.079 INFO:teuthology.orchestra.run.smithi039:Running: 2019-10-16T15:42:08.079 INFO:teuthology.orchestra.run.smithi039:> sudo mkdir -p /var/lib/ceph/osd/ceph-4 2019-10-16T15:42:08.096 INFO:tasks.ceph:{} 2019-10-16T15:42:08.097 INFO:tasks.ceph:{} 2019-10-16T15:42:08.097 INFO:tasks.ceph:osd.4 2019-10-16T15:42:08.097 INFO:teuthology.orchestra.run.smithi039:Running: 2019-10-16T15:42:08.097 INFO:teuthology.orchestra.run.smithi039:> sudo mkdir -p /var/lib/ceph/osd/ceph-5 2019-10-16T15:42:08.184 INFO:tasks.ceph:{} 2019-10-16T15:42:08.184 INFO:tasks.ceph:{} 2019-10-16T15:42:08.184 INFO:tasks.ceph:osd.5 2019-10-16T15:42:08.185 INFO:teuthology.orchestra.run.smithi039:Running: 2019-10-16T15:42:08.185 INFO:teuthology.orchestra.run.smithi039:> sudo mkdir -p /var/lib/ceph/osd/ceph-6 2019-10-16T15:42:08.267 INFO:tasks.ceph:{} 2019-10-16T15:42:08.268 INFO:tasks.ceph:{} 2019-10-16T15:42:08.268 INFO:tasks.ceph:osd.6 2019-10-16T15:42:08.268 INFO:teuthology.orchestra.run.smithi039:Running: 2019-10-16T15:42:08.268 INFO:teuthology.orchestra.run.smithi039:> sudo mkdir -p /var/lib/ceph/osd/ceph-7 2019-10-16T15:42:08.355 INFO:tasks.ceph:{} 2019-10-16T15:42:08.356 INFO:tasks.ceph:{} 2019-10-16T15:42:08.356 INFO:tasks.ceph:osd.7
Updated by David Galloway over 4 years ago
This is what it should look like:
2019-09-11T10:08:59.262 INFO:tasks.ceph:ctx.disk_config.remote_to_roles_to_dev: {Remote(name='ubuntu@smithi103.front.sepia.ceph.com'): {'osd.0': '/dev/vg_nvme/lv_4'}, Remote(name='ubuntu@smithi116.front.sepia.ceph.com'): {'osd.1': '/dev/vg_nvme/lv_4', 'osd.3': '/dev/vg_nvme/lv_2', 'osd.2': '/dev/vg_nvme/lv_1'}} 2019-09-11T10:08:59.262 INFO:teuthology.orchestra.run.smithi103:Running: 2019-09-11T10:08:59.262 INFO:teuthology.orchestra.run.smithi103:> sudo mkdir -p /var/lib/ceph/osd/ceph-0 2019-09-11T10:08:59.279 INFO:tasks.ceph:{'osd.0': '/dev/vg_nvme/lv_4'} 2019-09-11T10:08:59.279 INFO:tasks.ceph:{} 2019-09-11T10:08:59.279 INFO:tasks.ceph:osd.0 2019-09-11T10:08:59.279 INFO:tasks.ceph:['mkfs.xfs', '-f', '-i', 'size=2048'] on /dev/vg_nvme/lv_4 on ubuntu@smithi103.front.sepia.ceph.com 2019-09-11T10:08:59.279 INFO:teuthology.orchestra.run.smithi103:Running: 2019-09-11T10:08:59.280 INFO:teuthology.orchestra.run.smithi103:> yes | sudo mkfs.xfs -f -i size=2048 /dev/vg_nvme/lv_4 2019-09-11T10:08:59.828 INFO:teuthology.orchestra.run.smithi103.stdout:meta-data=/dev/vg_nvme/lv_4 isize=2048 agcount=4, agsize=5859072 blks 2019-09-11T10:08:59.828 INFO:teuthology.orchestra.run.smithi103.stdout: = sectsz=512 attr=2, projid32bit=1 2019-09-11T10:08:59.829 INFO:teuthology.orchestra.run.smithi103.stdout: = crc=1 finobt=1, sparse=0, rmapbt=0, reflink=0 2019-09-11T10:08:59.829 INFO:teuthology.orchestra.run.smithi103.stdout:data = bsize=4096 blocks=23436288, imaxpct=25 2019-09-11T10:08:59.829 INFO:teuthology.orchestra.run.smithi103.stdout: = sunit=0 swidth=0 blks 2019-09-11T10:08:59.829 INFO:teuthology.orchestra.run.smithi103.stdout:naming =version 2 bsize=4096 ascii-ci=0 ftype=1 2019-09-11T10:08:59.829 INFO:teuthology.orchestra.run.smithi103.stdout:log =internal log bsize=4096 blocks=11443, version=2 2019-09-11T10:08:59.829 INFO:teuthology.orchestra.run.smithi103.stdout: = sectsz=512 sunit=0 blks, lazy-count=1 2019-09-11T10:08:59.830 INFO:teuthology.orchestra.run.smithi103.stdout:realtime =none extsz=4096 blocks=0, rtextents=0 2019-09-11T10:08:59.830 INFO:tasks.ceph:mount /dev/vg_nvme/lv_4 on ubuntu@smithi103.front.sepia.ceph.com -o noatime 2019-09-11T10:08:59.831 INFO:teuthology.orchestra.run.smithi103:Running: 2019-09-11T10:08:59.831 INFO:teuthology.orchestra.run.smithi103:> sudo mount -t xfs -o noatime /dev/vg_nvme/lv_4 /var/lib/ceph/osd/ceph-0
Updated by David Galloway over 4 years ago
So things start to go wrong with the remote_to_roles_to_dev
function.
Good:
2019-09-11T10:08:59.262 INFO:tasks.ceph:ctx.disk_config.remote_to_roles_to_dev: {Remote(name='ubuntu@smithi103.front.sepia.ceph.com'): {'osd.0': '/dev/vg_nvme/lv_4'}, Remote(name='ubuntu@smithi116.front.sepia.ceph.com'): {'osd.1': '/dev/vg_nvme/lv_4', 'osd.3': '/dev/vg_nvme/lv_2', 'osd.2': '/dev/vg_nvme/lv_1'}}
Bad:
2019-10-16T15:42:08.079 INFO:tasks.ceph:ctx.disk_config.remote_to_roles_to_dev: {Remote(name='ubuntu@smithi039.front.sepia.ceph.com'): {}, Remote(name='ubuntu@smithi044.front.sepia.ceph.com'): {}}
I suspect this broke things: https://github.com/ceph/ceph/pull/30792
Updated by Nathan Cutler over 4 years ago
Here's an emergency fix we could try: https://github.com/ceph/teuthology/pull/1334
Updated by Kyrylo Shatskyy over 4 years ago
Nathan Cutler wrote:
Here's an emergency fix we could try: https://github.com/ceph/teuthology/pull/1334
I can confirm that this temporary workaround is most probable working solution for now from my point of view.
Updated by Kyrylo Shatskyy over 4 years ago
Kyrylo Shatskyy wrote:
Nathan Cutler wrote:
Here's an emergency fix we could try: https://github.com/ceph/teuthology/pull/1334
I can confirm that this temporary workaround is most probable working solution for now from my point of view.
However, I still don't understand how the log in the bug description related to the issue, because the fix in #1334 is not related to it, I can suggest that it is coincidence.
Updated by Jason Dillaman over 4 years ago
see mailing list thread: https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/Z3ZW4UHGRBN4LU2JFC4LBGYIN6BOKDWB/
Updated by Jason Dillaman over 4 years ago
- Project changed from sepia to teuthology
- Status changed from New to Fix Under Review
Updated by Nathan Cutler over 4 years ago
- Subject changed from "No space left on device" errors to "No space left on device" errors following 41a13eca480e38cfeeba7a180b4516b90598c39b
- Status changed from Fix Under Review to Resolved
optimistically resolving, even though the test run http://pulpito.ceph.com/smithfarm-2019-10-17_12:35:03-rbd-wip-yuri8-testing-2019-10-11-1347-luminous-distro-basic-smithi/ is still pending