Bug #16360
closedImage not cleanly deleted with rbd on bluestore
0%
Description
Issue¶
I was playing with rbd to create, delete and snapshot images so that I can have VM disks on Ceph. I have three images (debian, ubuntu trusty and xenial cloud images) that were snapshot and protected so that I can clone them.
For each test I did with VMs:
- clone a disk with
rbd clone libvirt/_IMG_trusty-server-cloudimg-amd64@snap libvirt/test1
- spin the VM and do some tests
- delete the VM
- delete the disk with
rbd rm libvirt/test1
- repeat
Everything was fine until it didn't cleanly delete test1
:
# rbd rm libvirt/test1; rbd clone libvirt/_IMG_trusty-server-cloudimg-amd64@snap libvirt/test1 && rbd ls -l libvirt 2016-06-17 10:31:32.353948 7fbfc3b7e100 -1 WARNING: the following dangerous and experimental features are enabled: * 2016-06-17 10:31:32.353988 7fbfc3b7e100 -1 WARNING: the following dangerous and experimental features are enabled: * 2016-06-17 10:31:32.355525 7fbfc3b7e100 -1 WARNING: the following dangerous and experimental features are enabled: * Removing image: 100% complete...done. 2016-06-17 10:31:32.454006 7efca05e2100 -1 WARNING: the following dangerous and experimental features are enabled: * 2016-06-17 10:31:32.454049 7efca05e2100 -1 WARNING: the following dangerous and experimental features are enabled: * 2016-06-17 10:31:32.455920 7efca05e2100 -1 WARNING: the following dangerous and experimental features are enabled: * 2016-06-17 10:31:32.494994 7efca05e2100 -1 librbd: error adding image to directory: (17) File exists 2016-06-17 10:31:32.497393 7efca05e2100 -1 librbd: error creating child: (17) File exists rbd: clone error: (17) File exists
Is it possible to purge test1
without having to delete the pools and/or the OSDs?
Detail¶
NB: The warnings are removed in the next console outputs
I have a small lab cluster with 3 nodes. Each have:
- a MON
- a MDS
- 12 OSD bluestore created with:
ceph-disk -v prepare --bluestore --dmcrypt /dev/sda
each OSD has an entry inceph.conf
to store rocksDB on SSDs (they were used for journals before):
[osd.0] host = storage0 bluestore block db path = /dev/mapper/vgCephJournal_md125-ceph_block_db_0 bluestore block wal path = /dev/mapper/vgCephJournal_md125-ceph_block_wal_0
It automatically created the symlinks:
# ls -l /var/lib/ceph/osd/ceph-0/block* lrwxrwxrwx 1 ceph ceph 48 Jun 16 15:20 /var/lib/ceph/osd/ceph-0/block -> /dev/mapper/2a0d2cb0-afc2-4cb8-a64d-6c558a341318 lrwxrwxrwx 1 ceph ceph 47 Jun 16 15:20 /var/lib/ceph/osd/ceph-0/block.db -> /dev/mapper/vgCephJournal_md125-ceph_block_db_0 lrwxrwxrwx 1 ceph ceph 58 Jun 16 15:20 /var/lib/ceph/osd/ceph-0/block_dmcrypt -> /dev/disk/by-partuuid/2a0d2cb0-afc2-4cb8-a64d-6c558a341318 -rw-r--r-- 1 ceph ceph 37 Jun 16 15:20 /var/lib/ceph/osd/ceph-0/block_uuid lrwxrwxrwx 1 ceph ceph 48 Jun 16 15:20 /var/lib/ceph/osd/ceph-0/block.wal -> /dev/mapper/vgCephJournal_md125-ceph_block_wal_0
I have a forth node which also has a MON.
All 4 machines are up-to-date Ubuntu Xenial with the following Ceph version:
# ceph --version ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
The cluster has 3 pools:
# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 23265G 20283G 2982G 12.82 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS libvirt 4 3344M 0.04 5548G 850 data 5 679G 8.76 5548G 173977 metadata 6 25278k 0 5548G 26
libvirt
and data
both have 512 PG each and metadata
has 128.
The cluster is healthy:
# ceph -s cluster 2127ffaa-3d77-4f88-87ee-8e2278e4c074 health HEALTH_OK monmap e10: 4 mons at {compute0=172.16.1.10:6789/0,storage0=172.16.1.20:6789/0,storage1=172.16.1.21:6789/0,storage2=172.16.1.22:6789/0} election epoch 78, quorum 0,1,2,3 compute0,storage0,storage1,storage2 fsmap e75: 1/1/1 up {0=storage1.public=up:active}, 2 up:standby osdmap e645: 36 osds: 36 up, 36 in flags sortbitwise pgmap v8395: 1152 pgs, 3 pools, 682 GB data, 170 kobjects 2982 GB used, 20283 GB / 23265 GB avail 1152 active+clean
Now when I tried to get information about test1
, I have the following:
# rbd ls libvirt _IMG_debian-8.5.0-openstack-amd64 _IMG_trusty-server-cloudimg-amd64 _IMG_xenial-server-cloudimg-amd64 test1 # rbd ls -l libvirt NAME SIZE PARENT FMT PROT LOCK _IMG_debian-8.5.0-openstack-amd64 2048M 2 _IMG_debian-8.5.0-openstack-amd64@snap 2048M 2 yes _IMG_trusty-server-cloudimg-amd64 2252M 2 _IMG_trusty-server-cloudimg-amd64@snap 2252M 2 yes _IMG_xenial-server-cloudimg-amd64 2252M 2 _IMG_xenial-server-cloudimg-amd64@snap 2252M 2 yes rbd: list: (2) No such file or directory # rbd info libvirt/test1 rbd: error opening image test1: (2) No such file or directory # ceph osd map libvirt test1 osdmap e645 pool 'libvirt' (4) object 'test1' -> pg 4.bddbf0b9 (4.b9) -> up ([26,16,9], p26) acting ([26,16,9], p26)
I searched for test1
with rados
and removed it but that didn't change anything (there was no rbd_id.test1
object):
# rados -p libvirt ls | grep test1 test1 # rados -p libvirt rm test1
When I retry to delete the image, it doesn't change anything:
# rbd rm libvirt/test1 Removing image: 100% complete...done. # rbd info libvirt/test1 rbd: error opening image test1: (2) No such file or directory
You may find attached all four rbd commands with the debug level increased.
I remain at your disposal for further information.
Files
Updated by Alexandre Becholey almost 8 years ago
Furthermore I can't create new images:
# rbd clone libvirt/_IMG_trusty-server-cloudimg-amd64@snap libvirt/test # rbd ls libvirt _IMG_debian-8.5.0-openstack-amd64 _IMG_trusty-server-cloudimg-amd64 _IMG_xenial-server-cloudimg-amd64 test1 # rbd clone libvirt/_IMG_trusty-server-cloudimg-amd64@snap libvirt/test 2016-06-17 12:55:27.660513 7f0994c4c100 -1 librbd: rbd image test already exists rbd: clone error: (17) File exists root@compute0:~# rbd ls libvirt _IMG_debian-8.5.0-openstack-amd64 _IMG_trusty-server-cloudimg-amd64 _IMG_xenial-server-cloudimg-amd64 test1
Updated by Alexandre Becholey almost 8 years ago
The cluster is not healthy any more:
cluster 2127ffaa-3d77-4f88-87ee-8e2278e4c074 health HEALTH_ERR 3 pgs inconsistent 3 scrub errors monmap e10: 4 mons at {compute0=172.16.1.10:6789/0,storage0=172.16.1.20:6789/0,storage1=172.16.1.21:6789/0,storage2=172.16.1.22:6789/0} election epoch 78, quorum 0,1,2,3 compute0,storage0,storage1,storage2 fsmap e75: 1/1/1 up {0=storage1.public=up:active}, 2 up:standby osdmap e645: 36 osds: 36 up, 36 in flags sortbitwise pgmap v10745: 1152 pgs, 3 pools, 682 GB data, 170 kobjects 2982 GB used, 20283 GB / 23265 GB avail 1149 active+clean 3 active+clean+inconsistent
Updated by Alexandre Becholey almost 8 years ago
After repairing the 3 PGs, two of them keep failing:
2016-06-21 11:26:19.905700 mon.1 [INF] from='client.? 172.16.1.10:0/3698790893' entity='client.admin' cmd=[{"prefix": "pg repair", "pgid": "4.1c"}]: dispatch 2016-06-21 11:26:20.131239 mon.1 [INF] from='client.? 172.16.1.10:0/3358205872' entity='client.admin' cmd=[{"prefix": "pg repair", "pgid": "4.18b"}]: dispatch 2016-06-21 11:26:22.716280 mon.0 [INF] pgmap v12439: 1152 pgs: 1 active+clean+inconsistent, 1151 active+clean; 4596 MB data, 946 GB used, 22318 GB / 23265 GB avail 2016-06-21 11:26:24.750069 mon.0 [INF] pgmap v12440: 1152 pgs: 1152 active+clean; 4596 MB data, 946 GB used, 22318 GB / 23265 GB avail; 0 B/s, 0 keys/s, 0 objects/s recovering 2016-06-21 11:26:20.643613 osd.28 [INF] 4.1c repair starts 2016-06-21 11:26:20.828711 osd.28 [ERR] 4.1c shard 28: soid 4:3831950c:::rbd_directory:head omap_digest 0xffffffff != best guess omap_digest 0x918c395a from auth shard 6 2016-06-21 11:26:20.828764 osd.28 [ERR] 4.1c repair 0 missing, 1 inconsistent objects 2016-06-21 11:26:20.828780 osd.28 [ERR] 4.1c repair 1 errors, 1 fixed 2016-06-21 11:26:21.018938 osd.1 [INF] 4.18b repair starts 2016-06-21 11:26:21.147409 osd.1 [ERR] 4.18b shard 28 missing 4:d1f82166:::rbd_header.119a279e2a9e3:head 2016-06-21 11:26:21.147465 osd.1 [ERR] 4.18b repair 1 missing, 0 inconsistent objects 2016-06-21 11:26:21.147481 osd.1 [ERR] 4.18b repair 1 errors, 1 fixed 2016-06-21 11:26:50.754169 mon.0 [INF] HEALTH_OK 2016-06-21 11:27:11.853995 mon.1 [INF] from='client.? 172.16.1.10:0/3807871970' entity='client.admin' cmd=[{"prefix": "pg deep-scrub", "pgid": "4.1c"}]: dispatch 2016-06-21 11:27:12.073902 mon.1 [INF] from='client.? 172.16.1.10:0/2018208149' entity='client.admin' cmd=[{"prefix": "pg deep-scrub", "pgid": "4.18b"}]: dispatch 2016-06-21 11:27:12.653499 osd.28 [INF] 4.1c deep-scrub starts 2016-06-21 11:27:12.841558 osd.28 [ERR] 4.1c shard 28: soid 4:3831950c:::rbd_directory:head omap_digest 0xffffffff != best guess omap_digest 0x918c395a from auth shard 6 2016-06-21 11:27:12.841662 osd.28 [ERR] 4.1c deep-scrub 0 missing, 1 inconsistent objects 2016-06-21 11:27:12.841668 osd.28 [ERR] 4.1c deep-scrub 1 errors 2016-06-21 11:27:18.068798 mon.0 [INF] pgmap v12441: 1152 pgs: 1 active+clean+inconsistent, 1151 active+clean; 4596 MB data, 946 GB used, 22318 GB / 23265 GB avail 2016-06-21 11:27:14.027263 osd.1 [INF] 4.18b deep-scrub starts 2016-06-21 11:27:14.151964 osd.1 [ERR] 4.18b shard 28 missing 4:d1f82166:::rbd_header.119a279e2a9e3:head 2016-06-21 11:27:14.152072 osd.1 [ERR] 4.18b deep-scrub 1 missing, 0 inconsistent objects 2016-06-21 11:27:14.152077 osd.1 [ERR] 4.18b deep-scrub 1 errors 2016-06-21 11:27:19.089094 mon.0 [INF] pgmap v12442: 1152 pgs: 2 active+clean+inconsistent, 1150 active+clean; 4596 MB data, 946 GB used, 22318 GB / 23265 GB avail 2016-06-21 11:27:50.754768 mon.0 [INF] HEALTH_ERR; 2 pgs inconsistent; 2 scrub errors
Here is the log of the PG that has been successfully repaired:
2016-06-21 09:28:12.438673 osd.2 172.16.1.20:6804/27826 321 : cluster [INF] 4.78 repair starts 2016-06-21 09:28:12.565706 osd.2 172.16.1.20:6804/27826 322 : cluster [ERR] 4.78 shard 19 missing 4:1e2ef459:::rbd_id.test:head 2016-06-21 09:28:12.565757 osd.2 172.16.1.20:6804/27826 323 : cluster [ERR] 4.78 repair 1 missing, 0 inconsistent objects 2016-06-21 09:28:12.565773 osd.2 172.16.1.20:6804/27826 324 : cluster [ERR] 4.78 repair 1 errors, 1 fixed 2016-06-21 11:31:24.348320 mon.1 [INF] from='client.? 172.16.1.10:0/2068510417' entity='client.admin' cmd=[{"prefix": "pg deep-scrub", "pgid": "4.78"}]: dispatch 2016-06-21 11:31:25.822731 osd.2 [INF] 4.78 deep-scrub starts 2016-06-21 11:31:25.955523 osd.2 [INF] 4.78 deep-scrub ok
Updated by Sage Weil about 7 years ago
- Status changed from New to Can't reproduce
if you see anything like this on kraken or later, please reopen!