Bug #16360: Image not cleanly deleted with rbd on bluestore - Ceph - Ceph

Actions

Copy link

Bug #16360

closed

Image not cleanly deleted with rbd on bluestore

Added by Alexandre Becholey almost 8 years ago. Updated about 7 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Sage Weil

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

v10.2.2

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Issue¶

I was playing with rbd to create, delete and snapshot images so that I can have VM disks on Ceph. I have three images (debian, ubuntu trusty and xenial cloud images) that were snapshot and protected so that I can clone them.

For each test I did with VMs:

clone a disk with rbd clone libvirt/_IMG_trusty-server-cloudimg-amd64@snap libvirt/test1
spin the VM and do some tests
delete the VM
delete the disk with rbd rm libvirt/test1
repeat

Everything was fine until it didn't cleanly delete test1:

# rbd rm libvirt/test1; rbd clone libvirt/_IMG_trusty-server-cloudimg-amd64@snap libvirt/test1 && rbd ls -l libvirt
2016-06-17 10:31:32.353948 7fbfc3b7e100 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-06-17 10:31:32.353988 7fbfc3b7e100 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-06-17 10:31:32.355525 7fbfc3b7e100 -1 WARNING: the following dangerous and experimental features are enabled: *
Removing image: 100% complete...done.
2016-06-17 10:31:32.454006 7efca05e2100 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-06-17 10:31:32.454049 7efca05e2100 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-06-17 10:31:32.455920 7efca05e2100 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-06-17 10:31:32.494994 7efca05e2100 -1 librbd: error adding image to directory: (17) File exists
2016-06-17 10:31:32.497393 7efca05e2100 -1 librbd: error creating child: (17) File exists
rbd: clone error: (17) File exists

Is it possible to purge test1 without having to delete the pools and/or the OSDs?

Detail¶

NB: The warnings are removed in the next console outputs

I have a small lab cluster with 3 nodes. Each have:

a MON
a MDS

12 OSD bluestore created with:
ceph-disk -v prepare --bluestore --dmcrypt /dev/sda
each OSD has an entry in ceph.conf to store rocksDB on SSDs (they were used for journals before):

  [osd.0]
  host = storage0
  bluestore block db path = /dev/mapper/vgCephJournal_md125-ceph_block_db_0
  bluestore block wal path = /dev/mapper/vgCephJournal_md125-ceph_block_wal_0

It automatically created the symlinks:

# ls -l /var/lib/ceph/osd/ceph-0/block*
lrwxrwxrwx 1 ceph ceph 48 Jun 16 15:20 /var/lib/ceph/osd/ceph-0/block -> /dev/mapper/2a0d2cb0-afc2-4cb8-a64d-6c558a341318
lrwxrwxrwx 1 ceph ceph 47 Jun 16 15:20 /var/lib/ceph/osd/ceph-0/block.db -> /dev/mapper/vgCephJournal_md125-ceph_block_db_0
lrwxrwxrwx 1 ceph ceph 58 Jun 16 15:20 /var/lib/ceph/osd/ceph-0/block_dmcrypt -> /dev/disk/by-partuuid/2a0d2cb0-afc2-4cb8-a64d-6c558a341318
-rw-r--r-- 1 ceph ceph 37 Jun 16 15:20 /var/lib/ceph/osd/ceph-0/block_uuid
lrwxrwxrwx 1 ceph ceph 48 Jun 16 15:20 /var/lib/ceph/osd/ceph-0/block.wal -> /dev/mapper/vgCephJournal_md125-ceph_block_wal_0

I have a forth node which also has a MON.

All 4 machines are up-to-date Ubuntu Xenial with the following Ceph version:

# ceph --version
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

The cluster has 3 pools:

# ceph df
GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED 
    23265G     20283G        2982G         12.82 
POOLS:
    NAME         ID     USED       %USED     MAX AVAIL     OBJECTS 
    libvirt      4       3344M      0.04         5548G         850 
    data         5        679G      8.76         5548G      173977 
    metadata     6      25278k         0         5548G          26

libvirt and data both have 512 PG each and metadata has 128.

The cluster is healthy:

# ceph -s
    cluster 2127ffaa-3d77-4f88-87ee-8e2278e4c074
     health HEALTH_OK
     monmap e10: 4 mons at {compute0=172.16.1.10:6789/0,storage0=172.16.1.20:6789/0,storage1=172.16.1.21:6789/0,storage2=172.16.1.22:6789/0}
            election epoch 78, quorum 0,1,2,3 compute0,storage0,storage1,storage2
      fsmap e75: 1/1/1 up {0=storage1.public=up:active}, 2 up:standby
     osdmap e645: 36 osds: 36 up, 36 in
            flags sortbitwise
      pgmap v8395: 1152 pgs, 3 pools, 682 GB data, 170 kobjects
            2982 GB used, 20283 GB / 23265 GB avail
                1152 active+clean

Now when I tried to get information about test1, I have the following:

# rbd ls libvirt
_IMG_debian-8.5.0-openstack-amd64
_IMG_trusty-server-cloudimg-amd64
_IMG_xenial-server-cloudimg-amd64
test1
# rbd ls -l libvirt
NAME                                    SIZE PARENT FMT PROT LOCK 
_IMG_debian-8.5.0-openstack-amd64      2048M          2           
_IMG_debian-8.5.0-openstack-amd64@snap 2048M          2 yes       
_IMG_trusty-server-cloudimg-amd64      2252M          2           
_IMG_trusty-server-cloudimg-amd64@snap 2252M          2 yes       
_IMG_xenial-server-cloudimg-amd64      2252M          2           
_IMG_xenial-server-cloudimg-amd64@snap 2252M          2 yes       
rbd: list: (2) No such file or directory
# rbd info libvirt/test1
rbd: error opening image test1: (2) No such file or directory
# ceph osd map libvirt test1
osdmap e645 pool 'libvirt' (4) object 'test1' -> pg 4.bddbf0b9 (4.b9) -> up ([26,16,9], p26) acting ([26,16,9], p26)

I searched for test1 with rados and removed it but that didn't change anything (there was no rbd_id.test1 object):

# rados -p libvirt ls | grep test1
test1
# rados -p libvirt rm test1

When I retry to delete the image, it doesn't change anything:

# rbd rm libvirt/test1
Removing image: 100% complete...done.
# rbd info libvirt/test1
rbd: error opening image test1: (2) No such file or directory

You may find attached all four rbd commands with the debug level increased.

I remain at your disposal for further information.

Files

Download all files

ls.log (4.19 KB) ls.log	rbd ls --debug-rbd=20 --debug-ms=1	Alexandre Becholey, 06/17/2016 10:35 AM
info.log (6.16 KB) info.log	rbd info ubuntu1204_base --debug-rbd=20 --debug-ms=1	Alexandre Becholey, 06/17/2016 10:35 AM
ls-l.log (43.8 KB) ls-l.log	rbd ls -l --debug-rbd=20 --debug-ms=1	Alexandre Becholey, 06/17/2016 10:35 AM
rm.log (10.7 KB) rm.log	rbd rm libvirt/test1 --debug-rbd=20 --debug-ms=1	Alexandre Becholey, 06/17/2016 10:47 AM

Actions

Copy link

Updated by Alexandre Becholey almost 8 years ago

Furthermore I can't create new images:

# rbd clone libvirt/_IMG_trusty-server-cloudimg-amd64@snap libvirt/test
# rbd ls libvirt
_IMG_debian-8.5.0-openstack-amd64
_IMG_trusty-server-cloudimg-amd64
_IMG_xenial-server-cloudimg-amd64
test1
# rbd clone libvirt/_IMG_trusty-server-cloudimg-amd64@snap libvirt/test
2016-06-17 12:55:27.660513 7f0994c4c100 -1 librbd: rbd image test already exists
rbd: clone error: (17) File exists
root@compute0:~# rbd ls libvirt
_IMG_debian-8.5.0-openstack-amd64
_IMG_trusty-server-cloudimg-amd64
_IMG_xenial-server-cloudimg-amd64
test1

Actions

Copy link

Updated by Jason Dillaman almost 8 years ago

Assignee set to Sage Weil

Actions

Copy link

Updated by Alexandre Becholey almost 8 years ago

The cluster is not healthy any more:

    cluster 2127ffaa-3d77-4f88-87ee-8e2278e4c074
     health HEALTH_ERR
            3 pgs inconsistent
            3 scrub errors
     monmap e10: 4 mons at {compute0=172.16.1.10:6789/0,storage0=172.16.1.20:6789/0,storage1=172.16.1.21:6789/0,storage2=172.16.1.22:6789/0}
            election epoch 78, quorum 0,1,2,3 compute0,storage0,storage1,storage2
      fsmap e75: 1/1/1 up {0=storage1.public=up:active}, 2 up:standby
     osdmap e645: 36 osds: 36 up, 36 in
            flags sortbitwise
      pgmap v10745: 1152 pgs, 3 pools, 682 GB data, 170 kobjects
            2982 GB used, 20283 GB / 23265 GB avail
                1149 active+clean
                   3 active+clean+inconsistent

Actions

Copy link

Updated by Alexandre Becholey almost 8 years ago

After repairing the 3 PGs, two of them keep failing:

2016-06-21 11:26:19.905700 mon.1 [INF] from='client.? 172.16.1.10:0/3698790893' entity='client.admin' cmd=[{"prefix": "pg repair", "pgid": "4.1c"}]: dispatch
2016-06-21 11:26:20.131239 mon.1 [INF] from='client.? 172.16.1.10:0/3358205872' entity='client.admin' cmd=[{"prefix": "pg repair", "pgid": "4.18b"}]: dispatch
2016-06-21 11:26:22.716280 mon.0 [INF] pgmap v12439: 1152 pgs: 1 active+clean+inconsistent, 1151 active+clean; 4596 MB data, 946 GB used, 22318 GB / 23265 GB avail
2016-06-21 11:26:24.750069 mon.0 [INF] pgmap v12440: 1152 pgs: 1152 active+clean; 4596 MB data, 946 GB used, 22318 GB / 23265 GB avail; 0 B/s, 0 keys/s, 0 objects/s recovering
2016-06-21 11:26:20.643613 osd.28 [INF] 4.1c repair starts
2016-06-21 11:26:20.828711 osd.28 [ERR] 4.1c shard 28: soid 4:3831950c:::rbd_directory:head omap_digest 0xffffffff != best guess omap_digest 0x918c395a from auth shard 6
2016-06-21 11:26:20.828764 osd.28 [ERR] 4.1c repair 0 missing, 1 inconsistent objects
2016-06-21 11:26:20.828780 osd.28 [ERR] 4.1c repair 1 errors, 1 fixed
2016-06-21 11:26:21.018938 osd.1 [INF] 4.18b repair starts
2016-06-21 11:26:21.147409 osd.1 [ERR] 4.18b shard 28 missing 4:d1f82166:::rbd_header.119a279e2a9e3:head
2016-06-21 11:26:21.147465 osd.1 [ERR] 4.18b repair 1 missing, 0 inconsistent objects
2016-06-21 11:26:21.147481 osd.1 [ERR] 4.18b repair 1 errors, 1 fixed
2016-06-21 11:26:50.754169 mon.0 [INF] HEALTH_OK

2016-06-21 11:27:11.853995 mon.1 [INF] from='client.? 172.16.1.10:0/3807871970' entity='client.admin' cmd=[{"prefix": "pg deep-scrub", "pgid": "4.1c"}]: dispatch
2016-06-21 11:27:12.073902 mon.1 [INF] from='client.? 172.16.1.10:0/2018208149' entity='client.admin' cmd=[{"prefix": "pg deep-scrub", "pgid": "4.18b"}]: dispatch
2016-06-21 11:27:12.653499 osd.28 [INF] 4.1c deep-scrub starts
2016-06-21 11:27:12.841558 osd.28 [ERR] 4.1c shard 28: soid 4:3831950c:::rbd_directory:head omap_digest 0xffffffff != best guess omap_digest 0x918c395a from auth shard 6
2016-06-21 11:27:12.841662 osd.28 [ERR] 4.1c deep-scrub 0 missing, 1 inconsistent objects
2016-06-21 11:27:12.841668 osd.28 [ERR] 4.1c deep-scrub 1 errors
2016-06-21 11:27:18.068798 mon.0 [INF] pgmap v12441: 1152 pgs: 1 active+clean+inconsistent, 1151 active+clean; 4596 MB data, 946 GB used, 22318 GB / 23265 GB avail
2016-06-21 11:27:14.027263 osd.1 [INF] 4.18b deep-scrub starts
2016-06-21 11:27:14.151964 osd.1 [ERR] 4.18b shard 28 missing 4:d1f82166:::rbd_header.119a279e2a9e3:head
2016-06-21 11:27:14.152072 osd.1 [ERR] 4.18b deep-scrub 1 missing, 0 inconsistent objects
2016-06-21 11:27:14.152077 osd.1 [ERR] 4.18b deep-scrub 1 errors
2016-06-21 11:27:19.089094 mon.0 [INF] pgmap v12442: 1152 pgs: 2 active+clean+inconsistent, 1150 active+clean; 4596 MB data, 946 GB used, 22318 GB / 23265 GB avail
2016-06-21 11:27:50.754768 mon.0 [INF] HEALTH_ERR; 2 pgs inconsistent; 2 scrub errors

Here is the log of the PG that has been successfully repaired:

2016-06-21 09:28:12.438673 osd.2 172.16.1.20:6804/27826 321 : cluster [INF] 4.78 repair starts
2016-06-21 09:28:12.565706 osd.2 172.16.1.20:6804/27826 322 : cluster [ERR] 4.78 shard 19 missing 4:1e2ef459:::rbd_id.test:head
2016-06-21 09:28:12.565757 osd.2 172.16.1.20:6804/27826 323 : cluster [ERR] 4.78 repair 1 missing, 0 inconsistent objects
2016-06-21 09:28:12.565773 osd.2 172.16.1.20:6804/27826 324 : cluster [ERR] 4.78 repair 1 errors, 1 fixed

2016-06-21 11:31:24.348320 mon.1 [INF] from='client.? 172.16.1.10:0/2068510417' entity='client.admin' cmd=[{"prefix": "pg deep-scrub", "pgid": "4.78"}]: dispatch
2016-06-21 11:31:25.822731 osd.2 [INF] 4.78 deep-scrub starts
2016-06-21 11:31:25.955523 osd.2 [INF] 4.78 deep-scrub ok

Actions

Copy link

Updated by Jason Dillaman almost 8 years ago

Project changed from rbd to Ceph

Actions

Copy link

Updated by Sage Weil about 7 years ago

Status changed from New to Can't reproduce

if you see anything like this on kraken or later, please reopen!

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #16360

Image not cleanly deleted with rbd on bluestore

Issue¶

Detail¶

Updated by Alexandre Becholey almost 8 years ago

Updated by Jason Dillaman almost 8 years ago

Updated by Alexandre Becholey almost 8 years ago

Updated by Alexandre Becholey almost 8 years ago

Updated by Jason Dillaman almost 8 years ago

Updated by Sage Weil about 7 years ago