Bug #24528: Missing snapshot after upgrade from Kraken (11.2.0) to Luminous (12.2.5) - rbd - Ceph

Actions

Copy link

Bug #24528

closed

Missing snapshot after upgrade from Kraken (11.2.0) to Luminous (12.2.5)

Added by Remi Vichery almost 6 years ago. Updated about 5 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I just upgraded from Kraken to Luminous. But after the upgrade, I started noticing that openstack-compute-node service was not able to do a clone (CoW) of some glance image/snapshot and thus was creating the base image on the root disk before doing an "rbd import".
Some images are still working fine, but some are not.

If you look at the example below, everything seems to be consistent (snapshot present) but when trying to do snap protect, I get an error where normally I should get something like "the snapshot is already protected". I was able to workaround that by recreating a snapshot (using snap create) and redoing the protect (using snap protect).

[root@os-controller01 ~(admin@nuagex-labs)]# rbd --name client.admin --pool glance-images snap protect --snap snap --image 3dc4dfda-223a-40eb-9a72-b820fc2d8d2f 
rbd: protecting snap failed: (2) No such file or directory

[root@os-controller01 ~(admin@nuagex-labs)]# qemu-img info rbd:glance-images/3dc4dfda-223a-40eb-9a72-b820fc2d8d2f:id=glance                                                                                                                   
image: rbd:glance-images/3dc4dfda-223a-40eb-9a72-b820fc2d8d2f:id=glance
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: unavailable
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
snap      snap                    20G 1969-12-31 19:00:00   00:00:00.000

[root@os-controller01 ~(admin@nuagex-labs)]# rbd --name client.glance --pool glance-images snap ls 3dc4dfda-223a-40eb-9a72-b820fc2d8d2f                                                                                                       
SNAPID NAME     SIZE TIMESTAMP 
   744 snap 20480 MB   

[root@os-controller01 ~(admin@nuagex-labs)]# rbd --name client.admin --pool glance-images info 3dc4dfda-223a-40eb-9a72-b820fc2d8d2f                                                                                                           
rbd image '3dc4dfda-223a-40eb-9a72-b820fc2d8d2f':
        size 20480 MB in 5120 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.6a7eac2eb141f2
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
        flags:

My image is now listing two snapshots with the same name, is there any issue with that ?

[root@os-controller01 ~(keystone_admin)]# rbd --name client.glance --pool glance-images snap ls 3dc4dfda-223a-40eb-9a72-b820fc2d8d2f 
SNAPID NAME     SIZE TIMESTAMP                
   744 snap 20480 MB                          
  1021 snap 20480 MB Thu Jun 14 23:18:40 2018

Actions

Copy link

Updated by Greg Farnum almost 6 years ago

Project changed from Ceph to rbd
Category deleted (~~openstack~~)

Actions

Copy link

Updated by Jason Dillaman almost 6 years ago

Status changed from New to Need More Info

@Remi: can you please provide the output from the following command?

rados -p glance-images listomapvals rbd_header.6a7eac2eb141f2

Actions

Copy link

Updated by Remi Vichery almost 6 years ago

Here is the output

```
[root@os-controller01 ~]$ rados -p glance-images listomapvals rbd_header.6a7eac2eb141f2
features
value (8 bytes) :
00000000 3d 00 00 00 00 00 00 00 |=.......|
00000008

object_prefix
value (27 bytes) :
00000000 17 00 00 00 72 62 64 5f 64 61 74 61 2e 36 61 37 |....rbd_data.6a7|
00000010 65 61 63 32 65 62 31 34 31 66 32 |eac2eb141f2|
0000001b

order
value (1 bytes) :
00000000 16 |.|
00000001

size
value (8 bytes) :
00000000 00 00 00 00 05 00 00 00 |........|
00000008

snap_seq
value (8 bytes) :
00000000 fd 03 00 00 00 00 00 00 |........|
00000008

snapshot_00000000000002e8
value (91 bytes) :
00000000 05 01 55 00 00 00 e8 02 00 00 00 00 00 00 04 00 |..U.............|
00000010 00 00 73 6e 61 70 00 00 00 00 05 00 00 00 3d 00 |..snap........=.|
00000020 00 00 00 00 00 00 01 01 1c 00 00 00 ff ff ff ff |................|
00000030 ff ff ff ff 00 00 00 00 fe ff ff ff ff ff ff ff |................|
00000040 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 |................|
00000050 00 01 01 04 00 00 00 ff ff ff ff |...........|
0000005b

snapshot_00000000000003fd
value (99 bytes) :
00000000 06 01 5d 00 00 00 fd 03 00 00 00 00 00 00 04 00 |..].............|
00000010 00 00 73 6e 61 70 00 00 00 00 05 00 00 00 3d 00 |..snap........=.|
00000020 00 00 00 00 00 00 01 01 1c 00 00 00 ff ff ff ff |................|
00000030 ff ff ff ff 00 00 00 00 fe ff ff ff ff ff ff ff |................|
00000040 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 |................|
00000050 00 01 01 04 00 00 00 00 00 00 00 10 30 23 5b 71 |............0#[q|
00000060 8c 17 15 |...|
00000063

```

Jason Dillaman wrote:

@Remi: can you please provide the output from the following command?

[...]

Actions

Copy link

Updated by Jason Dillaman almost 6 years ago

Thanks -- you are hitting issue #19413. After you upgraded your OSDs to v11.2.0, a pre-Kraken client created a snapshot which resulted in an invalid snapshot namespace id being inserted into the record (the last 4 0xff bytes). The easiest fix is to flatten all children of that snapshot and then use a pre-kraken client to delete the snapshot.

Alternatively, you could delete/rename the "snap" snapshot that you just added and then use a hex editor and the 'rados' CLI to to manipulate the omap value for the original snapshot in order to fix the corrupt entry.

Actions

Copy link

Updated by Jason Dillaman about 5 years ago

Status changed from Need More Info to Closed

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rbd

Custom queries

Bug #24528

Missing snapshot after upgrade from Kraken (11.2.0) to Luminous (12.2.5)

Updated by Greg Farnum almost 6 years ago

Updated by Jason Dillaman almost 6 years ago

Updated by Remi Vichery almost 6 years ago

Updated by Jason Dillaman almost 6 years ago

Updated by Jason Dillaman about 5 years ago