Project

General

Profile

Bug #19413

Cannot delete some snapshots after upgrade from jewel to kraken

Added by Benoit Loriot 5 months ago. Updated 4 months ago.

Status:
Pending Backport
Priority:
High
Target version:
-
Start date:
03/29/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
kraken
Needs Doc:
No

Description

Hello,

after upgrade from Jewel to Kraken, I got snapshots on several images that I can't delete.

  1. rbd -p qa-volumes snap ls --image volume-51883b81-87a9-4353-b317-9bef54e2e92f
    SNAPID NAME SIZE
    485 snapshot-d5d9b870-2416-4200-bd6d-d9a40be892d6 10240 MB
  1. rbd -p qa-volumes snap rm --image volume-51883b81-87a9-4353-b317-9bef54e2e92f --snap snapshot-d5d9b870-2416-4200-bd6d-d9a40be892d6
    Removing snap: 0% complete...failed.
    rbd: failed to remove snapshot: (22) Invalid argument
  1. ceph --version
    ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)

Related issues

Copied to rbd - Backport #19833: kraken: Cannot delete some snapshots after upgrade from jewel to kraken Resolved

History

#1 Updated by Denis Horbunov 4 months ago

Hi there. I've also stambled upon this issue. My ceph cluster was migrated from 10.2.5 to 11.2.0 (ceph --version
ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)) last week. I was aware about this issue in advance and all snapshots were deleted prior to the migration as a precaution meassure but one image has a undeleteable shapshot taken after migration to kraken.
- # rbd info oneimages-pool/one-73
rbd image 'one-73':
size 200 GB in 51200 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.10425036238e1f29
format: 2
features: layering, exclusive-lock
flags:

- # rbd snap ls oneimages-pool/one-73
SNAPID NAME SIZE
35637 snapmirror.1 200 GB
35704 snapmirror.2 200 GB

- # rbd snap unprotect
rbd: snap is already unprotected

- # rbd snap rm
Removing snap: 0% complete...failed.
rbd: failed to remove snapshot: (22) Invalid argument

#2 Updated by Denis Horbunov 4 months ago

May be this would be more helpfull.

rbd snap rm --debug-rbd 50
2017-04-26 13:55:07.388358 7f89d2b88000 5 librbd::AioImageRequestWQ: 0x55887fef8a70 : ictx=0x55887ff677b0
2017-04-26 13:55:07.388485 7f89d2b88000 20 librbd::ImageState: 0x55887ff66e80 open
2017-04-26 13:55:07.388500 7f89d2b88000 10 librbd::ImageState: 0x55887ff66e80 0x55887ff66e80 send_open_unlock
2017-04-26 13:55:07.388519 7f89d2b88000 10 librbd::image::OpenRequest: 0x55887ff68d70 send_v2_detect_header
2017-04-26 13:55:07.390219 7f89ab7fe700 10 librbd::image::OpenRequest: handle_v2_detect_header: r=0
2017-04-26 13:55:07.390237 7f89ab7fe700 10 librbd::image::OpenRequest: 0x55887ff68d70 send_v2_get_id
2017-04-26 13:55:07.390986 7f89ab7fe700 10 librbd::image::OpenRequest: handle_v2_get_id: r=0
2017-04-26 13:55:07.391005 7f89ab7fe700 10 librbd::image::OpenRequest: 0x55887ff68d70 send_v2_get_immutable_metadata
2017-04-26 13:55:07.392548 7f89ab7fe700 10 librbd::image::OpenRequest: handle_v2_get_immutable_metadata: r=0
2017-04-26 13:55:07.392559 7f89ab7fe700 10 librbd::image::OpenRequest: 0x55887ff68d70 send_v2_get_stripe_unit_count
2017-04-26 13:55:07.393214 7f89ab7fe700 10 librbd::image::OpenRequest: handle_v2_get_stripe_unit_count: r=-8
2017-04-26 13:55:07.393219 7f89ab7fe700 10 librbd::image::OpenRequest: 0x55887ff68d70 send_v2_get_data_pool
2017-04-26 13:55:07.393855 7f89ab7fe700 10 librbd::image::OpenRequest: 0x55887ff68d70 handle_v2_get_data_pool: r=0
2017-04-26 13:55:07.393870 7f89ab7fe700 10 librbd::ImageCtx: init_layout stripe_unit 4194304 stripe_count 1 object_size 4194304 prefix rbd_data.10425036238e1f29 format rbd_data.10425036238e1f29.%016llx
2017-04-26 13:55:07.393877 7f89ab7fe700 10 librbd::image::OpenRequest: 0x55887ff68d70 send_v2_apply_metadata: start_key=conf_
2017-04-26 13:55:07.394675 7f89ab7fe700 10 librbd::image::OpenRequest: 0x55887ff68d70 handle_v2_apply_metadata: r=0
2017-04-26 13:55:07.394694 7f89ab7fe700 20 librbd::ImageCtx: apply_metadata
2017-04-26 13:55:07.395198 7f89ab7fe700 20 librbd::ImageCtx: enabling caching...
2017-04-26 13:55:07.395203 7f89ab7fe700 20 librbd::ImageCtx: Initial cache settings: size=1024000 num_objects=10 max_dirty=768000 target_dirty=512000 max_dirty_age=1
2017-04-26 13:55:07.395290 7f89ab7fe700 10 librbd::ImageCtx: cache bytes 1024000 -> about 32 objects
2017-04-26 13:55:07.395330 7f89ab7fe700 10 librbd::image::OpenRequest: 0x55887ff68d70 send_register_watch
2017-04-26 13:55:07.395433 7f89ab7fe700 10 librbd::ImageWatcher: 0x7f8998034c50 registering image watcher
2017-04-26 13:55:07.401575 7f89ab7fe700 10 librbd::image::OpenRequest: 0x55887ff68d70 handle_register_watch: r=0
2017-04-26 13:55:07.401586 7f89ab7fe700 10 librbd::image::OpenRequest: 0x55887ff68d70 send_refresh
2017-04-26 13:55:07.401589 7f89ab7fe700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 send_v2_get_mutable_metadata
2017-04-26 13:55:07.402798 7f89ab7fe700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 handle_v2_get_mutable_metadata: r=0
2017-04-26 13:55:07.402821 7f89ab7fe700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 send_v2_get_flags
2017-04-26 13:55:07.403775 7f89ab7fe700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 handle_v2_get_flags: r=0
2017-04-26 13:55:07.403788 7f89ab7fe700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 send_v2_get_group
2017-04-26 13:55:07.404642 7f89ab7fe700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 handle_v2_get_group: r=0
2017-04-26 13:55:07.404652 7f89ab7fe700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 send_v2_get_snapshots
2017-04-26 13:55:07.406129 7f89ab7fe700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 handle_v2_get_snapshots: r=0
2017-04-26 13:55:07.406143 7f89ab7fe700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 send_v2_get_snap_namespaces
2017-04-26 13:55:07.407092 7f89ab7fe700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 handle_v2_get_snap_namespaces: r=0
2017-04-26 13:55:07.407115 7f89ab7fe700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 send_v2_init_exclusive_lock
2017-04-26 13:55:07.407123 7f89ab7fe700 10 librbd::ExclusiveLock: 0x7f8998041710 init
2017-04-26 13:55:07.407126 7f89ab7fe700 5 librbd::AioImageRequestWQ: block_writes: 0x55887ff677b0, num=1
2017-04-26 13:55:07.407198 7f89aaffd700 10 librbd::ExclusiveLock: 0x7f8998041710 handle_init_complete
2017-04-26 13:55:07.407208 7f89aaffd700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 handle_v2_init_exclusive_lock: r=0
2017-04-26 13:55:07.407211 7f89aaffd700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 send_v2_apply
2017-04-26 13:55:07.407215 7f89aaffd700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 handle_v2_apply
2017-04-26 13:55:07.407222 7f89aaffd700 20 librbd::image::RefreshRequest: 0x7f8998036ad0 apply
2017-04-26 13:55:07.407228 7f89aaffd700 20 librbd::image::RefreshRequest: new snapshot id=35704 name=snapmirror.2 size=214748364800
2017-04-26 13:55:07.407231 7f89aaffd700 20 librbd::image::RefreshRequest: new snapshot id=35637 name=snapmirror.1 size=214748364800
2017-04-26 13:55:07.407244 7f89aaffd700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 send_flush_aio
2017-04-26 13:55:07.407249 7f89aaffd700 10 librbd::image::RefreshRequest: 0x7f8998036ad0 handle_flush_aio: r=0
Removing snap: 2017-04-26 13:55:07.407254 7f89aaffd700 10 librbd::image::OpenRequest: handle_refresh: r=0
2017-04-26 13:55:07.407259 7f89aaffd700 10 librbd::ImageState: 0x55887ff66e80 0x55887ff66e80 handle_open: r=0
2017-04-26 13:55:07.407315 7f89d2b88000 20 librbd: snap_remove 0x55887ff677b0 snapmirror.1 flags: 0
2017-04-26 13:55:07.407327 7f89d2b88000 20 librbd: get_snap_namespace 0x55887ff677b0 snapmirror.10% complete...failed.

rbd: failed to remove snapshot: (22) Invalid argument
2017-04-26 13:55:07.407495 7f89d2b88000 20 librbd::ImageState: 0x55887ff66e80 close
2017-04-26 13:55:07.407502 7f89d2b88000 10 librbd::ImageState: 0x55887ff66e80 0x55887ff66e80 send_close_unlock
2017-04-26 13:55:07.407504 7f89d2b88000 10 librbd::image::CloseRequest: 0x55887ff68a50 send_shut_down_update_watchers
2017-04-26 13:55:07.407505 7f89d2b88000 20 librbd::ImageState: 0x55887ff66e80 shut_down_update_watchers
2017-04-26 13:55:07.407506 7f89d2b88000 20 librbd::ImageState: 0x55887ff66f60 ImageUpdateWatchers::shut_down
2017-04-26 13:55:07.407508 7f89d2b88000 20 librbd::ImageState: 0x55887ff66f60 ImageUpdateWatchers::shut_down: completing shut down
2017-04-26 13:55:07.407542 7f89aaffd700 10 librbd::image::CloseRequest: 0x55887ff68a50 handle_shut_down_update_watchers: r=0
2017-04-26 13:55:07.407550 7f89aaffd700 10 librbd::image::CloseRequest: 0x55887ff68a50 send_unregister_image_watcher
2017-04-26 13:55:07.407554 7f89aaffd700 10 librbd::ImageWatcher: 0x7f8998034c50 unregistering image watcher
2017-04-26 13:55:07.412588 7f89aaffd700 10 librbd::image::CloseRequest: 0x55887ff68a50 handle_unregister_image_watcher: r=0
2017-04-26 13:55:07.412597 7f89aaffd700 10 librbd::image::CloseRequest: 0x55887ff68a50 send_shut_down_aio_queue
2017-04-26 13:55:07.412600 7f89aaffd700 5 librbd::AioImageRequestWQ: shut_down: in_flight=0
2017-04-26 13:55:07.412604 7f89aaffd700 10 librbd::image::CloseRequest: 0x55887ff68a50 handle_shut_down_aio_queue: r=0
2017-04-26 13:55:07.412607 7f89aaffd700 10 librbd::image::CloseRequest: 0x55887ff68a50 send_shut_down_exclusive_lock
2017-04-26 13:55:07.412608 7f89aaffd700 10 librbd::ExclusiveLock: 0x7f8998041710 shut_down
2017-04-26 13:55:07.412612 7f89aaffd700 10 librbd::ExclusiveLock: 0x7f8998041710 handle_shutdown: r=0
2017-04-26 13:55:07.412615 7f89aaffd700 20 librbd::AioImageRequestWQ: clear_require_lock_on_read
2017-04-26 13:55:07.412617 7f89aaffd700 5 librbd::AioImageRequestWQ: unblock_writes: 0x55887ff677b0, num=0
2017-04-26 13:55:07.412628 7f89aaffd700 10 librbd::image::CloseRequest: 0x55887ff68a50 handle_shut_down_exclusive_lock: r=0
2017-04-26 13:55:07.412634 7f89aaffd700 10 librbd::image::CloseRequest: 0x55887ff68a50 send_flush_readahead
2017-04-26 13:55:07.412640 7f89aaffd700 10 librbd::image::CloseRequest: 0x55887ff68a50 handle_flush_readahead: r=0
2017-04-26 13:55:07.412644 7f89aaffd700 10 librbd::image::CloseRequest: 0x55887ff68a50 send_shut_down_cache
2017-04-26 13:55:07.412700 7f89aaffd700 10 librbd::image::CloseRequest: 0x55887ff68a50 handle_shut_down_cache: r=0
2017-04-26 13:55:07.412706 7f89aaffd700 10 librbd::image::CloseRequest: 0x55887ff68a50 send_flush_op_work_queue
2017-04-26 13:55:07.412710 7f89aaffd700 10 librbd::image::CloseRequest: 0x55887ff68a50 handle_flush_op_work_queue: r=0
2017-04-26 13:55:07.412713 7f89aaffd700 10 librbd::image::CloseRequest: 0x55887ff68a50 handle_flush_image_watcher: r=0
2017-04-26 13:55:07.412734 7f89aaffd700 10 librbd::ImageState: 0x55887ff66e80 0x55887ff66e80 handle_close: r=0

#3 Updated by Jason Dillaman 4 months ago

  • Project changed from Ceph to rbd
  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman
  • Priority changed from Normal to High

#4 Updated by Jason Dillaman 4 months ago

  • Backport set to kraken

#5 Updated by Jason Dillaman 4 months ago

  • Backport deleted (kraken)

#6 Updated by Jason Dillaman 4 months ago

  • Status changed from In Progress to Need More Info

@Denis: can you do me a favor, run the following, and attach the results?

# rados -p oneimages-pool listomapvals rbd_header.10425036238e1f29

#7 Updated by Jason Dillaman 4 months ago

  • Status changed from Need More Info to Need Review
  • Backport set to kraken

#8 Updated by Benoit Loriot 4 months ago

# rados -p qa-volumes listomapvals rbd_data.3d91f429d7ac52
error getting omap keys qa-volumes/rbd_data.3d91f429d7ac52: (2) No such file or directory

#9 Updated by Jason Dillaman 4 months ago

@Benoit: switch "rbd_data" for "rbd_header" in your command above.

#10 Updated by Benoit Loriot 4 months ago

Here is the correct output

# rados -p qa-volumes listomapvals rbd_header.3d91f429d7ac52 --debug-rbd 50
features
value (8 bytes) :
00000000  3d 00 00 00 00 00 00 00                           |=.......|
00000008

object_prefix
value (27 bytes) :
00000000  17 00 00 00 72 62 64 5f  64 61 74 61 2e 33 64 39  |....rbd_data.3d9|
00000010  31 66 34 32 39 64 37 61  63 35 32                 |1f429d7ac52|
0000001b

order
value (1 bytes) :
00000000  16                                                |.|
00000001

size
value (8 bytes) :
00000000  00 00 00 80 02 00 00 00                           |........|
00000008

snap_seq
value (8 bytes) :
00000000  e8 01 00 00 00 00 00 00                           |........|
00000008

snapshot_00000000000001e5
value (132 bytes) :
00000000  05 01 7e 00 00 00 e5 01  00 00 00 00 00 00 2d 00  |..~...........-.|
00000010  00 00 73 6e 61 70 73 68  6f 74 2d 64 35 64 39 62  |..snapshot-d5d9b|
00000020  38 37 30 2d 32 34 31 36  2d 34 32 30 30 2d 62 64  |870-2416-4200-bd|
00000030  36 64 2d 64 39 61 34 30  62 65 38 39 32 64 36 00  |6d-d9a40be892d6.|
00000040  00 00 80 02 00 00 00 3d  00 00 00 00 00 00 00 01  |.......=........|
00000050  01 1c 00 00 00 ff ff ff  ff ff ff ff ff 00 00 00  |................|
00000060  00 fe ff ff ff ff ff ff  ff 00 00 00 00 00 00 00  |................|
00000070  00 00 00 00 00 00 00 00  00 00 01 01 04 00 00 00  |................|
00000080  ff ff ff ff                                       |....|
00000084

#11 Updated by Jason Dillaman 4 months ago

@Benoit: Great, thanks. I was expecting the problem to be four 0xFF bytes at the end of your snapshot_XYZ value. The associated PR should prevent that issue from occurring in the future. In the meantime, the easiest way to resolve the issue would be to use a pre-Kraken rbd CLI to remove the snapshot. Alternatively, you can use the rados CLI to write the key value to a file, use a hex editor to change the last four bytes to all 0x00, and again use rados CLI to update the value using the contents of said file.

The issue occurred when the OSDs were upgraded to Kraken and a Jewel RBD client created a snapshot. The Kraken OSD incorrectly wrote the four 0xFF values (-1) instead of four 0x00 values since the Jewel RBD client doesn't understand snapshot namespaces. When the RBD client was later upgraded to Kraken, it interpreted that namespace value as an invalid namespace and returns a -EINVAL error code.

#12 Updated by Benoit Loriot 4 months ago

Thanks Jason, snapshot deletion worked from a Jewel client.

#13 Updated by Mykola Golub 4 months ago

  • Status changed from Need Review to Pending Backport

#14 Updated by Jason Dillaman 4 months ago

  • Copied to Backport #19833: kraken: Cannot delete some snapshots after upgrade from jewel to kraken added

#15 Updated by Denis Horbunov 4 months ago

I'm sorry for being late. Thank you very much indeed Jason! This solved my problem too.

Also available in: Atom PDF