Project

General

Profile

Actions

Bug #24102

closed

snapshot of RBD image is found to be all zero.

Added by 宏伟 唐 almost 6 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
rbd
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After I successfully created a snapshot for a RBD image in replicated pool, I exported the newly created snapshot out with the command -- "rbd export rbdpool/xxxx@snap ./snapshot". Unfortunately, the data in the snapshot is not consistent with the source image compared by md5sum. Moreover, I viewed the snapshot with od, and found that the data in the snapshot is all "0".


Files

rbd.log.txt (127 KB) rbd.log.txt 宏伟 唐, 06/08/2018 12:52 PM
Actions #1

Updated by Greg Farnum almost 6 years ago

  • Project changed from Ceph to rbd
Actions #2

Updated by Jason Dillaman almost 6 years ago

  • Status changed from New to Need More Info

Any chance the image is a clone or you are using cache tiering? What Ceph version are you running? Can you attach the resulting 'rbd.log' file from running "rbd export rbdpool/xxxx@snap ./snapshot-new --debug-rbd=20 > rbd.log"

Actions #3

Updated by 宏伟 唐 almost 6 years ago

Yes, I am using cache tiering.
The version of ceph is 12.2.2.

The attachment is the log of running the command "rbd export rbdpool/xxxx@snap ./snapshot-new --debug-rbd=20 > rbd.log" on a corrupted snapshot.

Actions #4

Updated by 宏伟 唐 almost 6 years ago

I fetch the rbd_object_map for the image and the snapshot, the rbd_object_map(s) of them are different.

The command I used is as follows:

rados -p reppool get rbd_object_map.<id> image_map
rados -p reppool get rbd_object_map.<id>.<snapseq> snap_map

hexdump -Cv image_map
hexdump -Cv snap_map

Actions #5

Updated by 宏伟 唐 almost 6 years ago

On the OSD side, the snapshot is marked as "removed" (in the removed_snaps set). So the find_object_context function returns "-ENOENT" and librbd returns all zero block to the client.

However, the snapshot is NOT removed by us, because the snapshot is protected all the time. And the rbd_header.<id> block in the "reppool" pool still records the snapshot all the time.

Actions #6

Updated by Jason Dillaman almost 6 years ago

Are you using the RBD data-pool feature on this image? The initial version of Luminous had a bug where the snapshots were recorded in the image header pool instead of the data pool.

Actions #7

Updated by 宏伟 唐 almost 6 years ago

How to check whether the data-pool feature is enabled?

I use the command "rbd info reppool/<image-id>" on the image, and the output contains the following information about features:

features: layering, exclusive-lock, object-map, fast-diff, deep-flatten

Jason Dillaman wrote:

Are you using the RBD data-pool feature on this image? The initial version of Luminous had a bug where the snapshots were recorded in the image header pool instead of the data pool.

Actions #8

Updated by Jason Dillaman almost 6 years ago

OK, then it's not enabled.

At this point, I don't have enough information to assist trying to determine how your snapshot was deleted. The only way that librbd could do it would be if you had an unprotected snapshot, started the snap remove process and aborted it after it deleted the RADOS snapshot but before it removed the snapshot record from the image header, and then protected the snapshot. If you can discover how to repeat this issue, we will look into it.

Actions #9

Updated by 宏伟 唐 almost 6 years ago

This issue can be triggered by deleting a RBD image from a cache tiering pool. But the probability of the occurrence is not 100%.

When I delete a RBD image, other images in the same cache tiering pool might be destroyed.

Actions #10

Updated by Jason Dillaman almost 6 years ago

So you are saying that if you remove image X via 'rbd rm <base-tier-pool>/<image-name>', running "ceph osd pool ls detail --format json-pretty | grep removed" for your pool will show new, unrelated snapshots added to the collection?

Actions #11

Updated by 宏伟 唐 almost 6 years ago

Yes, this is it.

Jason Dillaman wrote:

So you are saying that if you remove image X via 'rbd rm <base-tier-pool>/<image-name>', running "ceph osd pool ls detail --format json-pretty | grep removed" for your pool will show new, unrelated snapshots added to the collection?

Actions #12

Updated by Jason Dillaman almost 6 years ago

I cannot reproduce what you are seeing. Can you please provide exact CLI commands and CLI output?

Actions #13

Updated by Jason Dillaman about 5 years ago

  • Status changed from Need More Info to Closed

Closing due to a lack of activity

Actions

Also available in: Atom PDF