Project

General

Profile

Actions

Bug #62438

open

object-map rebuild might cause rbd data corruption

Added by Igor Fedotov 9 months ago. Updated 9 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
quincy, reef
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The following data corruption scenario is 100% reproducible in my vstart cluster (built from main branch). Seen reports for the same issue for real 17.2.5 clusters.

1) ceph osd pool create rbd 64 64; rbd create --image-feature layering,exclusive-lock -s 10G rbd/corruption ; rbd map rbd/corruption; mkfs.ext4 /dev/rbd0 ; rbd feature enable rbd/corruption object-map,fast-diff

2) rbd object-map rebuild rbd/corruption; mount /dev/rbd0 /mnt/rbd0 ; dd if=/dev/urandom of=/mnt/rbd0/random bs=4k count=1000000 ; umount /mnt/rbd0 ; rbd unmap /dev/rbd0 ;

3) rbd map rbd/corruption ; e2fsck -n -f /dev/rbd0 ; rbd unmap /dev/rbd0

As a result e2fsck from step 3 reports errors:

e2fsck 1.46.5 (30-Dec-2021)
Pass 1: Checking inodes, blocks, and sizes
Inode 12 has an invalid extent node (blk 33795, lblk 0)
Clear? no

Inode 12 extent tree (at level 1) could be shorter. Optimize? no

Inode 12, i_blocks is 8000008, should be 0. Fix? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -(9264--31791) -33795 -(34816--98303) -(100352--163839) -(165888--229375) -(231424--294911) -(296960--524287) -(532512--555039) -(557056--819199) -(821248--884735) -(886784--969279) -(2555904--2621439)
Fix? no

/dev/rbd0: *** WARNING: Filesystem still has errors ***

Actions #1

Updated by Igor Fedotov 9 months ago

A few side notes:
1) Apparently similar issues were mentioned a while ago, e.g. https://www.spinics.net/lists/ceph-users/msg26158.html
2) The issue isn't reproducible when using rbd-nbd, "rbd feature enable" stalls, apparently due to exclusive locking. And that's what kernel implementation probably misses.
3) Kernel versions attempted: 5.10.?, 6.1.6, 6.4.4

Actions #2

Updated by Ilya Dryomov 9 months ago

Hi Igor,

1) ceph osd pool create rbd 64 64; rbd create --image-feature layering,exclusive-lock -s 10G rbd/corruption ; rbd map rbd/corruption; mkfs.ext4 /dev/rbd0 ; rbd feature enable rbd/corruption object-map,fast-diff

The kernel client doesn't support enabling and disabling image features while the image is mapped. So only the following expected to work:

$ ceph osd pool create rbd 64 64
$ rbd create --image-feature layering,exclusive-lock -s 10G rbd/corruption
$ rbd feature enable rbd/corruption object-map,fast-diff
$ rbd map rbd/corruption
$ mkfs.ext4 /dev/rbd0
$ rbd object-map rebuild rbd/corruption
...
Actions #3

Updated by Igor Fedotov 9 months ago

Ilya Dryomov wrote:

Hi Igor,

1) ceph osd pool create rbd 64 64; rbd create --image-feature layering,exclusive-lock -s 10G rbd/corruption ; rbd map rbd/corruption; mkfs.ext4 /dev/rbd0 ; rbd feature enable rbd/corruption object-map,fast-diff

The kernel client doesn't support enabling and disabling image features while the image is mapped. So only the following expected to work:

[...]

So client has to return an error in this case, right?

Actions #4

Updated by Igor Fedotov 9 months ago

btw: rbd info indicates updated feature maps after step 1:
rbd image 'corruption':
size 10 GiB in 2560 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 10cc91bb3ff8
block_name_prefix: rbd_data.10cc91bb3ff8
format: 2
features: layering, exclusive-lock, object-map, fast-diff
op_features:
flags: object map invalid, fast diff invalid
create_timestamp: Tue Aug 15 19:40:15 2023
access_timestamp: Tue Aug 15 19:40:15 2023
modify_timestamp: Tue Aug 15 19:40:15 2023

Actions #5

Updated by Alexander Patrakov 9 months ago

Ilya Dryomov wrote:

The kernel client doesn't support enabling and disabling image features while the image is mapped.

Is it realistic for the `rbd feature enable` command to detect that the image is used by a kernel client (possibly on a different host)? If so, it would be nice for it to error out.

Also, I would expect the RBD documentation to mention this data-corruption possibility, especially since there exist posts by Ceph developers that could be misinterpreted as if they state that it is supposed to work. See e.g. https://tracker.ceph.com/issues/15007

Actions #6

Updated by Ilya Dryomov 9 months ago

Igor Fedotov wrote:

The kernel client doesn't support enabling and disabling image features while the image is mapped. So only the following expected to work:

[...]

So client has to return an error in this case, right?

Right. The kernel client would have no choice but to refuse any further I/O after an image feature change is detected, but that is definitely better than silently ignoring it. I'll work on that.

Actions #7

Updated by Ilya Dryomov 9 months ago

Igor Fedotov wrote:

btw: rbd info indicates updated feature maps after step 1:
rbd image 'corruption':
size 10 GiB in 2560 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 10cc91bb3ff8
block_name_prefix: rbd_data.10cc91bb3ff8
format: 2
features: layering, exclusive-lock, object-map, fast-diff
op_features:
flags: object map invalid, fast diff invalid
create_timestamp: Tue Aug 15 19:40:15 2023
access_timestamp: Tue Aug 15 19:40:15 2023
modify_timestamp: Tue Aug 15 19:40:15 2023

This is expected. The object-map and fast-diff image features get enabled, but the kernel client simply ignores that.

Actions #8

Updated by Ilya Dryomov 9 months ago

Alexander Patrakov wrote:

Is it realistic for the `rbd feature enable` command to detect that the image is used by a kernel client (possibly on a different host)? If so, it would be nice for it to error out.

It's feasible but I'm not sure it's desired. Ceph and Linux kernel are separate projects with widely different release schedules. If the kernel client grows support for enabling and disabling (some) image features, we wouldn't want older librbd/CLI to error out.

Also, I would expect the RBD documentation to mention this data-corruption possibility, especially since there exist posts by Ceph developers that could be misinterpreted as if they state that it is supposed to work. See e.g. https://tracker.ceph.com/issues/15007

It's not a data corruption per se, at least not in this particular case -- all the data is there. The object map just needs to be rebuilt again prior to running e2fsck in step 3.

Actions

Also available in: Atom PDF