Bug #61472: [krbd] volume data corruption when using rbd-mirror w/failover - Linux kernel client - Ceph

Actions

Copy link

Bug #61472

closed

[krbd] volume data corruption when using rbd-mirror w/failover

Added by Christopher Hoffman 11 months ago. Updated 11 months ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Ilya Dryomov

Category:

rbd

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

This bug creates holes (zeros in this case) in the image where there should be data. This causes corruption on the block and filesystem level. A reproducer was created to illustrate the steps to cause the problem and show associated symptoms.

The environment:

There's two sites, site-a and site-b. To start the scenario, an image is created with mirror mode.

Here's how we arrive at the bug:

1. Maps image on Primary
2. Formats then mount bdev
3a. starts manually creating 40 mirror image snapshots, every 3 seconds. This step will end 3b when completed.  
3b. untars via script provided with timeout of 5m
4. unmap then demote image on primary
5. wait for demote image to be synced to secondary
6. promote image on secondary
7. map image on secondary
8. fsck -fn image

After running the steps above a data corruption occurred.
This can be seen running fsck on the volume:

+ sudo fsck -fn /dev/rbd0
fsck from util-linux 2.38.1
e2fsck 1.46.5 (30-Dec-2021)
Pass 1: Checking inodes, blocks, and sizes
HTREE directory inode 83315 has an invalid root node.
Clear HTree index? no

HTREE directory inode 83315 has an unsupported hash version (20)
Clear HTree index? no

HTREE directory inode 83315 uses an incompatible htree root node flag.
Clear HTree index? no
...

Steps to reproduce are:
1. In the build directory of a clone of upstream run:

sh ./rbd_reproducer_corrupt1.sh

2. This script will run until error code is returned (most likely going to be fsck).

Files

Download all files

kernel_untar.sh (3.9 KB) kernel_untar.sh		Christopher Hoffman, 05/26/2023 04:44 PM
mirrorenv.sh (4.66 KB) mirrorenv.sh		Christopher Hoffman, 05/26/2023 04:44 PM
rbd_reproducer_corrupt1.sh (2.64 KB) rbd_reproducer_corrupt1.sh		Christopher Hoffman, 05/26/2023 04:44 PM
may25-26-corrupt1 (17.1 KB) may25-26-corrupt1	full fsck output	Christopher Hoffman, 05/26/2023 04:53 PM
demote-promote snap fsck.txt (5.77 KB) demote-promote snap fsck.txt		Christopher Hoffman, 06/01/2023 09:13 PM
rbd_reproducer_corrupt2.sh (3.04 KB) rbd_reproducer_corrupt2.sh		Christopher Hoffman, 06/01/2023 09:20 PM
keep-mirror-snaps.patch (3.14 KB) keep-mirror-snaps.patch		Christopher Hoffman, 06/01/2023 09:51 PM

Related issues 3 (1 open — 2 closed)

Actions

Copy link Download all files

Updated by Christopher Hoffman 11 months ago

File demote-promote snap fsck.txt demote-promote snap fsck.txt added
File keep-mirror-snaps.patch added
File rbd_reproducer_corrupt2.sh rbd_reproducer_corrupt2.sh added
Description updated (diff)

I was able to reproduce the above within a shorter time frame.

The key differences in the above procedure is:

1. Instead of 1m snap schedule interval, it is now 40 manual ones every 3 seconds. This happens during I/O.
2. Use script: rbd_reproducer_corrupt2.sh

Then after running the script, I verified last known snapshot on original primary side (demoted snap) and the original secondary side (promoted snap). The former passes fsck, while the latter does not. See attached "demote-promote snap fsck.txt" for more details.

I've also attached a working patch named "keep-mirror-snaps.patch" that doesn't remove synced snapshots to allow for a back-in-time analysis, if needed.

To make things easier, one can use the working branch: "wip-dcorruption1" from: https://github.com/chrisphoffman/ceph.git.

Actions

Copy link

Updated by Christopher Hoffman 11 months ago

File deleted (~~keep-mirror-snaps.patch~~)

Actions

Copy link

Updated by Christopher Hoffman 11 months ago

File keep-mirror-snaps.patch keep-mirror-snaps.patch added

Actions

Copy link

Updated by Ilya Dryomov 11 months ago

Project changed from rbd to Linux kernel client
Subject changed from volume data corruption when using rbd-mirror w/failover to [krbd] volume data corruption when using rbd-mirror w/failover
Category set to rbd
Status changed from New to Fix Under Review
Assignee set to Ilya Dryomov

Actions

Copy link

Updated by Ilya Dryomov 11 months ago

Copied to Bug #61616: [librbd] volume data corruption when using rbd-mirror w/failover added

Actions

Copy link

Updated by Ilya Dryomov 11 months ago

Copied to Bug #61617: [test][krbd] volume data corruption when using rbd-mirror w/failover added

Actions

Copy link

Updated by Ilya Dryomov 11 months ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

Updated by Ilya Dryomov 11 months ago

Status changed from Pending Backport to Resolved

In 5.4.247, 5.10.184, 5.15.117, 6.1.34 and 6.3.8.

Actions

Copy link

Updated by Ilya Dryomov 7 months ago

Related to Bug #63010: [test] reproducer for a deadlock which can occur when a watch error is hit while krbd is recovering from a previous watch error added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Linux kernel client

Custom queries

Bug #61472

[krbd] volume data corruption when using rbd-mirror w/failover

Updated by Christopher Hoffman 11 months ago

Updated by Christopher Hoffman 11 months ago

Updated by Christopher Hoffman 11 months ago

Updated by Ilya Dryomov 11 months ago

Updated by Ilya Dryomov 11 months ago

Updated by Ilya Dryomov 11 months ago

Updated by Ilya Dryomov 11 months ago

Updated by Ilya Dryomov 11 months ago

Updated by Ilya Dryomov 7 months ago