Project

General

Profile

Actions

Bug #35974

open

Apparent export-diff/import-diff corruption

Added by Jason Dillaman over 5 years ago. Updated over 3 years ago.

Status:
Need More Info
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From the ML:

We utilize Ceph RBDs for our users' storage and need to keep data synchronized across data centres. For this we rely on 'rbd export-diff / import-diff'. Lately we have been noticing cases in which the file system on the 'destination RBD' is corrupt. We have been trying to isolate the issue, which may or may not be due to Ceph. We suspect the problem could be in 'rbd export-diff / import-diff' and are wondering if people have been seeing issues with these tools. Let me explain our use case and issue in more detail.

We have a number of data centres each with a Ceph cluster storing tens of thousands of RBDs. We maintain extra copies of each RBD in other data centres. After we are 'done' using a RBD, we create a snapshot and use 'rbd export-diff' to create a diff between the most recent 'common' snapshot at the other data center. We send the data over the network, and use 'rbd import-diff' on the destination. When we apply a diff to a destination RBD we can guarantee its 'HEAD' is clean. Of course we guarantee that an RBD is only used in one data centre at a time.

We noticed corruption at the destination RBD based on fsck failures, further investigation showed that checksums on the RBD mismatch as well. Somehow the data is sometimes getting corrupted either by our software or 'rbd export-diff / import-diff'. Our investigation suggests that the the problem is in 'rbd export-diff/import-diff'. The main evidence of this is that occasionally we sync an RBD between multiple data centres. Each sync is a separate job with its own 'rbd export-diff'. We noticed that both destination locations have the same corruption (and the same checksum) and the source is healthy.

In addition to this, we are seeing a similar type of corruption in another use case when we migrate RBDs and snapshots across pools. In this case we clone a version of an RBD (e.g. HEAD-3) to a new pool and rely on 'rbd export-diff/import-diff' to restore the last 3 snapshots on top. Here too we see cases of fsck and RBD checksum failures.
We maintain various metrics and logs. Looking back at our data we have seen the issue at a small scale for a while on Jewel, but the frequency increased recently. The timing may have coincided with a move to Luminous, but this may be coincidence. We are currently on Ceph 12.2.5.

We are wondering if people are experiencing similar issues with 'rbd export-diff / import-diff'. I'm sure many people use it to keep backups in sync. Since it is backups, many people may not inspect the data often. In our use case, we use this mechanism to keep data in sync and actually need the data in the other location often. We are wondering if anyone else has encountered any issues, it's quite possible that many people may have this issue, buts simply don't realize. We are likely hitting it much more frequently due to the scale of our operation (tens of thousands of syncs a day).


Files

export-diff_3x.tar.gz (199 KB) export-diff_3x.tar.gz Tarball of the export-diff logs Patrick McLean, 09/13/2018 08:51 PM
export-diff_7.log.tar.gz (99.5 KB) export-diff_7.log.tar.gz export-diff logs with rbd_balance_snap_reads = false Patrick McLean, 09/14/2018 06:42 PM
export-diff-balance.tar.gz.00 (1000 KB) export-diff-balance.tar.gz.00 Cliff Pajaro, 09/20/2018 09:10 PM
export-diff-balance.tar.gz.01 (1000 KB) export-diff-balance.tar.gz.01 Cliff Pajaro, 09/20/2018 09:10 PM
export-diff-balance.tar.gz.02 (449 KB) export-diff-balance.tar.gz.02 Cliff Pajaro, 09/20/2018 09:10 PM
Actions #1

Updated by Jason Dillaman over 5 years ago

  • Description updated (diff)
Actions #2

Updated by Patrick McLean over 5 years ago

I have attached logs of the export-diffs run with "--debug-rbd=20" and "--debug-rados=20"

Actions #3

Updated by Jason Dillaman over 5 years ago

  • Status changed from New to Need More Info

@Patrick: were the resulting exports different between run 3b and 3c? The logs indicate that they read the same data from the source image. Also, are you using krbd or librbd as the client?

Actions #4

Updated by Patrick McLean over 5 years ago

The exports between 3b and 3c were identical.
All the clients that are mounting the filesystems are currently using krbd. The exports are being done with "rbd export-diff" on the command line from a different machine (a large amount of time later), which would be using krbd as the client.

Doing further investigation last night, we discovered that setting

rbd_balance_snap_reads = false
(it was true before) seems to make the export-diff not create corrupted exports. We are thinking we should also set
rbd_balance_parent_reads = false
as well.

I am attaching a log of the export-diff with

rbd_balance_snap_reads = false

Actions #5

Updated by Jason Dillaman over 5 years ago

@Patrick: Interesting find. If it truly is related to just that option, we will have to get a RADOS core team member to look at this since the only delta is that the read request is sent to a random OSD instead of the primary for the PG. That could indicate either (1) you have a data inconsistency between the OSDs that a deep-scrub should detect or (2) there is a very odd bug in read handling against a secondary.

Assuming the issue is still at the same LBA, the new log doesn't show any difference in the diff stats. Is there any chance you could re-run the "export-diff" adding "--debug-objecter=20"? It would be great to get a run where "rbd_balance_snap_reads = false" (and therefore the export is not corrupt) and one where "rbd_balance_snap_reads = true" and the export is corrupt.

Updated by Cliff Pajaro over 5 years ago

@Jason Borden
For the logs sent previously, I performed export-diff between snapshot1 and snapshot2. When I did an rbd export on snapshot1 with read balance set to true, I got the following error:

2018-09-17 22:36:20.286978 7f6d64ff9700 -1 librbd::io::ObjectRequest: 0x7f6d5409a650 handle_read_object: failed to read from object: (5) Input/output error
rbd: error reading from source image at offset 142606336: (5) Input/output error
2018-09-17 22:36:20.290172 7f6d5ffff700 -1 librbd::io::ObjectRequest: 0x7f6d54001630 handle_read_cache: failed to read from cache: (5) Input/output error
Exporting image: 0% complete...failed.
rbd: export error: (5) Input/output error

I identified the PG that byte 142606336 corresponds to and performed a deep-scrub on it. Now I don't get an export error or have discrepancies with export-diff between snapshot1 and snapshot2 with read balance set to true.

I still get a discrepancy with export-diff between the start of the RBD and snapshot1. I have attached the logs for that with the debug option you specified and for both read balance set to true and false.

Actions #7

Updated by Cliff Pajaro over 5 years ago

I did a deep analysis of the export-diff files created with read-balance set to true and false. When read-balance is true, even though the export-diff command is configured to be between snapshot0 and snapshot1, I see data from a future snapshot (e.g. snapshot3). It seems like a deep scrub fixes this issue.

Actions #8

Updated by Jason Dillaman over 5 years ago

@Cliff: are you saying the deep-scrub fixed the corruption between snapshot0 and snapshot1 -- or are you saying that you are still seeing data from snapshot3?

Actions #9

Updated by Cliff Pajaro over 5 years ago

@Jason Borden: If rbd_balance_snap_reads is enabled, deep-scrub fixes the issue.

Actions #10

Updated by Jason Dillaman over 5 years ago

  • Project changed from rbd to RADOS
  • Component(RADOS) OSD added

Moving to OSD team in case they have any follow-up -- but it just sounds like your OSDs were inconsistent for some reason. The other question is whether or not we should internally disable balanced and localized reads when a cache tier is in use (since it's two lightly tested features being used in combination).

Actions #11

Updated by Cliff Pajaro over 5 years ago

Ultimately we have disabled the rbd_balance_snap_reads feature.

To help the developers troubleshoot the problem, here's a summary of what we've found:
Enable the feature and work on an RBD with deep scrub problems
-librbd will see data corruption (rbd-nbd map, rbd export, rbd export-diff), the issue is not seen with krbd (rbd map)
-data appears to be taken from incorrect snapshots (e.g. snapshots from the future)
-deep scrub fixes the problem

Actions #12

Updated by Cliff Pajaro over 5 years ago

The other question is whether or not we should internally disable balanced and localized reads when a cache tier is in use (since it's two lightly tested features being used in combination).

You mention cache tiering but that is not something we use.
Actions #13

Updated by Neha Ojha over 5 years ago

Should disable balanced and localized reads when a cache tier is in use.

Actions #14

Updated by Mykola Golub over 5 years ago

Cliff Pajaro wrote:

the issue is not seen with krbd (rbd map)

rbd_balance_snap_reads config option is only for librbd clients, krbd does not use it.

Actions #15

Updated by Jason Dillaman over 5 years ago

Cliff Pajaro wrote:

[...]
You mention cache tiering but that is not something we use.

Sorry, I mixed this one up with another case. This one just sounds like one or more OSDs got out-of-sync with snapshot metadata somehow and the balanced read is what exposed it.

Actions #16

Updated by Josh Durgin over 4 years ago

This sounds like it may be due to balance reads not behaving properly for the diff rados op, since it's not operating on a snapshot, but SNAP_DIR, which could be out of sync since it is not immutable.

Jason, can librbd avoid balancing reads for the diff operations?

Actions #17

Updated by Jason Dillaman over 4 years ago

@Josh Jones: AFAIK, the diff calculations do not set the LOCALIZED/BALANCED read flags. Those are only (optionally) set on the IO read path.

Actions #18

Updated by Josh Durgin over 3 years ago

Looking at this again, it seems like a potential bug when reading from replicas and encountering an EIO - this should not have been transmitted back to the client.

Actions

Also available in: Atom PDF