Project

General

Profile

Actions

Bug #17545

closed

Data corruption using RBD with caching enabled

Added by Wido den Hollander over 7 years ago. Updated over 7 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
librbd
Target version:
-
% Done:

0%

Source:
other
Tags:
rbd,corruption,windows,sqlserver,caching,writeback
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This was reported on launchpad, but I think it's better suited to be reported here: https://bugs.launchpad.net/mos/+bug/1627775

The situation is that when using Windows on top of RBD with caching enabled Windows 2012R2 complains about page corruptions.

Tested with both Firefly and Hammer it only happens on RBD backed volumes with caching enabled. When the writeback cache is disabled the problem does NOT occur.

The issue is not reproducible on LVM/file based storage.

Steps to reproduce: run SQL Server running on Windows 2012R2 or SQLioSim (stress test utility emulating SQL server)

Expected results: no errors

Actual result:
xpected FileId: 0x0
Received FileId: 0x0
Expected PageId: 0xCB19C
Received PageId: 0xCB19A (does not match expected)
Received CheckSum: 0x9F444071
Calculated CheckSum: 0x89603EC9 (does not match expected)
Received Buffer Length: 0x2000

Reproducibility: steadily reproducable with SQLioSim

Like mentioned, the workaround is currently to disable RBD caching, but that kills the performance of the system completely.

The issue has been reproduced using OpenStack on Ubuntu 12.04 and 14.04, but also on Proxmox. This hints towards a RBD issue and not so much a Qemu issue.

We still have to test this with the Jewel client (librbd) on the systems, but so far Firefly and Hammer have the same result.


Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Backport #16546: hammer: ObjectCacher doesn't correctly handle read replies on split BufferHeadsResolvedAlexey SheplyakovActions
Actions #1

Updated by Wido den Hollander over 7 years ago

  • Release set to firefly
  • Release set to hammer
Actions #2

Updated by Wido den Hollander over 7 years ago

Seems like it has been fixed by #16002

Tests have been running with that fix applied on a Hammer client and after 24 hours we haven't seen the issue come back.

Actions #3

Updated by Greg Farnum over 7 years ago

  • Is duplicate of Backport #16546: hammer: ObjectCacher doesn't correctly handle read replies on split BufferHeads added
Actions #4

Updated by Greg Farnum over 7 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF