Project

General

Profile

Actions

Bug #11898

closed

Writing on forced umount causes silent data loss

Added by Kevin Lamontagne almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
fs/ceph
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Writing on a CephFS bind mount, (for example, inside a LXC container), for which the parent is umounted or broken, emits the following kernel message on the client machine:

ceph: writepage_start ffff883exxxxxxxx on forced umount

The "dmesg" quickly fills up with such messages. The file has a seemingly correct size but is now full of nulls instead of the expected data. The writing process does not block or report an error. After writing a small file on this client machine, I see that other clients reading the file may block.

Ceph: 0.87.1
Kernel: kernel.org 3.18.0 with Ubuntu deb package configuration

I've joined the logs for the 2 MDS. The machine was booted at 2015-06-04 16:43. data loss started when a container with a bind mount rebooted around 2015-06-04 ~18:28 and ended with a restart of all containers and remount of parent mount at 2015-06-05 ~11:21

There's nothing out of the ordinary in mon and osd logs.

To be honest, I'm not entirely sure of the chain of events because I'm not able to replicate this. I'd be happy to provide more info.


Files

cephmds-0.log (42.4 KB) cephmds-0.log Kevin Lamontagne, 06/05/2015 04:21 PM
cephmds-1.log (418 KB) cephmds-1.log Kevin Lamontagne, 06/05/2015 04:21 PM
Actions #1

Updated by Greg Farnum almost 9 years ago

Is this in any way different from what happens against any other filesystem? This sounds more like something is bad in how forced umount and bind mounts interact in general (although it definitely could be us as well).

Actions #2

Updated by Zheng Yan almost 9 years ago

why do you use 'force umount'?

Actions #3

Updated by Kevin Lamontagne almost 9 years ago

I don't unmount directly, it's what lxc-stop does.

I hope to have some time to create a repeatable test case soon. A quick test using a local ext4 bind mounted then force umounted, shows the directory structure becomes unavailable and files can't be opened. The "original" mount is untouched.

Meanwhile I found a related open Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=613904#15

It wouldn't be too bad if the mount became unavailable, but I've seen a broken mount create new files and overwriting files, filling them with nulls, for a long period of time.

Actions #5

Updated by Zheng Yan almost 9 years ago

  • Status changed from New to 7
Actions #6

Updated by Zheng Yan over 8 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF