Bug #11898
closedWriting on forced umount causes silent data loss
0%
Description
Writing on a CephFS bind mount, (for example, inside a LXC container), for which the parent is umounted or broken, emits the following kernel message on the client machine:
ceph: writepage_start ffff883exxxxxxxx on forced umount
The "dmesg" quickly fills up with such messages. The file has a seemingly correct size but is now full of nulls instead of the expected data. The writing process does not block or report an error. After writing a small file on this client machine, I see that other clients reading the file may block.
Ceph: 0.87.1
Kernel: kernel.org 3.18.0 with Ubuntu deb package configuration
I've joined the logs for the 2 MDS. The machine was booted at 2015-06-04 16:43. data loss started when a container with a bind mount rebooted around 2015-06-04 ~18:28 and ended with a restart of all containers and remount of parent mount at 2015-06-05 ~11:21
There's nothing out of the ordinary in mon and osd logs.
To be honest, I'm not entirely sure of the chain of events because I'm not able to replicate this. I'd be happy to provide more info.
Files