Bug #16028

File >100GB crash OSDs(?)

Added by Georg Stergiou about 6 years ago. Updated almost 6 years ago.

Won't Fix
Target version:
% Done:


2 - major
Affected Versions:
Pull request ID:
Crash signature (v1):
Crash signature (v2):


Hi all,

I encountered a strange behaviour in the morning which I think is a bug. I was not able to find something similar in here, so I post my experience.

I use Ceph to store my VM backups but also mail server filesystems and so on. All servers are debian (7.10), ceph is v0.94.7. The VMs is store (rados -p VMs put ...) are usually about 40-60GB each. The VM backup pool is configured with size 2 (min_size 1). Ceph is running on 4 hosts with (3 hosts have 4 OSDs each, 1 host has 8 OSDs). My general RBD pool is configured with size 3, min_size = 2.

Ceph is installed regularly using ceph-deploy, and the configuration is quite vanilla.

Yesterday night I uploaded an image with about 104GB (that was the first file over 100GB). In the morning I noticed, that ceph is driving crazy. One OSD was marked down but all my RBDs were blocking, which I thought could not happen with pool size is 3 (min_size 2). I thought one disc failed, so I waited for the rebalance.

While balancing (after may be 10-15 minutes), I noticed that a second OSD was suddenly flagged as down, but the first OSD was available again. Logs showed, that the OSDs reconnected by themselves because disk was perfectly working ("log_channel(cluster) log [WRN] : map e8682 wrongly marked me down").

Now, those OSDs got continuously marked "down" (one of them) while the second came back and that in an infinite loop every 10-15 minutes. Between those cycles ceph cluster was working correctly about a minute before the OSDs got dropped again.

I set both OSDs to "down" (as I still thought broken disks are the reason) and everything synced fine. But there is one stuck pg which contains that >100GB file, which is located exactly on those two OSDs which caused the trouble. So that's no coincidence in my eyes.

All OSDs are mounted like follows:

  1. Host ceph01
    /dev/xvdf1 on /var/lib/ceph/osd/ceph-13 type xfs (rw,noatime,attr2,delaylog,inode64,noquota)
  1. Host ceph02
    /dev/xvdc1 on /var/lib/ceph/osd/ceph-21 type xfs (rw,noatime,attr2,delaylog,inode64,noquota)

I attached the pg query, may be this can help.

Could it be that a xfs filesize limitation / xattr limitation causes the OSDs to be dropped mistakenly?

Thanks for feedback

pg_query.txt View - ceph pg query 10.31 (stuck active+remapped) (32.6 KB) Georg Stergiou, 05/25/2016 04:57 PM


#1 Updated by Abhishek Lekshmanan almost 6 years ago

  • Project changed from rgw to Ceph

#2 Updated by Samuel Just almost 6 years ago

  • Status changed from New to Won't Fix

Yeah, that's not gonna work. rados level objects are supposed to be bounded size (think 4 MB). You want to be using RGW's s3 interface to upload the objects (it breaks large images down into smaller pieces, 4MB by default, behind the scenes).

Also available in: Atom PDF