File >100GB crash OSDs(?)
I encountered a strange behaviour in the morning which I think is a bug. I was not able to find something similar in here, so I post my experience.
I use Ceph to store my VM backups but also mail server filesystems and so on. All servers are debian (7.10), ceph is v0.94.7. The VMs is store (rados -p VMs put ...) are usually about 40-60GB each. The VM backup pool is configured with size 2 (min_size 1). Ceph is running on 4 hosts with (3 hosts have 4 OSDs each, 1 host has 8 OSDs). My general RBD pool is configured with size 3, min_size = 2.
Ceph is installed regularly using ceph-deploy, and the configuration is quite vanilla.
Yesterday night I uploaded an image with about 104GB (that was the first file over 100GB). In the morning I noticed, that ceph is driving crazy. One OSD was marked down but all my RBDs were blocking, which I thought could not happen with pool size is 3 (min_size 2). I thought one disc failed, so I waited for the rebalance.
While balancing (after may be 10-15 minutes), I noticed that a second OSD was suddenly flagged as down, but the first OSD was available again. Logs showed, that the OSDs reconnected by themselves because disk was perfectly working ("log_channel(cluster) log [WRN] : map e8682 wrongly marked me down").
Now, those OSDs got continuously marked "down" (one of them) while the second came back and that in an infinite loop every 10-15 minutes. Between those cycles ceph cluster was working correctly about a minute before the OSDs got dropped again.
I set both OSDs to "down" (as I still thought broken disks are the reason) and everything synced fine. But there is one stuck pg which contains that >100GB file, which is located exactly on those two OSDs which caused the trouble. So that's no coincidence in my eyes.
All OSDs are mounted like follows:
- Host ceph01
/dev/xvdf1 on /var/lib/ceph/osd/ceph-13 type xfs (rw,noatime,attr2,delaylog,inode64,noquota)
- Host ceph02
/dev/xvdc1 on /var/lib/ceph/osd/ceph-21 type xfs (rw,noatime,attr2,delaylog,inode64,noquota)
I attached the pg query, may be this can help.
Could it be that a xfs filesize limitation / xattr limitation causes the OSDs to be dropped mistakenly?
Thanks for feedback
#2 Updated by Samuel Just almost 6 years ago
- Status changed from New to Won't Fix
Yeah, that's not gonna work. rados level objects are supposed to be bounded size (think 4 MB). You want to be using RGW's s3 interface to upload the objects (it breaks large images down into smaller pieces, 4MB by default, behind the scenes).