Bug #8453
closedOSD crash "error (28) No space left on device not handled on operation 10 (31417988.0.5, or op 5, counting from 0)" but disk space is available
0%
Description
hello,
tonight one of OSDs crashed with "error (28) No space left on device not handled on operation 10 (31417988.0.5, or op 5, counting from 0)". But according to monitoring disk space didn't run out.
After manual start OSD still crashes with same error (some integers differ) during journal reply. There are >100GB of free disk space, inode usage is about 2%.
I am attaching 2 log files:
ceph-osd.12.log.1.gz - original log with crash (at ~2:08)
ceph-osd.12.log.gz - log with "debug filestore = 15" in ceph.conf. Truncated before OSD launch.
I am using 0.80.1 (ubuntu 14.04 from proposed repo), OSD deployed with ceph-deploy. OSD was running with "filestore max sync interval = 18" setting
Files
Updated by Greg Farnum almost 10 years ago
- Status changed from New to Rejected
- Source changed from other to Community (user)
The OSD is definitely getting back ENOSPC from the local filesystem; many of them keep quite a lot of reserved space which shows up as free until you try to use it (and they don't generally handle filling up nicely, anyway). You'll want to consult a configuration guide for btrfs or xfs or ext4 to figure out how to handle it (if, indeed, it can be worked around).