Bug #22796
closed
bluestore gets to ENOSPC with small devices
Added by David Turner over 6 years ago.
Updated about 5 years ago.
Description
I have a 3 node cluster with mon, mds, mgr, and osds all running on each. The steps I've recently performed on my cluster have all gone well until all 3 of my Bluestore SSD OSDs started crashing with the titled segfault.
I upgraded to 12.2.2 from 10.2.10.
Migrated my 9 HDD OSDs to bluestore (without flash media for rocksdb or WAL).
Configured my crush rules to specifically use class HDD.
Failed to be able to remove the previously required cache tier on top of an EC cephfs data pool due to this issue. http://tracker.ceph.com/issues/22754
Created 3 new SSD OSDs with accompanying crush rules to use class SSD.
Updated the pools cephfs_metadata and cephfs_cache to use the replicated-ssd crush rule.
2 days after making this change, the 3 SSD OSDs all segfaulted at the same time and refused to come back up. I generated a `debug bluestore 20` log for each of these OSDs, but don't know how you would like me to provide them since they're 80MB/each.
Files
Can you attach logs with lower debug level? E.g. debug bluestore = 5
Here's a log with `debug bluestore 5`.
David Turner wrote:
Here's a log with `debug bluestore = 5`.
- Project changed from Ceph to bluestore
- Category deleted (
OSD)
- Priority changed from Normal to High
Please use ceph-post-file to upload the full logs.
debug bluestore = 20 log for the same OSD as before.
ceph-post-file: 06b467b7-4a91-4263-85e0-c89268b694e3
This might be a red herring. I think Nick Fisk on the ML found the problem. Originally the output of `ceph osd df` showed the OSDs as 45% full, now it's showing as completely full.
I was able to resolve this issue by using the ceph-objectstore-tool to remove copies of PGs so the osds could start. The crash in this place would be helpful to specify full osds instead of unknown error.
David Turner wrote:
I was able to resolve this issue by using the ceph-objectstore-tool to remove copies of PGs so the osds could start. The crash in this place would be helpful to specify full osds instead of unknown error.
It does, "2018-01-25 06:05:56.325462 7f3803f9c700 -1 bluestore(/var/lib/ceph/osd/ceph-9) _txc_add_transaction error (28) No space left on device not handled on operation 10 (op 0, counting from 0)"
- Subject changed from BlueStore.cc: 9363: FAILED assert(0 == "unexpected error") to BlueStore.cc: 9363: FAILED assert(0 == "unexpected error") (ENOSPC)
- Status changed from New to Need More Info
-147> 2018-01-25 05:36:54.471301 7fd8eb27e700 5 osd.9 14828 heartbeat: osd_stat(22560 MB used, 16383 PB avail, 22312 MB total, peers [] op hist [])
ok, clearnly 16PB free isn't right. is 22GB total for the OSD correct, though?
- Related to Bug #23040: bluestore: statfs available can go negative added
Yes The 22GB is correct, the 16PB is not. I created a quick set of SSD OSDs to test new crush rules from what the OSDs had been using as journals on filestore. The cephfs_cache pool hadn't used even 5GB at a time in the previous few months, but overnight it filled up completely when I put it on the small SSD crush rule.
I'm a little confused that the OSDs were able to fill up to 100%. I'm using default ratio settings.
- Subject changed from BlueStore.cc: 9363: FAILED assert(0 == "unexpected error") (ENOSPC) to bluestore gets to ENOSPC with small devices
- Status changed from Need More Info to 12
The full checks rely on a (slow) feedback loop. For small devices, it's easy to go faster than the "set the full flag" operation. This could be improved!
- Priority changed from High to Normal
- Status changed from 12 to Resolved
with Igor's recent changes, we no longer rely on the slow feedback, so i think we can close this now.
Also available in: Atom
PDF