Bug #22467
closedosd boot has stuck for 10min because of clear_temp_object
0%
Description
rebooting a host,osd will run to the function of osd::init => clear_temp_objects => collection_list => _collection_list => ( it = db->get_iterator(PREFIX_PBJ) ; it->lower_bound(temp_start_key) ) for each pg,the time of it->lower_bound(temp_start_key) is about 7s,if each osd has 100pgs,this cost of clear_temp_object will be 10min
Files
Updated by jinxiang cheng over 6 years ago
i always got this problem , restart ceph cluster ,the OSD witch have 15000pgs ,cost almost 25min to become up state. After trace OSD LOG, i got this:osd::init => clear_temp_objects => collection_list => _collection_list ,just like yours. and if you got a solution about this problem?
Updated by Josh Durgin over 6 years ago
- Project changed from RADOS to bluestore
This is bluestore, right? It sounds like you've got too large/slow a rocksdb - you want that metadata on an ssd.
Updated by jinxiang cheng over 6 years ago
Josh Durgin wrote:
This is bluestore, right? It sounds like you've got too large/slow a rocksdb - you want that metadata on an ssd.
Filestore. and why clear this temp objects during OSD initialize process, and that should be optimizede if a large number of pgs on one OSD.
Updated by Sage Weil over 6 years ago
- Status changed from New to Can't reproduce
25,000 is way too many PGs for one osd. I suspect the problem is that the cache for leveldb or rocksdb is way to small to accomodate that many PGs. Try increasing leveldb_cache_size or rocksdb_cache_size to be 10x bigger (the defaults are only 128MB or 256MB or similar).