Bug #47044
PG::_delete_some isn't optimal iterating objects
0%
Description
every step to build collection_list(https://github.com/ceph/ceph/blob/v14.2.11/src/osd/PG.cc#L7132) PG::_delete_some doesn't pass "next" object(from previos step) to function collection_list(https://github.com/ceph/ceph/blob/v14.2.11/src/osd/PG.cc#L7132) so every time _collection_list(https://github.com/ceph/ceph/blob/v14.2.11/src/os/bluestore/BlueStore.cc#L9822) iterating from root pg to next object and waste resource to iterate obsolete objects
It looks like for every iterate:
2020-08-14 12:49:33.825 7feecd488700 15 bluestore(/var/lib/ceph/osd/ceph-672) collection_list 17.236_head start GHMIN end GHMAX max 64
2020-08-14 12:49:33.825 7feecd488700 20 bluestore(/var/lib/ceph/osd/ceph-672) _collection_list range 0x7f7fffffffffffffed6c400000 to 0x7f7fffffffffffffed6c600000 and 0x7f80000000000000116c400000 to 0x7f80000000000000116c600000 start GHMIN
2020-08-14 12:49:33.825 7feecd488700 20 bluestore(/var/lib/ceph/osd/ceph-672) _collection_list pend 0x7f7fffffffffffffed6c600000
2020-08-14 12:49:33.825 7feecd488700 20 bluestore(/var/lib/ceph/osd/ceph-672) _collection_list key 0x7f7fffffffffffffff20002dfb216f73'dmap.163033!='0x0000000000000000ffffffffffffffff'o' >= GHMAX
2020-08-14 12:49:33.825 7feecd488700 20 bluestore(/var/lib/ceph/osd/ceph-672) _collection_list oid #17:6c400000::::head# end GHMAX
i mean that #17:6c400000::::head# == root pg and we start every time from this point
We have about 1M objects per pg so delete pg from osd after backfill takes much time and cpu+disk activity for iterating removed objects in previos steps
Related issues
History
#1 Updated by Serg D over 3 years ago
For example ceph-objectstore-tool use function OSD::recursive_remove_collection(https://github.com/ceph/ceph/blob/v14.2.11/src/osd/OSD.cc#L4387) and it works correct to track start position to build collection_list
#2 Updated by Neha Ojha over 3 years ago
- Related to Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion added
#3 Updated by Neha Ojha over 3 years ago
- Priority changed from Normal to Urgent
#4 Updated by Igor Fedotov over 3 years ago
- Duplicated by Bug #47174: [BlueStore] Pool/PG deletion(space reclamation) is very slow added
#5 Updated by Igor Fedotov over 3 years ago
- Status changed from New to In Progress
- Assignee set to Igor Fedotov
- Pull request ID set to 37314
#6 Updated by Dan van der Ster over 3 years ago
Can confirm this makes replacing hw for an S3 cluster quite intrusive, due to the block-db devs getting overloaded (io util 100%) when a PG needs to be deleted.
We worked around with osd delete sleep = 20.
#7 Updated by Kefu Chai over 3 years ago
- Status changed from In Progress to Resolved
#8 Updated by Igor Fedotov over 3 years ago
- Status changed from Resolved to Pending Backport
- Backport set to octopus, nautilus, mimic
#9 Updated by Igor Fedotov over 3 years ago
- Copied to Backport #48480: octopus: PG::_delete_some isn't optimal iterating objects added
#10 Updated by Igor Fedotov over 3 years ago
- Copied to Backport #48481: mimic: PG::_delete_some isn't optimal iterating objects added
#11 Updated by Igor Fedotov over 3 years ago
- Copied to Backport #48482: nautilus: PG::_delete_some isn't optimal iterating objects added
#12 Updated by Nathan Cutler about 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".