Actions
Bug #54396
closedSetting osd_pg_max_concurrent_snap_trims to 0 prematurely clears the snaptrim queue
% Done:
0%
Source:
Tags:
Backport:
octopus,pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
See https://www.spinics.net/lists/ceph-users/msg71061.html
This time around, after a few hours of snaptrimming, users complained of high IO latency, and indeed Ceph reported "slow ops" on a number of OSDs and on the active MDS. I attributed this to the snaptrimming and decided to reduce it by initially setting osd_pg_max_concurrent_snap_trims to 1, which didn't seem to help much, so I then set it to 0, which had the surprising effect of transitioning all PGs back to active+clean (is this intended?). I also restarted the MDS which seemed to be struggling. IO latency went back to normal immediately.
In the code, when osd_pg_max_concurrent_snap_trims is 0, PrimaryLogPG::AwaitAsyncWork::react(const DoSnapWork&) calls pg->snap_mapper.get_next_objects_to_trim looking for 0 snaps to trim. But pg->snap_mapper.get_next_objects_to_trim returns ENOENT in this case, then DoSnapWork erases the remaining snap_to_trim.
Updated by Dan van der Ster about 2 years ago
- Status changed from New to Fix Under Review
- Assignee set to Dan van der Ster
- Pull request ID set to 45140
Updated by Dan van der Ster about 2 years ago
More context:
ceph pg dump reports a SNAPTRIMQ_LEN of 0 on all PGs. Did CephFS just leak a massive 12 TiB worth of objects...? It seems to me that the snaptrim operation did not complete at all.
Updated by Radoslaw Zarzynski about 2 years ago
- Related to Bug #52026: osd: pgs went back into snaptrim state after osd restart added
Updated by Radoslaw Zarzynski about 2 years ago
- Priority changed from Normal to High
Updated by Laura Flores about 2 years ago
- Status changed from Fix Under Review to Resolved
Updated by Neha Ojha about 2 years ago
- Status changed from Resolved to Pending Backport
Updated by Backport Bot about 2 years ago
- Copied to Backport #54466: pacific: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears the snaptrim queue added
Updated by Backport Bot about 2 years ago
- Copied to Backport #54467: quincy: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears the snaptrim queue added
Updated by Backport Bot about 2 years ago
- Copied to Backport #54468: octopus: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears the snaptrim queue added
Updated by Neha Ojha almost 2 years ago
- Status changed from Pending Backport to Resolved
Actions