Project

General

Profile

Bug #58008

mds/PurgeQueue: don't consider filer_max_purge_ops when _calculate_ops

Added by yixing hao 3 months ago. Updated about 2 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
backport_processed
Backport:
pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

_calculate_ops relying on a config which can be modified on the fly will cause a bug. e.g.

  1. A file has 20 objects and filer_max_purge_ops config was 10.
  2. calling PurgeQueue::_execute_item and _calculate_ops returns 10, so ops_in_flight add 10.
  3. adjust filer_max_purge_ops to 20 on the fly
  4. calling PurgeQueue::_execute_item_complete and _calculate_ops returns 20, so ops_in_flight dec 20.
  5. since ops_in_flight is uint64, this cause an overflow which makes ops_in_flight far more greater than max_purge_ops and can't go back to a reasonable value.

filer_max_purge_ops will still work when _do_purge_range, so it's ok to ignore it here.


Related issues

Copied to CephFS - Backport #58253: quincy: mds/PurgeQueue: don't consider filer_max_purge_ops when _calculate_ops In Progress
Copied to CephFS - Backport #58254: pacific: mds/PurgeQueue: don't consider filer_max_purge_ops when _calculate_ops In Progress

History

#1 Updated by yixing hao 3 months ago

When increasing filer_max_purge_ops on a pacific version mds, pq_executing_ops/pq_executing_ops_high_water of purge_queue becomes abnormal immediately, but I think it also applies to the main branch.

ceph daemon mds.x perf dump | jq .'purge_queue' {
"pq_executing_ops": 18446744073709552000,
"pq_executing_ops_high_water": 18446744073709552000,
"pq_executing": 0,
"pq_executing_high_water": 512,
"pq_executed": 687769701,
"pq_item_in_journal": 0
}

#2 Updated by Venky Shankar 3 months ago

  • Category set to Correctness/Safety
  • Status changed from New to Fix Under Review
  • Assignee set to yixing hao
  • Target version set to v18.0.0
  • Backport set to pacific,quincy

#3 Updated by Venky Shankar about 2 months ago

  • Status changed from Fix Under Review to Pending Backport

#4 Updated by Backport Bot about 2 months ago

  • Copied to Backport #58253: quincy: mds/PurgeQueue: don't consider filer_max_purge_ops when _calculate_ops added

#5 Updated by Backport Bot about 2 months ago

  • Copied to Backport #58254: pacific: mds/PurgeQueue: don't consider filer_max_purge_ops when _calculate_ops added

#6 Updated by Backport Bot about 2 months ago

  • Tags set to backport_processed

Also available in: Atom PDF