Feature #41080

rgw: break up user reset-stats into multiple cls ops

Added by J. Eric Ivancich over 1 year ago. Updated 3 months ago.

Target version:
% Done:


Affected Versions:
Pull request ID:


Currently when a user requests the reset of user stats via radosgw-admin, a single write op is sent to the OSD holding the user's info object, and it's omap entries are read in a loop and the final result written to the object's header.

The advantage to this technique is that it is atomic, manipulating a single object with a write operation.

The downside, though, is that on OSDs that may be bogged down for other reasons, this write operation may take a while and limit access to the pg on which this object resides.

There are a couple of ideas that might mitigate this. However given how infrequent this operation is, these changes are likely not worth implementing, at least at this point in time. Instead, this tracker is here primarily to capture these ideas for the future.

It's important to understand that one object is read from and written to for this op.

For a user that has a lot of buckets this operation incrementally reads through their buckets, totaling the stats as it goes along, with one final write. Bucket stats are read in groups of 1000, so if a user had 100,000 buckets, this would involve 100 reads.

So the first idea is to do the reads as one op to determine the total and the write as a second op, to update the header. Presumably other reads on the PG could take place during the read op. The primary challenge here is to make sure there were no intervening writes between the read op and write op. A generation number and/or timestamp of the header write could be used to insure that the write op is ok to complete. Otherwise an error could take place, and possibly a set of retries.

The second idea would be even to break the reads into multiple ops, with enough information returned from each to continue the operation with more reads, followed by a single write. The same challenge as listed above is applicable here, although with more opportunities for races with other write ops.

Related issues

Copied to rgw - Backport #46968: octopus: rgw: break up user reset-stats into multiple cls ops Resolved


#1 Updated by J. Eric Ivancich over 1 year ago

Comments to this tracker are invited.

#2 Updated by Josh Durgin over 1 year ago

There are a finite number of OSD op threads. If the 100 reads in a single op take a while, they will block one of those threads. By default there are 2 threads per shard for SSD, and 8 shards, so if these kind of ops were more common, they could end up blocking I/O for 1/8th of the PGs.

The first idea doesn't help much since reads to the same PG would still be blocked on each other - there's no parallelism there today. The 2nd idea, with multiple ops, would get around this and let other work happen interspersed with these operations.

#3 Updated by J. Eric Ivancich over 1 year ago

Thank you, Josh. That's very helpful info.

#4 Updated by Matt Benjamin 7 months ago

  • Pull request ID set to 34869

#5 Updated by Matt Benjamin 7 months ago

  • Status changed from New to Fix Under Review
  • Assignee set to Matt Benjamin
  • Backport set to octopus

#6 Updated by J. Eric Ivancich 4 months ago

  • Status changed from Fix Under Review to Pending Backport

#7 Updated by Nathan Cutler 4 months ago

  • Copied to Backport #46968: octopus: rgw: break up user reset-stats into multiple cls ops added

#8 Updated by Nathan Cutler 3 months ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF