Bug #7520
closedLock contention during scrubbing which could potentially hang the OSD for a couple of seconds
0%
Description
We are using Ceph as object store (via radosgw) and each time the cluster starts doing scrubbing, the performance degrades (e.g. latency increase 20%).
With some investigation, one pattern we found which made the cluster slow was a lock contention happening during scrubbing, here is the data flow:
1. The replica OSD receives msg MOSDRepScrub for a PG and then it locks the PG to process (https://github.com/ceph/ceph/blob/master/src/osd/OSD.h#L1799)
2. The OSD tick thread:
2.1 lock the osd and run a scheduled scrub (with holding OSD lock)
2.2 foreach pg in OSDService::last_scrub_pg
2.2.1 lock the pg // if this pg happens to be the pg in the item 1, it could block holding OSD lock for a while (in our cluster up to several seconds)
2.2.2 try to get a local / remote reserver and queue the scrub
2.2.3 unlock the pg
2.3 unlock the osd
The lock contention could happen as step 2.2.1, as it could try to acquire a lock in step 1, which blocks, and as it holds a OSD lock, the messenger are not able to do dispatch and enqueue op, as result, the OSD hand a couple of seconds.
An easy way to fix is, at step 2.2, before locking the pg, check if it is not a primary PG, if yes, just skip it.