Bug #9202: Performance degradation during recovering and backfilling - Ceph - Ceph

Actions

Copy link

Bug #9202

closed

Performance degradation during recovering and backfilling

Added by Zhi Zhang over 9 years ago. Updated over 7 years ago.

Status:

Can't reproduce

Priority:

High

Assignee:

Samuel Just

Category:

Target version:

% Done:

Source:

Community (dev)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

From recent test and analysis, we find slow requests mainly happen at 2 patterns during recovering and backfilling.

1. Long time waiting in the op_wq and no op thread to process them. This will be reported as 'no flag points reached'.

2. Op thread has got an op to process, but is stuck by pg lock. This can be seen from dump_historic_ops, see event "waiting_for_osdmap" and "reached_pg".

{ "description": "osd_op(client.48687.0:41007649
default.45832.251_709e27f683b5a2160634c45ec79bf260
[] 22.b4445a81 ondisk+write e15711)",
          "received_at": "2014-08-20 03:05:20.324103",
          "age": "229.608474",
          "duration": "141.505679",
          "type_data": [
                "started",
               { "client": "client.48687",
                  "tid": 41007649},
                [
                   { "time": "2014-08-20 03:05:20.370175",
                      "event": "waiting_for_osdmap"},
                   { "time": "2014-08-20 03:07:41.829093",
                      "event": "reached_pg"},
                   { "time": "2014-08-20 03:07:41.829486",
                      "event": "started"},
                   { "time": "2014-08-20 03:07:41.829536",
                      "event": "started"},
                   { "time": "2014-08-20 03:07:41.829782",
                      "event": "done"}]]},

Above 2 patterns are relevant to some extent. When doing backfilling, there will be some pg scan ops. When op thread starts to process pg scan op, it will hold this pg lock until it finishes. However, it may take longer time to process pg scan op.

See below logs, it takes 1+ min to finish this op.
--------------------------------------------------------------------------
2014-08-20 05:40:10.614246 7f552081a700 10 osd.148 15817 dequeue_op 0x30a634a0 prio 127 cost 0 latency 7.181219 pg_scan(get_digest 22.79cs0
d79c//head//22-0//0//22 e 15817/15817) v2 pg pg[22.79cs0( v 15817'23973(12435'20642,15817'23973] lb 858ec79c/default.45834.877_c218526299dbf02a43375aafca843965/head//22 local-les=0 n=19260 ec=11132 les/c 14898/14893 14896/14896/14264)
[148,76,62,137,152,19,38,74,129,54,71]/[37,76,62,137,152,19,38,74,129,54,71]
r=-1 lpr=14896 pi=11132-14895/25 luod=0'0 crt=15815'23972 active+remapped]
...
2014-08-20 05:41:23.071930 7f552081a700 10 osd.148 15817 dequeue_op 0x30a634a0 finish
--------------------------------------------------------------------------

During this time, if any op thread tries to process any op belonging to this pg, it will first try to get this pg lock, then this op thread will be stuck because ceph uses mutex::lock here. This is pattern 2.

So when this happens, not only current op can't be processed, but also this op thread can't process other ops in the op_wq. Hence the available op threads become less and lots of ops in the op_wq become slow requests. This is pattern 1.

To mitigate the performance degradation issue, we could tune some backfill/recovery parameters and increase op thread number temporarily, but the
recovery time will be multiple times longer than before.

For further solution, we could investigate from following ways to see if they will be helpful.
1. use mutex::try_lock instead of mutex::lock?
2. reduce pg scan op's processing time by reducing some scope or dividing into many smaller ops?

Actions

Copy link

Updated by Sage Weil over 9 years ago

Priority changed from Normal to High

Actions

Copy link

Updated by Samuel Just over 9 years ago

Assignee set to Samuel Just

Actions

Copy link

Updated by Samuel Just over 9 years ago

There are two issues here. First is that we stubbornly wait on a contended pg lock instead of moving to the next item in the list. That's not great, but I argue that well behaved op_wq items should always complete quickly.

The more significant problem is that satisfying the pg scan takes too long. The number of items we grab during a single backfill scan doesn't really need to be particularly large if the objects in the pg are reasonably sized.

I think the first step is to reduce osd_backfill_scan_min to something like 4 and osd_backfill_scan_max to something like 16 by default. https://github.com/athanatos/ceph/tree/wip-9202 has a version of that patch.

A longer term solution would be to satisfy a larger scan request via multiple trips through the queue.

Finally, we could choose to skip a contended pg a set number of times in the queue.

Thoughts?

Actions

Copy link

Updated by Zhi Zhang over 9 years ago

Hi Samuel,

Thanks for the short-term fix by tuning that 2 parameters of backfill scan. With tuning other backfill/recovery parameters, it obviously helps and stabilizes the user traffic during recovering/backfilling, although the recovery time will be multiple times longer than before. This is the trade-off for the moment.

After stabilizing this issue, we could further see another issue happened at peering stage (or say at the beginning of an OSD is up and in again). It is a different topic and I will create another tracker to trace it.

Actions

Copy link