Project

General

Profile

Actions

Documentation #44958

closed

Ceph v12.2.13 causes extreme high number of blocked operations

Added by Chris Jones about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Tags:
gc
Backport:
nautilus octopus
Reviewed:
Affected Versions:
Pull request ID:

Description

Ceph v12.2.13 yields an extremely high number of blocked requests.

We are using ceph v12.2.12 on some of our clusters with relatively few issues. We attempted an upgrade to v12.2.13 on several clusters and immediately started getting an extreme high number of blocked requests under load. This in turn caused OSD suicides and overall major reduction in performance of the cluster. For a cluster used to handling thousands of requests per minute, with extremely rare incidence of blocked requests on v12.2.12, this was very concerning. Blocked requests would shoot up to the thousands across 10 to 100 osds at a time.

The condition was so severe that, while there was no data loss, the cluster became impractical to use. We reverted back to v12.2.12 and the problem went away.

This condition was not a consistent condition, but was sporadic as the load on the cluster changed. We suspect that the blocked operations may have been related to leveldb compaction that triggered a cascading effect, but we are unclear.

This cluster was fresh installed as jewel v10.2.11 approx 1-2 years ago and has been in operation since. We first upgraded to v12.2.12 when it was released, and then to v12.2.13.

There were no issues with v10.2.11 nor with v12.2.12, but v12.2.13 immediately yielded performance issues primarily in the form of blocked osd requests and osd suicides due to the suicide timeout. Increasing suicide timeout only exacerbated the issue by allowing blocked osd requests to block for longer periods of time.

All v12.2.13 clusters have since been downgraded back to v12.2.12, so I do not have an active cluster on which to debug, but I am very willing to provide any additional detail you might need to provide a diagnosis or explanation, or to replicate the issue.

Cluster size is approx 2PB with 9 cluster nodes of approx 60x6TB 6TB HDD spinning disk per node (540 total disks in cluster), 6/3 erasure coding. Some of the upgraded clusters have SSD journals, while some do not. We are using filestore and xfs.

Our main concern is that v12.2.12 runs perfectly well, while v12.2.13 does not.

Please let us know if this is a known issue, and how we can help resolve this.


Related issues 2 (0 open2 closed)

Copied to rgw - Backport #45479: octopus: Ceph v12.2.13 causes extreme high number of blocked operationsResolvedNathan CutlerActions
Copied to rgw - Backport #45480: nautilus: Ceph v12.2.13 causes extreme high number of blocked operationsResolvedNathan CutlerActions
Actions #1

Updated by Chris Jones about 4 years ago

It appears that the increased efficiency in garbage collection in v12.2.13 versus v12.2.12 are the root cause of the blocked/slow requests. In v12.2.12 we had very aggressive garbage collection settings in order to keep up with garbage collection. In v12.2.13, those same settings caused extremely high numbers of garbage items to be removed in a short period of time. This in turn led to high rates of leveldb compaction, which was causing the slow requests and eventually the osd suicides.

By reverting our garbage collection configurations to a much more conservative rate, the situation has been resolved.

Actions #2

Updated by Dan Hill about 4 years ago

Yeah, there were several improvements to GC processing in 12.2.13:
rgw: gc use aio (issue#24592, pr#28784, Yehuda Sadeh, Zhang Shaowen, Yao Zongyou, Jesse Williamson)
rgw: resolve bugs and clean up garbage collection code (issue#38454, pr#31664, Dan Hill, J. Eric Ivancich)

Perhaps the AIO GC change should be mentioned more prominently in the release notes?

Actions #3

Updated by Chris Jones about 4 years ago

There is an undocumented setting rgw gc max concurrent io that appears to have been introduced. I did not find that in any official ceph documentation, but I saw this in Red Hat's documentation. It would be good to better document the config value and note its effect.

I have been experimenting with different settings of garbage collection with varying degrees of success.

Actions #4

Updated by Casey Bodley almost 4 years ago

  • Tracker changed from Bug to Documentation
  • Assignee set to Casey Bodley
  • Tags set to gc
Actions #5

Updated by Casey Bodley almost 4 years ago

  • Backport set to nautilus octopus
  • Pull request ID set to 34952
Actions #6

Updated by Casey Bodley almost 4 years ago

  • Status changed from New to Pending Backport
Actions #7

Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #45479: octopus: Ceph v12.2.13 causes extreme high number of blocked operations added
Actions #8

Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #45480: nautilus: Ceph v12.2.13 causes extreme high number of blocked operations added
Actions #9

Updated by Nathan Cutler almost 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF