Project

General

Profile

Bug #48909

clog slow request overwhelm monitors

Added by gerald yang about 3 years ago. Updated over 2 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus, octopus
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
MonClient
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

A recent change https://tracker.ceph.com/issues/43975 logs details for each slow request and sends to monitors
But on large cluster, it could overwhelm monitors with spurious logs when performance issue happens, and cause further instability in the cluster
In our case, ceph.log growed to more than 14GB quickly, and we need to restart all monitors to recover
This was added in Nautilus (14.2.10) and Octopus (15.2.0)

Would it be better to have a configuration to turn this on/off, so we can disable logging details for every slow requests to monitors if necessary


Related issues

Duplicates RADOS - Feature #52424: [RFE] Limit slow request details to mgr log Resolved

History

#1 Updated by Dan Hill about 3 years ago

  • Status changed from New to In Progress
  • Assignee set to gerald yang
  • Backport set to nautilus, octopus
  • Regression changed from No to Yes
  • Affected Versions v14.2.10, v15.0.0 added
  • Component(RADOS) MonClient added
  • Component(RADOS) deleted (Monitor, OSD)

The clog introduced by issue#43975 should either be removed or configurable to prevent issues on large, high-throughput clusters.

I suggest adding a new setting that is disabled by default:
clog_slow_request_detail

This setting can then be enabled to capture slow request detail on smaller clusters.

#3 Updated by Prashant D over 2 years ago

This is being handled over https://tracker.ceph.com/issues/52424.

#4 Updated by Neha Ojha over 2 years ago

  • Status changed from In Progress to Duplicate

#5 Updated by Neha Ojha over 2 years ago

  • Duplicates Feature #52424: [RFE] Limit slow request details to mgr log added

Also available in: Atom PDF