Bug #48909
clog slow request overwhelm monitors
0%
Description
A recent change https://tracker.ceph.com/issues/43975 logs details for each slow request and sends to monitors
But on large cluster, it could overwhelm monitors with spurious logs when performance issue happens, and cause further instability in the cluster
In our case, ceph.log growed to more than 14GB quickly, and we need to restart all monitors to recover
This was added in Nautilus (14.2.10) and Octopus (15.2.0)
Would it be better to have a configuration to turn this on/off, so we can disable logging details for every slow requests to monitors if necessary
Related issues
History
#1 Updated by Dan Hill about 3 years ago
- Status changed from New to In Progress
- Assignee set to gerald yang
- Backport set to nautilus, octopus
- Regression changed from No to Yes
- Affected Versions v14.2.10, v15.0.0 added
- Component(RADOS) MonClient added
- Component(RADOS) deleted (
Monitor, OSD)
The clog introduced by issue#43975 should either be removed or configurable to prevent issues on large, high-throughput clusters.
I suggest adding a new setting that is disabled by default:
clog_slow_request_detail
This setting can then be enabled to capture slow request detail on smaller clusters.
#2 Updated by gerald yang about 3 years ago
#3 Updated by Prashant D over 2 years ago
This is being handled over https://tracker.ceph.com/issues/52424.
#4 Updated by Neha Ojha over 2 years ago
- Status changed from In Progress to Duplicate
#5 Updated by Neha Ojha over 2 years ago
- Duplicates Feature #52424: [RFE] Limit slow request details to mgr log added