Bug #53542: Ceph Metadata Pool disk throughput usage increasing - CephFS - Ceph

Actions

Copy link

Bug #53542

open

Ceph Metadata Pool disk throughput usage increasing

Added by Andras Sali over 2 years ago. Updated over 2 years ago.

Status:

Fix Under Review

Priority:

Normal

Assignee:

Xiubo Li

Category:

Target version:

Ceph - v17.0.0

% Done:

Source:

Tags:

Backport:

pacific

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

task(medium)

Pull request ID:

44180

Crash signature (v1):

Crash signature (v2):

Description

Hi All,

We have been observing that if we let our MDS run for some time, the bandwidth usage of the disks in the metadata pool starts increasing significantly (whilst IOPS is about constant), even though the number of clients, the workloads or anything else doesn't change.

However, after restarting the MDS, the issue goes away for some time and the same workloads require 1/10th of the metadata disk bandwidth whilst doing the same IOPS.

We run our CephFS cluster in a cloud environment where the disk throughput / bandwidth capacity is quite expensive to increase and we are hitting bandwidth / throughput limits, even though we still have a lot of IOPS capacity left.

We suspect that somehow the journaling of the MDS becomes more extensive (i.e. larger journal updates for each operation), but we couldn't really pin down which parameter might affect this.

I attach a plot of how the Bytes / Operation (throughput in bytes per sec / IOPS) evolves over time, when we restart the MDS, it drops to around 32kb (even though the min block size for the metadata pool OSDs is 4kb in our settings) and then increases over time to around 300kb.

Any insight into whether this is "normal" behaviour or how to tune this would be really appreciated.

Files

Bytes_per_op.png (10.6 KB) Bytes_per_op.png

Bytes per operation

Andras Sali, 12/08/2021 04:54 PM

Related issues 3 (2 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #53542

Ceph Metadata Pool disk throughput usage increasing

Updated by Andras Sali over 2 years ago

Updated by Xiubo Li over 2 years ago

Updated by Xiubo Li over 2 years ago

Updated by Andras Sali over 2 years ago

Updated by Andras Sali over 2 years ago

Updated by Xiubo Li over 2 years ago

Updated by Xiubo Li over 2 years ago

Updated by Venky Shankar over 2 years ago

Updated by Loïc Dachary over 2 years ago

Updated by Xiubo Li over 2 years ago

Updated by Xiubo Li over 2 years ago

Updated by Xiubo Li over 2 years ago

Updated by Xiubo Li over 2 years ago

Updated by Xiubo Li over 2 years ago