Project

General

Profile

Actions

Bug #58022

open

Fragmentation score rising by seemingly stuck thread

Added by Kevin Fox over 1 year ago. Updated 9 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
backport_processed
Backport:
reef quincy pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Due to issue https://tracker.ceph.com/issues/57672 we've been monitoring our clusters closely ensure it doesn't run into the same issue on our other clusters. We have a cluster running 16.2.9 that is showing a weird/bad behavior.

We've noticed some osd's suddenly start increasing their fragmentation at a constant rate until they are restarted. They then settle down and reduce their fragmentation very slowly.

Talking with @Vikhyat Umrao a bit, the theory was maybe compaction was kicking in repeatedly. We used the ceph_rocksdb_log_parser.py on one of the runaway osds and didn't see a significant number of compaction events during the time of its runaway fragmentation. So that is unlikely to be the cause.

Please see attached screenshot. You can see the run away osds do so over multiple days and then when we restart them, they level off and slowly decrease.

If it was workload related, we would expect it to continue to fragment after the restart as the workload continues on. But the behavior stops immediately on restart. So feels like some thread in the osd is doing something unusual until restarted.


Files

Screenshot from 2022-11-14 08-53-56.png (396 KB) Screenshot from 2022-11-14 08-53-56.png Kevin Fox, 11/14/2022 05:03 PM
Screenshot from 2022-11-28 11-15-20.png (436 KB) Screenshot from 2022-11-28 11-15-20.png Newer picture, after I had just restarted the current batch of runaways. Kevin Fox, 11/28/2022 07:17 PM
Screenshot from 2023-01-18 09-11-04-1.png (437 KB) Screenshot from 2023-01-18 09-11-04-1.png Kevin Fox, 01/18/2023 05:15 PM
Screenshot from 2023-01-20 09-02-13.png (185 KB) Screenshot from 2023-01-20 09-02-13.png Kevin Fox, 01/20/2023 05:04 PM
Screenshot from 2023-01-20 09-36-41.png (219 KB) Screenshot from 2023-01-20 09-36-41.png Kevin Fox, 01/20/2023 05:37 PM
Screenshot from 2023-01-25 09-50-11.png (232 KB) Screenshot from 2023-01-25 09-50-11.png Kevin Fox, 01/25/2023 05:58 PM
Screenshot from 2023-01-25 09-50-41.png (228 KB) Screenshot from 2023-01-25 09-50-41.png Kevin Fox, 01/25/2023 05:58 PM
osd.3.gz (554 KB) osd.3.gz Kevin Fox, 01/27/2023 05:52 PM
osd.4.gz (501 KB) osd.4.gz Kevin Fox, 01/27/2023 05:52 PM
Screenshot from 2023-07-31 10-36-29-3.png (427 KB) Screenshot from 2023-07-31 10-36-29-3.png Kevin Fox, 07/31/2023 07:23 PM

Related issues 3 (2 open1 closed)

Copied to bluestore - Backport #61463: quincy: Fragmentation score rising by seemingly stuck threadNewAdam KupczykActions
Copied to bluestore - Backport #61464: pacific: Fragmentation score rising by seemingly stuck threadRejectedAdam KupczykActions
Copied to bluestore - Backport #61465: reef: Fragmentation score rising by seemingly stuck threadNewAdam KupczykActions
Actions

Also available in: Atom PDF