Bug #46969: Octopus OSDs deadlock with slow ops and make the whole cluster unresponsive - RADOS - Ceph

Actions

Copy link

Bug #46969

open

Octopus OSDs deadlock with slow ops and make the whole cluster unresponsive

Added by Vitaliy Filippov over 3 years ago. Updated almost 3 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v15.2.4

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Hi,

I have another unpleasant bug to report.

Right after I upgraded my cluster to Octopus 15.2.4 I started to experience some deadlocks which result in a lot of "slow ops" and make the whole cluster unresponsive until I restart some OSDs.

It happens every day or so, yesterday it happened twice. It's usually caused by one specific OSD: osd.7 and it's usually sufficient to restart it. However, yesterday evening it was a different OSD, I ended up restarting the whole cluster.

The cluster has 3 SAS SSD + 11 NVMe drives, 1 OSD per SSD/NVMe, all drives look healthy according to SMART. Also I'm using a configuration that seems a "bug-bingo": EC 2+1 + compression.

ceph daemon osd.7 dump_blocked_ops shows a number of blocked ops in the "queued for pg" state and one "started" operation (see attachment). Other OSDs also show a lot of blocked ops, sometimes it's obvious that they're waiting for osd.7 (there's something like "waiting for sub ops from 7"), sometimes not.

What other details do you want for me to provide to start looking into this bug?

Now I basically restart my Octopus cluster every day, it's pretty annoying :)

Files

ops7.txt (489 KB) ops7.txt

Vitaliy Filippov, 08/14/2020 08:03 AM

Actions

Copy link

Updated by Vitaliy Filippov over 3 years ago

It seems the problem has gone away after removing the following non-default variables from the configuration:

#bluestore_prefer_deferred_size_ssd = 16384
#bluestore_sync_submit_transaction = true
#bdev_enable_discard = true
#bdev_async_discard = true
#bluestore_rocksdb_options = compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=8,recycle_log_file_num=32,write_buffer_size=33554432,writable_file_max_buffer_size=0,compaction_readahead_size=2097152

At least the cluster is alive for several days without reboots. Before these changes it required manual intervention every day.

Actions

Copy link