Project

General

Profile

Bug #46969

Octopus OSDs deadlock with slow ops and make the whole cluster unresponsive

Added by Vitaliy Filippov over 3 years ago. Updated almost 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

I have another unpleasant bug to report.

Right after I upgraded my cluster to Octopus 15.2.4 I started to experience some deadlocks which result in a lot of "slow ops" and make the whole cluster unresponsive until I restart some OSDs.

It happens every day or so, yesterday it happened twice. It's usually caused by one specific OSD: osd.7 and it's usually sufficient to restart it. However, yesterday evening it was a different OSD, I ended up restarting the whole cluster.

The cluster has 3 SAS SSD + 11 NVMe drives, 1 OSD per SSD/NVMe, all drives look healthy according to SMART. Also I'm using a configuration that seems a "bug-bingo": EC 2+1 + compression.

ceph daemon osd.7 dump_blocked_ops shows a number of blocked ops in the "queued for pg" state and one "started" operation (see attachment). Other OSDs also show a lot of blocked ops, sometimes it's obvious that they're waiting for osd.7 (there's something like "waiting for sub ops from 7"), sometimes not.

What other details do you want for me to provide to start looking into this bug?

Now I basically restart my Octopus cluster every day, it's pretty annoying :)

ops7.txt View (489 KB) Vitaliy Filippov, 08/14/2020 08:03 AM

History

#1 Updated by Vitaliy Filippov over 3 years ago

It seems the problem has gone away after removing the following non-default variables from the configuration:

#bluestore_prefer_deferred_size_ssd = 16384
#bluestore_sync_submit_transaction = true
#bdev_enable_discard = true
#bdev_async_discard = true
#bluestore_rocksdb_options = compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=8,recycle_log_file_num=32,write_buffer_size=33554432,writable_file_max_buffer_size=0,compaction_readahead_size=2097152

At least the cluster is alive for several days without reboots. Before these changes it required manual intervention every day.

#2 Updated by Vitaliy Filippov over 3 years ago

Oops, sorry, there was one more change - I changed shards*threads to 1*16 from default 2*8:

osd_op_num_threads_per_shard = 16
osd_op_num_shards = 1

It could also be the thing that helped.

I did it after looking here https://github.com/ceph/ceph/pull/36032/commits/51d3e7f4877b97717bce15e93f691f273da325df and seeing the word "wakeup" :) where there's a lack of wakeup there may be deadlocks too... :)

#3 Updated by Greg Farnum almost 3 years ago

  • Project changed from Ceph to RADOS

Also available in: Atom PDF