Bug #21180
closedBluestore throttler causes down OSD
0%
Description
Writing large amount of data to EC RBD pool via NBD causes down OSDs, PGs and drop in traffic due to unhealthy cluster. OSDs themself are running, disks seem to be idle. In logs "heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f05d4c90700' had timed out after 60" (I increased it from 15 during testing) and slow requests can be observed.
Setting bluestore_throttle_bytes to 0 resolves issue.
Attaching gdb thread backtrace of one of OSDs.
Files
Updated by Henrik Korkuc over 6 years ago
just an update - sometimes even with bluestore_throttle_bytes set to 0 I get down OSDs, but it is much more rare and usually recovers
Updated by Sage Weil over 6 years ago
Can you try setting bluestore_deferred_throttle_bytes = 0 along with bluestore_throttle_bytes = 0 and see if that resolves it? Thanks!
Updated by Sage Weil over 6 years ago
- Related to Bug #21171: bluestore: aio submission deadlock added
Updated by Henrik Korkuc over 6 years ago
pool used for this workload is blocked by down PG (#21287), but I'll try to replicate on same cluster with newly created pool
Updated by Sage Weil over 6 years ago
- Status changed from Need More Info to Resolved
Pretty sure this was #21171, fixed merged to master and luminous, will be in 12.2.1.