Project

General

Profile

Actions

Bug #21475

closed

12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropping ping request

Added by Nokia ceph-users over 6 years ago. Updated over 6 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-deploy
Component(RADOS):
BlueStore
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

~~~

2017-09-18 14:51:59.895746 7f1e744e0700 0 log_channel(cluster) log [WRN] : slow request 60.068824 seconds old, received at 2017-09-18 14:50:59.826849: MOSDECSubOpWriteReply(1.132s0 1350/1344 ECSubWriteReply(tid=971, last_complete=1350'153, committed=1, applied=0)) currently queued_for_pg
2017-09-18 14:51:59.895749 7f1e744e0700 0 log_channel(cluster) log [WRN] : slow request 60.068737 seconds old, received at 2017-09-18 14:50:59.826936: MOSDECSubOpWriteReply(1.132s0 1350/1344 ECSubWriteReply(tid=971, last_complete=0'0, committed=0, applied=1)) currently queued_for_pg
2017-09-18 14:51:59.895754 7f1e744e0700 0 log_channel(cluster) log [WRN] : slow request 60.067539 seconds old, received at 2017-09-18 14:50:59.828134: MOSDECSubOpWriteReply(1.132s0 1350/1344 ECSubWriteReply(tid=971, last_complete=1350'153, committed=1, applied=0)) currently queued_for_pg
2017-09-18 14:51:59.923825 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 1359 k (1083 k + 276 k)
2017-09-18 14:51:59.923835 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 1066 k (1066 k + 0 )
2017-09-18 14:51:59.923837 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 643 k (643 k + 0 )
2017-09-18 14:51:59.923840 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 1049 k (1049 k + 0 )
2017-09-18 14:51:59.923842 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 896 k (896 k + 0 )
2017-09-18 14:51:59.940780 7f1e77ca5700 20 osd.181 1350 share_map_peer 0x7f1e8dbf2800 already has epoch 1350
2017-09-18 14:51:59.940855 7f1e78ca7700 20 osd.181 1350 share_map_peer 0x7f1e8dbf2800 already has epoch 1350
2017-09-18 14:52:00.081390 7f1e6f572700 20 osd.181 1350 OSD::ms_dispatch: ping magic: 0 v1
2017-09-18 14:52:00.081393 7f1e6f572700 10 osd.181 1350 do_waiters -- start
2017-09-18 14:52:00.081394 7f1e6f572700 10 osd.181 1350 do_waiters -- finish
2017-09-18 14:52:00.081395 7f1e6f572700 20 osd.181 1350 _dispatch 0x7f1e90923a40 ping magic: 0 v1
2017-09-18 14:52:00.081397 7f1e6f572700 10 osd.181 1350 ping from client.414556
2017-09-18 14:52:00.123908 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 1359 k (1083 k + 276 k)
2017-09-18 14:52:00.123926 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 1066 k (1066 k + 0 )
2017-09-18 14:52:00.123932 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 643 k (643 k + 0 )
2017-09-18 14:52:00.123937 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 1049 k (1049 k + 0 )
2017-09-18 14:52:00.123942 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 896 k (896 k + 0 )
2017-09-18 14:52:00.145445 7f1e784a6700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f1e61cbb700' had timed out after 60
2017-09-18 14:52:00.145450 7f1e784a6700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f1e624bc700' had timed out after 60
2017-09-18 14:52:00.145496 7f1e784a6700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f1e63cbf700' had timed out after 60
2017-09-18 14:52:00.145534 7f1e784a6700 10 osd.181 1350 internal heartbeat not healthy, dropping ping request
2017-09-18 14:52:00.146224 7f1e78ca7700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f1e61cbb700' had timed out after 60
2017-09-18 14:52:00.146226 7f1e78ca7700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f1e624bc700' had timed out after 60

~~~

Workaround -

~~
bluestore_deferred_throttle_bytes = 0
bluestore_throttle_bytes = 0
~~

Attached gcore dump with this ticket.


Files

gdb.txt.gz (6.73 KB) gdb.txt.gz Nokia ceph-users, 09/20/2017 11:16 AM

Related issues 1 (0 open1 closed)

Related to RADOS - Bug #21171: bluestore: aio submission deadlockResolvedSage Weil08/29/2017

Actions
Actions #1

Updated by Nokia ceph-users over 6 years ago

Seems, its a duplicate of this tracker http://tracker.ceph.com/issues/21180 . Please verify..

Actions #2

Updated by Sage Weil over 6 years ago

  • Status changed from New to Duplicate
Actions #3

Updated by Sage Weil over 6 years ago

  • Related to Bug #21171: bluestore: aio submission deadlock added
Actions

Also available in: Atom PDF