Project

General

Profile

Actions

Bug #36473

closed

hung osd_repop, bluestore committed but failed to trigger repop_commit

Added by Sage Weil over 5 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/sage-2018-10-16_18:31:27-rados-wip-sage-testing-2018-10-16-0724-distro-basic-smithi/3148851

Usually after the bluestre commit succeeds the op wq wakes up and calls something like

2018-10-16 20:57:10.468 7f26725c0700 10 osd.2 pg_epoch: 45 pg[1.6( v 45'14646 (45'14623,45'14646] local-lis/les=29/30 n=12628 ec=9/9 lis/c 29/29 les/c/f 30/30/0 26/29/29) [1,2] r=1 lpr=29 luod=0'0 lua=25'12628 crt=45'14646 lcod 45'14632 active mbc={}] repop_commit on op osd_repop(osd.1.0:64467 1.6 e45/29 1:64f1278e:::benchmark_data_smithi176_34325_object9600:head v 45'14634) v2, sending commit to osd.1
2018-10-16 20:57:10.468 7f26725c0700 20 osd.2 45 share_map_peer 0x55f95b2c5080 already has epoch 45
2018-10-16 20:57:10.468 7f26725c0700  1 -- 172.21.15.176:6806/31292 --> 172.21.15.176:6802/39730 -- osd_repop_reply(osd.1.0:64467 1.6 e45/29 ondisk, result = 0) v2 -- ?+0 0x55f97436a780 con 0x55f95b2c5080

but for this op it didn't:
2018-10-16 20:57:41.245 7f268c5f4700 20 slow op osd_repop(osd.1.0:64485 1.6 e45/29 1:64f950b4:::benchmark_data_smithi176_38726_object47030:head v 45'14652) initiated 2018-10-16 20:57:10.470116
2018-10-16 20:57:41.245 7f268c5f4700 -1 osd.2 45 get_health_metrics reporting 1 slow ops, oldest is osd_repop(osd.1.0:64485 1.6 e45/29 1:64f950b4:::benchmark_data_smithi176_38726_object47030:head v 45'14652)

maybe a race condition on the queue wakeup?


Related issues 2 (0 open2 closed)

Has duplicate bluestore - Bug #36284: Bluestore might be hanging OSDDuplicate10/01/2018

Actions
Copied to RADOS - Backport #45025: mimic: hung osd_repop, bluestore committed but failed to trigger repop_commitRejectedActions
Actions

Also available in: Atom PDF