Project

General

Profile

Bug #18373

osd: repop vs push race

Added by Sage Weil about 7 years ago. Updated about 7 years ago.

Status:
Duplicate
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

osd.0 creates 1:d117688f:::smithi10313463-24:32c at 904'633 by cloning from head.

then it pushes a recovery op to osd.5 for 1:d117688f:::smithi10313463-24:329 with clone_subset: {1:d117688f:::smithi10313463-24:32c=[0~3815420]}) (cloning from the fresh clone).

on osd.5, we get the repop, and queue it at prio 127

2016-12-30 13:16:39.887474 7f7b32e7b700 15 osd.5 907 enqueue_op 0x7f7b4ac40c60 prio 127 cost 1260 latency 0.000052 osd_repop(client.4136.0:8854 1.b) v1

but and then the push op at prio 63
2016-12-30 13:16:39.888447 7f7b32e7b700 15 osd.5 907 enqueue_op 0x7f7b4ac42d00 prio 63 cost 1000 latency 0.000068 MOSDPGPush(1.b 907 [PushOp(1:d117688f:::smithi10313463-24:329, version: 904'632, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:d117688f:::smithi10313463-24:329@904'632, size: 3815420, copy_subset: [], clone_subset: {1:d117688f:::smithi10313463-24:32c=[0~3815420]}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2

and the push op gets dequeued first,
2016-12-30 13:16:39.939069 7f7b13892700 10 osd.5 907 dequeue_op 0x7f7b4ac42d00 prio 63 cost 1000 latency 0.050689 MOSDPGPush(1.b 907 [PushOp(1:d117688f:::smithi10313463-24:329, version: 904'632, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:d117688f:::smithi10313463-24:329@904'632, size: 3815420, copy_subset: [], clone_subset: {1:d117688f:::smithi10313463-24:32c=[0~3815420]}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 pg pg[1.b( v 904'633 lc 902'631 (0'0,904'633] local-les=907 n=2 ec=10 les/c/f 907/902/0 905/906/488) [0,2,5] r=2 lpr=907 pi=901-905/2 luod=0'0 crt=904'633 active m=1]

and we crash a bit later with
   -48> 2016-12-30 13:16:39.939860 7f7b13892700 -1 bluestore(/var/lib/ceph/osd/ceph-5) ENOENT on clone suggests osd bug

/a/teuthology-2016-12-29_11:30:03-rados-kraken-distro-basic-smithi/674738


Related issues

Duplicates Ceph - Bug #17831: osd: ENOENT on clone Resolved 11/08/2016

History

#1 Updated by Sage Weil about 7 years ago

  • Backport set to kraken

#2 Updated by Samuel Just about 7 years ago

  • Duplicates Bug #17831: osd: ENOENT on clone added

#3 Updated by Samuel Just about 7 years ago

  • Status changed from 12 to Duplicate

Also available in: Atom PDF