Bug #18373
osd: repop vs push race
% Done:
0%
Source:
Tags:
Backport:
kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
osd.0 creates 1:d117688f:::smithi10313463-24:32c at 904'633 by cloning from head.
then it pushes a recovery op to osd.5 for 1:d117688f:::smithi10313463-24:329 with clone_subset: {1:d117688f:::smithi10313463-24:32c=[0~3815420]}) (cloning from the fresh clone).
on osd.5, we get the repop, and queue it at prio 127
2016-12-30 13:16:39.887474 7f7b32e7b700 15 osd.5 907 enqueue_op 0x7f7b4ac40c60 prio 127 cost 1260 latency 0.000052 osd_repop(client.4136.0:8854 1.b) v1
but and then the push op at prio 63
2016-12-30 13:16:39.888447 7f7b32e7b700 15 osd.5 907 enqueue_op 0x7f7b4ac42d00 prio 63 cost 1000 latency 0.000068 MOSDPGPush(1.b 907 [PushOp(1:d117688f:::smithi10313463-24:329, version: 904'632, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:d117688f:::smithi10313463-24:329@904'632, size: 3815420, copy_subset: [], clone_subset: {1:d117688f:::smithi10313463-24:32c=[0~3815420]}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2
and the push op gets dequeued first,
2016-12-30 13:16:39.939069 7f7b13892700 10 osd.5 907 dequeue_op 0x7f7b4ac42d00 prio 63 cost 1000 latency 0.050689 MOSDPGPush(1.b 907 [PushOp(1:d117688f:::smithi10313463-24:329, version: 904'632, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:d117688f:::smithi10313463-24:329@904'632, size: 3815420, copy_subset: [], clone_subset: {1:d117688f:::smithi10313463-24:32c=[0~3815420]}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 pg pg[1.b( v 904'633 lc 902'631 (0'0,904'633] local-les=907 n=2 ec=10 les/c/f 907/902/0 905/906/488) [0,2,5] r=2 lpr=907 pi=901-905/2 luod=0'0 crt=904'633 active m=1]
and we crash a bit later with
-48> 2016-12-30 13:16:39.939860 7f7b13892700 -1 bluestore(/var/lib/ceph/osd/ceph-5) ENOENT on clone suggests osd bug
/a/teuthology-2016-12-29_11:30:03-rados-kraken-distro-basic-smithi/674738
Related issues
History
#1 Updated by Sage Weil about 7 years ago
- Backport set to kraken
#2 Updated by Samuel Just about 7 years ago
- Duplicates Bug #17831: osd: ENOENT on clone added
#3 Updated by Samuel Just about 7 years ago
- Status changed from 12 to Duplicate