Bug #10847
closedstuck recovering, MOSDPGPush took 25 minutes from send to recieve
0%
Description
2015-02-11 09:49:17.972106 7f050f0c8700 20 osd.1 pg_epoch: 41 pg[1.12( v 29'12 (0'0,29'12] local-les=34 n=9 ec=8 les/c 34/22 32/32/32) [4,1,3] r=1 lpr=32 pi=21-31/4 luod=0'0 crt=22'8 lcod 29'11 active] send_pushes: sending push PushOp(921295f2/benchmark_data_burnupi27_14622_object153/head//1, version: 22'8, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery
_info: ObjectRecoveryInfo(921295f2/benchmark_data_burnupi27_14622_object153/head//1@22'8, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)) to osd.4
2015-02-11 09:49:17.972148 7f050f0c8700 1 -- 10.214.135.36:6805/13635 --> 10.214.137.128:6809/25506 -- MOSDPGPush(1.12 41 [PushOp(921295f2/benchmark_data_burnupi27_14622_object153/head//1, version: 22'8, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(921295f2/benchmark_data_burnupi27_14622_object153/head//1@22'8, copy_subset: [
0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 -- ?+0 0x4356000 con 0x441cdc0
2015-02-11 09:49:17.972174 7f050f0c8700 10 osd.1 41 dequeue_op 0x4779000 finish
...
2015-02-11 10:16:55.501102 7f69bed51700 1 -- 10.214.137.128:6809/25506 <== osd.1 10.214.135.36:6805/13635 1861 ==== MOSDPGPush(1.12 41 [PushOp(921295f2/benchmark_data_burnupi27_14622_object153/head//1, version: 22'8, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(921295f2/benchmark_data_burnupi27_14622_object153/head//1@22'8, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 ==== 1049498+0+0 (4092764375 0 0) 0x6389a00 con 0x5a018c0
2015-02-11 10:16:55.501210 7f69bed51700 10 osd.4 41 handle_replica_op MOSDPGPush(1.12 41 [PushOp(921295f2/benchmark_data_burnupi27_14622_object153/head//1, version: 22'8, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(921295f2/benchmark_data_burnupi27_14622_object153/head//1@22'8, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:1048576, data_complete:false, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) v2 epoch 41
ubuntu@teuthology:/a/samuelj-2015-02-10_21:50:39-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/749893/remote
wip-sam-testing, but I don't think it's related
Updated by Samuel Just about 9 years ago
ea5d1b370e534520ad686d3764bbe269c08cec8a
Saved as wip-sam-testing-10847
Updated by Sage Weil about 9 years ago
- Status changed from New to In Progress
- Assignee set to Sage Weil
Updated by Sage Weil about 9 years ago
the recovery message is queued with a lower priority. it looks like it starved.
Updated by Sage Weil about 9 years ago
- Status changed from In Progress to 12
- Assignee deleted (
Sage Weil) - Source changed from other to Q/A
Updated by Samuel Just almost 9 years ago
- Priority changed from Urgent to High
- Regression set to No
Updated by Samuel Just over 7 years ago
- Status changed from 12 to Can't reproduce