Project

General

Profile

Actions

Bug #35075

open

copy-get stuck sending osd_op

Added by Sage Weil over 5 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2018-08-30 21:12:16.918 7f73758d3700 10 osd.7 pg_epoch: 581 pg[2.12( v 568'662 (293'372,568'662] local-lis/les=567/568 n=37 ec=474/18 lis/c 567/548 les/c/f 568/549/0 566/567/526) [7,3]/[7,4] backfill=[3] r=0 lpr=567 pi=[548,567)/1 bft=3 crt=568'662 lcod 568'661 mlcod 568'661 active+remapped+backfill_toofull mbc={} trimq=71 ps=[11e~1,148~3,14c~1,157~1,15b~1]] _copy_some obc(2:4a49aa23:::smithi13436822-966:head rwstate(write n=1 w=0)) 0x558e612f9810

but the objecter op doesn't get sent until way later,
2018-08-31 00:01:20.472 7f737f8e7700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- osd_op(osd.7.13:98 2.17 2:ec7202bc:::smithi13436822-95 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [assert-version v228,copy-get max 8388608] snapc 0=[] ondisk+read+known_if_redirected e4657) v8 -- ?+0 0x558e47d082c0 con 0x558e45178c00

/a/sage-2018-08-30_13:54:10-rados-wip-sage2-testing-2018-08-29-1402-distro-basic-smithi/2959744

There are some objecter pings going out...

2018-08-30 21:11:48.613 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6813/31160 -- ping magic: 0 v1 -- ?+0 0x558e48f7c700 con 0x558e49663300
2018-08-30 21:12:18.619 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4e1dc1c0 con 0x558e4775a200
2018-08-30 21:12:23.620 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4e1dce00 con 0x558e4775a200
2018-08-30 21:12:28.621 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e49849a40 con 0x558e4775a200
2018-08-30 21:12:33.622 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e49ca8000 con 0x558e4775a200
2018-08-30 21:12:38.623 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4f8aea80 con 0x558e4775a200
2018-08-30 21:12:43.624 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e479b0fc0 con 0x558e4775a200
2018-08-30 21:12:48.625 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e48629340 con 0x558e4775a200
2018-08-30 21:12:53.626 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4fd1d880 con 0x558e4775a200
2018-08-30 21:12:58.627 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4bd79340 con 0x558e4775a200
2018-08-30 21:13:03.628 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4bd79180 con 0x558e4775a200
2018-08-30 21:13:08.629 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4bd78fc0 con 0x558e4775a200
2018-08-30 21:13:13.630 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4bd78e00 con 0x558e4775a200
2018-08-30 21:13:18.631 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4890f6c0 con 0x558e4775a200
2018-08-30 21:13:23.632 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e492996c0 con 0x558e4775a200
2018-08-30 21:13:28.633 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e479b0fc0 con 0x558e4775a200
2018-08-30 21:13:33.634 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e479b0c40 con 0x558e4775a200
2018-08-30 21:13:38.635 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e479b1340 con 0x558e4775a200
2018-08-30 21:13:43.636 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e6dcbc000 con 0x558e4775a200
2018-08-30 21:13:48.636 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e6dcbc380 con 0x558e4775a200
...
2018-08-31 00:00:40.464 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e6dcaf180 con 0x558e4775a200
2018-08-31 00:00:45.465 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e60d79340 con 0x558e4775a200
2018-08-31 00:00:50.466 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4b2eddc0 con 0x558e4775a200
2018-08-31 00:00:55.467 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e6e73d6c0 con 0x558e4775a200
2018-08-31 00:01:00.468 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e7325ee00 con 0x558e4775a200
2018-08-31 00:01:05.469 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e7167bc00 con 0x558e4775a200
2018-08-31 00:01:10.470 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e6e75fdc0 con 0x558e4775a200
2018-08-31 00:01:15.471 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e6e73b340 con 0x558e4775a200
2018-08-31 00:01:20.472 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e5077ce00 con 0x558e4775a200

and that time range maps to the period between when the copy starts and when the op is finally sent.

maybe hitting some objecter throttle loop?

Actions

Also available in: Atom PDF