Project

General

Profile

Actions

Bug #35075

open

copy-get stuck sending osd_op

Added by Sage Weil over 5 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2018-08-30 21:12:16.918 7f73758d3700 10 osd.7 pg_epoch: 581 pg[2.12( v 568'662 (293'372,568'662] local-lis/les=567/568 n=37 ec=474/18 lis/c 567/548 les/c/f 568/549/0 566/567/526) [7,3]/[7,4] backfill=[3] r=0 lpr=567 pi=[548,567)/1 bft=3 crt=568'662 lcod 568'661 mlcod 568'661 active+remapped+backfill_toofull mbc={} trimq=71 ps=[11e~1,148~3,14c~1,157~1,15b~1]] _copy_some obc(2:4a49aa23:::smithi13436822-966:head rwstate(write n=1 w=0)) 0x558e612f9810

but the objecter op doesn't get sent until way later,
2018-08-31 00:01:20.472 7f737f8e7700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- osd_op(osd.7.13:98 2.17 2:ec7202bc:::smithi13436822-95 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [assert-version v228,copy-get max 8388608] snapc 0=[] ondisk+read+known_if_redirected e4657) v8 -- ?+0 0x558e47d082c0 con 0x558e45178c00

/a/sage-2018-08-30_13:54:10-rados-wip-sage2-testing-2018-08-29-1402-distro-basic-smithi/2959744

There are some objecter pings going out...

2018-08-30 21:11:48.613 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6813/31160 -- ping magic: 0 v1 -- ?+0 0x558e48f7c700 con 0x558e49663300
2018-08-30 21:12:18.619 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4e1dc1c0 con 0x558e4775a200
2018-08-30 21:12:23.620 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4e1dce00 con 0x558e4775a200
2018-08-30 21:12:28.621 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e49849a40 con 0x558e4775a200
2018-08-30 21:12:33.622 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e49ca8000 con 0x558e4775a200
2018-08-30 21:12:38.623 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4f8aea80 con 0x558e4775a200
2018-08-30 21:12:43.624 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e479b0fc0 con 0x558e4775a200
2018-08-30 21:12:48.625 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e48629340 con 0x558e4775a200
2018-08-30 21:12:53.626 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4fd1d880 con 0x558e4775a200
2018-08-30 21:12:58.627 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4bd79340 con 0x558e4775a200
2018-08-30 21:13:03.628 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4bd79180 con 0x558e4775a200
2018-08-30 21:13:08.629 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4bd78fc0 con 0x558e4775a200
2018-08-30 21:13:13.630 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4bd78e00 con 0x558e4775a200
2018-08-30 21:13:18.631 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4890f6c0 con 0x558e4775a200
2018-08-30 21:13:23.632 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e492996c0 con 0x558e4775a200
2018-08-30 21:13:28.633 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e479b0fc0 con 0x558e4775a200
2018-08-30 21:13:33.634 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e479b0c40 con 0x558e4775a200
2018-08-30 21:13:38.635 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e479b1340 con 0x558e4775a200
2018-08-30 21:13:43.636 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e6dcbc000 con 0x558e4775a200
2018-08-30 21:13:48.636 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e6dcbc380 con 0x558e4775a200
...
2018-08-31 00:00:40.464 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e6dcaf180 con 0x558e4775a200
2018-08-31 00:00:45.465 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e60d79340 con 0x558e4775a200
2018-08-31 00:00:50.466 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e4b2eddc0 con 0x558e4775a200
2018-08-31 00:00:55.467 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e6e73d6c0 con 0x558e4775a200
2018-08-31 00:01:00.468 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e7325ee00 con 0x558e4775a200
2018-08-31 00:01:05.469 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e7167bc00 con 0x558e4775a200
2018-08-31 00:01:10.470 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e6e75fdc0 con 0x558e4775a200
2018-08-31 00:01:15.471 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e6e73b340 con 0x558e4775a200
2018-08-31 00:01:20.472 7f7397b19700  1 -- 172.21.15.44:0/31160 --> 172.21.15.44:6809/31159 -- ping magic: 0 v1 -- ?+0 0x558e5077ce00 con 0x558e4775a200

and that time range maps to the period between when the copy starts and when the op is finally sent.

maybe hitting some objecter throttle loop?

Actions #1

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions

Also available in: Atom PDF