Project

General

Profile

Bug #9221

LibRadosTwoPoolsPP.PromoteOn2ndRead

Added by Sage Weil over 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2014-08-24T14:31:25.511 INFO:tasks.workunit.client.0.burnupi25.stdout:[ RUN      ] LibRadosTwoPoolsPP.PromoteOn2ndRead
2014-08-24T14:31:33.620 INFO:tasks.workunit.client.0.burnupi25.stdout:test/librados/tier.cc:2133: Failure
2014-08-24T14:31:33.620 INFO:tasks.workunit.client.0.burnupi25.stdout:Value of: it == cache_ioctx.objects_end()
2014-08-24T14:31:33.620 INFO:tasks.workunit.client.0.burnupi25.stdout:  Actual: false
2014-08-24T14:31:33.620 INFO:tasks.workunit.client.0.burnupi25.stdout:Expected: true
2014-08-24T14:31:35.534 INFO:tasks.workunit.client.0.burnupi25.stdout:[  FAILED  ] LibRadosTwoPoolsPP.PromoteOn2ndRead (10024 ms)

ubuntu@teuthology:/a/teuthology-2014-08-24_02:30:02-rados-next-testing-basic-multi/446787

Related issues

Duplicated by Ceph - Bug #9140: [ FAILED ] LibRadosTwoPoolsPP.PromoteOn2ndRead (9913 ms) Duplicate 08/15/2014

Associated revisions

Revision c7e1b9e1 (diff)
Added by Sage Weil over 9 years ago

ceph_test_rados_api_tier: make PromoteOn2ndRead test tolerate retries

If there is an ill-timed connection reset our read could get sent twice.
Weaken our assertion if the read was slow to tolerate this case.

Fixes: #9221
Signed-off-by: Sage Weil <>

Revision 347ac0f8 (diff)
Added by Sage Weil over 8 years ago

ceph_test_rados_api_tier: make PromoteOn2ndRead tolerate thrashing

Repeate the test up to 20 times until we get a read that doesn't
trigger promote.

Fixes: #9221 (again)
Signed-off-by: Sage Weil <>

History

#1 Updated by Sage Weil over 9 years ago

remote/burnupi27/log/ceph-osd.5.log:2014-08-24 14:31:33.247455 7f27effbb700  1 -- 10.214.135.36:6803/14471 <== client.4283 10.214.134.2:0/3033066 460 ==== osd_op(client.4283.0:2537 test-rados-api-burnupi25-33066-24/foo [read 0~1] 83.43c091f1 ack+read+known_if_redirected e534) v4 ==== 187+0+0 (454238802 0 0) 0x3828780 con 0x3555600
remote/burnupi27/log/ceph-osd.5.log:2014-08-24 14:31:33.262267 7f27effbb700  1 -- 10.214.135.36:6803/14471 <== client.4283 10.214.134.2:0/3033066 1 ==== osd_op(client.4283.0:2537 test-rados-api-burnupi25-33066-24/foo [read 0~1] 83.43c091f1 RETRY=1 ack+retry+read+known_if_redirected e534) v4 ==== 187+0+0 (1943489374 0 0) 0x4953400 con 0x3555fa0

#2 Updated by Sage Weil over 9 years ago

  • Status changed from New to Fix Under Review

#3 Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Resolved

#4 Updated by Sage Weil over 8 years ago

  • Status changed from Resolved to 12
  • Regression set to No

This can still happen if there is a connection reset... and the client can reconnect quickly so the previous workaround is not sufficient.

Weaken/remove the test? Or try multiple times? Or make Objecter have a "don't resend this op" flag? :/

/a/sage-2015-08-19_19:13:53-rados-wip-sage-testing-distro-basic-multi/1023237

remote/mira012/log/ceph-osd.1.log.gz:2015-08-20 06:53:19.248960 7f82c1f27700  1 -- 10.214.131.116:6808/17657 <== client.4305 10.214.131.116:0/3022407 1 ==== osd_op(client.4305.0:3456 test-rados-api-mira012-22407-44/foo [read 0~1] 109.37ccae ack+read+known_if_redirected e614) v5 ==== 193+0+0 (3571128857 0 0) 0x7f82f0519680 con 0x7f82f0b842c0
remote/plana73/log/ceph-osd.5.log.gz:2015-08-20 06:53:19.249186 7fea69958700  1 -- 10.214.132.5:6800/8845 <== osd.1 10.214.131.116:0/17657 1 ==== osd_op(osd.1.333:311 test-rados-api-mira012-22407-44/foo [read 0~1] 90.37ccae ack+read+ignore_cache+ignore_overlay+known_if_redirected e614) v5 ==== 193+0+0 (1666144091 0 0) 0x7fea95f88c80 con 0x7fea971dc940
remote/mira012/log/ceph-osd.1.log.gz:2015-08-20 06:53:19.289014 7f82c1f27700  1 -- 10.214.131.116:6808/17657 <== client.4305 10.214.131.116:0/3022407 1 ==== osd_op(client.4305.0:3456 test-rados-api-mira012-22407-44/foo [read 0~1] 109.37ccae RETRY=1 ack+retry+read+known_if_redirected e614) v5 ==== 193+0+0 (2983510787 0 0) 0x7f82f0518280 con 0x7f82f0563fa0
remote/plana73/log/ceph-osd.5.log.gz:2015-08-20 06:53:19.289164 7fea69958700  1 -- 10.214.132.5:6800/8845 <== osd.1 10.214.131.116:0/17657 2 ==== osd_op(osd.1.333:312 test-rados-api-mira012-22407-44/foo [read 0~1] 90.37ccae ack+read+ignore_cache+ignore_overlay+known_if_redirected e614) v5 ==== 193+0+0 (1666144091 0 0) 0x7fea9607be00 con 0x7fea971dc940
remote/plana73/log/ceph-osd.5.log.gz:2015-08-20 06:53:19.289533 7fea69958700  1 -- 10.214.132.5:6800/8845 <== osd.1 10.214.131.116:0/17657 3 ==== osd_op(osd.1.333:313 test-rados-api-mira012-22407-44/foo@snapdir [list-snaps] 90.37ccae ack+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e614) v5 ==== 193+0+0 (2046055422 0 0) 0x7fea95f88a00 con 0x7fea971dc940
remote/plana73/log/ceph-osd.5.log.gz:2015-08-20 06:53:19.289639 7fea69958700  1 -- 10.214.132.5:6800/8845 <== osd.1 10.214.131.116:0/17657 4 ==== osd_op(osd.1.333:314 test-rados-api-mira012-22407-44/foo [copy-get max 8388608] 90.37ccae ack+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e614) v5 ==== 193+0+29 (1186049805 0 750716084) 0x7fea95f8a580 con 0x7fea971dc940
...

#5 Updated by Sage Weil over 8 years ago

  • Status changed from 12 to In Progress
  • Assignee set to Sage Weil

#6 Updated by Sage Weil over 8 years ago

  • Status changed from In Progress to 7

#7 Updated by Sage Weil over 8 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF