Project

General

Profile

Bug #23006

repair_test.yaml fails reproducibly in jewel integration testing

Added by Nathan Cutler 11 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
02/15/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

Failure reason looks like this: "2018-02-07 09:08:23.888224 osd.3 172.21.15.148:6808/25279 13 : cluster [ERR] 5.0 : soid 5:a0216fbc:::repair_test_obj:head size 256 != size 1 from shard 3" in cluster log

The failure is reproducible in the current 10.2.11 integration branch (wip-jewel-backports), and a similar failure appeared in a 10.2.6 integration run.

See http://tracker.ceph.com/issues/21742#note-14 for the list of PRs included in the 10.2.11 integration branch.

Logs 10.2.11:

The 10.2.6 failure text was ""2017-01-14 09:13:29.649276 osd.3 172.21.15.44:6800/865390 12 : cluster [ERR] 5.0 shard 3: soid 5:a0216fbc:::repair_test_obj:head size 1 != size 223 from auth oi 5:a0216fbc:::repair_test_obj:head(20'1 client.4352.0:1 dirty|data_digest|omap_digest s 223 uv 1 dd 9a3a59aa od ffffffff)" in cluster log"

Log 10.2.6:

History

#1 Updated by Nathan Cutler 11 months ago

@David, could you have a look? Are these two failures even related?

#2 Updated by Nathan Cutler 11 months ago

  • Description updated (diff)

#3 Updated by Nathan Cutler 11 months ago

  • Description updated (diff)

#4 Updated by Nathan Cutler 11 months ago

  • Description updated (diff)

#5 Updated by Nathan Cutler 11 months ago

  • Description updated (diff)

#6 Updated by Nathan Cutler 11 months ago

  • Subject changed from repair_test.yaml sometimes fails in jewel to repair_test.yaml fails reproducibly in jewel integration testing
  • Description updated (diff)

#7 Updated by David Zafman 11 months ago

When I do what I think is an equivalent truncation of a shard on current jewel, I get a single line showing all the errors:

2018-02-15 17:39:51.305625 7fef06ed5700 10 log_client logged 2018-02-15 17:39:49.264333 osd.1 127.0.0.1:6804/71530 14 : cluster [ERR] 1.0 shard 1: soid 1:602f83fe:::foo:head data_digest 0x7c5a95e8 != data_digest 0xe850d09b from shard 0, data_digest 0x7c5a95e8 != data_digest 0xe850d09b from auth oi 1:602f83fe:::foo:head(12'36 client.4114.0:36 dirty|data_digest|omap_digest s 147300440 uv 36 dd e850d09b od ffffffff alloc_hint [0 0]), size 1 != size 147300440 from auth oi 1:602f83fe:::foo:head(12'36 client.4114.0:36 dirty|data_digest|omap_digest s 147300440 uv 36 dd e850d09b od ffffffff alloc_hint [0 0]), size 1 != size 147300440 from shard 0

In the teuthology log of one of the runs in this tracker, we see 2 lines. The yaml only ignores lines with "size 1 != size"

2018-02-07T09:39:29.261 INFO:tasks.ceph.osd.3.smithi035.stderr:2018-02-07 09:39:29.263169 7f2086feb700 -1 log_channel(cluster) log [ERR] : 5.0 shard 3: soid 5:a0216fbc:::repair_test_obj:head data_digest 0x6f0a661c != data_digest 0x3d1d6f0 from auth oi 5:a0216fbc:::repair_test_obj:head(22'1 client.4257.0:1 dirty|data_digest|omap_digest s 258 uv 1 dd 3d1d6f0 od ffffffff alloc_hint [0 0]), size 1 != size 258 from auth oi 5:a0216fbc:::repair_test_obj:head(22'1 client.4257.0:1 dirty|data_digest|omap_digest s 258 uv 1 dd 3d1d6f0 od ffffffff alloc_hint [0 0])
2018-02-07T09:39:29.261 INFO:tasks.ceph.osd.3.smithi035.stderr:2018-02-07 09:39:29.263178 7f2086feb700 -1 log_channel(cluster) log [ERR] : 5.0 : soid 5:a0216fbc:::repair_test_obj:head data_digest 0x3d1d6f0 != data_digest 0x6f0a661c from shard 3, size 258 != size 1 from shard 3

I'm not very concerned about this, but need to look at the exact code some more.

#8 Updated by David Zafman 11 months ago

I do reproduce this with the exact sha1 (a95e2c4958b256d75ea1c732c2f2cce45f024081) you are testing. One fix is to backport 5f58301 however, that introduces another minor problem which is fixed in pull request #20450.

I'm going to fix this by ignoring "size 258 != size" in yaml.

#9 Updated by David Zafman 11 months ago

  • Status changed from New to Need Review
  • Release set to jewel

#10 Updated by David Zafman 11 months ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF