Project

General

Profile

Actions

Bug #12983

closed

handle_sub_read: Bad hash for 116/b1271c9d:test-rados-api-burnupi34-7183-66/foo/head

Added by Sage Weil over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2015-09-06T23:36:57.664 INFO:tasks.ceph.osd.4.plana34.stderr:2015-09-06 23:36:57.655690 7f6f4a868700 -1 log_channel(cluster) log [ERR] : handle_sub_read: Bad hash for 116/b1271c9d:test-rados-api-burnupi34-7183-66/foo/head digest
 0x56d5a3a6 expected 0xcf9b62cb
2015-09-06T23:36:57.673 INFO:tasks.ceph.osd.0.burnupi34.stderr:2015-09-06 23:36:57.674749 7f1539cb7700 -1 log_channel(cluster) log [ERR] : handle_sub_read: Bad hash for 116/b1271c9d:test-rados-api-burnupi34-7183-66/foo/head dige
st 0x56d5a3a6 expected 0xcf9b62cb
2015-09-06T23:36:57.675 INFO:tasks.ceph.osd.1.burnupi34.stderr:2015-09-06 23:36:57.675847 7f9b6ca8f700 -1 osd.1 pg_epoch: 834 pg[123.5( empty local-les=828 n=0 ec=827 les/c 828/828 827/827/827) [1,2] r=0 lpr=827 crt=0'0 mlcod 0'
0 active+clean NIBBLEWISE] finish_promote unexpected promote error (5) Input/output error
2015-09-06T23:36:57.675 INFO:tasks.workunit.client.0.burnupi34.stdout:test/librados/tier.cc:3117: Failure
2015-09-06T23:36:57.675 INFO:tasks.workunit.client.0.burnupi34.stdout:Value of: ioctx.read("foo", bl, 1, 0)
2015-09-06T23:36:57.675 INFO:tasks.workunit.client.0.burnupi34.stdout:  Actual: -5
2015-09-06T23:36:57.675 INFO:tasks.workunit.client.0.burnupi34.stdout:Expected: 1
2015-09-06T23:37:01.035 INFO:tasks.workunit.client.0.burnupi34.stdout:[  FAILED  ] LibRadosTwoPoolsECPP.EvictSnap (17607 ms)


/a/sage-2015-09-06_20:14:37-rados-wip-sage-testing---basic-multi/1045602

Related issues 2 (0 open2 closed)

Related to Ceph - Bug #12967: osd/ECBackend.cc: 201: FAILED assert(res.errors.empty())DuplicateDavid Zafman09/05/2015

Actions
Related to Ceph - Bug #13016: For testing should add config to report errors on any EC shard in error ResolvedDavid Zafman09/09/2015

Actions
Actions #1

Updated by Loïc Dachary over 8 years ago

This job run is mostly to convince myself that it runs ok on infernalis. It can be killed if it is known to fail.

filter='rados/thrash/{hobj-sort.yaml 0-size-min-size-overrides/2-size-1-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml clusters/fixed-2.yaml fs/ext4.yaml msgr/async.yaml msgr-failures/fastclose.yaml thrashers/default.yaml workloads/rados_api_tests.yaml}'
./virtualenv/bin/teuthology-suite --priority 101 --suite rados --filter="$filter" --suite-branch infernalis --distro ubuntu --email loic@dachary.org --ceph infernalis --machine-type plana,burnupi,mira
teuthology-openstack --throttle 5 --verbose --simultaneous-jobs 50 --key-name myself --key-filename ~/Downloads/myself --priority 101 --suite rados --filter="$filter" --suite-branch infernalis --distro ubuntu --email loic@dachary.org --ceph infernalis
Actions #2

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
Actions #3

Updated by Loïc Dachary over 8 years ago

Difference between the wip-sage-testing and infernalis

$ git --no-pager log --stat --oneline --graph ceph/infernalis..2d7e4803038ccac959659ebada301d78e77bb0b1
* 2d7e480 Merge remote-tracking branch 'gh/master' into wip-sage-testing
*   e01c90c Merge branch 'wip-12861' of https://github.com/liewegas/ceph into wip-sage-testing
|\  
| * 64962aa qa/workunits/rados/test_alloc_hint.sh: sudo to list files
|    qa/workunits/rados/test_alloc_hint.sh | 2 +-
|    1 file changed, 1 insertion(+), 1 deletion(-)
*   b63e308 Merge branch 'tmap-update' of git://github.com/wonzhq/ceph into wip-sage-testing
|\  
| * f5bf2c5 osd: force promote for ops which ec base pool can't handle
|    src/osd/OSD.cc | 58 +++++++++++++++++++++++++++++++++++++++++-----------------
|    1 file changed, 41 insertions(+), 17 deletions(-)
*   430ce9d Merge branch 'wip-12912' of git://github.com/yuyuyu101/ceph into wip-sage-testing
|\  
| * 9cc1055 AsyncConnection: Close connection when unregistered connection met WAIT
|    src/msg/async/AsyncConnection.cc | 4 ++++
|    1 file changed, 4 insertions(+)
* 80636ba Merge branch 'wip-scrub-promote' of git://github.com/XinzeChi/ceph into wip-sage-testing
* f77949f bug fix: osd: requeue_scrub when kick_object_context_blocked
   src/osd/ReplicatedPG.cc | 15 +++++++--------
   1 file changed, 7 insertions(+), 8 deletions(-)

Actions #4

Updated by Sage Weil over 8 years ago

  • Assignee set to David Zafman
Actions #5

Updated by Sage Weil over 8 years ago

/a/sage-2015-09-07_18:07:23-rados-wip-sage-testing---basic-multi/1046245

Actions #6

Updated by David Zafman over 8 years ago

/a/sage-2015-09-07_18:07:23-rados-wip-sage-testing---basic-multi/1046245

2015-09-07 22:57:03.865461 7f163e0e4700 -1 log_channel(cluster) log [ERR] : handle_sub_read: Bad hash for 116/eb77ba0a:test-rados-api-mira033-22974-66/bam/head digest 0x56d5a3a6 expected 0xcf9b62cb
2015-09-07 22:57:03.863327 7fd18d474700 -1 log_channel(cluster) log [ERR] : handle_sub_read: Bad hash for 116/eb77ba0a:test-rados-api-mira033-22974-66/bam/head digest 0x56d5a3a6 expected 0xcf9b62cb

This logging shows EIO from a 2 out of 3 shards. errors={2(2)=-5,3(0)=-5} due to bad hashes.

It is interesting that the expected digest is the same for both shards and that the computed digest of both shards are the same.

Actions #7

Updated by Sage Weil over 8 years ago

/a/sage-2015-09-15_18:56:52-rados-wip-sage-testing---basic-multi/1058556

Actions #8

Updated by David Zafman over 8 years ago

Compared to a simple run of just the single test case there is an extra setattr. In addition we can see the extra sequencer op count 14530.1.8.

2015-09-07 22:57:00.390817 7fd194be7700 15 filestore(/var/lib/ceph/osd/ceph-3) setattrs 116.2s0_head/0:116/eb77ba0a:test-rados-api-mira033-22974-66/bam/head
2015-09-07 22:57:00.390882 7fd194be7700 10 filestore(/var/lib/ceph/osd/ceph-3) setattrs 116.2s0_head/0:116/eb77ba0a:test-rados-api-mira033-22974-66/bam/head = 0

2015-09-07 22:57:00.390922 7fd194be7700 20 filestore(/var/lib/ceph/osd/ceph-3) fgetattrs 164 getting 'hinfo_key'
2015-09-07 22:57:00.390937 7fd194be7700 15 filestore(/var/lib/ceph/osd/ceph-3) setattrs 116.2s0_head/0:116/eb77ba0a:test-rados-api-mira033-22974-66/bam/head
2015-09-07 22:57:00.391213 7fd194be7700 20 filestore dbobjectmap: seq is 1779
2015-09-07 22:57:00.391308 7fd194be7700 20 filestore set_map_header: setting 1778 oid 0:116/eb77ba0a:test-rados-api-mira033-22974-66/bam/head parent seq 0
2015-09-07 22:57:00.391341 7fd194be7700 10 filestore oid: 0:116/eb77ba0a:test-rados-api-mira033-22974-66/bam/head not skipping op, *spos 14530.1.6
2015-09-07 22:57:00.391344 7fd194be7700 10 filestore  > header.spos 0.0.0
2015-09-07 22:57:00.391418 7fd194be7700 10 filestore(/var/lib/ceph/osd/ceph-3) setattrs 116.2s0_head/0:116/eb77ba0a:test-rados-api-mira033-22974-66/bam/head = 0

2015-09-07 22:57:00.391462 7fd194be7700 20 filestore(/var/lib/ceph/osd/ceph-3) fgetattrs 164 getting '_'
2015-09-07 22:57:00.391477 7fd194be7700 20 filestore(/var/lib/ceph/osd/ceph-3) fgetattrs 164 getting 'hinfo_key'
2015-09-07 22:57:00.391490 7fd194be7700 15 filestore(/var/lib/ceph/osd/ceph-3) setattrs 116.2s0_head/0:116/eb77ba0a:test-rados-api-mira033-22974-66/bam/head
2015-09-07 22:57:00.391572 7fd194be7700 10 filestore oid: 0:116/eb77ba0a:test-rados-api-mira033-22974-66/bam/head not skipping op, *spos 14530.1.7
2015-09-07 22:57:00.391576 7fd194be7700 10 filestore  > header.spos 0.0.0
2015-09-07 22:57:00.391609 7fd194be7700 10 filestore(/var/lib/ceph/osd/ceph-3) setattrs 116.2s0_head/0:116/eb77ba0a:test-rados-api-mira033-22974-66/bam/head = 0

Extra setattr:

2015-09-07 22:57:00.391648 7fd194be7700 20 filestore(/var/lib/ceph/osd/ceph-3) fgetattrs 164 getting '_'
2015-09-07 22:57:00.391664 7fd194be7700 20 filestore(/var/lib/ceph/osd/ceph-3) fgetattrs 164 getting 'hinfo_key'
2015-09-07 22:57:00.391676 7fd194be7700 15 filestore(/var/lib/ceph/osd/ceph-3) setattrs 116.2s0_head/0:116/eb77ba0a:test-rados-api-mira033-22974-66/bam/head
2015-09-07 22:57:00.391687 7fd194be7700 10 filestore oid: 0:116/eb77ba0a:test-rados-api-mira033-22974-66/bam/head not skipping op, *spos 14530.1.8
2015-09-07 22:57:00.391689 7fd194be7700 10 filestore  > header.spos 0.0.0
2015-09-07 22:57:00.391723 7fd194be7700 10 filestore(/var/lib/ceph/osd/ceph-3) setattrs 116.2s0_head/0:116/eb77ba0a:test-rados-api-mira033-22974-66/bam/head = 0
2015-09-07 22:57:00.391730 7fd194be7700 10 journal op_apply_finish 14530 open_ops 1 -> 0, max_applied_seq 14529 -> 14530
2015-09-07 22:57:00.391732 7fd194be7700 10 filestore(/var/lib/ceph/osd/ceph-3) _do_op 0x55d2a64d20f0 seq 14530 r = 0, finisher 0x55d2a5fd23a0 0
2015-09-07 22:57:00.391735 7fd194be7700 10 filestore(/var/lib/ceph/osd/ceph-3) _finish_op 0x55d2a64d20f0 seq 14530 osr(116.2s0 0x55d2a4ff3120)/0x55d2a4ff3120 lat 0.386367
Actions #9

Updated by David Zafman over 8 years ago

  • Status changed from New to 12
Actions #10

Updated by Sage Weil over 8 years ago

The 'Bad hash' message triggered twice on my wip-sage-testing run that included the wip-12983 branch:

/a/sage-2015-09-24_20:43:22-rados-wip-sage-testing---basic-multi/1067946
/a/sage-2015-09-24_20:43:22-rados-wip-sage-testing---basic-multi/1068002

Actions #11

Updated by Samuel Just over 8 years ago

Harumph, /a/sage-2015-09-24_20:43:22-rados-wip-sage-testing---basic-multi/1068002 has no recovery. This should be fairly straightforward, looking...

Actions #12

Updated by Samuel Just over 8 years ago

2015-09-24 22:48:12.489539 7ff05c984700 1 -- 10.214.132.108:6808/25735 <== client.4345 10.214.132.108:0/2866085996 11 ==== osd_op(client.4345.0:26 test-rados-api-mira034-28047-42/foo [read 0~128,read 0~128] 76.81ece4b2 ack+read+known_if_redirected e328) v6 ==== 258+0+0 (87309738 0 0) 0x7ff08a0bedc0 con 0x7ff08a3e3080

I think sage's failures are a bug in the first patch in david's branch, I'll make a new version of that commit.

Actions #13

Updated by Sage Weil over 8 years ago

/var/lib/teuthworker/archive/sage-2015-09-28_08:14:59-rados-wip-sage-testing---basic-multi/1073896

another build that includes whatever is referenced by the PR

Actions #14

Updated by Samuel Just over 8 years ago

  • Status changed from 12 to 7
Actions #15

Updated by Sage Weil over 8 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF