Project

General

Profile

Actions

Bug #12967

closed

osd/ECBackend.cc: 201: FAILED assert(res.errors.empty())

Added by Sage Weil over 8 years ago. Updated almost 8 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2015-09-04T20:59:51.434 INFO:tasks.ceph.osd.4.burnupi56.stderr:2015-09-04 20:59:51.434841 7feaf9dd4700 -1 log_channel(cluster) log [ERR] : handle_sub_read: Bad hash for 116/936bc794:test-rados-api-burnupi49-12862-66/foo/head digest 0x56d5a3a6 expected 0xcf9b62cb
2015-09-04T20:59:51.438 INFO:tasks.ceph.osd.2.burnupi49.stderr:osd/ECBackend.cc: In function 'virtual void OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)' thread 7f3f33263700 time 2015-09-04 20:59:51.435764
2015-09-04T20:59:51.438 INFO:tasks.ceph.osd.2.burnupi49.stderr:osd/ECBackend.cc: 201: FAILED assert(res.errors.empty())
2015-09-04T20:59:51.438 INFO:tasks.ceph.osd.2.burnupi49.stderr: ceph version 9.0.3-1372-g210a4b7 (210a4b7dbb0f914d45e2c194a3ec683782f37c4f)
2015-09-04T20:59:51.439 INFO:tasks.ceph.osd.2.burnupi49.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f3f48f5821b]
2015-09-04T20:59:51.439 INFO:tasks.ceph.osd.2.burnupi49.stderr: 2: (OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x192) [0x7f3f48dc8e82]
2015-09-04T20:59:51.439 INFO:tasks.ceph.osd.2.burnupi49.stderr: 3: (GenContext<std::pair<RecoveryMessages*, ECBackend::read_result_t&>&>::complete(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x9) [0x7f3f48db6a59]
2015-09-04T20:59:51.439 INFO:tasks.ceph.osd.2.burnupi49.stderr: 4: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x63) [0x7f3f48dacff3]
2015-09-04T20:59:51.440 INFO:tasks.ceph.osd.2.burnupi49.stderr: 5: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*)+0xe71) [0x7f3f48daded1]
2015-09-04T20:59:51.440 INFO:tasks.ceph.osd.2.burnupi49.stderr: 6: (ECBackend::handle_message(std::shared_ptr<OpRequest>)+0x206) [0x7f3f48db4fd6]
2015-09-04T20:59:51.440 INFO:tasks.ceph.osd.2.burnupi49.stderr: 7: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x167) [0x7f3f48b58c27]
2015-09-04T20:59:51.440 INFO:tasks.ceph.osd.2.burnupi49.stderr: 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3bd) [0x7f3f489bb69d]
2015-09-04T20:59:51.441 INFO:tasks.ceph.osd.2.burnupi49.stderr: 9: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x5d) [0x7f3f489bb8bd]
2015-09-04T20:59:51.441 INFO:tasks.ceph.osd.2.burnupi49.stderr: 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x8c4) [0x7f3f489e0444]
2015-09-04T20:59:51.441 INFO:tasks.ceph.osd.2.burnupi49.stderr: 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f) [0x7f3f48f48c3f]
2015-09-04T20:59:51.441 INFO:tasks.ceph.osd.2.burnupi49.stderr: 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f3f48f4ab40]
2015-09-04T20:59:51.441 INFO:tasks.ceph.osd.2.burnupi49.stderr: 13: (()+0x8182) [0x7f3f47312182]
2015-09-04T20:59:51.442 INFO:tasks.ceph.osd.2.burnupi49.stderr: 14: (clone()+0x6d) [0x7f3f4565a38d]

/a/sage-2015-09-04_11:01:27-rados-wip-sage-testing---basic-multi/1044108

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #12983: handle_sub_read: Bad hash for 116/b1271c9d:test-rados-api-burnupi34-7183-66/foo/headResolvedDavid Zafman09/07/2015

Actions
Actions #1

Updated by Sage Weil over 8 years ago

  • Status changed from New to Duplicate
Actions #2

Updated by Aaron T about 8 years ago

I'm encountering an issue that looks very similar. Four OSDs are crashing almost immediately after startup, both with 9.2.0 and 9.2.1. Logfiles attached with 'debug osd = 20/20'.

Actions #3

Updated by Aaron T about 8 years ago

Aaron Ten Clay wrote:

I'm encountering an issue that looks very similar. Four OSDs are crashing almost immediately after startup, both with 9.2.0 and 9.2.1. Logfiles attached with 'debug osd = 20/20'.

Attaching to this bug per ddn in #ceph-devel - file uploads seem to have failed. External links:
https://aarontc.com/ceph/12967/ceph-osd.2.log.bz2
https://aarontc.com/ceph/12967/ceph-osd.5.log.bz2
https://aarontc.com/ceph/12967/ceph-osd.22.log.bz2
https://aarontc.com/ceph/12967/ceph-osd.33.log.bz2

Actions #4

Updated by Shinobu Kinjo about 8 years ago

Hm, not sure what's best - the "duplicated" bug mentions error messages I'm not seeing (handle_sub_read: Bash hash)

s/Bash/Bad

Actions #5

Updated by Aaron T about 8 years ago

Shinobu Kinjo wrote:

Hm, not sure what's best - the "duplicated" bug mentions error messages I'm not seeing (handle_sub_read: Bash hash)

s/Bash/Bad

Just to confirm, there are no occurrences of "handle_sub_read: Bad hash for" in any of the log files. Nor any occurrences of the word "Bad".

Grepping for 'handle_sub_read' I do see some messages that might be helpful, i.e.

eph-osd.5.log.bz2:2016-02-28 00:50:44.610637 7f3fa3fef700 20 osd.5 pg_epoch: 61307 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61307 n=88761 ec=136 les/c/f 61307/49243/50004 61305/61305/61303) [22,2,33,5,9,10,36,35,17,23,7,11,14,13]/[22,2,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61305 pi=136-61304/13500 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read read request=6656 r=6656 len=6656
ceph-osd.5.log.bz2:2016-02-28 00:50:44.861750 7f3fa17ea700 -1 log_channel(cluster) log [ERR] : handle_sub_read: No hinfo for 2/ab64d095/100000578e9.00001458/head
ceph-osd.5.log.bz2:2016-02-28 00:50:44.861757 7f3fa17ea700  5 osd.5 pg_epoch: 61307 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61307 n=88761 ec=136 les/c/f 61307/49243/50004 61305/61305/61303) [22,2,33,5,9,10,36,35,17,23,7,11,14,13]/[22,2,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61305 pi=136-61304/13500 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read: No hinfo for 2/ab64d095/100000578e9.00001458/head
ceph-osd.5.log.bz2:2016-02-28 00:50:46.393875 7f3fa3fef700 20 osd.5 pg_epoch: 61307 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61307 n=88761 ec=136 les/c/f 61307/49243/50004 61305/61305/61303) [22,2,33,5,9,10,36,35,17,23,7,11,14,13]/[22,2,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61305 pi=136-61304/13500 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read read request=419744 r=419744 len=419744
ceph-osd.5.log.bz2:2016-02-28 00:50:46.393903 7f3fa3fef700 20 osd.5 pg_epoch: 61307 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61307 n=88761 ec=136 les/c/f 61307/49243/50004 61305/61305/61303) [22,2,33,5,9,10,36,35,17,23,7,11,14,13]/[22,2,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61305 pi=136-61304/13500 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read: Checking hash of 2/b9127495/1000006dd39.000000f1/head
ceph-osd.5.log.bz2:2016-02-28 00:51:48.492283 7f3fa17ea700 20 osd.5 pg_epoch: 61309 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61309 n=88761 ec=136 les/c/f 61309/49243/50004 61308/61308/61308) [2147483647,2,33,5,9,10,36,35,17,23,7,11,14,13]/[2147483647,2,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61308 pi=136-61307/13501 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read read request=419744 r=419744 len=419744
ceph-osd.5.log.bz2:2016-02-28 00:51:48.492301 7f3fa17ea700 20 osd.5 pg_epoch: 61309 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61309 n=88761 ec=136 les/c/f 61309/49243/50004 61308/61308/61308) [2147483647,2,33,5,9,10,36,35,17,23,7,11,14,13]/[2147483647,2,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61308 pi=136-61307/13501 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read: Checking hash of 2/b9127495/1000006dd39.000000f1/head
ceph-osd.5.log.bz2:2016-02-28 00:51:48.580278 7f3fa3fef700 -1 log_channel(cluster) log [ERR] : handle_sub_read: No hinfo for 2/ab64d095/100000578e9.00001458/head
ceph-osd.5.log.bz2:2016-02-28 00:51:48.580281 7f3fa3fef700  5 osd.5 pg_epoch: 61309 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61309 n=88761 ec=136 les/c/f 61309/49243/50004 61308/61308/61308) [2147483647,2,33,5,9,10,36,35,17,23,7,11,14,13]/[2147483647,2,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61308 pi=136-61307/13501 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read: No hinfo for 2/ab64d095/100000578e9.00001458/head
ceph-osd.5.log.bz2:2016-02-28 00:52:50.354423 7f3fa17ea700 -1 log_channel(cluster) log [ERR] : handle_sub_read: No hinfo for 2/ab64d095/100000578e9.00001458/head
ceph-osd.5.log.bz2:2016-02-28 00:52:50.354432 7f3fa17ea700  5 osd.5 pg_epoch: 61311 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61311 n=88761 ec=136 les/c/f 61311/49243/50004 61310/61310/61310) [2147483647,2147483647,33,5,9,10,36,35,17,23,7,11,14,13]/[2147483647,2147483647,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61310 pi=136-61309/13502 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read: No hinfo for 2/ab64d095/100000578e9.00001458/head
ceph-osd.5.log.bz2:2016-02-28 00:52:50.354443 7f3fa17ea700 10 osd.5 pg_epoch: 61311 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61311 n=88761 ec=136 les/c/f 61311/49243/50004 61310/61310/61310) [2147483647,2147483647,33,5,9,10,36,35,17,23,7,11,14,13]/[2147483647,2147483647,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61310 pi=136-61309/13502 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read: fulfilling attr request on 2/ab64d095/100000578e9.00001458/head
Actions #6

Updated by Mustafa Muhammad almost 8 years ago

I am having this issue on 10.2.0

Actions

Also available in: Atom PDF