Bug #12967
closedosd/ECBackend.cc: 201: FAILED assert(res.errors.empty())
0%
Description
2015-09-04T20:59:51.434 INFO:tasks.ceph.osd.4.burnupi56.stderr:2015-09-04 20:59:51.434841 7feaf9dd4700 -1 log_channel(cluster) log [ERR] : handle_sub_read: Bad hash for 116/936bc794:test-rados-api-burnupi49-12862-66/foo/head digest 0x56d5a3a6 expected 0xcf9b62cb 2015-09-04T20:59:51.438 INFO:tasks.ceph.osd.2.burnupi49.stderr:osd/ECBackend.cc: In function 'virtual void OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)' thread 7f3f33263700 time 2015-09-04 20:59:51.435764 2015-09-04T20:59:51.438 INFO:tasks.ceph.osd.2.burnupi49.stderr:osd/ECBackend.cc: 201: FAILED assert(res.errors.empty()) 2015-09-04T20:59:51.438 INFO:tasks.ceph.osd.2.burnupi49.stderr: ceph version 9.0.3-1372-g210a4b7 (210a4b7dbb0f914d45e2c194a3ec683782f37c4f) 2015-09-04T20:59:51.439 INFO:tasks.ceph.osd.2.burnupi49.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f3f48f5821b] 2015-09-04T20:59:51.439 INFO:tasks.ceph.osd.2.burnupi49.stderr: 2: (OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x192) [0x7f3f48dc8e82] 2015-09-04T20:59:51.439 INFO:tasks.ceph.osd.2.burnupi49.stderr: 3: (GenContext<std::pair<RecoveryMessages*, ECBackend::read_result_t&>&>::complete(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x9) [0x7f3f48db6a59] 2015-09-04T20:59:51.439 INFO:tasks.ceph.osd.2.burnupi49.stderr: 4: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x63) [0x7f3f48dacff3] 2015-09-04T20:59:51.440 INFO:tasks.ceph.osd.2.burnupi49.stderr: 5: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*)+0xe71) [0x7f3f48daded1] 2015-09-04T20:59:51.440 INFO:tasks.ceph.osd.2.burnupi49.stderr: 6: (ECBackend::handle_message(std::shared_ptr<OpRequest>)+0x206) [0x7f3f48db4fd6] 2015-09-04T20:59:51.440 INFO:tasks.ceph.osd.2.burnupi49.stderr: 7: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x167) [0x7f3f48b58c27] 2015-09-04T20:59:51.440 INFO:tasks.ceph.osd.2.burnupi49.stderr: 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3bd) [0x7f3f489bb69d] 2015-09-04T20:59:51.441 INFO:tasks.ceph.osd.2.burnupi49.stderr: 9: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x5d) [0x7f3f489bb8bd] 2015-09-04T20:59:51.441 INFO:tasks.ceph.osd.2.burnupi49.stderr: 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x8c4) [0x7f3f489e0444] 2015-09-04T20:59:51.441 INFO:tasks.ceph.osd.2.burnupi49.stderr: 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f) [0x7f3f48f48c3f] 2015-09-04T20:59:51.441 INFO:tasks.ceph.osd.2.burnupi49.stderr: 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f3f48f4ab40] 2015-09-04T20:59:51.441 INFO:tasks.ceph.osd.2.burnupi49.stderr: 13: (()+0x8182) [0x7f3f47312182] 2015-09-04T20:59:51.442 INFO:tasks.ceph.osd.2.burnupi49.stderr: 14: (clone()+0x6d) [0x7f3f4565a38d]/a/sage-2015-09-04_11:01:27-rados-wip-sage-testing---basic-multi/1044108
Updated by Aaron T about 8 years ago
I'm encountering an issue that looks very similar. Four OSDs are crashing almost immediately after startup, both with 9.2.0 and 9.2.1. Logfiles attached with 'debug osd = 20/20'.
Updated by Aaron T about 8 years ago
Aaron Ten Clay wrote:
I'm encountering an issue that looks very similar. Four OSDs are crashing almost immediately after startup, both with 9.2.0 and 9.2.1. Logfiles attached with 'debug osd = 20/20'.
Attaching to this bug per ddn in #ceph-devel - file uploads seem to have failed. External links:
https://aarontc.com/ceph/12967/ceph-osd.2.log.bz2
https://aarontc.com/ceph/12967/ceph-osd.5.log.bz2
https://aarontc.com/ceph/12967/ceph-osd.22.log.bz2
https://aarontc.com/ceph/12967/ceph-osd.33.log.bz2
Updated by Shinobu Kinjo about 8 years ago
Hm, not sure what's best - the "duplicated" bug mentions error messages I'm not seeing (handle_sub_read: Bash hash)
s/Bash/Bad
Updated by Aaron T about 8 years ago
Shinobu Kinjo wrote:
Hm, not sure what's best - the "duplicated" bug mentions error messages I'm not seeing (handle_sub_read: Bash hash)
s/Bash/Bad
Just to confirm, there are no occurrences of "handle_sub_read: Bad hash for" in any of the log files. Nor any occurrences of the word "Bad".
Grepping for 'handle_sub_read' I do see some messages that might be helpful, i.e.
eph-osd.5.log.bz2:2016-02-28 00:50:44.610637 7f3fa3fef700 20 osd.5 pg_epoch: 61307 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61307 n=88761 ec=136 les/c/f 61307/49243/50004 61305/61305/61303) [22,2,33,5,9,10,36,35,17,23,7,11,14,13]/[22,2,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61305 pi=136-61304/13500 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read read request=6656 r=6656 len=6656 ceph-osd.5.log.bz2:2016-02-28 00:50:44.861750 7f3fa17ea700 -1 log_channel(cluster) log [ERR] : handle_sub_read: No hinfo for 2/ab64d095/100000578e9.00001458/head ceph-osd.5.log.bz2:2016-02-28 00:50:44.861757 7f3fa17ea700 5 osd.5 pg_epoch: 61307 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61307 n=88761 ec=136 les/c/f 61307/49243/50004 61305/61305/61303) [22,2,33,5,9,10,36,35,17,23,7,11,14,13]/[22,2,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61305 pi=136-61304/13500 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read: No hinfo for 2/ab64d095/100000578e9.00001458/head ceph-osd.5.log.bz2:2016-02-28 00:50:46.393875 7f3fa3fef700 20 osd.5 pg_epoch: 61307 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61307 n=88761 ec=136 les/c/f 61307/49243/50004 61305/61305/61303) [22,2,33,5,9,10,36,35,17,23,7,11,14,13]/[22,2,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61305 pi=136-61304/13500 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read read request=419744 r=419744 len=419744 ceph-osd.5.log.bz2:2016-02-28 00:50:46.393903 7f3fa3fef700 20 osd.5 pg_epoch: 61307 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61307 n=88761 ec=136 les/c/f 61307/49243/50004 61305/61305/61303) [22,2,33,5,9,10,36,35,17,23,7,11,14,13]/[22,2,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61305 pi=136-61304/13500 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read: Checking hash of 2/b9127495/1000006dd39.000000f1/head ceph-osd.5.log.bz2:2016-02-28 00:51:48.492283 7f3fa17ea700 20 osd.5 pg_epoch: 61309 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61309 n=88761 ec=136 les/c/f 61309/49243/50004 61308/61308/61308) [2147483647,2,33,5,9,10,36,35,17,23,7,11,14,13]/[2147483647,2,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61308 pi=136-61307/13501 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read read request=419744 r=419744 len=419744 ceph-osd.5.log.bz2:2016-02-28 00:51:48.492301 7f3fa17ea700 20 osd.5 pg_epoch: 61309 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61309 n=88761 ec=136 les/c/f 61309/49243/50004 61308/61308/61308) [2147483647,2,33,5,9,10,36,35,17,23,7,11,14,13]/[2147483647,2,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61308 pi=136-61307/13501 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read: Checking hash of 2/b9127495/1000006dd39.000000f1/head ceph-osd.5.log.bz2:2016-02-28 00:51:48.580278 7f3fa3fef700 -1 log_channel(cluster) log [ERR] : handle_sub_read: No hinfo for 2/ab64d095/100000578e9.00001458/head ceph-osd.5.log.bz2:2016-02-28 00:51:48.580281 7f3fa3fef700 5 osd.5 pg_epoch: 61309 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61309 n=88761 ec=136 les/c/f 61309/49243/50004 61308/61308/61308) [2147483647,2,33,5,9,10,36,35,17,23,7,11,14,13]/[2147483647,2,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61308 pi=136-61307/13501 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read: No hinfo for 2/ab64d095/100000578e9.00001458/head ceph-osd.5.log.bz2:2016-02-28 00:52:50.354423 7f3fa17ea700 -1 log_channel(cluster) log [ERR] : handle_sub_read: No hinfo for 2/ab64d095/100000578e9.00001458/head ceph-osd.5.log.bz2:2016-02-28 00:52:50.354432 7f3fa17ea700 5 osd.5 pg_epoch: 61311 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61311 n=88761 ec=136 les/c/f 61311/49243/50004 61310/61310/61310) [2147483647,2147483647,33,5,9,10,36,35,17,23,7,11,14,13]/[2147483647,2147483647,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61310 pi=136-61309/13502 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read: No hinfo for 2/ab64d095/100000578e9.00001458/head ceph-osd.5.log.bz2:2016-02-28 00:52:50.354443 7f3fa17ea700 10 osd.5 pg_epoch: 61311 pg[2.15s3( v 54651'119961 (51651'109862,54651'119961] local-les=61311 n=88761 ec=136 les/c/f 61311/49243/50004 61310/61310/61310) [2147483647,2147483647,33,5,9,10,36,35,17,23,7,11,14,13]/[2147483647,2147483647,33,5,34,10,16,11,17,23,7,2147483647,14,13] r=3 lpr=61310 pi=136-61309/13502 luod=0'0 crt=54325'119953 lcod 0'0 active+remapped+inconsistent NIBBLEWISE] handle_sub_read: fulfilling attr request on 2/ab64d095/100000578e9.00001458/head