Project

General

Profile

Actions

Bug #13937

closed

osd/ECBackend.cc: 201: FAILED assert(res.errors.empty())

Added by Markus Blank-Burian over 8 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I am getting the following error with ceph v9.2.0 on two different OSDs. The error occurs shortly after startup.

 osd/ECBackend.cc: 201: FAILED assert(res.errors.empty())

 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x55b5305f5e73]
 2: (OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0xe3) [0x55b5304987af]
 3: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x59) [0x55b530484ad1]
 4: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*)+0x1008) [0x55b530485ea6]
 5: (ECBackend::handle_message(std::shared_ptr<OpRequest>)+0x216) [0x55b530489c66]
 6: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x1c0) [0x55b53029ea4e]
 7: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x40c) [0x55b5300ffc1e]
 8: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x52) [0x55b5300ffe52]
 9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x61f) [0x55b5301172c7]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x7ae) [0x55b5305e3b16]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55b5305e7060]
 12: (Thread::entry_wrapper()+0x64) [0x55b5305d7c2c]
 13: (()+0x7176) [0x7f4cc5cdd176]
 14: (clone()+0x6d) [0x7f4cc3fb149d]

Configuration is rather simple:
3x mirror ("data")
3x mirror ("datahot", cache-tier for "dataec")
8+3 jerasure ("dataec")

pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 72574 flags hashpspool crash_replay_interval 45 min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
pool 1 'metadata' replicated size 4 min_size 2 crush_ruleset 1 object_hash rjenkins pg_num 256 pgp_num 256 last_change 72577 flags hashpspool min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 31 flags hashpspool min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
pool 5 'datahot' replicated size 3 min_size 2 crush_ruleset 3 object_hash rjenkins pg_num 128 pgp_num 128 last_change 185334 flags hashpspool,incomplete_clones tier_of 6 cache_mode writeback target_bytes 1099511627776 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 stripe_width 0
pool 6 'dataec' erasure size 11 min_size 8 crush_ruleset 4 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 185334 lfor 185334 flags hashpspool tiers 5 read_tier 5 write_tier 5 stripe_width 131072

data and datahot are used as data-pools for cephfs. All pools share the same OSDs.


Files

logs.zip.001 (1 MB) logs.zip.001 Markus Blank-Burian, 12/01/2015 04:00 PM
logs.zip.002 (1 MB) logs.zip.002 Markus Blank-Burian, 12/01/2015 04:00 PM
logs.zip.003 (1 MB) logs.zip.003 Markus Blank-Burian, 12/01/2015 04:00 PM
logs.zip.004 (1 MB) logs.zip.004 Markus Blank-Burian, 12/01/2015 04:00 PM
logs.zip.005 (503 KB) logs.zip.005 Markus Blank-Burian, 12/01/2015 04:00 PM

Related issues 3 (0 open3 closed)

Related to Ceph - Feature #14513: Test and improve ec handling of reads on objects with shards unexpectedly missing on a replicaResolvedDavid Zafman01/26/2016

Actions
Has duplicate RADOS - Bug #13493: osd: for ec, cascading crash during recovery if one shard is corruptedDuplicateDavid Zafman10/15/2015

Actions
Copied to Ceph - Backport #17970: jewel: osd/ECBackend.cc: 201: FAILED assert(res.errors.empty())ResolvedDavid ZafmanActions
Actions

Also available in: Atom PDF