Project

General

Profile

Actions

Bug #9150

closed

osd/ECBackend.cc: 529: FAILED assert(pop.data.length() == sinfo.aligned_logical_offset_to_chunk_offset( after_progress.data_recovered_to - op.recovery_progress.data_recovered_to))

Added by Sage Weil over 9 years ago. Updated about 9 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

   -26> 2014-08-14 21:54:50.429079 7f232f5f6700  5 -- op tracker -- , seq: 35842, time: 2014-08-14 21:54:50.429075, event: reached_pg, op: MOSDECSubOpReadReply(1.ds0 616 ECSubReadReply(tid=2536, attrs_read=0))
   -25> 2014-08-14 21:54:50.429087 7f232f5f6700 10 osd.3 pg_epoch: 616 pg[1.ds0( v 613'272 lc 613'271 (0'0,613'272] local-les=616 n=2 ec=7 les/c 616/593 614/615/615) [2147483647,4,5]/[3,4,5] r=0 lpr=615 pi=570-614/4 rops=1 crt=613'272 lcod 552'269 mlcod 0'0 active+recovering+remapped m=1] handle_message: MOSDECSubO
pReadReply(1.ds0 616 ECSubReadReply(tid=2536, attrs_read=0)) v1
   -24> 2014-08-14 21:54:50.429108 7f232f5f6700 10 osd.3 pg_epoch: 616 pg[1.ds0( v 613'272 lc 613'271 (0'0,613'272] local-les=616 n=2 ec=7 les/c 616/593 614/615/615) [2147483647,4,5]/[3,4,5] r=0 lpr=615 pi=570-614/4 rops=1 crt=613'272 lcod 552'269 mlcod 0'0 active+recovering+remapped m=1] handle_sub_read_reply: rep
ly ECSubReadReply(tid=2536, attrs_read=0)
   -23> 2014-08-14 21:54:50.429128 7f232f5f6700 10 osd.3 pg_epoch: 616 pg[1.ds0( v 613'272 lc 613'271 (0'0,613'272] local-les=616 n=2 ec=7 les/c 616/593 614/615/615) [2147483647,4,5]/[3,4,5] r=0 lpr=615 pi=570-614/4 rops=1 crt=613'272 lcod 552'269 mlcod 0'0 active+recovering+remapped m=1] handle_sub_read_reply read
op complete: ReadOp(tid=2536, to_read={be132d0d/plana5016685-48/head//1=read_request_t(to_read=[1052672,1052672], need=4(1),5(2), want_attrs=0)}, complete={be132d0d/plana5016685-48/head//1=read_result_t(r=0, errors={}, noattrs, returned=(1052672, 1052672, [4(1),526336, 5(2),526336])}, priority=127, obj_to_source={b
e132d0d/plana5016685-48/head//1=4(1),5(2)}, source_to_obj={4(1)=be132d0d/plana5016685-48/head//1,5(2)=be132d0d/plana5016685-48/head//1}, in_progress=)
   -22> 2014-08-14 21:54:50.429157 7f232f5f6700 10 osd.3 pg_epoch: 616 pg[1.ds0( v 613'272 lc 613'271 (0'0,613'272] local-les=616 n=2 ec=7 les/c 616/593 614/615/615) [2147483647,4,5]/[3,4,5] r=0 lpr=615 pi=570-614/4 rops=1 crt=613'272 lcod 552'269 mlcod 0'0 active+recovering+remapped m=1] handle_recovery_read_compl
ete: returned be132d0d/plana5016685-48/head//1 (1052672, 1052672, [4(1),526336, 5(2),526336])
   -21> 2014-08-14 21:54:50.429178 7f232f5f6700 10 osd.3 pg_epoch: 616 pg[1.ds0( v 613'272 lc 613'271 (0'0,613'272] local-les=616 n=2 ec=7 les/c 616/593 614/615/615) [2147483647,4,5]/[3,4,5] r=0 lpr=615 pi=570-614/4 rops=1 crt=613'272 lcod 552'269 mlcod 0'0 active+recovering+remapped m=1] handle_recovery_read_compl
ete: [1,526336, 2,526336]
   -20> 2014-08-14 21:54:50.430405 7f232f5f6700 10 osd.3 pg_epoch: 616 pg[1.ds0( v 613'272 lc 613'271 (0'0,613'272] local-les=616 n=2 ec=7 les/c 616/593 614/615/615) [2147483647,4,5]/[3,4,5] r=0 lpr=615 pi=570-614/4 rops=1 crt=613'272 lcod 552'269 mlcod 0'0 active+recovering+remapped m=1] continue_recovery_op: cont
inuing RecoveryOp(hoid=be132d0d/plana5016685-48/head//1 v=613'272 missing_on=3(0) missing_on_shards=0 recovery_info=ObjectRecoveryInfo(be132d0d/plana5016685-48/head//1@613'272, copy_subset: [], clone_subset: {}) recovery_progress=ObjectRecoveryProgress(!first, data_recovered_to:1052672, data_complete:false, omap_re
covered_to:, omap_complete:true) pending_read=0 obc refcount=1 state=READING waiting_on_pushes= extent_requested=1052672,1052672
   -19> 2014-08-14 21:54:50.430466 7f232f5f6700 10 osd.3 pg_epoch: 616 pg[1.ds0( v 613'272 lc 613'271 (0'0,613'272] local-les=616 n=2 ec=7 les/c 616/593 614/615/615) [2147483647,4,5]/[3,4,5] r=0 lpr=615 pi=570-614/4 rops=1 crt=613'272 lcod 552'269 mlcod 0'0 active+recovering+remapped m=1] continue_recovery_op: befo
re_progress=ObjectRecoveryProgress(!first, data_recovered_to:1052672, data_complete:false, omap_recovered_to:, omap_complete:true), after_progress=ObjectRecoveryProgress(!first, data_recovered_to:1155072, data_complete:true, omap_recovered_to:, omap_complete:true), pop.data.length()=526336, size=1155072
     0> 2014-08-14 21:54:50.435361 7f232f5f6700 -1 osd/ECBackend.cc: In function 'void ECBackend::continue_recovery_op(ECBackend::RecoveryOp&, RecoveryMessages*)' thread 7f232f5f6700 time 2014-08-14 21:54:50.430493
osd/ECBackend.cc: 529: FAILED assert(pop.data.length() == sinfo.aligned_logical_offset_to_chunk_offset( after_progress.data_recovered_to - op.recovery_progress.data_recovered_to))

 ceph version 0.83-450-g6a55543 (6a555434ee3edaf742ee7e5910bcba8dd0de46dd)
 1: (ECBackend::continue_recovery_op(ECBackend::RecoveryOp&, RecoveryMessages*)+0x2593) [0x938463]
 2: (ECBackend::handle_recovery_read_complete(hobject_t const&, boost::tuples::tuple<unsigned long, unsigned long, std::map<pg_shard_t, ceph::buffer::list, std::less<pg_shard_t>, std::allocator<std::pair<pg_shard_t const, ceph::buffer::list> > >, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::nu
ll_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type>&, boost::optional<std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > > >, RecoveryMessages*)+0xa5b) [0x938fab]
 3: (OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x86) [0x94d2f6]
 4: (GenContext<std::pair<RecoveryMessages*, ECBackend::read_result_t&>&>::complete(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x9) [0x940609]
 5: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x5b) [0x92f62b]
 6: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*)+0xcd5) [0x93a0d5]
 7: (ECBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x516) [0x93dde6]
 8: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x15a) [0x7bd8ba]
 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1a2) [0x6470a2]
 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x6c1) [0x647b71]
 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x6fc) [0xa7774c]
 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xa79030]
 13: (()+0x7e9a) [0x7f234d052e9a]
 14: (clone()+0x6d) [0x7f234ba033fd]

/a/sage-2014-08-14_15:41:43-rados-next-testing-basic-multi/424657

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #9135: ENOENT on collection_addCan't reproduce08/15/2014

Actions
Actions #1

Updated by Sage Weil over 9 years ago

  • Priority changed from Urgent to High
Actions #2

Updated by Sage Weil over 9 years ago

suspect this and #9135 to be a ghost due to misbehaving underlying fs

Actions #3

Updated by Samuel Just about 9 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF