Project

General

Profile

Actions

Bug #9978

closed

keyvaluestore: void ECBackend::handle_sub_read

Added by Dmitry Smirnov over 9 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On 0.87 "Giant" I'm repeatedly hit by the following assert, typically crashing 4 ODSs at once:

   -10> 2014-11-01 04:48:38.513707 7fbf06370700  1 -- 192.168.0.204:6802/29215 --> 192.168.0.201:6804/15191 -- MOSDPGPush(19.11 83067 [PushOp(b2a6ced1/rbd_data.8b861374b0dc
51.0000000000012cf4/head//19, version: 67156'1718071, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectReco
veryInfo(b2a6ced1/rbd_data.8b861374b0dc51.0000000000012cf4/head//19@67156'1718071, copy_subset: [], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_r
ecovered_to:0, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re
covered_to:, omap_complete:false))]) v2 -- ?+0 0x7fbf35a83600 con 0x7fbf4d647b20
    -9> 2014-11-01 04:48:38.513716 7fbf06370700  5 osd.2 pg_epoch: 83067 pg[19.11( v 78404'1880414 (78135'1877414,78404'1880414] local-les=83009 n=3248 ec=44362 les/c 83009
/82202 82987/82998/82998) [6,12,11]/[2,12] r=0 lpr=82998 pi=80090-82997/49 rops=1 bft=6,11 crt=0'0 lcod 0'0 mlcod 0'0 active+undersized+degraded+remapped+backfilling] backf
ill_pos is 292aced1/rbd_data.5014112ae8944a.00000000000086d2/head//19
    -8> 2014-11-01 04:48:38.513768 7fbf06370700  1 -- 192.168.0.204:6802/29215 --> 192.168.0.204:6807/29400 -- pg_backfill(progress 19.11 e 83067/83067 lb 8651ced1/rbd_data
.5014112ae8944a.000000000000907f/head//19) v3 -- ?+0 0x7fbf4ed68d00 con 0x7fbf3bf9d1e0
    -7> 2014-11-01 04:48:38.513801 7fbf06370700  1 -- 192.168.0.204:6802/29215 --> 192.168.0.201:6804/15191 -- pg_backfill(progress 19.11 e 83067/83067 lb 8651ced1/rbd_data
.5014112ae8944a.000000000000907f/head//19) v3 -- ?+0 0x7fbf38c78480 con 0x7fbf4d647b20
    -6> 2014-11-01 04:48:38.604249 7fbefcb1e700  1 -- 192.168.0.204:6802/29215 <== osd.7 192.168.0.2:6818/31087 1095 ==== MOSDECSubOpRead(18.5s4 83067 ECSubRead(tid=2, to_r
ead={20cf7305/10000252fc3.00000059/head//18=0,4194304}, attrs_to_read=20cf7305/10000252fc3.00000059/head//18)) v1 ==== 199+0+0 (328060131 0 0) 0x7fbf2fd9e6c0 con 0x7fbf4da2
8160
    -5> 2014-11-01 04:48:38.604279 7fbefcb1e700  5 -- op tracker -- seq: 26245, time: 2014-11-01 04:48:38.604149, event: header_read, op: MOSDECSubOpRead(18.5s4 83067 ECSub
Read(tid=2, to_read={20cf7305/10000252fc3.00000059/head//18=0,4194304}, attrs_to_read=20cf7305/10000252fc3.00000059/head//18))
    -4> 2014-11-01 04:48:38.604294 7fbefcb1e700  5 -- op tracker -- seq: 26245, time: 2014-11-01 04:48:38.604153, event: throttled, op: MOSDECSubOpRead(18.5s4 83067 ECSubRe
ad(tid=2, to_read={20cf7305/10000252fc3.00000059/head//18=0,4194304}, attrs_to_read=20cf7305/10000252fc3.00000059/head//18))
    -3> 2014-11-01 04:48:38.604302 7fbefcb1e700  5 -- op tracker -- seq: 26245, time: 2014-11-01 04:48:38.604244, event: all_read, op: MOSDECSubOpRead(18.5s4 83067 ECSubRea
d(tid=2, to_read={20cf7305/10000252fc3.00000059/head//18=0,4194304}, attrs_to_read=20cf7305/10000252fc3.00000059/head//18))
    -2> 2014-11-01 04:48:38.604308 7fbefcb1e700  5 -- op tracker -- seq: 26245, time: 0.000000, event: dispatched, op: MOSDECSubOpRead(18.5s4 83067 ECSubRead(tid=2, to_read
={20cf7305/10000252fc3.00000059/head//18=0,4194304}, attrs_to_read=20cf7305/10000252fc3.00000059/head//18))
    -1> 2014-11-01 04:48:38.604357 7fbf08b75700  5 -- op tracker -- seq: 26245, time: 2014-11-01 04:48:38.604356, event: reached_pg, op: MOSDECSubOpRead(18.5s4 83067 ECSubR
ead(tid=2, to_read={20cf7305/10000252fc3.00000059/head//18=0,4194304}, attrs_to_read=20cf7305/10000252fc3.00000059/head//18))
     0> 2014-11-01 04:48:38.605682 7fbf08b75700 -1 osd/ECBackend.cc: In function 'void ECBackend::handle_sub_read(pg_shard_t, ECSubRead&, ECSubReadReply*)' thread 7fbf08b75700 time 2014-11-01 04:48:38.604443
osd/ECBackend.cc: 876: FAILED assert(0)

 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x82) [0x7fbf2b7e1122]
 2: (ECBackend::handle_sub_read(pg_shard_t, ECSubRead&, ECSubReadReply*)+0x649) [0x7fbf2b62ece9]
 3: (ECBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x4b5) [0x7fbf2b637b55]
 4: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x36e) [0x7fbf2b42116e]
 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x408) [0x7fbf2b279ad8]
 6: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x350) [0x7fbf2b27a070]
 7: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x848) [0x7fbf2b7d06a8]
 8: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fbf2b7d2b00]
 9: (()+0x80a4) [0x7fbf29e4b0a4]
 10: (clone()+0x6d) [0x7fbf283a5cbd]

Files

ceph-osd.1.log.xz (125 KB) ceph-osd.1.log.xz Dmitry Smirnov, 11/05/2014 06:32 PM
ceph-osd.5.log.xz (145 KB) ceph-osd.5.log.xz Dmitry Smirnov, 11/06/2014 12:32 AM
ceph-osd.2.log.xz (265 KB) ceph-osd.2.log.xz Dmitry Smirnov, 11/18/2014 11:31 PM
ceph-osd.14.log.xz (259 KB) ceph-osd.14.log.xz Dmitry Smirnov, 11/18/2014 11:31 PM

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #9727: 0.86 EC+ KV OSDs crashingDuplicateHaomai Wang10/10/2014

Actions
Actions

Also available in: Atom PDF