Bug #23031
Updated by David Zafman over 5 years ago
Using vstart to start 3 OSDs with -o filestore debug inject read err=1
Manually injectdataerr on all replicas of object "foo."
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
1.0 1 1 1 0 1 72750205 18 18 active+recovery_unfound+degraded+inconsistent 2018-02-16 18:01:18.212552 12'18 32:289 [1,0] 1 [1,0] 1 12'18 2018-02-16 17:44:18.140880 12'18 2018-02-16 17:44:18.140880 0
Marked all OSDS out and back in
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
1.0 1 1 2 0 1 72750205 18 18 active+recovery_unfound+degraded+remapped 2018-02-16 18:03:44.924607 12'18 41:301 [1] 1 [1,2] 1 12'18 2018-02-16 17:44:18.140880 12'18 2018-02-16 17:44:18.140880 0
Killed all OSDs and then restarted them in order 0, 1 and 2. osd.0 crashed as below.
# killall ceph-osd
# for i in $(seq 0 2); do bin/ceph-osd -i $i -c ceph.conf; done
<pre>
'''
-2> 2018-02-16 18:11:49.233 7f2643a9e700 10 osd.0 pg_epoch: 44 pg[1.0( v 12'18 lc 12'17 (0'0,12'18] local-lis/les=43/44 n=1 ec=10/10 lis/c 43/31 les/c/f 44/32/0 43/43/43) [1,0] r=1 lpr=43 pi=[31,43)/3 luod=0'0 crt=12'18 lcod 0'0 active m=1] _handle_message: 0x55644b334bc0
-1> 2018-02-16 18:11:49.233 7f2643a9e700 10 osd.0 pg_epoch: 44 pg[1.0( v 12'18 lc 12'17 (0'0,12'18] local-lis/les=43/44 n=1 ec=10/10 lis/c 43/31 les/c/f 44/32/0 43/43/43) [1,0] r=1 lpr=43 pi=[31,43)/3 luod=0'0 crt=12'18 lcod 0'0 active m=1] do_repop 1:602f83fe:::foo:head v 44'19 (transaction) 143
0> 2018-02-16 18:11:49.233 7f2643a9e700 -1 /home/dzafman/ceph/src/osd/ReplicatedBackend.cc: In function 'void ReplicatedBackend::do_repop(OpRequestRef)' thread 7f2643a9e700 time 2018-02-16 18:11:49.235446
/home/dzafman/ceph/src/osd/ReplicatedBackend.cc: 1085: FAILED assert(!parent->get_log().get_missing().is_missing(soid))
ceph version 13.0.1-2022-gcb1cf5e (cb1cf5ef1b559f2bffe54abdf6b6cae453db6ebf) mimic (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xf5) [0x7f26633c3615]
2: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0x11c) [0x5564494375ac]
3: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x237) [0x55644943a347]
4: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0x55644934bee7]
5: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x675) [0x5564492f9605]
6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x341) [0x556449155d21]
7: (PGOpItem::run(OSD*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x5564493cdb52]
8: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xf24) [0x55644915df94]
9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4f2) [0x7f26633c90e2]
10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f26633cb440]
11: (()+0x76ba) [0x7f2661edb6ba]
12: (clone()+0x6d) [0x7f266116582d]
</pre>
'''
Manually injectdataerr on all replicas of object "foo."
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
1.0 1 1 1 0 1 72750205 18 18 active+recovery_unfound+degraded+inconsistent 2018-02-16 18:01:18.212552 12'18 32:289 [1,0] 1 [1,0] 1 12'18 2018-02-16 17:44:18.140880 12'18 2018-02-16 17:44:18.140880 0
Marked all OSDS out and back in
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
1.0 1 1 2 0 1 72750205 18 18 active+recovery_unfound+degraded+remapped 2018-02-16 18:03:44.924607 12'18 41:301 [1] 1 [1,2] 1 12'18 2018-02-16 17:44:18.140880 12'18 2018-02-16 17:44:18.140880 0
Killed all OSDs and then restarted them in order 0, 1 and 2. osd.0 crashed as below.
# killall ceph-osd
# for i in $(seq 0 2); do bin/ceph-osd -i $i -c ceph.conf; done
<pre>
'''
-2> 2018-02-16 18:11:49.233 7f2643a9e700 10 osd.0 pg_epoch: 44 pg[1.0( v 12'18 lc 12'17 (0'0,12'18] local-lis/les=43/44 n=1 ec=10/10 lis/c 43/31 les/c/f 44/32/0 43/43/43) [1,0] r=1 lpr=43 pi=[31,43)/3 luod=0'0 crt=12'18 lcod 0'0 active m=1] _handle_message: 0x55644b334bc0
-1> 2018-02-16 18:11:49.233 7f2643a9e700 10 osd.0 pg_epoch: 44 pg[1.0( v 12'18 lc 12'17 (0'0,12'18] local-lis/les=43/44 n=1 ec=10/10 lis/c 43/31 les/c/f 44/32/0 43/43/43) [1,0] r=1 lpr=43 pi=[31,43)/3 luod=0'0 crt=12'18 lcod 0'0 active m=1] do_repop 1:602f83fe:::foo:head v 44'19 (transaction) 143
0> 2018-02-16 18:11:49.233 7f2643a9e700 -1 /home/dzafman/ceph/src/osd/ReplicatedBackend.cc: In function 'void ReplicatedBackend::do_repop(OpRequestRef)' thread 7f2643a9e700 time 2018-02-16 18:11:49.235446
/home/dzafman/ceph/src/osd/ReplicatedBackend.cc: 1085: FAILED assert(!parent->get_log().get_missing().is_missing(soid))
ceph version 13.0.1-2022-gcb1cf5e (cb1cf5ef1b559f2bffe54abdf6b6cae453db6ebf) mimic (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xf5) [0x7f26633c3615]
2: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0x11c) [0x5564494375ac]
3: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x237) [0x55644943a347]
4: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0x55644934bee7]
5: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x675) [0x5564492f9605]
6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x341) [0x556449155d21]
7: (PGOpItem::run(OSD*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x5564493cdb52]
8: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xf24) [0x55644915df94]
9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4f2) [0x7f26633c90e2]
10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f26633cb440]
11: (()+0x76ba) [0x7f2661edb6ba]
12: (clone()+0x6d) [0x7f266116582d]
</pre>
'''