Actions
Bug #23857
closedflush (manifest) vs async recovery causes out of order op
Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
0> 2018-04-24 21:30:43.570 7fc59bce1700 -1 /build/ceph-13.0.2-1742-g8c053ae/src/osd/PrimaryLogPG.cc: In function 'void PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)' thread 7fc59bce1700 time 2018-04-24 21:30:43.575095 /build/ceph-13.0.2-1742-g8c053ae/src/osd/PrimaryLogPG.cc: 3942: FAILED assert(0 == "out of order op") ceph version 13.0.2-1742-g8c053ae (8c053ae078dc13c766dde187006d7d6161d4bca9) mimic (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7fc5c4b72252] 2: (()+0x2e4417) [0x7fc5c4b72417] 3: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x1663) [0x564d0b1b22a3] 4: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x3615) [0x564d0b1b59b5] 5: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xcc3) [0x564d0b1b9cd3] 6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1b8) [0x564d0b010698] 7: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x564d0b28cdc2] 8: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9ec) [0x564d0b0301bc] 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x46e) [0x7fc5c4b7718e] 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fc5c4b79210] 11: (()+0x76ba) [0x7fc5c361b6ba] 12: (clone()+0x6d) [0x7fc5c2e4441d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
/a/sage-2018-04-24_16:58:54-rados-wip-sage3-testing-2018-04-24-1010-distro-basic-smithi/2435204
first, client.4334.0:11569 triggers a flush, and the op goes on the flushing waitlist
-1809> 2018-04-24 21:30:43.522 7fc59fce9700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 luod=106'482 rops=1 bft=0 crt=106'483 lcod 106'481 mlcod 106'481 active+remapped+backfilling mbc={}] do_op: op osd_op(client.4334.0:11569 3.36 3:6d6399eb:::smithi03814407-98:head [write 1947263~638108] snapc 0=[] ondisk+write+known_if_redirected e106) v8 -1808> 2018-04-24 21:30:43.522 7fc59fce9700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 luod=106'482 rops=1 bft=0 crt=106'483 lcod 106'481 mlcod 106'481 active+remapped+backfilling mbc={}] op_has_sufficient_caps session=0x564d10d12d00 pool=3 (unique_pool_0 ) owner=0 pool_app_metadata={rados={}} need_read_cap=0 need_write_cap=1 classes=[] -> yes -1807> 2018-04-24 21:30:43.522 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 luod=106'482 rops=1 bft=0 crt=106'483 lcod 106'481 mlcod 106'481 active+remapped+backfilling mbc={}] do_op osd_op(client.4334.0:11569 3.36 3:6d6399eb:::smithi03814407-98:head [write 1947263~638108] snapc 0=[] ondisk+write+known_if_redirected e106) v8 m ay_write -> write-ordered flags ondisk+write+known_if_redirected -1806> 2018-04-24 21:30:43.522 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 luod=106'482 rops=1 bft=0 crt=106'483 lcod 106'481 mlcod 106'481 active+remapped+backfilling mbc={}] get_object_context: found obc in cache: 0x564d10a46c00 -1805> 2018-04-24 21:30:43.522 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 luod=106'482 rops=1 bft=0 crt=106'483 lcod 106'481 mlcod 106'481 active+remapped+backfilling mbc={}] get_object_context: 0x564d10a46c00 3:6d6399eb:::smithi03814407-98:head rwstate(write n=1 w=0) oi: 3:6d6399eb:::smithi03814407-98:head(106'483 client.43 34.0:11568 dirty|manifest s 3764900 uv 483 alloc_hint [0 0 0] manifest(chunked {468480=(len: 1090048 oid: 2:a3e960ef:::smithi03814407-98 low_tier:0 offset: 468480 flags: dirty)})) exists: 1 ssc: 0x564d128bed60 snapset: 0=[]:{} -1804> 2018-04-24 21:30:43.522 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 luod=106'482 rops=1 bft=0 crt=106'483 lcod 106'481 mlcod 106'481 active+remapped+backfilling mbc={}] find_object_context 3:6d6399eb:::smithi03814407-98:head @head oi=3:6d6399eb:::smithi03814407-98:head(106'483 client.4334.0:11568 dirty|manifest s 37649 00 uv 483 alloc_hint [0 0 0] manifest(chunked {468480=(len: 1090048 oid: 2:a3e960ef:::smithi03814407-98 low_tier:0 offset: 468480 flags: dirty)})) -1803> 2018-04-24 21:30:43.522 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 luod=106'482 rops=1 bft=0 crt=106'483 lcod 106'481 mlcod 106'481 active+remapped+backfilling mbc={}] start_flush 3:6d6399eb:::smithi03814407-98:head v106'483 uv483 blocking
a bit later, that object (3:6d6399eb:::smithi03814407-98:head) starts async-recovering and becomes degraded:
-1624> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] recover_replicas(1) -1623> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] recover_primary recovering 0 in pg, missing missing(0 may_include_deletes = 1) -1622> 2018-04-24 21:30:43.530 7fc59fce9700 25 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] recover_primary {} -1621> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] recover_backfill (1) bft=0 last_backfill_started 3:6d182d29:::smithi03814407-227 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head -1620> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] peer osd.0 info 3.36( v 106'483 (27'127,106'483] lb MIN (bitwise) local-lis/les=105/106 n=0 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) interval MAX-MAX 0 objects -1619> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] update_range: bi is old, (106'482) can be updated with log to projected_last_update 106'483 -1618> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] scanning pg log first -1617> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] operator(): updating from version 106'483 -1616> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] operator(): 3:6d6399eb:::smithi03814407-98:head updated to version 106'483 -1615> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] scanning projected log -1614> 2018-04-24 21:30:43.530 7fc59fce9700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] my backfill interval BackfillInfo(3:6d6399eb:::smithi03814407-98:head-MAX 6 objects {3:6d6399eb:::smithi03814407-98:head=106'483,3:6df132a0:::smithi03814407-157 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head=106'482,3:6e1b86ab:::smithi03814407-52:head=97'472,3:6ebc0f04:::smithi03814407-150:head=28'282,3:6f8f49e3:::smithi03814407-242:head=101'479,3:6ff528d8:::smithi03814407-30:head=95'465}) -1613> 2018-04-24 21:30:43.530 7fc59fce9700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] peer shard 0 backfill BackfillInfo(MAX-MAX 0 objects) -1612> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] get_object_context: found obc in cache: 0x564d10a46c00 -1611> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] get_object_context: 0x564d10a46c00 3:6d6399eb:::smithi03814407-98:head rwstate(none n=0 w=0) oi: 3:6d6399eb:::smithi03814407-98:head(106'483 client.4334.0:11568 dirty|manifest s 3764900 uv 483 alloc_hint [0 0 0] manifest(chunked {468480=(len: 1090048 oid: 2:a3e960ef:::smithi03814407-98 low_tier:0 offset: 468480 flags: dirty)})) exists: 1 ssc: 0x564d128bed60 snapset: 0=[]:{} -1610> 2018-04-24 21:30:43.530 7fc59fce9700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] BACKFILL pushing 3:6d6399eb:::smithi03814407-98:head with ver 106'483 to peers [0] -1609> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] prep_backfill_object_push 3:6d6399eb:::smithi03814407-98:head v 106'483 to peers [0] -1608> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] start_recovery_op 3:6d6399eb:::smithi03814407-98:head -1607> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 106 start_recovery_op pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] 3:6d6399eb:::smithi03814407-98:head (0/3 rops) -1606> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] recover_object: 3:6d6399eb:::smithi03814407-98:head -1605> 2018-04-24 21:30:43.530 7fc59fce9700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] start_pushes soid 3:6d6399eb:::smithi03814407-98:head -1604> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] prep_push_to_replica: 3:6d6399eb:::smithi03814407-98:head v106'483 size 3764900 to osd.0 -1603> 2018-04-24 21:30:43.530 7fc59fce9700 15 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] push_to_replica snapset is 0=[]:{} -1602> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] calc_head_subsets 3:6d6399eb:::smithi03814407-98:head clone_overlap {} -1601> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] calc_head_subsets 3:6d6399eb:::smithi03814407-98:head data_subset [0~3764900] clone_subsets {} -1600> 2018-04-24 21:30:43.530 7fc59fce9700 7 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] build_push_op 3:6d6399eb:::smithi03814407-98:head v 106'483 size 3764900 recovery_info: ObjectRecoveryInfo(3:6d6399eb:::smithi03814407-98:head@106'483, size: 3764900, copy_subset: [0~3764900], clone_subset: {}, snapset: 0=[]:{})
the next ops come along,
-1390> 2018-04-24 21:30:43.534 7fc59bce1700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] _handle_message: 0x564d0d0c52c0 -1389> 2018-04-24 21:30:43.534 7fc59bce1700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] do_op: op osd_op(client.4334.0:11570 3.36 3:6d6399eb:::smithi03814407-98:head [write 3351151~447528] snapc 0=[] ondisk+write+known_if_redirected e106) v8 -1388> 2018-04-24 21:30:43.534 7fc59bce1700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] op_has_sufficient_caps session=0x564d10d12d00 pool=3 (unique_pool_0 ) owner=0 pool_app_metadata={rados={}} need_read_cap=0 need_write_cap=1 classes=[] -> yes -1387> 2018-04-24 21:30:43.534 7fc59bce1700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] do_op osd_op(client.4334.0:11570 3.36 3:6d6399eb:::smithi03814407-98:head [write 3351151~447528] snapc 0=[] ondisk+write+known_if_redirected e106) v8 may_write -> write-ordered flags ondisk+write+known_if_redirected ... -1375> 2018-04-24 21:30:43.534 7fc59bce1700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] do_op: op osd_op(client.4334.0:11571 3.36 3:6d6399eb:::smithi03814407-98:head [setxattr _header (56),truncate 3798679] snapc 0=[] ondisk+write+known_if_redirected e106) v8 -1374> 2018-04-24 21:30:43.534 7fc5b420a700 20 filestore(/var/lib/ceph/osd/ceph-2) lfn_unlink(485): clearing omap on #3:d458472c:::smithi03814407-285 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# in cid 3.2b_head -1373> 2018-04-24 21:30:43.534 7fc59bce1700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] op_has_sufficient_caps session=0x564d10d12d00 pool=3 (unique_pool_0 ) owner=0 pool_app_metadata={rados={}} need_read_cap=0 need_write_cap=1 classes=[] -> yes -1372> 2018-04-24 21:30:43.534 7fc59bce1700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] do_op osd_op(client.4334.0:11571 3.36 3:6d6399eb:::smithi03814407-98:head [setxattr _header (56),truncate 3798679] snapc 0=[] ondisk+write+known_if_redirected e106) v8 may_write -> write-ordered flags ondisk+write+known_if_redirected
..and go on the waiting for degraded list.
a bit later, the flush completes,
-1307> 2018-04-24 21:30:43.538 7fc5994dc700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] finish_manifest_flush 3:6d6399eb:::smithi03814407-98:head tid 443 (0) Success last_offset: 468480 -1306> 2018-04-24 21:30:43.538 7fc5994dc700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] finish_flush 3:6d6399eb:::smithi03814407-98:head tid 443 (0) Success -1305> 2018-04-24 21:30:43.538 7fc5994dc700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] kick_object_context_blocked 3:6d6399eb:::smithi03814407-98:head requeuing 1 requests -1304> 2018-04-24 21:30:43.538 7fc5994dc700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] requeue_op 0x564d0d0c5d40
which makes up the first op, which then is processed, sees degraded, and goes on the degraded waitlist too:
-1292> 2018-04-24 21:30:43.538 7fc59bce1700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] do_op: op osd_op(client.4334.0:11569 3.36 3:6d6399eb:::smithi03814407-98:head [write 1947263~638108] snapc 0=[] ondisk+write+known_if_redirected e106) v8 -1291> 2018-04-24 21:30:43.538 7fc59bce1700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] op_has_sufficient_caps session=0x564d10d12d00 pool=3 (unique_pool_0 ) owner=0 pool_app_metadata={rados={}} need_read_cap=0 need_write_cap=1 classes=[] -> yes -1290> 2018-04-24 21:30:43.538 7fc59bce1700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] do_op osd_op(client.4334.0:11569 3.36 3:6d6399eb:::smithi03814407-98:head [write 1947263~638108] snapc 0=[] ondisk+write+known_if_redirected e106) v8 may_write -> write-ordered flags ondisk+write+known_if_redirected
and this point the waitlist ordering is 11570 11571 11569.
when recovery finishes and we wake them up, they are in the wrong order. we crash a bit later.
Actions