Project

General

Profile

Actions

Bug #23857

closed

flush (manifest) vs async recovery causes out of order op

Added by Sage Weil about 6 years ago. Updated almost 4 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

     0> 2018-04-24 21:30:43.570 7fc59bce1700 -1 /build/ceph-13.0.2-1742-g8c053ae/src/osd/PrimaryLogPG.cc: In function 'void PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)' thread 7fc59bce1700 time 2018-04-24 21:30:43.575095
/build/ceph-13.0.2-1742-g8c053ae/src/osd/PrimaryLogPG.cc: 3942: FAILED assert(0 == "out of order op")

 ceph version 13.0.2-1742-g8c053ae (8c053ae078dc13c766dde187006d7d6161d4bca9) mimic (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7fc5c4b72252]
 2: (()+0x2e4417) [0x7fc5c4b72417]
 3: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x1663) [0x564d0b1b22a3]
 4: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x3615) [0x564d0b1b59b5]
 5: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xcc3) [0x564d0b1b9cd3]
 6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1b8) [0x564d0b010698]
 7: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x564d0b28cdc2]
 8: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9ec) [0x564d0b0301bc]
 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x46e) [0x7fc5c4b7718e]
 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fc5c4b79210]
 11: (()+0x76ba) [0x7fc5c361b6ba]
 12: (clone()+0x6d) [0x7fc5c2e4441d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

/a/sage-2018-04-24_16:58:54-rados-wip-sage3-testing-2018-04-24-1010-distro-basic-smithi/2435204

first, client.4334.0:11569 triggers a flush, and the op goes on the flushing waitlist

 -1809> 2018-04-24 21:30:43.522 7fc59fce9700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 luod=106'482 rops=1 bft=0 crt=106'483 lcod 106'481 mlcod 106'481 active+remapped+backfilling mbc={}] do_op: op osd_op(client.4334.0:11569 3.36 3:6d6399eb:::smithi03814407-98:head [write 1947263~638108] snapc 0=[] ondisk+write+known_if_redirected e106) 
v8
 -1808> 2018-04-24 21:30:43.522 7fc59fce9700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 luod=106'482 rops=1 bft=0 crt=106'483 lcod 106'481 mlcod 106'481 active+remapped+backfilling mbc={}] op_has_sufficient_caps session=0x564d10d12d00 pool=3 (unique_pool_0 ) owner=0 pool_app_metadata={rados={}} need_read_cap=0 need_write_cap=1 classes=[] 
-> yes
 -1807> 2018-04-24 21:30:43.522 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 luod=106'482 rops=1 bft=0 crt=106'483 lcod 106'481 mlcod 106'481 active+remapped+backfilling mbc={}] do_op osd_op(client.4334.0:11569 3.36 3:6d6399eb:::smithi03814407-98:head [write 1947263~638108] snapc 0=[] ondisk+write+known_if_redirected e106) v8 m
ay_write -> write-ordered flags ondisk+write+known_if_redirected
 -1806> 2018-04-24 21:30:43.522 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 luod=106'482 rops=1 bft=0 crt=106'483 lcod 106'481 mlcod 106'481 active+remapped+backfilling mbc={}] get_object_context: found obc in cache: 0x564d10a46c00
 -1805> 2018-04-24 21:30:43.522 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 luod=106'482 rops=1 bft=0 crt=106'483 lcod 106'481 mlcod 106'481 active+remapped+backfilling mbc={}] get_object_context: 0x564d10a46c00 3:6d6399eb:::smithi03814407-98:head rwstate(write n=1 w=0) oi: 3:6d6399eb:::smithi03814407-98:head(106'483 client.43
34.0:11568 dirty|manifest s 3764900 uv 483 alloc_hint [0 0 0] manifest(chunked {468480=(len: 1090048 oid: 2:a3e960ef:::smithi03814407-98 low_tier:0 offset: 468480 flags: dirty)})) exists: 1 ssc: 0x564d128bed60 snapset: 0=[]:{}
 -1804> 2018-04-24 21:30:43.522 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 luod=106'482 rops=1 bft=0 crt=106'483 lcod 106'481 mlcod 106'481 active+remapped+backfilling mbc={}] find_object_context 3:6d6399eb:::smithi03814407-98:head @head oi=3:6d6399eb:::smithi03814407-98:head(106'483 client.4334.0:11568 dirty|manifest s 37649
00 uv 483 alloc_hint [0 0 0] manifest(chunked {468480=(len: 1090048 oid: 2:a3e960ef:::smithi03814407-98 low_tier:0 offset: 468480 flags: dirty)}))
 -1803> 2018-04-24 21:30:43.522 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 luod=106'482 rops=1 bft=0 crt=106'483 lcod 106'481 mlcod 106'481 active+remapped+backfilling mbc={}] start_flush 3:6d6399eb:::smithi03814407-98:head v106'483 uv483 blocking

a bit later, that object (3:6d6399eb:::smithi03814407-98:head) starts async-recovering and becomes degraded:
 -1624> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] recover_replicas(1)
 -1623> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] recover_primary recovering 0 in pg, missing missing(0 may_include_deletes = 1)
 -1622> 2018-04-24 21:30:43.530 7fc59fce9700 25 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] recover_primary {}
 -1621> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] recover_backfill (1) bft=0 last_backfill_started 3:6d182d29:::smithi03814407-227 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head
 -1620> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] peer osd.0 info 3.36( v 106'483 (27'127,106'483] lb MIN (bitwise) local-lis/les=105/106 n=0 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) interval MAX-MAX 0 objects
 -1619> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] update_range: bi is old, (106'482) can be updated with log to projected_last_update 106'483
 -1618> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] scanning pg log first
 -1617> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] operator(): updating from version 106'483
 -1616> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] operator(): 3:6d6399eb:::smithi03814407-98:head updated to version 106'483
 -1615> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] scanning projected log
 -1614> 2018-04-24 21:30:43.530 7fc59fce9700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}]    my backfill interval BackfillInfo(3:6d6399eb:::smithi03814407-98:head-MAX 6 objects {3:6d6399eb:::smithi03814407-98:head=106'483,3:6df132a0:::smithi03814407-157 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head=106'482,3:6e1b86ab:::smithi03814407-52:head=97'472,3:6ebc0f04:::smithi03814407-150:head=28'282,3:6f8f49e3:::smithi03814407-242:head=101'479,3:6ff528d8:::smithi03814407-30:head=95'465})
 -1613> 2018-04-24 21:30:43.530 7fc59fce9700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}]  peer shard 0 backfill BackfillInfo(MAX-MAX 0 objects)
 -1612> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] get_object_context: found obc in cache: 0x564d10a46c00
 -1611> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] get_object_context: 0x564d10a46c00 3:6d6399eb:::smithi03814407-98:head rwstate(none n=0 w=0) oi: 3:6d6399eb:::smithi03814407-98:head(106'483 client.4334.0:11568 dirty|manifest s 3764900 uv 483 alloc_hint [0 0 0] manifest(chunked {468480=(len: 1090048 oid: 2:a3e960ef:::smithi03814407-98 low_tier:0 offset: 468480 flags: dirty)})) exists: 1 ssc: 0x564d128bed60 snapset: 0=[]:{}
 -1610> 2018-04-24 21:30:43.530 7fc59fce9700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}]  BACKFILL pushing 3:6d6399eb:::smithi03814407-98:head with ver 106'483 to peers [0]
 -1609> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] prep_backfill_object_push 3:6d6399eb:::smithi03814407-98:head v 106'483 to peers [0]
 -1608> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] start_recovery_op 3:6d6399eb:::smithi03814407-98:head
 -1607> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 106 start_recovery_op pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] 3:6d6399eb:::smithi03814407-98:head (0/3 rops)
 -1606> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] recover_object: 3:6d6399eb:::smithi03814407-98:head
 -1605> 2018-04-24 21:30:43.530 7fc59fce9700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] start_pushes soid 3:6d6399eb:::smithi03814407-98:head
 -1604> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] prep_push_to_replica: 3:6d6399eb:::smithi03814407-98:head v106'483 size 3764900 to osd.0
 -1603> 2018-04-24 21:30:43.530 7fc59fce9700 15 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] push_to_replica snapset is 0=[]:{}
 -1602> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] calc_head_subsets 3:6d6399eb:::smithi03814407-98:head clone_overlap {}
 -1601> 2018-04-24 21:30:43.530 7fc59fce9700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] calc_head_subsets 3:6d6399eb:::smithi03814407-98:head  data_subset [0~3764900]  clone_subsets {}
 -1600> 2018-04-24 21:30:43.530 7fc59fce9700  7 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] build_push_op 3:6d6399eb:::smithi03814407-98:head v 106'483 size 3764900 recovery_info: ObjectRecoveryInfo(3:6d6399eb:::smithi03814407-98:head@106'483, size: 3764900, copy_subset: [0~3764900], clone_subset: {}, snapset: 0=[]:{})

the next ops come along,
 -1390> 2018-04-24 21:30:43.534 7fc59bce1700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] _handle_message: 0x564d0d0c52c0
 -1389> 2018-04-24 21:30:43.534 7fc59bce1700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] do_op: op osd_op(client.4334.0:11570 3.36 3:6d6399eb:::smithi03814407-98:head [write 3351151~447528] snapc 0=[] ondisk+write+known_if_redirected e106) v8
 -1388> 2018-04-24 21:30:43.534 7fc59bce1700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] op_has_sufficient_caps session=0x564d10d12d00 pool=3 (unique_pool_0 ) owner=0 pool_app_metadata={rados={}} need_read_cap=0 need_write_cap=1 classes=[] -> yes
 -1387> 2018-04-24 21:30:43.534 7fc59bce1700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] do_op osd_op(client.4334.0:11570 3.36 3:6d6399eb:::smithi03814407-98:head [write 3351151~447528] snapc 0=[] ondisk+write+known_if_redirected e106) v8 may_write -> write-ordered flags ondisk+write+known_if_redirected
...
 -1375> 2018-04-24 21:30:43.534 7fc59bce1700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] do_op: op osd_op(client.4334.0:11571 3.36 3:6d6399eb:::smithi03814407-98:head [setxattr _header (56),truncate 3798679] snapc 0=[] ondisk+write+known_if_redirected e106) v8
 -1374> 2018-04-24 21:30:43.534 7fc5b420a700 20 filestore(/var/lib/ceph/osd/ceph-2) lfn_unlink(485): clearing omap on #3:d458472c:::smithi03814407-285 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# in cid 3.2b_head
 -1373> 2018-04-24 21:30:43.534 7fc59bce1700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] op_has_sufficient_caps session=0x564d10d12d00 pool=3 (unique_pool_0 ) owner=0 pool_app_metadata={rados={}} need_read_cap=0 need_write_cap=1 classes=[] -> yes
 -1372> 2018-04-24 21:30:43.534 7fc59bce1700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] do_op osd_op(client.4334.0:11571 3.36 3:6d6399eb:::smithi03814407-98:head [setxattr _header (56),truncate 3798679] snapc 0=[] ondisk+write+known_if_redirected e106) v8 may_write -> write-ordered flags ondisk+write+known_if_redirected

..and go on the waiting for degraded list.

a bit later, the flush completes,

 -1307> 2018-04-24 21:30:43.538 7fc5994dc700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] finish_manifest_flush 3:6d6399eb:::smithi03814407-98:head tid 443 (0) Success last_offset: 468480
 -1306> 2018-04-24 21:30:43.538 7fc5994dc700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] finish_flush 3:6d6399eb:::smithi03814407-98:head tid 443 (0) Success
 -1305> 2018-04-24 21:30:43.538 7fc5994dc700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] kick_object_context_blocked 3:6d6399eb:::smithi03814407-98:head requeuing 1 requests
 -1304> 2018-04-24 21:30:43.538 7fc5994dc700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] requeue_op 0x564d0d0c5d40

which makes up the first op, which then is processed, sees degraded, and goes on the degraded waitlist too:
 -1292> 2018-04-24 21:30:43.538 7fc59bce1700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] do_op: op osd_op(client.4334.0:11569 3.36 3:6d6399eb:::smithi03814407-98:head [write 1947263~638108] snapc 0=[] ondisk+write+known_if_redirected e106) v8
 -1291> 2018-04-24 21:30:43.538 7fc59bce1700 20 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] op_has_sufficient_caps session=0x564d10d12d00 pool=3 (unique_pool_0 ) owner=0 pool_app_metadata={rados={}} need_read_cap=0 need_write_cap=1 classes=[] -> yes
 -1290> 2018-04-24 21:30:43.538 7fc59bce1700 10 osd.2 pg_epoch: 106 pg[3.36( v 106'483 (27'127,106'483] local-lis/les=105/106 n=9 ec=88/18 lis/c 105/88 les/c/f 106/89/0 104/105/105) [0,2]/[2,3] backfill=[0] r=0 lpr=105 pi=[88,105)/1 rops=1 bft=0 crt=106'483 lcod 106'482 mlcod 106'482 active+remapped+backfilling mbc={255={}}] do_op osd_op(client.4334.0:11569 3.36 3:6d6399eb:::smithi03814407-98:head [write 1947263~638108] snapc 0=[] ondisk+write+known_if_redirected e106) v8 may_write -> write-ordered flags ondisk+write+known_if_redirected

and this point the waitlist ordering is 11570 11571 11569.

when recovery finishes and we wake them up, they are in the wrong order. we crash a bit later.


Related issues 1 (1 open0 closed)

Related to Ceph - Bug #23947: ceph_test_rados dumped core, Error: finished tid 1 when last_acked_tid was 6New

Actions
Actions

Also available in: Atom PDF