Project

General

Profile

Bug #21823

on_flushed: object ... obc still alive (ec + cache tiering)

Added by Sage Weil about 2 years ago. Updated 3 months ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
Start date:
10/17/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:

Description

 -2114> 2017-10-17 17:17:26.628161 7f23b81b2700 10 osd.1 pg_epoch: 68 pg[2.2s1( v 68'613 (0'0,68'613] local-lis/les=57/58 n=613 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=68'612 lua=68'612 crt=68'612 lcod 68'611 mlcod 68'611 active+undersized+degraded] start_rmw: Op(2:512819fe:::smithi04510283-1653 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head v=68'614 tt=0'0 tid=6382 reqid=osd.3.13:3287 client_op=osd_op(osd.3.13:3287 2.2s1 2:512819fe:::smithi04510283-1653 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [copy-from ver 1792] snapc 0=[] ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e68) roll_forward_to=68'612 temp_added= temp_cleared= pending_read={} remote_read={} remote_read_result={} pending_apply= pending_commit= plan.to_read={} plan.will_write={2:512819fe:::smithi04510283-1653 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head=[0~3768320]})
...
 -1310> 2017-10-17 17:17:26.652999 7f23cb1d8700 10 osd.1 pg_epoch: 68 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=68'613 lua=68'612 crt=68'612 lcod 68'612 mlcod 68'611 active+undersized+degraded] op_applied version 68'613
...
 -1203> 2017-10-17 17:17:26.653732 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=68'613 lua=68'613 crt=68'612 lcod 68'612 mlcod 68'612 active+undersized+degraded] state<Started/Primary/Active>: Active advmap
 -1202> 2017-10-17 17:17:26.653740 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=68'613 lua=68'613 crt=68'612 lcod 68'612 mlcod 68'612 active+undersized+degraded] state<Started>: Started advmap
 -1201> 2017-10-17 17:17:26.653748 7f23c31c8700 20 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=68'613 lua=68'613 crt=68'612 lcod 68'612 mlcod 68'612 active+undersized+degraded] new interval newup [5,1,0,3] newacting [5,1,0,3]
 -1200> 2017-10-17 17:17:26.653756 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=68'613 lua=68'613 crt=68'612 lcod 68'612 mlcod 68'612 active+undersized+degraded] state<Started>: should_restart_peering, transitioning to Reset
 -1199> 2017-10-17 17:17:26.653764 7f23c31c8700  5 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=68'613 lua=68'613 crt=68'612 lcod 68'612 mlcod 68'612 active+undersized+degraded] exit Started/Primary/Active/Clean 30.453020 31 0.001211
 -1198> 2017-10-17 17:17:26.653775 7f23c31c8700  5 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=68'613 lua=68'613 crt=68'612 lcod 68'612 mlcod 68'612 active+undersized+degraded] exit Started/Primary/Active 30.464621 0 0.000000
 -1197> 2017-10-17 17:17:26.653783 7f23c31c8700 20 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=68'613 lua=68'613 crt=68'612 lcod 68'612 mlcod 68'612 active] agent_stop
 -1196> 2017-10-17 17:17:26.653791 7f23c31c8700  5 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=68'613 lua=68'613 crt=68'612 lcod 68'612 mlcod 68'612 active] exit Started/Primary 31.466804 0 0.000000
 -1195> 2017-10-17 17:17:26.653799 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=68'613 lua=68'613 crt=68'612 lcod 68'612 mlcod 68'612 active] clear_primary_state
 -1194> 2017-10-17 17:17:26.653952 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=0'0 lua=68'613 crt=68'612 lcod 68'612 mlcod 0'0 active] release_backoffs [2:40000000::::head,2:80000000::::head)
 -1193> 2017-10-17 17:17:26.653962 7f23c31c8700 20 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=0'0 lua=68'613 crt=68'612 lcod 68'612 mlcod 0'0 active] agent_stop
 -1192> 2017-10-17 17:17:26.653969 7f23c31c8700  5 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=0'0 lua=68'613 crt=68'612 lcod 68'612 mlcod 0'0 active] exit Started 31.467009 0 0.000000
 -1191> 2017-10-17 17:17:26.653977 7f23c31c8700  5 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=0'0 lua=68'613 crt=68'612 lcod 68'612 mlcod 0'0 active] enter Reset
 -1190> 2017-10-17 17:17:26.653983 7f23c31c8700 20 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=57 pi=[50,57)/1 luod=0'0 lua=68'613 crt=68'612 lcod 68'612 mlcod 0'0 active] set_last_peering_reset 69
 -1189> 2017-10-17 17:17:26.653990 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=69 pi=[50,57)/1 luod=0'0 lua=68'613 crt=68'612 lcod 68'612 mlcod 0'0 active] Clearing blocked outgoing recovery messages
 -1188> 2017-10-17 17:17:26.653997 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=69 pi=[50,57)/1 luod=0'0 lua=68'613 crt=68'612 lcod 68'612 mlcod 0'0 active] Beginning to block outgoing recovery messages
 -1187> 2017-10-17 17:17:26.654004 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=69 pi=[50,57)/1 luod=0'0 lua=68'613 crt=68'612 lcod 68'612 mlcod 0'0 active] state<Reset>: Reset advmap
 -1186> 2017-10-17 17:17:26.654011 7f23c31c8700 20 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=69 pi=[50,57)/1 luod=0'0 lua=68'613 crt=68'612 lcod 68'612 mlcod 0'0 active] new interval newup [5,1,0,3] newacting [5,1,0,3]
 -1185> 2017-10-17 17:17:26.654018 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=69 pi=[50,57)/1 luod=0'0 lua=68'613 crt=68'612 lcod 68'612 mlcod 0'0 active] state<Reset>: should restart peering, calling start_peering_interval again
 -1184> 2017-10-17 17:17:26.654024 7f23c31c8700 20 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [2147483647,1,0,3] r=1 lpr=69 pi=[50,57)/1 luod=0'0 lua=68'613 crt=68'612 lcod 68'612 mlcod 0'0 active] set_last_peering_reset 69
 -1183> 2017-10-17 17:17:26.654047 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 luod=0'0 lua=68'613 crt=68'612 lcod 68'612 active] start_peering_interval: check_new_interval output: check_new_interval interval(57-68 up [2147483647,1,0,3](1) acting [2147483647,1,0,3](1)): not rw, up_thru 57 up_from 13 last_epoch_clean 51
 -1182> 2017-10-17 17:17:26.654059 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 57/57/57) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 luod=0'0 lua=68'613 crt=68'612 lcod 68'612 active]  noting past ([50,68] intervals=([57,68] acting 0(2),1(1),3(3)))
 -1181> 2017-10-17 17:17:26.654066 7f23c31c8700 20 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 luod=0'0 lua=68'613 crt=68'612 lcod 68'612 active] _on_new_intervalchecking missing set deletes flag. missing = missing(0 may_include_deletes = 1)
 -1180> 2017-10-17 17:17:26.654072 7f23c31c8700  1 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 luod=0'0 lua=68'613 crt=68'612 lcod 68'612 active] start_peering_interval up [2147483647,1,0,3] -> [5,1,0,3], acting [2147483647,1,0,3] -> [5,1,0,3], acting_primary 1(1) -> 5, up_primary 1(1) -> 5, role 1 -> 1, features acting 2305244844532236283 upacting 2305244844532236283
 -1179> 2017-10-17 17:17:26.654081 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown] clear_primary_state
 -1178> 2017-10-17 17:17:26.654087 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown] release_backoffs [2:40000000::::head,2:80000000::::head)
 -1177> 2017-10-17 17:17:26.654094 7f23c31c8700 20 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown] agent_stop
 -1176> 2017-10-17 17:17:26.654099 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown] on_change
 -1175> 2017-10-17 17:17:26.654104 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown] cancel_copy_ops
 -1174> 2017-10-17 17:17:26.654109 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown] cancel_flush_ops
 -1173> 2017-10-17 17:17:26.654114 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown] cancel_proxy_ops
 -1172> 2017-10-17 17:17:26.654120 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown]  canceling repop tid 6382
 -1171> 2017-10-17 17:17:26.654126 7f23c31c8700 20 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown] remove_repop repgather(0x7f23f3ee8d00 68'614 rep_tid=6382 committed?=0 applied?=0 r=0)
 -1170> 2017-10-17 17:17:26.654195 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown] on_change_cleanup
 -1169> 2017-10-17 17:17:26.654203 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown] on_change
 -1168> 2017-10-17 17:17:26.654223 7f23c31c8700 20 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown] exit NotTrimming
 -1167> 2017-10-17 17:17:26.654229 7f23c31c8700 20 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown] enter NotTrimming
 -1166> 2017-10-17 17:17:26.654579 7f23c31c8700 15 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] clear_stats
 -1165> 2017-10-17 17:17:26.654587 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] on_role_change
 -1164> 2017-10-17 17:17:26.654590 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] cancel_recovery
 -1163> 2017-10-17 17:17:26.654593 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] clear_recovery_state
 -1162> 2017-10-17 17:17:26.654598 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] check_recovery_sources no source osds () went down
 -1161> 2017-10-17 17:17:26.654605 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] handle_activate_map 
 -1160> 2017-10-17 17:17:26.654610 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] take_waiters
 -1159> 2017-10-17 17:17:26.654614 7f23c31c8700  5 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] exit Reset 0.000637 1 2017-10-17 17:17:26.654603
 -1158> 2017-10-17 17:17:26.654620 7f23c31c8700  5 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] enter Started
 -1157> 2017-10-17 17:17:26.654624 7f23c31c8700  5 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] enter Start
 -1155> 2017-10-17 17:17:26.654627 7f23c31c8700  1 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] state<Start>: transitioning to Stray
 -1154> 2017-10-17 17:17:26.654632 7f23c31c8700  5 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] exit Start 0.000008 0 0.000000
 -1153> 2017-10-17 17:17:26.654636 7f23c31c8700  5 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] enter Started/Stray
 -1152> 2017-10-17 17:17:26.654641 7f23c31c8700 20 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] handle_activate_map: Not dirtying info: last_persisted is 68 while current is 69
 -1151> 2017-10-17 17:17:26.654644 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] handle_peering_event: epoch_sent: 69 epoch_requested: 69 NullEvt
 -1149> 2017-10-17 17:17:26.654661 7f23c31c8700 10 bluestore(/var/lib/ceph/osd/ceph-1) queue_transactions existing 0x7f23f06439c0 osr(2.2s1 0x7f23f04f8c40)
 -1147> 2017-10-17 17:17:26.654669 7f23c31c8700 15 bluestore(/var/lib/ceph/osd/ceph-1) _omap_setkeys 2.2s1_head 1#2:40000000::::head#
 -1146> 2017-10-17 17:17:26.654677 7f23c31c8700 10 bluestore(/var/lib/ceph/osd/ceph-1) _omap_setkeys 2.2s1_head 1#2:40000000::::head# = 0
 -1111> 2017-10-17 17:17:26.654881 7f23cd893700 10 osd.1 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY]handle_pg_peering_evt acting changed in 69 (msg from 66)
 -1099> 2017-10-17 17:17:26.654902 7f23cd893700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] handle_query query(info 0'0) from replica 5(0)
 -1089> 2017-10-17 17:17:26.654922 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] handle_peering_event: epoch_sent: 69 epoch_requested: 69 MQuery from 5(0) query_epoch 69 query: query(info 0'0)
 -1084> 2017-10-17 17:17:26.654932 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] sending info
 -1071> 2017-10-17 17:17:26.654968 7f23c29c7700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] handle_peering_event: epoch_sent: 69 epoch_requested: 69 IntervalFlush
 -1066> 2017-10-17 17:17:26.654977 7f23c29c7700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] state<Started>: Ending blocked outgoing recovery messages
 -1046> 2017-10-17 17:17:26.655000 7f23c29c7700  1 -- 172.21.15.45:6805/7222 --> 172.21.15.96:6817/26009 -- pg_notify((query:69 sent:69 2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) 1->0)=([50,68] intervals=([57,68] acting 0(2),1(1),3(3))) (query:69 sent:69 2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) 1->0)=([50,68] intervals=([57,68] acting 0(2),1(1),3(3))) epoch 69) v6 -- ?+0 0x7f23edfc7c00 con 0x7f23f3a81200
  -944> 2017-10-17 17:17:26.655453 7f23cb1d8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] flushed
  -942> 2017-10-17 17:17:26.655475 7f23c31c8700 10 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] handle_peering_event: epoch_sent: 69 epoch_requested: 69 FlushedEvt
  -938> 2017-10-17 17:17:26.655493 7f23c31c8700 -1 osd.1 pg_epoch: 69 pg[2.2s1( v 68'614 (0'0,68'614] local-lis/les=57/58 n=614 ec=19/19 lis/c 57/50 les/c/f 58/51/0 69/69/69) [5,1,0,3] r=1 lpr=69 pi=[50,69)/1 crt=68'612 lcod 68'612 unknown NOTIFY] on_flushed: object 2:512819fe:::smithi04510283-1653 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head obc still alive

/a/sage-2017-10-17_15:09:41-rados-wip-sage-testing-2017-10-16-2116-distro-basic-smithi/1742469

Related issues

Related to RADOS - Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty()) Can't reproduce 05/22/2017

History

#1 Updated by Myoungwon Oh about 2 years ago

It seems that SharedLRU causes refs leak. My guess is that this is same cause as http://tracker.ceph.com/issues/20000.
In this bug log, assert() occurs at object_contexts.empty().
SharedLRU.empty() only check weak_refs.empty(). Therefore, if refs leak happen, assert() occurs.

To avoid this, how about using
pair<hobject_t, ObjectContextRef> i;
while (object_contexts.get_next(i.first, &i)) {
object_contexts.purge(i.first);
}
instead of object_contexts.clear() in PrimaryLogPG::on_change() ?

#2 Updated by Myoungwon Oh about 2 years ago

It seems that SharedLRU causes refs leak. My guess is that this is the same cause as http://tracker.ceph.com/issues/20000.
In this bug log, assert() occurs at object_contexts.empty().
SharedLRU.empty() only check weak_refs.empty(). Therefore, if refs leak happen, assert() occurs.

To avoid this, how about using
pair<hobject_t, ObjectContextRef> i;
while (object_contexts.get_next(i.first, &i)) {
object_contexts.purge(i.first);
}
instead of object_contexts.clear() in PrimaryLogPG::on_change() ?

#3 Updated by Sage Weil almost 2 years ago

The problem is that we want the (weak) refs to go away. If they still exist, then someone has a hadnle to an obc from a previous interval, and that will lead to bugs.. thats' why the check exists. Unfortunatley it's super hard to track down where they come from.

#4 Updated by Sage Weil almost 2 years ago

  • Related to Bug #20000: osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty()) added

#5 Updated by Josh Durgin 3 months ago

  • Status changed from Verified to Can't reproduce

Also available in: Atom PDF