Project

General

Profile

Bug #39441

osd acting cycle

Added by Sage Weil about 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

osd.9 (mimic)

2019-04-23 22:04:08.542 7fb87ff35700  5 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped+peering mbc={}] enter Started/Primary/Peering/GetLog
2019-04-23 22:04:08.542 7fb87ff35700 10 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped+peering mbc={}] choose_acting all_info osd.0 7.a( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c 0/0 les/c/f 0/0/0 0/0/0)
2019-04-23 22:04:08.542 7fb87ff35700 10 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped+peering mbc={}] choose_acting all_info osd.6 7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071)
2019-04-23 22:04:08.542 7fb87ff35700 10 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped+peering mbc={}] choose_acting all_info osd.9 7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071)
2019-04-23 22:04:08.542 7fb87ff35700 10 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped+peering mbc={}] choose_acting all_info osd.11 7.a( v 1267'2440 lc 1255'993 (0'0,1267'2440] local-lis/les=1252/1253 n=2440 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071)
2019-04-23 22:04:08.542 7fb87ff35700 10 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped+peering mbc={}] calc_replicated_acting newest update on osd.6 with 7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071)
calc_replicated_acting primary is osd.9 with 7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071)
 osd.6 (up) accepted 7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071)
2019-04-23 22:04:08.542 7fb87ff35700 20 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped+peering mbc={}] choose_async_recovery_replicated candidates by cost are: 122,9
2019-04-23 22:04:08.542 7fb87ff35700 20 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped+peering mbc={}] choose_async_recovery_replicated result want=[6] async_recovery=9
2019-04-23 22:04:08.542 7fb87ff35700 10 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped+peering mbc={}] choose_acting want [6] != acting [9], requesting pg_temp change
2019-04-23 22:04:08.542 7fb87ff35700  5 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped+peering mbc={}] exit Started/Primary/Peering/GetLog 0.000198 0 0.000000
2019-04-23 22:04:08.542 7fb87ff35700 10 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped+peering mbc={}] state<Started/Primary/Peering>: Leaving Peering
2019-04-23 22:04:08.542 7fb87ff35700  5 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped+peering mbc={}] exit Started/Primary/Peering 0.005481 0 0.000000
2019-04-23 22:04:08.542 7fb87ff35700  5 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped mbc={}] exit Started/Primary 0.005504 0 0.000000
2019-04-23 22:04:08.542 7fb87ff35700 10 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped mbc={}] clear_primary_state
2019-04-23 22:04:08.542 7fb87ff35700 10 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped mbc={}] release_backoffs [7:50000000::::0,7:60000000::::head)
2019-04-23 22:04:08.542 7fb87ff35700 20 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped mbc={}] agent_stop
2019-04-23 22:04:08.542 7fb87ff35700  5 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped mbc={}] enter Started/Primary
2019-04-23 22:04:08.542 7fb87ff35700  5 osd.9 pg_epoch: 12071 pg[7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12071/12071) [9,6]/[9] r=0 lpr=12071 pi=[1252,12071)/2 crt=1263'2319 lcod 1263'2317 mlcod 0'0 remapped mbc={}] enter Started/Primary/WaitActingChange
2019-04-23 22:04:08.542 7fb87ff35700 10 osd.9 12071 send_pg_temp {7.a=[6]}

osd.6 (octopus)

2019-04-23 22:08:45.618 7f892e235700  5 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped+peering m=363 mbc={}] enter Started/Primary/Peering/GetLog
2019-04-23 22:08:45.618 7f892e235700 10 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped+peering m=363 mbc={}] choose_acting all_info osd.0 7.a( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c 0/0 les/c/f 0/0/0 0/0/0)
2019-04-23 22:08:45.618 7f892e235700 10 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped+peering m=363 mbc={}] choose_acting all_info osd.6 7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348)
2019-04-23 22:08:45.618 7f892e235700 10 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped+peering m=363 mbc={}] choose_acting all_info osd.9 7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348)
2019-04-23 22:08:45.618 7f892e235700 10 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped+peering m=363 mbc={}] choose_acting all_info osd.11 7.a( v 1267'2440 lc 1255'993 (0'0,1267'2440] local-lis/les=1252/1253 n=2440 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348)
2019-04-23 22:08:45.618 7f892e235700 10 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped+peering m=363 mbc={}] calc_replicated_acting newest update on osd.6 with 7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348)
up_primary: 9) selected as primary
calc_replicated_acting primary is osd.9 with 7.a( v 1263'2319 (0'0,1263'2319] local-lis/les=1257/1258 n=2319 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348)
 osd.6 (up) accepted 7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348)

2019-04-23 22:08:45.618 7f892e235700 20 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped+peering m=363 mbc={}] choose_async_recovery_replicated candidates by cost are: 122,9,363,6
2019-04-23 22:08:45.618 7f892e235700 20 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped+peering m=363 mbc={}] choose_async_recovery_replicated result want=[9] async_recovery=6
2019-04-23 22:08:45.618 7f892e235700 10 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped+peering m=363 mbc={}] choose_acting want [9] != acting [6], requesting pg_temp change
2019-04-23 22:08:45.618 7f892e235700  5 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped+peering m=363 mbc={}] exit Started/Primary/Peering/GetLog 0.000055 0 0.000000
2019-04-23 22:08:45.618 7f892e235700 10 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped+peering m=363 mbc={}] state<Started/Primary/Peering>: Leaving Peering
2019-04-23 22:08:45.618 7f892e235700  5 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped+peering m=363 mbc={}] exit Started/Primary/Peering 0.002765 0 0.000000
2019-04-23 22:08:45.618 7f892e235700  5 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped m=363 mbc={}] exit Started/Primary 0.002776 0 0.000000
2019-04-23 22:08:45.618 7f892e235700 10 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped m=363 mbc={}] clear_primary_state
2019-04-23 22:08:45.618 7f893ba50700 20 osd.6 12348 get_map 1263 - loading and decoding 0x56354e263a80
2019-04-23 22:08:45.618 7f892e235700 10 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped m=363 mbc={}] release_backoffs [7:50000000::::0,7:60000000::::head)
2019-04-23 22:08:45.618 7f892e235700 20 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped m=363 mbc={}] agent_stop
2019-04-23 22:08:45.618 7f893ba50700 15 filestore(/var/lib/ceph/osd/ceph-6) read(3393): meta/#-1:18588850:::osdmap.1263:0# 0~0
2019-04-23 22:08:45.618 7f893ba50700 20 filestore.osr(0x56354b201840) wait_for_apply #-1:18588850:::osdmap.1263:0# done
2019-04-23 22:08:45.618 7f892e235700  5 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped m=363 mbc={}] enter Started/Primary
2019-04-23 22:08:45.618 7f892e235700  5 osd.6 pg_epoch: 12348 pg[7.a( v 1267'2441 lc 1255'630 (0'0,1267'2441] local-lis/les=1265/1266 n=2441 ec=1252/1252 lis/c 1265/1252 les/c/f 1266/1253/0 1268/12348/12348) [9,6]/[6] r=0 lpr=12348 pi=[1252,12348)/2 crt=1267'2441 mlcod 0'0 remapped m=363 mbc={}] enter Started/Primary/WaitActingChange
2019-04-23 22:08:45.618 7f892e235700 10 log is not dirty
2019-04-23 22:08:45.618 7f892e235700 10 osd.6 12348 send_pg_temp {7.a=[9]}

/a/sage-2019-04-23_06:50:56-rados-wip-sage4-testing-2019-04-22-2147-distro-basic-smithi/3882551


Related issues

Copied to RADOS - Backport #39512: nautilus: osd acting cycle Resolved

History

#1 Updated by Neha Ojha about 2 years ago

https://github.com/ceph/ceph/pull/24004 was not backported to mimic, which might explain why the octopus osd is calculating the cost differently from the mimic osd, resulting in a cycle because of this inconsistency.

#2 Updated by Neha Ojha about 2 years ago

  • Status changed from 12 to Fix Under Review
  • Pull request ID set to 27798

#3 Updated by Neha Ojha about 2 years ago

  • Assignee set to Neha Ojha

#4 Updated by Neha Ojha about 2 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to nautilus

#5 Updated by Nathan Cutler about 2 years ago

#6 Updated by Nathan Cutler almost 2 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF