Project

General

Profile

Actions

Bug #52925

closed

pg peering alway after trigger async recovery

Added by yite gu over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

my ceph version 14.2.21,I want to test pg async recovery function, so I olny set osd.9 config "osd_async_recovery_min_cost" value
to 20, because I want to trigger async recovery early. And then, I stop osd.9 daemon, and then I write some data to rbd. I restart osd.9 after write data.
primary pg of osd.9 start peering, I can see choose_async_recovery_replicated:
2021-09-22 14:23:49.597 7f38d423f700 20 osd.9 pg_epoch: 11494 pg[25.2c7( v 11489'46976 (11466'43900,11489'46976] local-lis/les=11490/11491 n=45 ec=11459/11459 lis/c 11492/11490 les/c/f 11493/11491/0 11494/11494/11494) [9,8,27] r=0 lpr=11494 pi=[11490,11494)/1 crt=11489'46976 lcod 0'0 mlcod 0'0 peering mbc={}] choose_async_recovery_replicated candidates by cost are: 52,9
and then:
2021-09-22 14:23:49.597 7f38d423f700 10 osd.9 pg_epoch: 11494 pg[25.2c7( v 11489'46976 (11466'43900,11489'46976] local-lis/les=11490/11491 n=45 ec=11459/11459 lis/c 11492/11490 les/c/f 11493/11491/0 11494/11494/11494) [9,8,27] r=0 lpr=11494 pi=[11490,11494)/1 crt=11489'46976 lcod 0'0 mlcod 0'0 peering mbc={}] choose_acting want [8,27] != acting [9,8,27], requesting pg_temp change
and then osd.9 recevied osdmap:
2021-09-22 14:23:50.260 7f38d423f700 20 osd.9 pg_epoch: 11495 pg[25.2c7( v 11489'46976 (11466'43900,11489'46976] local-lis/les=11490/11491 n=45 ec=11459/11459 lis/c 11492/11490 les/c/f 11493/11491/0 11494/11494/11494) [9,8,27] r=0 lpr=11494 pi=[11490,11494)/1 crt=11489'46976 lcod 0'0 mlcod 0'0 unknown mbc={}] new interval newup [9,8,27] newacting [8,27]
pg 25.2c7 is not primay in osd.9, so transitioning to Stray:
2021-09-22 14:23:50.261 7f38d423f700 1 osd.9 pg_epoch: 11495 pg[25.2c7( v 11489'46976 (11466'43900,11489'46976] local-lis/les=11490/11491 n=45 ec=11459/11459 lis/c 11492/11490 les/c/f 11493/11491/0 11494/11495/11495) [9,8,27]/[8,27] r=-1 lpr=11495 pi=[11490,11495)/1 crt=11489'46976 lcod 0'0 remapped NOTIFY mbc={}] state<Start>: transitioning to Stray
------------------------------
osd.8 log:
2021-09-30 18:52:41.654 7f3c12d87700 5 osd.8 pg_epoch: 708811 pg[25.2c7( v 11493'47028 (11466'44000,11493'47028] local-lis/les=102989/102990 n=45 ec=11459/11459 lis/c 102989/11490 les/c/f 102990/11491/0 102991/708811/708811) [9,8,27]/[8,27] r=0 lpr=708811 pi=[11490,708811)/2 crt=11493'47028 lcod 11493'47027 mlcod 0'0 remapped+peering mbc={}] enter Started/Primary/Peering/GetLog

2021-09-30 18:52:41.654 7f3c12d87700 10 osd.8 pg_epoch: 708811 pg[25.2c7( v 11493'47028 (11466'44000,11493'47028] local-lis/les=102989/102990 n=45 ec=11459/11459 lis/c 102989/11490 les/c/f 102990/11491/0 102991/708811/708811) [9,8,27]/[8,27] r=0 lpr=708811 pi=[11490,708811)/2 crt=11493'47028 lcod 11493'47027 mlcod 0'0 remapped+peering mbc={}] choose_acting all_info osd.8 25.2c7( v 11493'47028 (11466'44000,11493'47028] local-lis/les=102989/102990 n=45 ec=11459/11459 lis/c 102989/11490 les/c/f 102990/11491/0 102991/708811/708811)

2021-09-30 18:52:41.654 7f3c12d87700 10 osd.8 pg_epoch: 708811 pg[25.2c7( v 11493'47028 (11466'44000,11493'47028] local-lis/les=102989/102990 n=45 ec=11459/11459 lis/c 102989/11490 les/c/f 102990/11491/0 102991/708811/708811) [9,8,27]/[8,27] r=0 lpr=708811 pi=[11490,708811)/2 crt=11493'47028 lcod 11493'47027 mlcod 0'0 remapped+peering mbc={}] choose_acting all_info osd.9 25.2c7( v 11489'46976 (11466'43900,11489'46976] local-lis/les=11490/11491 n=45 ec=11459/11459 lis/c 102989/11490 les/c/f 102990/11491/0 102991/708811/708811)

2021-09-30 18:52:41.654 7f3c12d87700 10 osd.8 pg_epoch: 708811 pg[25.2c7( v 11493'47028 (11466'44000,11493'47028] local-lis/les=102989/102990 n=45 ec=11459/11459 lis/c 102989/11490 les/c/f 102990/11491/0 102991/708811/708811) [9,8,27]/[8,27] r=0 lpr=708811 pi=[11490,708811)/2 crt=11493'47028 lcod 11493'47027 mlcod 0'0 remapped+peering mbc={}] choose_acting all_info osd.27 25.2c7( v 11493'47028 (11466'44000,11493'47028] local-lis/les=102989/102990 n=45 ec=11459/11459 lis/c 102989/11490 les/c/f 102990/11491/0 102991/708811/708811)

2021-09-30 18:52:41.654 7f3c12d87700 10 osd.8 pg_epoch: 708811 pg[25.2c7( v 11493'47028 (11466'44000,11493'47028] local-lis/les=102989/102990 n=45 ec=11459/11459 lis/c 102989/11490 les/c/f 102990/11491/0 102991/708811/708811) [9,8,27]/[8,27] r=0 lpr=708811 pi=[11490,708811)/2 crt=11493'47028 lcod 11493'47027 mlcod 0'0 remapped+peering mbc={}] calc_replicated_acting newest update on osd.8 with 25.2c7( v 11493'47028 (11466'44000,11493'47028] local-lis/les=102989/102990 n=45 ec=11459/11459 lis/c 102989/11490 les/c/f 102990/11491/0 102991/708811/708811)
up_primary: 9) selected as primary
calc_replicated_acting primary is osd.9 with 25.2c7( v 11489'46976 (11466'43900,11489'46976] local-lis/les=11490/11491 n=45 ec=11459/11459 lis/c 102989/11490 les/c/f 102990/11491/0 102991/708811/708811)
osd.8 (up) accepted 25.2c7( v 11493'47028 (11466'44000,11493'47028] local-lis/les=102989/102990 n=45 ec=11459/11459 lis/c 102989/11490 les/c/f 102990/11491/0 102991/708811/708811)
osd.27 (up) accepted 25.2c7( v 11493'47028 (11466'44000,11493'47028] local-lis/les=102989/102990 n=45 ec=11459/11459 lis/c 102989/11490 les/c/f 102990/11491/0 102991/708811/708811)

2021-09-30 18:52:41.654 7f3c12d87700 20 osd.8 pg_epoch: 708811 pg[25.2c7( v 11493'47028 (11466'44000,11493'47028] local-lis/les=102989/102990 n=45 ec=11459/11459 lis/c 102989/11490 les/c/f 102990/11491/0 102991/708811/708811) [9,8,27]/[8,27] r=0 lpr=708811 pi=[11490,708811)/2 crt=11493'47028 lcod 11493'47027 mlcod 0'0 remapped+peering mbc={}] choose_async_recovery_replicated candidates by cost are:

2021-09-30 18:52:41.654 7f3c12d87700 20 osd.8 pg_epoch: 708811 pg[25.2c7( v 11493'47028 (11466'44000,11493'47028] local-lis/les=102989/102990 n=45 ec=11459/11459 lis/c 102989/11490 les/c/f 102990/11491/0 102991/708811/708811) [9,8,27]/[8,27] r=0 lpr=708811 pi=[11490,708811)/2 crt=11493'47028 lcod 11493'47027 mlcod 0'0 remapped+peering mbc={}] choose_async_recovery_replicated result want=[9,8,27] async_recovery=

2021-09-30 18:52:41.654 7f3c12d87700 10 osd.8 pg_epoch: 708811 pg[25.2c7( v 11493'47028 (11466'44000,11493'47028] local-lis/les=102989/102990 n=45 ec=11459/11459 lis/c 102989/11490 les/c/f 102990/11491/0 102991/708811/708811) [9,8,27]/[8,27] r=0 lpr=708811 pi=[11490,708811)/2 crt=11493'47028 lcod 11493'47027 mlcod 0'0 remapped+peering mbc={}] choose_acting want [9,8,27] != acting [8,27], requesting pg_temp change
so, pg peering aways.

Actions #2

Updated by Neha Ojha over 2 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (OSD)
Actions #3

Updated by Neha Ojha over 2 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 43534
Actions #4

Updated by Neha Ojha over 2 years ago

  • Status changed from Fix Under Review to Closed
Actions

Also available in: Atom PDF