Project

General

Profile

Bug #52509

PG merge: PG stuck in premerge+peered state

Added by Konstantin Shalygin over 2 years ago. Updated over 1 year ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
Peering
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi, we get a couple of outages with two stuck PG it premerge+peered state:

PG_AVAILABILITY Reduced data availability: 2 pgs inactive
    pg 4.1d9 is stuck inactive for 1000.400328, current state remapped+premerge+backfilling+peered, last acting [328,315,352]
    pg 4.9d9 is stuck inactive for 1000.400333, current state remapped+premerge+backfill_wait+peered, last acting [328,315,352]

When I was restart osd.352 - issue was resolved immediately. I was query this PG before perform restart (json's attached). After restart PG 4.9d9 was deleted
backfill target is osd.362 - this OSD was added at 2021-09-03, and also added some upmaps:

ceph-mon.mon1.log.2.gz:2021-09-03 08:41:05.988 7fa960ed2700  0 mon.mon1@0(leader) e6 handle_command mon_command({"prefix": "osd pg-upmap-items", "pgid": "4.9d9", "id": [362, 352]} v 0) v1
ceph-mon.mon1.log.2.gz:2021-09-03 08:41:05.988 7fa960ed2700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd=[{"prefix": "osd pg-upmap-items", "pgid": "4.9d9", "id": [362, 352]}]: dispatch
ceph-mon.mon1.log.2.gz:2021-09-03 08:41:06.824 7fa95f6cf700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd='[{"prefix": "osd pg-upmap-items", "pgid": "4.9d9", "id": [362, 352]}]': finished

Before this PG merge was runs (decreasing PG for pool_id 4)
Logs for this time for all OSD's:

osd.328.log

2021-09-05 11:50:49.561 7ff9340e4700  1 osd.328 pg_epoch: 494692 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494692/485476) [328,362,315]
 r=0 lpr=494692 pi=[493206,494692)/1 luod=0'0 crt=494690'26564689 lcod 494690'26564688 mlcod 0'0 active mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,352,315] -> [328,362,315], acting [328,352,315] -> [328,362,315], acting_primary 3
28 -> 328, up_primary 328 -> 328, role 0 -> 0, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:49.561 7ff9330e2700  1 osd.328 pg_epoch: 494692 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494692/328335) [328,362,315]
 r=0 lpr=494692 pi=[493179,494692)/1 luod=0'0 crt=494691'30825984 lcod 494687'30825982 mlcod 0'0 active mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,352,315] -> [328,362,315], acting [328,352,315] -> [328,362,315], acting_primary 3
28 -> 328, up_primary 328 -> 328, role 0 -> 0, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:49.561 7ff9340e4700  1 osd.328 pg_epoch: 494692 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494692/485476) [328,362,315]
 r=0 lpr=494692 pi=[493206,494692)/1 crt=494690'26564689 lcod 494690'26564688 mlcod 0'0 unknown mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Primary
2021-09-05 11:50:49.561 7ff9330e2700  1 osd.328 pg_epoch: 494692 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494692/328335) [328,362,315]
 r=0 lpr=494692 pi=[493179,494692)/1 crt=494691'30825984 lcod 494687'30825982 mlcod 0'0 unknown mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Primary
2021-09-05 11:50:50.585 7ff9340e4700  1 osd.328 pg_epoch: 494693 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494693/485476) [328,362,315]
/[328,315,352] r=0 lpr=494693 pi=[493206,494693)/1 crt=494690'26564689 lcod 494690'26564688 mlcod 0'0 remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,362,315] -> [328,315,352], acting_pr
imary 328 -> 328, up_primary 328 -> 328, role 0 -> 0, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:50.585 7ff9358e7700  1 osd.328 pg_epoch: 494693 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494693/328335) [328,362,315]
/[328,315,352] r=0 lpr=494693 pi=[493179,494693)/1 crt=494691'30825984 lcod 494687'30825982 mlcod 0'0 remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,362,315] -> [328,315,352], acting_pr
imary 328 -> 328, up_primary 328 -> 328, role 0 -> 0, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:50.585 7ff9340e4700  1 osd.328 pg_epoch: 494693 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494693/485476) [328,362,315]
/[328,315,352] r=0 lpr=494693 pi=[493206,494693)/1 crt=494690'26564689 lcod 494690'26564688 mlcod 0'0 remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Primary
2021-09-05 11:50:50.585 7ff9358e7700  1 osd.328 pg_epoch: 494693 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494693/328335) [328,362,315]
/[328,315,352] r=0 lpr=494693 pi=[493179,494693)/1 crt=494691'30825984 lcod 494687'30825982 mlcod 0'0 remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Primary
2021-09-05 11:50:51.617 7ff9340e4700  1 osd.328 pg_epoch: 494694 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494694/485476) [328,362,315]
/[328,315,352] backfill=[362] r=0 lpr=494694 pi=[493206,494694)/1 crt=494690'26564689 lcod 494690'26564688 mlcod 0'0 remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,315,352] -> [328,315,
352], acting_primary 328 -> 328, up_primary 328 -> 328, role 0 -> 0, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:51.617 7ff9340e4700  1 osd.328 pg_epoch: 494694 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494694/485476) [328,362,315]
/[328,315,352] r=0 lpr=494694 pi=[493206,494694)/1 crt=494690'26564689 lcod 494690'26564688 mlcod 0'0 remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Primary
2021-09-05 11:50:51.617 7ff9358e7700  1 osd.328 pg_epoch: 494694 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494694/328335) [328,362,315]
/[328,315,352] backfill=[362] r=0 lpr=494694 pi=[493179,494694)/1 crt=494691'30825984 lcod 494687'30825982 mlcod 0'0 remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,315,352] -> [328,315,
352], acting_primary 328 -> 328, up_primary 328 -> 328, role 0 -> 0, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:51.617 7ff9358e7700  1 osd.328 pg_epoch: 494694 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494694/328335) [328,362,315]
/[328,315,352] r=0 lpr=494694 pi=[493179,494694)/1 crt=494691'30825984 lcod 494687'30825982 mlcod 0'0 remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Primary
2021-09-05 11:50:52.657 7ff9340e4700  0 log_channel(cluster) log [DBG] : 4.9d9 starting backfill to osd.362 from (0'0,0'0] MAX to 494690'26564689
2021-09-05 11:50:52.657 7ff9330e2700  0 log_channel(cluster) log [DBG] : 4.1d9 starting backfill to osd.362 from (0'0,0'0] MAX to 494691'30825984
2021-09-05 12:10:03.354 7ff9358e7700  1 osd.328 pg_epoch: 494696 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=494694/494695 n=5765 ec=2302/2302 lis/c 494694/493179 les/c/f 494695/493180/0 494692/494696/328335) [328,362,315]
/[328,315] backfill=[362] r=0 lpr=494696 pi=[493179,494696)/1 luod=0'0 rops=1 crt=494691'30825984 lcod 494687'30825982 mlcod 0'0 remapped+premerge+peered mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], actin
g [328,315,352] -> [328,315], acting_primary 328 -> 328, up_primary 328 -> 328, role 0 -> 0, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:10:03.354 7ff9368e9700  1 osd.328 pg_epoch: 494696 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=494694/494695 n=5699 ec=2307/2302 lis/c 494694/493206 les/c/f 494695/493207/0 494692/494696/485476) [328,362,315]
/[328,315] backfill=[362] r=0 lpr=494696 pi=[493206,494696)/1 luod=0'0 crt=494690'26564689 lcod 494690'26564688 mlcod 0'0 remapped+premerge+peered mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,
315,352] -> [328,315], acting_primary 328 -> 328, up_primary 328 -> 328, role 0 -> 0, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:10:03.354 7ff9368e9700  1 osd.328 pg_epoch: 494696 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=494694/494695 n=5699 ec=2307/2302 lis/c 494694/493206 les/c/f 494695/493207/0 494692/494696/485476) [328,362,315]
/[328,315] r=0 lpr=494696 pi=[493206,494696)/1 crt=494690'26564689 lcod 494690'26564688 mlcod 0'0 remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Primary
2021-09-05 12:10:03.354 7ff9358e7700  1 osd.328 pg_epoch: 494696 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=494694/494695 n=5765 ec=2302/2302 lis/c 494694/493179 les/c/f 494695/493180/0 494692/494696/328335) [328,362,315]
/[328,315] r=0 lpr=494696 pi=[493179,494696)/1 crt=494691'30825984 lcod 494687'30825982 mlcod 0'0 remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Primary
2021-09-05 12:10:03.630 7ff9368e9700  0 log_channel(cluster) log [INF] : 4.9d9 continuing backfill to osd.362 from (494679'26561689,494690'26564689] MIN to 494690'26564689
2021-09-05 12:10:03.630 7ff9330e2700  0 log_channel(cluster) log [INF] : 4.1d9 continuing backfill to osd.362 from (494673'30822984,494691'30825984] 4:9b8e3888:::rbd_data.2c48cf82c3faa3.0000000000000dcd:head to 494691'30825984
2021-09-05 12:10:04.738 7ff9330e2700  1 osd.328 pg_epoch: 494698 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=494696/494697 n=5765 ec=2302/2302 lis/c 494696/493179 les/c/f 494697/493180/0 494692/494698/328335) [328,362,315]
/[328,315] backfill=[362] r=0 lpr=494698 pi=[493179,494698)/1 luod=0'0 rops=1 crt=494691'30825984 lcod 494687'30825982 mlcod 0'0 remapped+premerge+peered mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], actin
g [328,315] -> [328,315], acting_primary 328 -> 328, up_primary 328 -> 328, role 0 -> 0, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:10:04.738 7ff9330e2700  1 osd.328 pg_epoch: 494698 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=494696/494697 n=5765 ec=2302/2302 lis/c 494696/493179 les/c/f 494697/493180/0 494692/494698/328335) [328,362,315]
/[328,315] r=0 lpr=494698 pi=[493179,494698)/1 crt=494691'30825984 lcod 494687'30825982 mlcod 0'0 remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Primary
2021-09-05 12:10:04.738 7ff9340e4700  1 osd.328 pg_epoch: 494698 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=494696/494697 n=5699 ec=2307/2302 lis/c 494696/493206 les/c/f 494697/493207/0 494692/494698/485476) [328,362,315]
/[328,315] backfill=[362] r=0 lpr=494698 pi=[493206,494698)/1 luod=0'0 crt=494690'26564689 lcod 494690'26564688 mlcod 0'0 remapped+premerge+peered mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,
315] -> [328,315], acting_primary 328 -> 328, up_primary 328 -> 328, role 0 -> 0, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:10:04.738 7ff9340e4700  1 osd.328 pg_epoch: 494698 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=494696/494697 n=5699 ec=2307/2302 lis/c 494696/493206 les/c/f 494697/493207/0 494692/494698/485476) [328,362,315]
/[328,315] r=0 lpr=494698 pi=[493206,494698)/1 crt=494690'26564689 lcod 494690'26564688 mlcod 0'0 remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Primary
2021-09-05 12:10:05.726 7ff9368e9700  0 log_channel(cluster) log [INF] : 4.9d9 continuing backfill to osd.362 from (494679'26561689,494690'26564689] MIN to 494690'26564689
2021-09-05 12:10:05.726 7ff9330e2700  0 log_channel(cluster) log [INF] : 4.1d9 continuing backfill to osd.362 from (494673'30822984,494691'30825984] 4:9b8e3d6a:::rbd_data.c1ed8e3723dd3d.000000000001f461:head to 494691'30825984
2021-09-05 12:11:40.543 7ff9320e0700  0 osd.328 494705 do_command r=0
2021-09-05 12:11:40.551 7ff9320e0700  0 osd.328 494705 do_command r=0
2021-09-05 12:13:51.909 7ff9358e7700  1 osd.328 pg_epoch: 494716 pg[4.1d9( v 494713'30826259 (494675'30823255,494713'30826259] local-lis/les=494698/494699 n=5766 ec=2302/2302 lis/c 494698/493179 les/c/f 494699/493180/0 494692/494716/328335) [328,362,315]
 r=0 lpr=494716 pi=[493179,494716)/1 luod=0'0 crt=494713'30826259 lcod 494713'30826258 mlcod 0'0 active mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,315] -> [328,362,315], acting_primary 328 -
> 328, up_primary 328 -> 328, role 0 -> 0, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:13:51.909 7ff9358e7700  1 osd.328 pg_epoch: 494716 pg[4.1d9( v 494713'30826259 (494675'30823255,494713'30826259] local-lis/les=494698/494699 n=5766 ec=2302/2302 lis/c 494698/493179 les/c/f 494699/493180/0 494692/494716/328335) [328,362,315]
 r=0 lpr=494716 pi=[493179,494716)/1 crt=494713'30826259 lcod 494713'30826258 mlcod 0'0 unknown mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Primary

osd.315.log

2021-09-05 11:50:49.525 7f506df40700  1 osd.315 pg_epoch: 494692 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494692/328335) [328,362,315]
 r=2 lpr=494692 pi=[493179,494692)/1 luod=0'0 lua=494547'30756656 crt=494691'30825984 lcod 494687'30825982 active mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,352,315] -> [328,362,315], acting [328,352,315] -> [328,362,315], acting
_primary 328 -> 328, up_primary 328 -> 328, role 2 -> 2, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:49.525 7f506ef42700  1 osd.315 pg_epoch: 494692 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494692/485476) [328,362,315]
 r=2 lpr=494692 pi=[493206,494692)/1 luod=0'0 lua=494547'26450040 crt=494690'26564689 lcod 494690'26564688 active mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,352,315] -> [328,362,315], acting [328,352,315] -> [328,362,315], acting
_primary 328 -> 328, up_primary 328 -> 328, role 2 -> 2, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:49.525 7f506df40700  1 osd.315 pg_epoch: 494692 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494692/328335) [328,362,315]
 r=2 lpr=494692 pi=[493179,494692)/1 crt=494691'30825984 lcod 494687'30825982 unknown NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 11:50:49.525 7f506ef42700  1 osd.315 pg_epoch: 494692 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494692/485476) [328,362,315]
 r=2 lpr=494692 pi=[493206,494692)/1 crt=494690'26564689 lcod 494690'26564688 unknown NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 11:50:50.581 7f506c73d700  1 osd.315 pg_epoch: 494693 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494693/485476) [328,362,315]
/[328,315,352] r=1 lpr=494693 pi=[493206,494693)/1 crt=494690'26564689 lcod 494690'26564688 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,362,315] -> [328,315,352], acting_prima
ry 328 -> 328, up_primary 328 -> 328, role 2 -> 1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:50.581 7f506c73d700  1 osd.315 pg_epoch: 494693 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494693/485476) [328,362,315]
/[328,315,352] r=1 lpr=494693 pi=[493206,494693)/1 crt=494690'26564689 lcod 494690'26564688 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 11:50:50.581 7f506df40700  1 osd.315 pg_epoch: 494693 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494693/328335) [328,362,315]
/[328,315,352] r=1 lpr=494693 pi=[493179,494693)/1 crt=494691'30825984 lcod 494687'30825982 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,362,315] -> [328,315,352], acting_prima
ry 328 -> 328, up_primary 328 -> 328, role 2 -> 1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:50.581 7f506df40700  1 osd.315 pg_epoch: 494693 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494693/328335) [328,362,315]
/[328,315,352] r=1 lpr=494693 pi=[493179,494693)/1 crt=494691'30825984 lcod 494687'30825982 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 11:50:51.645 7f506ef42700  1 osd.315 pg_epoch: 494694 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494694/485476) [328,362,315]
/[328,315,352] r=1 lpr=494694 pi=[493206,494694)/1 crt=494690'26564689 lcod 494690'26564688 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,315,352] -> [328,315,352], acting_prima
ry 328 -> 328, up_primary 328 -> 328, role 1 -> 1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:51.645 7f506ef42700  1 osd.315 pg_epoch: 494694 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494694/485476) [328,362,315]
/[328,315,352] r=1 lpr=494694 pi=[493206,494694)/1 crt=494690'26564689 lcod 494690'26564688 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 11:50:51.645 7f506b73b700  1 osd.315 pg_epoch: 494694 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494694/328335) [328,362,315]
/[328,315,352] r=1 lpr=494694 pi=[493179,494694)/1 crt=494691'30825984 lcod 494687'30825982 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,315,352] -> [328,315,352], acting_prima
ry 328 -> 328, up_primary 328 -> 328, role 1 -> 1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:51.645 7f506b73b700  1 osd.315 pg_epoch: 494694 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494694/328335) [328,362,315]
/[328,315,352] r=1 lpr=494694 pi=[493179,494694)/1 crt=494691'30825984 lcod 494687'30825982 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 12:10:02.597 7f506df40700  1 osd.315 pg_epoch: 494696 pg[4.7a6( v 494695'31404168 (494685'31401125,494695'31404168] local-lis/les=367719/367720 n=11285 ec=2305/2302 lis/c 367719/367719 les/c/f 367720/367720/0 494696/494696/494696) [333,315] r=
1 lpr=494696 pi=[367719,494696)/1 luod=0'0 lua=494547'31097439 crt=494695'31404168 lcod 494695'31404167 active mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [352,333,315] -> [333,315], acting [352,333,315] -> [333,315], acting_primary 35
2 -> 333, up_primary 352 -> 333, role 2 -> 1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:10:02.597 7f506ef42700  1 osd.315 pg_epoch: 494696 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=494694/494695 n=5699 ec=2307/2302 lis/c 494694/493206 les/c/f 494695/493207/0 494692/494696/485476) [328,362,315]
/[328,315] r=1 lpr=494696 pi=[493206,494696)/1 luod=0'0 crt=494690'26564689 lcod 494690'26564688 active+remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,315,352] -> [328,315], acting_prim
ary 328 -> 328, up_primary 328 -> 328, role 1 -> 1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:10:02.597 7f506df40700  1 osd.315 pg_epoch: 494696 pg[4.7a6( v 494695'31404168 (494685'31401125,494695'31404168] local-lis/les=367719/367720 n=11285 ec=2305/2302 lis/c 367719/367719 les/c/f 367720/367720/0 494696/494696/494696) [333,315] r=
1 lpr=494696 pi=[367719,494696)/1 crt=494695'31404168 lcod 494695'31404167 unknown NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 12:10:02.597 7f506df40700  1 osd.315 pg_epoch: 494696 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=494694/494695 n=5765 ec=2302/2302 lis/c 494694/493179 les/c/f 494695/493180/0 494692/494696/328335) [328,362,315]
/[328,315] r=1 lpr=494696 pi=[493179,494696)/1 luod=0'0 crt=494691'30825984 lcod 494687'30825982 active+remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,315,352] -> [328,315], acting_prim
ary 328 -> 328, up_primary 328 -> 328, role 1 -> 1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:10:02.597 7f506df40700  1 osd.315 pg_epoch: 494696 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=494694/494695 n=5765 ec=2302/2302 lis/c 494694/493179 les/c/f 494695/493180/0 494692/494696/328335) [328,362,315]
/[328,315] r=1 lpr=494696 pi=[493179,494696)/1 crt=494691'30825984 lcod 494687'30825982 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 12:10:02.597 7f506ef42700  1 osd.315 pg_epoch: 494696 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=494694/494695 n=5699 ec=2307/2302 lis/c 494694/493206 les/c/f 494695/493207/0 494692/494696/485476) [328,362,315]
/[328,315] r=1 lpr=494696 pi=[493206,494696)/1 crt=494690'26564689 lcod 494690'26564688 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 12:10:04.765 7f506b73b700  1 osd.315 pg_epoch: 494698 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=494696/494697 n=5765 ec=2302/2302 lis/c 494696/493179 les/c/f 494697/493180/0 494692/494698/328335) [328,362,315]
/[328,315] r=1 lpr=494698 pi=[493179,494698)/1 luod=0'0 crt=494691'30825984 lcod 494687'30825982 active+remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,315] -> [328,315], acting_primary
328 -> 328, up_primary 328 -> 328, role 1 -> 1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:10:04.765 7f506c73d700  1 osd.315 pg_epoch: 494698 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=494696/494697 n=5699 ec=2307/2302 lis/c 494696/493206 les/c/f 494697/493207/0 494692/494698/485476) [328,362,315]
/[328,315] r=1 lpr=494698 pi=[493206,494698)/1 luod=0'0 crt=494690'26564689 lcod 494690'26564688 active+remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,315] -> [328,315], acting_primary
328 -> 328, up_primary 328 -> 328, role 1 -> 1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:10:04.765 7f506c73d700  1 osd.315 pg_epoch: 494698 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=494696/494697 n=5699 ec=2307/2302 lis/c 494696/493206 les/c/f 494697/493207/0 494692/494698/485476) [328,362,315]
/[328,315] r=1 lpr=494698 pi=[493206,494698)/1 crt=494690'26564689 lcod 494690'26564688 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 12:10:04.765 7f506b73b700  1 osd.315 pg_epoch: 494698 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=494696/494697 n=5765 ec=2302/2302 lis/c 494696/493179 les/c/f 494697/493180/0 494692/494698/328335) [328,362,315]
/[328,315] r=1 lpr=494698 pi=[493179,494698)/1 crt=494691'30825984 lcod 494687'30825982 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 12:10:19.449 7f506af3a700  1 osd.315 pg_epoch: 494700 pg[4.5c2( v 494699'46116981 (494683'46113941,494699'46116981] local-lis/les=493362/493363 n=11376 ec=2305/2302 lis/c 493362/493362 les/c/f 493363/494699/0 493361/494700/494700) [376,315,356
] r=1 lpr=494700 pi=[493362,494700)/1 luod=0'0 lua=494547'45939362 crt=494699'46116981 lcod 494699'46116979 active mbc={} ps=[23f~1]] start_peering_interval up [376,315,356] -> [376,315,356], acting [113,315,356] -> [376,315,356], acting_primary 113 -> 3
76, up_primary 376 -> 376, role 1 -> 1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:10:19.453 7f506af3a700  1 osd.315 pg_epoch: 494700 pg[4.5c2( v 494699'46116981 (494683'46113941,494699'46116981] local-lis/les=493362/493363 n=11376 ec=2305/2302 lis/c 493362/493362 les/c/f 493363/494699/0 493361/494700/494700) [376,315,356
] r=1 lpr=494700 pi=[493362,494700)/1 crt=494699'46116981 lcod 494699'46116979 unknown NOTIFY mbc={} ps=[23f~1]] state<Start>: transitioning to Stray
2021-09-05 12:10:57.345 7f506b73b700  1 osd.315 pg_epoch: 494702 pg[4.7a6( v 494701'31404229 (494685'31401225,494701'31404229] local-lis/les=494696/494697 n=11285 ec=2305/2302 lis/c 494696/367719 les/c/f 494697/367720/0 494702/494702/494702) [352,333,315
] r=2 lpr=494702 pi=[367719,494702)/1 luod=0'0 lua=494697'31404169 crt=494701'31404229 lcod 494701'31404228 active mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [333,315] -> [352,333,315], acting [333,315] -> [352,333,315], acting_primar
y 333 -> 352, up_primary 333 -> 352, role 1 -> 2, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:10:57.345 7f506b73b700  1 osd.315 pg_epoch: 494702 pg[4.7a6( v 494701'31404229 (494685'31401225,494701'31404229] local-lis/les=494696/494697 n=11285 ec=2305/2302 lis/c 494696/367719 les/c/f 494697/367720/0 494702/494702/494702) [352,333,315
] r=2 lpr=494702 pi=[367719,494702)/1 crt=494701'31404229 lcod 494701'31404228 unknown NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 12:13:51.939 7f506df40700  1 osd.315 pg_epoch: 494716 pg[4.1d9( v 494713'30826259 (494675'30823255,494713'30826259] local-lis/les=494698/494699 n=5766 ec=2302/2302 lis/c 494698/493179 les/c/f 494699/493180/0 494692/494716/328335) [328,362,315]
 r=2 lpr=494716 pi=[493179,494716)/1 luod=0'0 lua=494699'30826084 crt=494713'30826259 lcod 494713'30826258 active mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,315] -> [328,362,315], acting_pri
mary 328 -> 328, up_primary 328 -> 328, role 1 -> 2, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:13:51.939 7f506df40700  1 osd.315 pg_epoch: 494716 pg[4.1d9( v 494713'30826259 (494675'30823255,494713'30826259] local-lis/les=494698/494699 n=5766 ec=2302/2302 lis/c 494698/493179 les/c/f 494699/493180/0 494692/494716/328335) [328,362,315]
 r=2 lpr=494716 pi=[493179,494716)/1 crt=494713'30826259 lcod 494713'30826258 unknown NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray

osd.352.log

2021-09-05 11:50:49.540 7f60f5ce4700  1 osd.352 pg_epoch: 494692 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494692/485476) [328,362,315]
 r=-1 lpr=494692 pi=[493206,494692)/1 luod=0'0 lua=494547'26450040 crt=494690'26564689 lcod 494690'26564688 active mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,352,315] -> [328,362,315], acting [328,352,315] -> [328,362,315], actin
g_primary 328 -> 328, up_primary 328 -> 328, role 1 -> -1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:49.540 7f60f5ce4700  1 osd.352 pg_epoch: 494692 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494692/485476) [328,362,315]
 r=-1 lpr=494692 pi=[493206,494692)/1 crt=494690'26564689 lcod 494690'26564688 unknown NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 11:50:49.540 7f60f24dd700  1 osd.352 pg_epoch: 494692 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494692/328335) [328,362,315]
 r=-1 lpr=494692 pi=[493179,494692)/1 luod=0'0 lua=494547'30756656 crt=494691'30825984 lcod 494687'30825983 active mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,352,315] -> [328,362,315], acting [328,352,315] -> [328,362,315], actin
g_primary 328 -> 328, up_primary 328 -> 328, role 1 -> -1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:49.540 7f60f24dd700  1 osd.352 pg_epoch: 494692 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494692/328335) [328,362,315]
 r=-1 lpr=494692 pi=[493179,494692)/1 crt=494691'30825984 lcod 494687'30825983 unknown NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 11:50:50.589 7f60f4ce2700  1 osd.352 pg_epoch: 494693 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494693/328335) [328,362,315]
/[328,315,352] r=2 lpr=494693 pi=[493179,494693)/1 crt=494691'30825984 lcod 494687'30825983 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,362,315] -> [328,315,352], acting_prima
ry 328 -> 328, up_primary 328 -> 328, role -1 -> 2, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:50.589 7f60f4ce2700  1 osd.352 pg_epoch: 494693 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494693/328335) [328,362,315]
/[328,315,352] r=2 lpr=494693 pi=[493179,494693)/1 crt=494691'30825984 lcod 494687'30825983 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 11:50:50.593 7f60f5ce4700  1 osd.352 pg_epoch: 494693 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494693/485476) [328,362,315]
/[328,315,352] r=2 lpr=494693 pi=[493206,494693)/1 crt=494690'26564689 lcod 494690'26564688 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,362,315] -> [328,315,352], acting_prima
ry 328 -> 328, up_primary 328 -> 328, role -1 -> 2, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:50.593 7f60f5ce4700  1 osd.352 pg_epoch: 494693 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494693/485476) [328,362,315]
/[328,315,352] r=2 lpr=494693 pi=[493206,494693)/1 crt=494690'26564689 lcod 494690'26564688 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 11:50:51.629 7f60f5ce4700  1 osd.352 pg_epoch: 494694 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494694/485476) [328,362,315]
/[328,315,352] r=2 lpr=494694 pi=[493206,494694)/1 crt=494690'26564689 lcod 494690'26564688 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,315,352] -> [328,315,352], acting_prima
ry 328 -> 328, up_primary 328 -> 328, role 2 -> 2, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:51.629 7f60f4ce2700  1 osd.352 pg_epoch: 494694 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494694/328335) [328,362,315]
/[328,315,352] r=2 lpr=494694 pi=[493179,494694)/1 crt=494691'30825984 lcod 494687'30825983 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,315,352] -> [328,315,352], acting_prima
ry 328 -> 328, up_primary 328 -> 328, role 2 -> 2, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 11:50:51.629 7f60f5ce4700  1 osd.352 pg_epoch: 494694 pg[4.9d9( v 494690'26564689 (494679'26561633,494690'26564689] local-lis/les=493206/493207 n=5699 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494694/485476) [328,362,315]
/[328,315,352] r=2 lpr=494694 pi=[493206,494694)/1 crt=494690'26564689 lcod 494690'26564688 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 11:50:51.629 7f60f4ce2700  1 osd.352 pg_epoch: 494694 pg[4.1d9( v 494691'30825984 (494673'30822955,494691'30825984] local-lis/les=493179/493180 n=5765 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494694/328335) [328,362,315]
/[328,315,352] r=2 lpr=494694 pi=[493179,494694)/1 crt=494691'30825984 lcod 494687'30825983 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray

osd.362.log

2021-09-05 11:50:52.685 7fb43cd78700  1 osd.362 pg_epoch: 494694 pg[4.9d9( empty local-lis/les=0/0 n=0 ec=2307/2302 lis/c 493206/493206 les/c/f 493207/493207/0 494692/494694/485476) [328,362,315]/[328,315,352] r=-1 lpr=494694 pi=[493206,494694)/1 crt=0'0
 unknown mbc={}] state<Start>: transitioning to Stray
2021-09-05 11:50:52.685 7fb43e57b700  1 osd.362 pg_epoch: 494694 pg[4.1d9( empty local-lis/les=0/0 n=0 ec=2302/2302 lis/c 493179/493179 les/c/f 493180/493180/0 494692/494694/328335) [328,362,315]/[328,315,352] r=-1 lpr=494694 pi=[493179,494694)/1 crt=0'0
 unknown mbc={}] state<Start>: transitioning to Stray

Then restart osd.352

2021-09-05 12:10:02.628 7fb43cd78700  1 osd.362 pg_epoch: 494696 pg[4.9d9( v 494690'26564689 (494679'26561689,494690'26564689] lb MIN (bitwise) local-lis/les=494694/494695 n=0 ec=2307/2302 lis/c 494694/493206 les/c/f 494695/493207/0 494692/494696/485476)
 [328,362,315]/[328,315] r=-1 lpr=494696 pi=[493206,494696)/1 luod=0'0 crt=494690'26564689 lcod 0'0 active+remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,315,352] -> [328,315], acting_p
rimary 328 -> 328, up_primary 328 -> 328, role -1 -> -1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:10:02.628 7fb43cd78700  1 osd.362 pg_epoch: 494696 pg[4.9d9( v 494690'26564689 (494679'26561689,494690'26564689] lb MIN (bitwise) local-lis/les=494694/494695 n=0 ec=2307/2302 lis/c 494694/493206 les/c/f 494695/493207/0 494692/494696/485476)
 [328,362,315]/[328,315] r=-1 lpr=494696 pi=[493206,494696)/1 crt=494690'26564689 lcod 0'0 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 12:10:02.628 7fb43e57b700  1 osd.362 pg_epoch: 494696 pg[4.1d9( v 494691'30825984 (494673'30822984,494691'30825984] lb 4:9b8e3888:::rbd_data.2c48cf82c3faa3.0000000000000dcd:head (bitwise) local-lis/les=494694/494695 n=5101 ec=2302/2302 lis/c 4
94694/493179 les/c/f 494695/493180/0 494692/494696/328335) [328,362,315]/[328,315] r=-1 lpr=494696 pi=[493179,494696)/1 luod=0'0 crt=494691'30825984 active+remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,
315], acting [328,315,352] -> [328,315], acting_primary 328 -> 328, up_primary 328 -> 328, role -1 -> -1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:10:02.628 7fb43e57b700  1 osd.362 pg_epoch: 494696 pg[4.1d9( v 494691'30825984 (494673'30822984,494691'30825984] lb 4:9b8e3888:::rbd_data.2c48cf82c3faa3.0000000000000dcd:head (bitwise) local-lis/les=494694/494695 n=5101 ec=2302/2302 lis/c 4
94694/493179 les/c/f 494695/493180/0 494692/494696/328335) [328,362,315]/[328,315] r=-1 lpr=494696 pi=[493179,494696)/1 crt=494691'30825984 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 12:10:04.724 7fb43cd78700  1 osd.362 pg_epoch: 494698 pg[4.9d9( v 494690'26564689 (494679'26561689,494690'26564689] lb MIN (bitwise) local-lis/les=494694/494695 n=0 ec=2307/2302 lis/c 494696/493206 les/c/f 494697/493207/0 494692/494698/485476)
 [328,362,315]/[328,315] r=-1 lpr=494698 pi=[493206,494698)/1 luod=0'0 crt=494690'26564689 lcod 0'0 active+remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,315] -> [328,315], acting_prima
ry 328 -> 328, up_primary 328 -> 328, role -1 -> -1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:10:04.724 7fb43cd78700  1 osd.362 pg_epoch: 494698 pg[4.9d9( v 494690'26564689 (494679'26561689,494690'26564689] lb MIN (bitwise) local-lis/les=494694/494695 n=0 ec=2307/2302 lis/c 494696/493206 les/c/f 494697/493207/0 494692/494698/485476)
 [328,362,315]/[328,315] r=-1 lpr=494698 pi=[493206,494698)/1 crt=494690'26564689 lcod 0'0 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 12:10:04.724 7fb43bd76700  1 osd.362 pg_epoch: 494698 pg[4.1d9( v 494691'30825984 (494673'30822984,494691'30825984] lb 4:9b8e3d6a:::rbd_data.c1ed8e3723dd3d.000000000001f461:head (bitwise) local-lis/les=494694/494695 n=5107 ec=2302/2302 lis/c 4
94696/493179 les/c/f 494697/493180/0 494692/494698/328335) [328,362,315]/[328,315] r=-1 lpr=494698 pi=[493179,494698)/1 luod=0'0 crt=494691'30825984 active+remapped mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,
315], acting [328,315] -> [328,315], acting_primary 328 -> 328, up_primary 328 -> 328, role -1 -> -1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:10:04.724 7fb43bd76700  1 osd.362 pg_epoch: 494698 pg[4.1d9( v 494691'30825984 (494673'30822984,494691'30825984] lb 4:9b8e3d6a:::rbd_data.c1ed8e3723dd3d.000000000001f461:head (bitwise) local-lis/les=494694/494695 n=5107 ec=2302/2302 lis/c 4
94696/493179 les/c/f 494697/493180/0 494692/494698/328335) [328,362,315]/[328,315] r=-1 lpr=494698 pi=[493179,494698)/1 crt=494691'30825984 remapped NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray
2021-09-05 12:13:51.923 7fb43e57b700  1 osd.362 pg_epoch: 494716 pg[4.1d9( v 494713'30826259 (494675'30823255,494713'30826259] local-lis/les=494694/494695 n=5766 ec=2302/2302 lis/c 494698/493179 les/c/f 494699/493180/0 494692/494716/328335) [328,362,315]
 r=1 lpr=494716 pi=[493179,494716)/1 luod=0'0 lua=494699'30826084 crt=494713'30826259 active mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] start_peering_interval up [328,362,315] -> [328,362,315], acting [328,315] -> [328,362,315], acting_primary 328 -> 328, up_p
rimary 328 -> 328, role -1 -> 1, features acting 4611087854035861503 upacting 4611087854035861503
2021-09-05 12:13:51.923 7fb43e57b700  1 osd.362 pg_epoch: 494716 pg[4.1d9( v 494713'30826259 (494675'30823255,494713'30826259] local-lis/les=494694/494695 n=5766 ec=2302/2302 lis/c 494698/493179 les/c/f 494699/493180/0 494692/494716/328335) [328,362,315]
 r=1 lpr=494716 pi=[493179,494716)/1 crt=494713'30826259 unknown NOTIFY mbc={} ps=[23f~1,408~1,4a2~1,4a4~1]] state<Start>: transitioning to Stray

This is some bug or operations issue (e.g. wrong mappigs)?

4.9d9.json (24.6 KB) Konstantin Shalygin, 09/05/2021 02:11 PM

4.1d9.json (126 KB) Konstantin Shalygin, 09/05/2021 02:11 PM

History

#1 Updated by Konstantin Shalygin over 2 years ago

  • Affected Versions v14.2.22 added

#2 Updated by Konstantin Shalygin over 2 years ago

  • Description updated (diff)

#3 Updated by Neha Ojha over 2 years ago

  • Status changed from New to Need More Info

This is similar to https://tracker.ceph.com/issues/44684, which has already been fixed. It seems like the pgs are in backfill* states and therefore, we are waiting to merge, hence (+premerge). What's not clear is whether backfilling started after the merge targets were chosen by the mgr or before. Do you happen have the mgr logs from the same time, we are interested in logging from adjust_pgs? It'd be helpful if you could upload all the relevant logs using ceph-post-file.

#4 Updated by Konstantin Shalygin over 2 years ago

Neha, sorry, logs already rotated by logrotate.
When I was removes all upmaps from this pool - all PG merges passed without problems. So now we just don't do upmaps + merges in one time to avoid problems like this

#5 Updated by Konstantin Shalygin over 2 years ago

We can plan and spent time to setup staging cluster for this and try to reproduce it, if this a bug. With debug_mgr "10", as I can see, to see what mgr do

#6 Updated by Neha Ojha over 2 years ago

Konstantin Shalygin wrote:

We can plan and spent time to setup staging cluster for this and try to reproduce it, if this a bug. With debug_mgr "10", as I can see, to see what mgr do

That would be great, thanks!

#7 Updated by Markus Wennrich over 2 years ago

We had a similar outage.

We did try to increase the number of PGs on a bucket-index-pool:

$ ceph osd pool set s3.rgw.buckets.index pg_num 32

Then two PGs stuck inactive:

HEALTH_WARN Reduced data availability: 2 pgs inactive
[WRN] PG_AVAILABILITY: Reduced data availability: 2 pgs inactive
  pg 5.e is stuck inactive for 23m, current state stale+remapped+premerge+backfill_wait+peered, last acting [10,24,26]
  pg 5.1e is stuck inactive for 23m, current state stale+remapped+premerge+backfill_wait+peered, last acting [10,24,26]

Some buckets were then unavailable.

After speeding things up with increasing osd-max-backfills and going to remapped+premerge+backfilling and then finally to remapped+backfilling, they got unstuck and buckets were available again.

It looks like pg_autoscaler (though in scale-up profile) tried to decrease immediately again:

debug 2021-10-27T08:39:57.310+0000 7fc6dcc00700 0 [progress INFO root] update: starting ev 6b2c4cbc-5d15-4046-b0fc-bd4ceb518c13 (PG autoscaler decreasing pool 5 PGs from 32 to 8)

POOL                   ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
s3.rgw.buckets.index    5   27  3.6 MiB    1.08M   11 MiB      0     95 TiB
# ceph osd pool autoscale-status
POOL                     SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  PROFILE   
s3.rgw.buckets.index    3661k                3.0        330.2T  0.0000                                  1.0       8              on         scale-up 

#8 Updated by Konstantin Shalygin about 2 years ago

@Markus just for the record, what is your Ceph version? And what is your hardware for OSD's? The actual issue was on AMD CPU's
Thanks

#9 Updated by Konstantin Shalygin about 2 years ago

Neha Ojha wrote:

Konstantin Shalygin wrote:

We can plan and spent time to setup staging cluster for this and try to reproduce it, if this a bug. With debug_mgr "10", as I can see, to see what mgr do

That would be great, thanks!

Today I was trying PG split, then PG merge couple of times on staging cluster and can't find actual reproduces for this issue (Intel CPU's)

#10 Updated by Konstantin Shalygin over 1 year ago

  • Status changed from Need More Info to Can't reproduce

We have never experienced this problem again

Also available in: Atom PDF