Project

General

Profile

Actions

Bug #51282

closed

pybind/mgr/mgr_util: .mgr pool may be created too early causing spurious PG_DEGRADED warnings

Added by Patrick Donnelly almost 3 years ago. Updated 20 days ago.

Status:
Resolved
Priority:
Normal
Category:
Testing
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
4 - irritation
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-06-16T22:22:43.040+0000 7f6e8e779700 20 mon.a@0(leader).mgrstat health checks:
{   
    "PG_DEGRADED": {
        "severity": "HEALTH_WARN",
        "summary": {
            "message": "Degraded data redundancy: 2/4 objects degraded (50.000%), 1 pg degraded",
            "count": 1
        },
        "detail": [
            {   
                "message": "pg 1.0 is active+undersized+degraded, acting [7]" 
            }
        ]
    }
}

From: /ceph/teuthology-archive/pdonnell-2021-06-16_21:26:55-fs-wip-pdonnell-testing-20210616.191804-distro-basic-smithi/6175605/remote/smithi120/log/ceph-mon.a.log.gz

and a few other tests:

Failure: "2021-06-16T22:22:43.881363+0000 mon.a (mon.0) 143 : cluster [WRN] Health check failed: Degraded data redundancy: 2/4 objects degraded (50.000%), 1 pg degraded (PG_DEGRADED)" in cluster log
7 jobs: ['6175605', '6175619', '6175556', '6175591', '6175671', '6175600', '6175639']
suites intersection: ['conf/{client', 'mds', 'mon', 'osd}', 'overrides/{frag_enable', 'whitelist_health', 'whitelist_wrongly_marked_down}']
suites union: ['clusters/1a11s-mds-1c-client-3node', 'clusters/1a3s-mds-1c-client', 'conf/{client', 'distro/{centos_8}', 'distro/{rhel_8}', 'distro/{ubuntu_latest}', 'fs/snaps/{begin', 'fs/workload/{begin', 'k-testing}', 'mds', 'mon', 'mount/fuse', 'mount/kclient/{mount', 'ms-die-on-skipped}}', 'objectstore-ec/bluestore-comp', 'objectstore-ec/bluestore-comp-ec-root', 'omap_limit/10', 'omap_limit/10000', 'osd-asserts', 'osd}', 'overrides/{distro/testing/{flavor/centos_latest', 'overrides/{distro/testing/{flavor/ubuntu_latest', 'overrides/{frag_enable', 'ranks/1', 'ranks/3', 'ranks/5', 'scrub/no', 'scrub/yes', 'session_timeout', 'standby-replay', 'tasks/workunit/snaps}', 'tasks/{0-check-counter', 'whitelist_health', 'whitelist_wrongly_marked_down}', 'workunit/fs/misc}', 'workunit/fs/test_o_trunc}', 'workunit/suites/ffsb}', 'workunit/suites/fsstress}', 'workunit/suites/iogen}', 'workunit/suites/iozone}', 'wsync/{no}}', 'wsync/{yes}}']

I'm thinking this check is not quite right:https://github.com/ceph/ceph/blob/05d7f883a04d230cf17b40a1c7e8044d402c6a30/src/pybind/mgr/mgr_module.py#L972-L981

(I lifted that code from the devicehealth module.)


Related issues 2 (0 open2 closed)

Has duplicate rgw - Bug #51727: cluster [WRN] Health check failed: Degraded data redundancy: 2/4 objects degraded (50.000%), 1 pg degraded (PG_DEGRADED)" in cluster logClosed

Actions
Has duplicate CephFS - Bug #55825: cluster [WRN] Health check failed: Degraded data redundancy: 1 pg degraded (PG_DEGRADED)" in clusterDuplicate

Actions
Actions #1

Updated by Brad Hubbard almost 3 years ago

  • Subject changed from pybind/mgr/mgr_util: .mgr pool may be created to early causing spurious PG_DEGRADED warnings to pybind/mgr/mgr_util: .mgr pool may be created too early causing spurious PG_DEGRADED warnings
Actions #2

Updated by Brad Hubbard almost 3 years ago

  • Project changed from Ceph to mgr
Actions #3

Updated by Neha Ojha almost 3 years ago

2021-06-16T22:22:36.116+0000 7f6e8bf74700  0 log_channel(cluster) log [DBG] : osdmap e18: 12 total, 4 up, 12 in
...
2021-06-16T22:22:37.144+0000 7f6e90f7e700 10 mon.a@0(leader).osd e18 update_pending_pgs  pg 1.0 just added,  up [7] p 7 acting [7] p 7 history ec=19/19 lis/c=0/0 les/c/f=0/0/0 sis=19 past_intervals ([0,0] all_participants= intervals=)
...
2021-06-16T22:22:43.040+0000 7f6e8e779700 20 mon.a@0(leader).mgrstat health checks:
{
    "PG_DEGRADED": {
        "severity": "HEALTH_WARN",
        "summary": {
            "message": "Degraded data redundancy: 2/4 objects degraded (50.000%), 1 pg degraded",
            "count": 1
        },
        "detail": [
            {
                "message": "pg 1.0 is active+undersized+degraded, acting [7]" 
            }
        ]
    }
}

The 4 osds that were up and in:

2021-06-16T22:22:36.112+0000 7f6e90f7e700  2 mon.a@0(leader).osd e17  osd.1 UP [v2:172.21.15.199:6808/20782,v1:172.21.15.199:6809/20782]
2021-06-16T22:22:36.112+0000 7f6e90f7e700  2 mon.a@0(leader).osd e17  osd.4 UP [v2:172.21.15.199:6800/20784,v1:172.21.15.199:6801/20784]
2021-06-16T22:22:36.112+0000 7f6e90f7e700  2 mon.a@0(leader).osd e17  osd.7 UP [v2:172.21.15.199:6816/20783,v1:172.21.15.199:6817/20783]
2021-06-16T22:22:36.112+0000 7f6e90f7e700  2 mon.a@0(leader).osd e17  osd.10 UP [v2:172.21.15.199:6824/20785,v1:172.21.15.199:6825/20785]
2021-06-16T22:22:36.112+0000 7f6e90f7e700  2 mon.a@0(leader).osd e17  osd.1 IN
2021-06-16T22:22:36.112+0000 7f6e90f7e700  2 mon.a@0(leader).osd e17  osd.4 IN
2021-06-16T22:22:36.112+0000 7f6e90f7e700  2 mon.a@0(leader).osd e17  osd.7 IN
2021-06-16T22:22:36.112+0000 7f6e90f7e700  2 mon.a@0(leader).osd e17  osd.10 IN
2021-06-16T22:10:25.464 INFO:teuthology.task.internal:roles: ubuntu@smithi199.front.sepia.ceph.com - ['mon.b', 'mgr.y', 'mds.b', 'mds.e', 'mds.h', 'mds.k', 'osd.1', 'osd.4', 'osd.7', 'osd.10']

Looks like the have_enough_osds check would have passed because we've got [4 up, 12 in] OSDs when the pool was created, but for some reason, we created the PG with 1 osd. Could be something to do with how crush is picking placements targets, since all those 4 OSDs are from the same host.

Actions #4

Updated by Neha Ojha almost 3 years ago

/a/sage-2021-06-29_21:27:07-rados-wip-sage3-testing-2021-06-28-1912-distro-basic-smithi/6244042
/a/rfriedma-2021-07-07_11:26:37-rados-wip-ronenf-list-object-distro-basic-smithi/6255195

These were seen in rados/objectstore on a single host but similar symptoms. I will try to see if I can reproduce this reliably.

2021-06-29T23:51:12.764+0000 7f5d530cf700  1 mon.a@0(leader).osd e8 e8: 3 total, 2 up, 3 in
2021-06-29T23:51:14.552+0000 7f5d558d4700  1 -- [v2:172.21.15.60:3300/0,v1:172.21.15.60:6789/0] <== mgr.4100 172.21.15.60:0/13831 63 ==== mon_command({"prefix": "osd pool create", "format": "json", "pool": ".mgr", "pg_num": 1, "pg_num_min": 1} v 0) v1 ==== 135+0+0 (secure 0 0 0) 0x5584574a7b00 con 0x5584574cc400
2021-06-29T23:51:15.536+0000 7f5d580d9700 10 mon.a@0(leader).osd e9 update_pending_pgs  pg 1.0 just added,  up [1] p 1 acting [1] p 1 history ec=10/10 lis/c=0/0 les/c/f=0/0/0 sis=10 past_intervals ([0,0] all_participants= intervals=)
Actions #5

Updated by Neha Ojha almost 3 years ago

  • Has duplicate Bug #51727: cluster [WRN] Health check failed: Degraded data redundancy: 2/4 objects degraded (50.000%), 1 pg degraded (PG_DEGRADED)" in cluster log added
Actions #6

Updated by Neha Ojha almost 3 years ago

Neha Ojha wrote:

/a/sage-2021-06-29_21:27:07-rados-wip-sage3-testing-2021-06-28-1912-distro-basic-smithi/6244042
/a/rfriedma-2021-07-07_11:26:37-rados-wip-ronenf-list-object-distro-basic-smithi/6255195

These were seen in rados/objectstore on a single host but similar symptoms. I will try to see if I can reproduce this reliably.

[...]

Fails 2/40 times https://pulpito.ceph.com/nojha-2021-07-19_17:59:45-rados:objectstore-master-distro-basic-smithi/

Actions #8

Updated by Casey Bodley over 2 years ago

should the rgw suite be whitelisting these too?

Actions #9

Updated by Patrick Donnelly over 2 years ago

Casey Bodley wrote:

should the rgw suite be whitelisting these too?

I think ignorelisting these errors/warnings is a mistake as it can catch genuine problems in QA testing. Once this issue is fixed it's likely we forget to remove the ignorelist entry.

As annoying as it is, we should let this keep failing QA to give us the proper motivation to fix it.

Actions #10

Updated by Neha Ojha over 2 years ago

  • Assignee set to Neha Ojha
  • Priority changed from High to Urgent
Actions #11

Updated by Neha Ojha over 2 years ago

I reproduced this issue with additional logging using the same reproducer https://tracker.ceph.com/issues/51282#note-6. This time 1/40 jobs failed - https://pulpito.ceph.com/nojha-2021-08-20_22:23:09-rados:objectstore-wip-51282-distro-basic-smithi/

PASSED

2021-08-20T22:40:48.504+0000 7f2006311700  1 mon.a@0(leader).osd e9 update_pending_pgs osd.0 up   in  weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval [0,0) [v2:172.21.15.23:6804/28252,v1:172.21.15.23:6806/28252] [v2:172.21.15.23:6808/28252,v1:172.21.15.23:6810/28252] exists,up 83dfbd47-d084-43d9-bf49-fdd016340430
osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval [0,0) [v2:172.21.15.23:6800/28253,v1:172.21.15.23:6801/28253] [v2:172.21.15.23:6802/28253,v1:172.21.15.23:6803/28253] exists,up ebba65bc-0d7b-4a81-98a6-3ac7376ac599
osd.2 up   in  weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval [0,0) [v2:172.21.15.23:6816/28254,v1:172.21.15.23:6817/28254] [v2:172.21.15.23:6818/28254,v1:172.21.15.23:6819/28254] exists,up 8bbb1841-c1c3-48a5-96eb-b697db0b6d29

2021-08-20T22:40:48.504+0000 7f2006311700 10 mon.a@0(leader).osd e9 update_pending_pgs  pg 1.0 just added,  up [1,0] p 1 acting [1,0] p 1 history ec=10/10 lis/c=0/0 les/c/f=0/0/0 sis=10 past_intervals ([0,0] all_participants= intervals=)

vs FAILED

2021-08-20T22:46:59.355+0000 7f6209b49700  1 mon.a@0(leader).osd e9 update_pending_pgs osd.0 up   in  weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval [0,0) [v2:172.21.15.149:6800/28223,v1:172.21.15.149:6801/28223] [v2:172.21.15.149:6802/28223,v1:172.21.15.149:6803/28223] exists,up 9b0f044d-f1bf-474d-b87e-0ebe954c3e7c
osd.1 down in  weight 1 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0)   exists,new a91c812b-d9b3-4b78-a493-d1309fe6ca62
osd.2 up   in  weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval [0,0) [v2:172.21.15.149:6816/28226,v1:172.21.15.149:6817/28226] [v2:172.21.15.149:6818/28226,v1:172.21.15.149:6819/28226] exists,up ecb81170-2d1e-4c5e-acd3-91d38d254299

2021-08-20T22:46:59.355+0000 7f6209b49700 10 mon.a@0(leader).osd e9 update_pending_pgs  pg 1.0 just added,  up [0] p 0 acting [0] p 0 history ec=10/10 lis/c=0/0 les/c/f=0/0/0 sis=10 past_intervals ([0,0] all_participants= intervals=)

Looks like when pg 1.0 was added, though there were 2 candidate osds (osd.0 and osd.2) present in the osdmap, pg_to_up_acting_osds chose 1 osd and then we start seeing PG_DEGRADED warnings when a couple of objects are written to the pg. Need to check if anything has changed in this logic in master.

{
    "PG_DEGRADED": {
        "severity": "HEALTH_WARN",
        "summary": {
            "message": "Degraded data redundancy: 2/4 objects degraded (50.000%), 1 pg degraded",
            "count": 1
        },
        "detail": [
            {
                "message": "pg 1.0 is active+undersized+degraded, acting [0]" 
            }
        ]
    }
}
Actions #12

Updated by Neha Ojha over 2 years ago

Couple of updates:

1. I am not able to reproduce the issue in pacific with the same reproducer, https://pulpito.ceph.com/nojha-2021-08-25_18:46:59-rados:objectstore-wip-yuri-testing-2021-08-24-0743-pacific-distro-basic-smithi/.

2. I looked at recent changes in the relevant code path and found https://github.com/ceph/ceph/pull/39631, so I decided to revert it and run it through my reproducer, and it passed! https://pulpito.ceph.com/nojha-2021-08-27_03:56:45-rados:objectstore-wip-51282-revert-distro-basic-smithi/
The interesting thing is that the patch was also backported to pacific, but did not cause this issue there.

In any case, Patrick/Casey/Deepika, can you please run wip-51282-revert through your corresponding suites and let me know if it fixes the issue for you as well?

Actions #13

Updated by Patrick Donnelly over 2 years ago

Neha Ojha wrote:

Couple of updates:

1. I am not able to reproduce the issue in pacific with the same reproducer, https://pulpito.ceph.com/nojha-2021-08-25_18:46:59-rados:objectstore-wip-yuri-testing-2021-08-24-0743-pacific-distro-basic-smithi/.

2. I looked at recent changes in the relevant code path and found https://github.com/ceph/ceph/pull/39631, so I decided to revert it and run it through my reproducer, and it passed! https://pulpito.ceph.com/nojha-2021-08-27_03:56:45-rados:objectstore-wip-51282-revert-distro-basic-smithi/
The interesting thing is that the patch was also backported to pacific, but did not cause this issue there.

Well, in Pacific the .mgr pool does not exist. The device_health_metrics pool should generate the same warning though. There may be a subtle difference?

In any case, Patrick/Casey/Deepika, can you please run wip-51282-revert through your corresponding suites and let me know if it fixes the issue for you as well?

https://pulpito.ceph.com/pdonnell-2021-08-27_17:21:13-fs-wip-51282-revert-distro-basic-smithi/

Actions #14

Updated by Neha Ojha over 2 years ago

Patrick Donnelly wrote:

Neha Ojha wrote:

Couple of updates:

1. I am not able to reproduce the issue in pacific with the same reproducer, https://pulpito.ceph.com/nojha-2021-08-25_18:46:59-rados:objectstore-wip-yuri-testing-2021-08-24-0743-pacific-distro-basic-smithi/.

2. I looked at recent changes in the relevant code path and found https://github.com/ceph/ceph/pull/39631, so I decided to revert it and run it through my reproducer, and it passed! https://pulpito.ceph.com/nojha-2021-08-27_03:56:45-rados:objectstore-wip-51282-revert-distro-basic-smithi/
The interesting thing is that the patch was also backported to pacific, but did not cause this issue there.

Well, in Pacific the .mgr pool does not exist. The device_health_metrics pool should generate the same warning though. There may be a subtle difference?

In any case, Patrick/Casey/Deepika, can you please run wip-51282-revert through your corresponding suites and let me know if it fixes the issue for you as well?

https://pulpito.ceph.com/pdonnell-2021-08-27_17:21:13-fs-wip-51282-revert-distro-basic-smithi/

Comparing with the earlier run https://pulpito.ceph.com/pdonnell-2021-06-16_21:26:55-fs-wip-pdonnell-testing-20210616.191804-distro-basic-smithi/, there are fewer instances (2 to be accurate) of PG_DEGRADED, but these are not during pool creation, but are related to recovery, where these warnings are expected.

2021-08-27T19:36:02.842+0000 7f179e14c700 10 mon.a@0(leader).osd e19 update_pending_pgs  pg 1.0 just added,  up [7,10] p 7 acting [7,10] p 7 history ec=20/20 lis/c=0/0 les/c/f=0/0/0 sis=20 past_intervals ([0,0] all_participants= intervals=)
Actions #15

Updated by Deepika Upadhyay over 2 years ago

@Neha . hit this one in RBD suite:

description: rbd/mirror/{base/install clients/{mirror-extra mirror} cluster/{2-node
  openstack} msgr-failures/few objectstore/filestore-xfs supported-random-distro$/{rhel_8}
  workloads/rbd-mirror-workunit-min-compat-client-octopus}
duration: 2079.4988112449646
failure_reason: '"2021-11-02T14:27:49.280378+0000 mon.a (mon.0) 114 : cluster [WRN]
  Health check failed: Degraded data redundancy: 2/4 objects degraded (50.000%), 1
  pg degraded (PG_DEGRADED)" in cluster log'

/ceph/teuthology-archive/ideepika-2021-11-02_12:33:30-rbd-wip-ssd-cache-testing-distro-basic-smithi/6477554/teuthology.log

Actions #16

Updated by Casey Bodley over 2 years ago

Patrick Donnelly wrote:

Casey Bodley wrote:

should the rgw suite be whitelisting these too?

I think ignorelisting these errors/warnings is a mistake as it can catch genuine problems in QA testing. Once this issue is fixed it's likely we forget to remove the ignorelist entry.

As annoying as it is, we should let this keep failing QA to give us the proper motivation to fix it.

waiting hasn't helped yet. i'm adding this to rgw's ignorelist so it doesn't show up in quincy backport testing

Actions #17

Updated by Sridhar Seshasayee almost 2 years ago

/a/yuriw-2022-06-29_18:22:37-rados-wip-yuri2-testing-2022-06-29-0820-distro-default-smithi/6906054

Actions #18

Updated by Radoslaw Zarzynski almost 2 years ago

  • Priority changed from Urgent to Normal

Lower the priority because: 1) the issue is basically just an irritation-causing one and 2) the reproduction rate is pretty low.

Actions #19

Updated by Kamoltat (Junior) Sirivadhna over 1 year ago

/a/yuriw-2022-08-04_11:58:29-rados-wip-yuri3-testing-2022-08-03-0828-pacific-distro-default-smithi/6958108

Actions #21

Updated by Laura Flores over 1 year ago

  • Has duplicate Bug #55825: cluster [WRN] Health check failed: Degraded data redundancy: 1 pg degraded (PG_DEGRADED)" in cluster added
Actions #22

Updated by Laura Flores over 1 year ago

/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986324

Description: rados/monthrash/{ceph clusters/9-mons mon_election/connectivity msgr-failures/few msgr/async objectstore/bluestore-comp-zlib rados supported-random-distro$/{ubuntu_latest} thrashers/sync workloads/rados_5925}

2022-08-22T22:20:05.636 DEBUG:teuthology.orchestra.run.smithi050:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v ScrubResult | egrep -v 'scrub mismatch' | egrep -v 'overall HEALTH_' | egrep -v '\(MGR_DOWN\)' | egrep -v '\(PG_AVAILABILITY\)' | egrep -v '\(SLOW_OPS\)' | egrep -v 'slow request' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v 'overall HEALTH_' | egrep -v '\(MON_DOWN\)' | egrep -v 'overall HEALTH_' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | head -n 1
2022-08-22T22:20:05.659 INFO:teuthology.orchestra.run.smithi050.stdout:2022-08-22T22:17:00.832057+0000 mon.d (mon.6) 246 : cluster [WRN] Health check failed: Degraded data redundancy: 2/4 objects degraded (50.000%), 1 pg degraded (PG_DEGRADED)
2022-08-22T22:20:05.660 WARNING:tasks.ceph:Found errors (ERR|WRN|SEC) in cluster log
2022-08-22T22:20:05.660 DEBUG:teuthology.orchestra.run.smithi050:> sudo egrep '\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v ScrubResult | egrep -v 'scrub mismatch' | egrep -v 'overall HEALTH_' | egrep -v '\(MGR_DOWN\)' | egrep -v '\(PG_AVAILABILITY\)' | egrep -v '\(SLOW_OPS\)' | egrep -v 'slow request' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v 'overall HEALTH_' | egrep -v '\(MON_DOWN\)' | egrep -v 'overall HEALTH_' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | head -n 1
2022-08-22T22:20:05.722 DEBUG:teuthology.orchestra.run.smithi050:> sudo egrep '\[ERR\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v ScrubResult | egrep -v 'scrub mismatch' | egrep -v 'overall HEALTH_' | egrep -v '\(MGR_DOWN\)' | egrep -v '\(PG_AVAILABILITY\)' | egrep -v '\(SLOW_OPS\)' | egrep -v 'slow request' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v 'overall HEALTH_' | egrep -v '\(MON_DOWN\)' | egrep -v 'overall HEALTH_' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | head -n 1
2022-08-22T22:20:05.740 DEBUG:teuthology.orchestra.run.smithi050:> sudo egrep '\[WRN\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v ScrubResult | egrep -v 'scrub mismatch' | egrep -v 'overall HEALTH_' | egrep -v '\(MGR_DOWN\)' | egrep -v '\(PG_AVAILABILITY\)' | egrep -v '\(SLOW_OPS\)' | egrep -v 'slow request' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v 'overall HEALTH_' | egrep -v '\(MON_DOWN\)' | egrep -v 'overall HEALTH_' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | head -n 1
2022-08-22T22:20:05.815 INFO:teuthology.orchestra.run.smithi050.stdout:2022-08-22T22:17:00.832057+0000 mon.d (mon.6) 246 : cluster [WRN] Health check failed: Degraded data redundancy: 2/4 objects degraded (50.000%), 1 pg degraded (PG_DEGRADED)

Passed in a rerun.

Actions #23

Updated by Laura Flores over 1 year ago

  • Translation missing: en.field_tag_list set to test-failure
Actions #24

Updated by Radoslaw Zarzynski over 1 year ago

  • Assignee changed from Neha Ojha to Nitzan Mordechai

We would need judge whether it's expected. If so, we can extend the ignore list.

Actions #25

Updated by Laura Flores 12 months ago

/a/yuriw-2023-04-25_03:31:51-smoke-pacific-release-distro-default-smithi/7250679

Actions #26

Updated by Laura Flores 12 months ago

/a/yuriw-2023-05-06_14:41:44-rados-pacific-release-distro-default-smithi/7264066

Actions #31

Updated by Nitzan Mordechai 11 months ago

Milind Changire wrote:

quincy:
http://pulpito.front.sepia.ceph.com/yuriw-2023-05-31_21:56:15-fs-wip-yuri6-testing-2023-05-31-0933-quincy-distro-default-smithi/7292811

That one doesn't look related to .mgr pool that created too early. the pg 102.1e peering process remap and drop osd.3 and added osd.4 which caused recovery and PG_DEGRADED

Actions #35

Updated by Patrick Donnelly 25 days ago

  • Project changed from mgr to CephFS
  • Status changed from New to In Progress
  • Assignee changed from Nitzan Mordechai to Patrick Donnelly
  • Target version changed from v17.0.0 to v20.0.0
Actions #36

Updated by Patrick Donnelly 25 days ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 56566

backport will be tracked by #65018

Actions #37

Updated by Patrick Donnelly 20 days ago

  • Category set to Testing
  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF