Bug #44400
Marking OSD out causes primary-affinity 0 to be ignored when up_set has no common OSDs with acting_set
0%
Description
Process:
Set primary-affinity 0 on osd.0
Watch 'ceph osd ls-by-primary osd.0' until it has 0 PGs listed.
Mark osd.0 out
watch 'ceph osd ls-by-primary osd.0'
Result: a subset of PGs go back to osd.0 as primary
Commonality of these PGs is the acting_set shares no OSDs with the up_set
Expected behavior: Another OSD w/ primary affinity of non-zero in the acting_set would be the primary.
I feel this matters especially when dealing with a failing disk:
I noticed after marking a failing OSD w/ primary-affinity = 0, and then OUT that I began getting slow ops with the implicated OSD being the one marked p-a 0 and out.
I mark my failing disks p-a 0 and out (but leaving the daemon running) under the assumption that the replicas are available in the cluster, but not serving client read requests (often times "rados list-inconsistent-obj <pg>" has an osd with "read_error" - it seems prudent to prevent it from serving client reads. To me this makes more since than Stopping the OSD / marking it down and then out. Causing the PG to go degraded, increasing the risk of dropping below min_size.
History
#1 Updated by Neha Ojha over 3 years ago
- Priority changed from Normal to High
This is worth investigating, currently nothing in the choose_acting() function looks at primary-affinity.
#2 Updated by Dan van der Ster 9 months ago
- Affected Versions v16.2.9 added
Just confirming this is still present in pacific:
# ceph pg ls-by-primary osd.1 PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 39.1e 4549 0 0 0 18937718186 129 13 10085 active+clean 19h 448324'1471517 448324:4330160 [1,96,51]p1 [1,96,51]p1 2023-01-06T04:49:42.552907+0100 2023-01-04T21:58:42.062658+0100 72.0 0 0 0 0 0 0 0 0 active+clean 33h 67418'10 448323:130610 [1,39,97]p1 [1,39,97]p1 2023-01-05T14:16:32.171229+0100 2023-01-04T10:10:30.524757+0100 173.2c 119601 0 0 0 58484954500 0 0 10002 active+clean 3h 448324'66145367 448324:92454037 [1,34,61]p1 [1,34,61]p1 2023-01-06T20:56:45.623864+0100 2023-01-01T17:53:36.852451+0100 173.5b 119431 0 0 0 59203946240 0 0 10045 active+clean 21h 448324'67055460 448324:96440378 [1,167,96]p1 [1,167,96]p1 2023-01-06T02:09:09.299422+0100 2023-01-04T15:58:44.460735+0100 173.76 119067 0 0 0 58739275231 0 0 10043 active+clean 28h 448324'69228370 448324:96801688 [1,118,159]p1 [1,118,159]p1 2023-01-05T19:49:51.152731+0100 2022-12-31T16:02:16.973211+0100 173.fb 118414 0 0 0 57309395557 0 0 10100 active+clean 26h 448324'67576591 448324:92723006 [1,145,101]p1 [1,145,101]p1 2023-01-05T21:49:30.423406+0100 2023-01-04T19:24:08.830048+0100 174.4f 15565 0 0 0 4194304 33167364 65165 10078 active+clean 33h 448324'22725364 448324:25986782 [1,49,98]p1 [1,49,98]p1 2023-01-05T14:31:35.436061+0100 2022-12-31T13:46:26.645206+0100 179.c 3 0 0 0 1377 0 0 0 active+clean 2h 250428'311976 448323:684218 [1,62,119]p1 [1,62,119]p1 2023-01-06T21:51:09.431192+0100 2023-01-06T21:51:09.431192+0100 * NOTE: Omap statistics are gathered during deep scrub and may be inaccurate soon afterwards depending on utilization. See http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for further details. # ceph osd primary-affinity osd.1 0 set osd.1 primary-affinity to 0 (802) # ceph pg ls-by-primary osd.1 # ceph osd out osd.1 marked out osd.1. # ceph pg ls-by-primary osd.1 PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 173.76 119067 0 118897 0 58739275231 0 0 10051 active+remapped+backfilling 14s 448333'69228378 448333:96802054 [25,118,159]p25 [1,118,159]p1 2023-01-05T19:49:51.152731+0100 2022-12-31T16:02:16.973211+0100 174.4f 15565 0 15333 0 4194304 33167364 65165 10078 active+remapped+backfilling 14s 448324'22725364 448333:25987264 [15,49,98]p15 [1,49,98]p1 2023-01-05T14:31:35.436061+0100 2022-12-31T13:46:26.645206+0100 #
#3 Updated by Nitzan Mordechai 9 months ago
- Assignee set to Nitzan Mordechai
#4 Updated by Nitzan Mordechai 9 months ago
- Affected Versions v18.0.0 added
- Affected Versions deleted (
v14.2.6, v16.2.9)
#5 Updated by Nitzan Mordechai 9 months ago
- Affected Versions v16.2.9 added
#6 Updated by Nitzan Mordechai 9 months ago
- Status changed from New to In Progress
Our function OSDMap::_apply_primary_affinity will set osd as primary even if it is set to primary affinity 0, we are setting a fallback primary osd, but didn't consider if it set to primary affinity 0
So, if the original primary was set to primary affinity 0 and all the other osds in the acting set were not picked (and they do have primary affinity > 0) we will stay with the fallback osd.
#7 Updated by Nitzan Mordechai 9 months ago
- Pull request ID set to 49777
#8 Updated by Nitzan Mordechai 9 months ago
- Status changed from In Progress to Fix Under Review
#9 Updated by Radoslaw Zarzynski 7 months ago
- Status changed from Fix Under Review to Won't Fix
The discussion's outcome is that the fix could likely make more harm (for sure: bring more complexity) than the the symptoms is really worth.