Project

General

Profile

Bug #44400

Marking OSD out causes primary-affinity 0 to be ignored when up_set has no common OSDs with acting_set

Added by Wes Dillingham about 4 years ago. Updated about 1 year ago.

Status:
Won't Fix
Priority:
High
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Process:

Set primary-affinity 0 on osd.0
Watch 'ceph osd ls-by-primary osd.0' until it has 0 PGs listed.
Mark osd.0 out
watch 'ceph osd ls-by-primary osd.0'
Result: a subset of PGs go back to osd.0 as primary
Commonality of these PGs is the acting_set shares no OSDs with the up_set

Expected behavior: Another OSD w/ primary affinity of non-zero in the acting_set would be the primary.

I feel this matters especially when dealing with a failing disk:

I noticed after marking a failing OSD w/ primary-affinity = 0, and then OUT that I began getting slow ops with the implicated OSD being the one marked p-a 0 and out.

I mark my failing disks p-a 0 and out (but leaving the daemon running) under the assumption that the replicas are available in the cluster, but not serving client read requests (often times "rados list-inconsistent-obj <pg>" has an osd with "read_error" - it seems prudent to prevent it from serving client reads. To me this makes more since than Stopping the OSD / marking it down and then out. Causing the PG to go degraded, increasing the risk of dropping below min_size.

History

#1 Updated by Neha Ojha about 4 years ago

  • Priority changed from Normal to High

This is worth investigating, currently nothing in the choose_acting() function looks at primary-affinity.

#2 Updated by Dan van der Ster about 1 year ago

  • Affected Versions v16.2.9 added

Just confirming this is still present in pacific:

# ceph pg ls-by-primary osd.1
PG      OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES        OMAP_BYTES*  OMAP_KEYS*  LOG    STATE         SINCE  VERSION          REPORTED         UP             ACTING         SCRUB_STAMP                      DEEP_SCRUB_STAMP
39.1e      4549         0          0        0  18937718186          129          13  10085  active+clean    19h   448324'1471517   448324:4330160    [1,96,51]p1    [1,96,51]p1  2023-01-06T04:49:42.552907+0100  2023-01-04T21:58:42.062658+0100
72.0          0         0          0        0            0            0           0      0  active+clean    33h         67418'10    448323:130610    [1,39,97]p1    [1,39,97]p1  2023-01-05T14:16:32.171229+0100  2023-01-04T10:10:30.524757+0100
173.2c   119601         0          0        0  58484954500            0           0  10002  active+clean     3h  448324'66145367  448324:92454037    [1,34,61]p1    [1,34,61]p1  2023-01-06T20:56:45.623864+0100  2023-01-01T17:53:36.852451+0100
173.5b   119431         0          0        0  59203946240            0           0  10045  active+clean    21h  448324'67055460  448324:96440378   [1,167,96]p1   [1,167,96]p1  2023-01-06T02:09:09.299422+0100  2023-01-04T15:58:44.460735+0100
173.76   119067         0          0        0  58739275231            0           0  10043  active+clean    28h  448324'69228370  448324:96801688  [1,118,159]p1  [1,118,159]p1  2023-01-05T19:49:51.152731+0100  2022-12-31T16:02:16.973211+0100
173.fb   118414         0          0        0  57309395557            0           0  10100  active+clean    26h  448324'67576591  448324:92723006  [1,145,101]p1  [1,145,101]p1  2023-01-05T21:49:30.423406+0100  2023-01-04T19:24:08.830048+0100
174.4f    15565         0          0        0      4194304     33167364       65165  10078  active+clean    33h  448324'22725364  448324:25986782    [1,49,98]p1    [1,49,98]p1  2023-01-05T14:31:35.436061+0100  2022-12-31T13:46:26.645206+0100
179.c         3         0          0        0         1377            0           0      0  active+clean     2h    250428'311976    448323:684218   [1,62,119]p1   [1,62,119]p1  2023-01-06T21:51:09.431192+0100  2023-01-06T21:51:09.431192+0100

* NOTE: Omap statistics are gathered during deep scrub and may be inaccurate soon afterwards depending on utilization. See http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for further details.
# ceph osd primary-affinity osd.1 0
set osd.1 primary-affinity to 0 (802)
# ceph pg ls-by-primary osd.1
# ceph osd out osd.1
marked out osd.1.
# ceph pg ls-by-primary osd.1
PG      OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES        OMAP_BYTES*  OMAP_KEYS*  LOG    STATE                        SINCE  VERSION          REPORTED         UP               ACTING         SCRUB_STAMP                      DEEP_SCRUB_STAMP
173.76   119067         0     118897        0  58739275231            0           0  10051  active+remapped+backfilling    14s  448333'69228378  448333:96802054  [25,118,159]p25  [1,118,159]p1  2023-01-05T19:49:51.152731+0100  2022-12-31T16:02:16.973211+0100
174.4f    15565         0      15333        0      4194304     33167364       65165  10078  active+remapped+backfilling    14s  448324'22725364  448333:25987264    [15,49,98]p15    [1,49,98]p1  2023-01-05T14:31:35.436061+0100  2022-12-31T13:46:26.645206+0100
#

#3 Updated by Nitzan Mordechai about 1 year ago

  • Assignee set to Nitzan Mordechai

#4 Updated by Nitzan Mordechai about 1 year ago

  • Affected Versions v18.0.0 added
  • Affected Versions deleted (v14.2.6, v16.2.9)

#5 Updated by Nitzan Mordechai about 1 year ago

  • Affected Versions v16.2.9 added

#6 Updated by Nitzan Mordechai about 1 year ago

  • Status changed from New to In Progress

Our function OSDMap::_apply_primary_affinity will set osd as primary even if it is set to primary affinity 0, we are setting a fallback primary osd, but didn't consider if it set to primary affinity 0
So, if the original primary was set to primary affinity 0 and all the other osds in the acting set were not picked (and they do have primary affinity > 0) we will stay with the fallback osd.

#7 Updated by Nitzan Mordechai about 1 year ago

  • Pull request ID set to 49777

#8 Updated by Nitzan Mordechai about 1 year ago

  • Status changed from In Progress to Fix Under Review

#9 Updated by Radoslaw Zarzynski about 1 year ago

  • Status changed from Fix Under Review to Won't Fix

The discussion's outcome is that the fix could likely make more harm (for sure: bring more complexity) than the the symptoms is really worth.

Also available in: Atom PDF