Project

General

Profile

Actions

Bug #44446

closed

osd status reports old crush location after osd moves

Added by Kefu Chai about 4 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous, mimic, nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Scenario:

Move an OSD disk from host=worker1 to a new node (host=worker0) and on that new node we update the crush location (e.g. host=worker0). After the OSD starts up `ceph osd status` reports the old location, but `ceph osd tree` reports the new location.

[nwatkins@smash rook]$ kubectl -n rook-ceph exec -it rook-ceph-tools-7cf4cc7568-kz4q6 ceph osd status
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| id |   host  |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| 0  | worker1 | 1027M | 8188M |    0   |     0   |    0   |     0   | exists,up |
| 1  | worker0 | 1027M | 8188M |    0   |     0   |    0   |     0   | exists,up |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
[nwatkins@smash rook]$ kubectl -n rook-ceph exec -it rook-ceph-tools-7cf4cc7568-kz4q6 ceph osd tree
ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF 
-1       0.01758 root default                             
-5       0.01758     host worker0                         
 0   hdd 0.00879         osd.0        up  1.00000 1.00000 
 1   hdd 0.00879         osd.1        up  1.00000 1.00000 

And then after restarting the manager, `ceph osd status` also starts reporting the correct location

[nwatkins@smash rook]$ kubectl -n rook-ceph exec -it rook-ceph-tools-7cf4cc7568-kz4q6 ceph osd status
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| id |   host  |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| 0  | worker0 | 1027M | 8188M |    0   |     0   |    0   |     0   | exists,up |
| 1  | worker0 | 1027M | 8188M |    0   |     0   |    0   |     0   | exists,up |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+

related: http://tracker.ceph.com/issues/40011


Related issues 4 (0 open4 closed)

Copied from mgr - Bug #40871: osd status reports old crush location after osd movesResolvedKefu Chai

Actions
Copied to mgr - Backport #44522: luminous: osd status reports old crush location after osd movesRejectedActions
Copied to mgr - Backport #44523: mimic: osd status reports old crush location after osd movesRejectedActions
Copied to mgr - Backport #44524: nautilus: osd status reports old crush location after osd movesResolvedKefu ChaiActions
Actions #1

Updated by Kefu Chai about 4 years ago

  • Copied from Bug #40871: osd status reports old crush location after osd moves added
Actions #2

Updated by Kefu Chai about 4 years ago

this issue is copied from #40871. the PR of https://github.com/ceph/ceph/pull/30448/files does not address this issue at all. the stale hostname of OSD is caused by another root cause.

Actions #3

Updated by Kefu Chai about 4 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 33752
Actions #4

Updated by Sage Weil about 4 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Nathan Cutler about 4 years ago

  • Copied to Backport #44522: luminous: osd status reports old crush location after osd moves added
Actions #6

Updated by Nathan Cutler about 4 years ago

  • Copied to Backport #44523: mimic: osd status reports old crush location after osd moves added
Actions #7

Updated by Nathan Cutler about 4 years ago

  • Copied to Backport #44524: nautilus: osd status reports old crush location after osd moves added
Actions #8

Updated by Nathan Cutler about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF