Project

General

Profile

Bug #44446

osd status reports old crush location after osd moves

Added by Kefu Chai 4 months ago. Updated 4 months ago.

Status:
Pending Backport
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous, mimic, nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Scenario:

Move an OSD disk from host=worker1 to a new node (host=worker0) and on that new node we update the crush location (e.g. host=worker0). After the OSD starts up `ceph osd status` reports the old location, but `ceph osd tree` reports the new location.

[nwatkins@smash rook]$ kubectl -n rook-ceph exec -it rook-ceph-tools-7cf4cc7568-kz4q6 ceph osd status
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| id |   host  |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| 0  | worker1 | 1027M | 8188M |    0   |     0   |    0   |     0   | exists,up |
| 1  | worker0 | 1027M | 8188M |    0   |     0   |    0   |     0   | exists,up |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
[nwatkins@smash rook]$ kubectl -n rook-ceph exec -it rook-ceph-tools-7cf4cc7568-kz4q6 ceph osd tree
ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF 
-1       0.01758 root default                             
-5       0.01758     host worker0                         
 0   hdd 0.00879         osd.0        up  1.00000 1.00000 
 1   hdd 0.00879         osd.1        up  1.00000 1.00000 

And then after restarting the manager, `ceph osd status` also starts reporting the correct location

[nwatkins@smash rook]$ kubectl -n rook-ceph exec -it rook-ceph-tools-7cf4cc7568-kz4q6 ceph osd status
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| id |   host  |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| 0  | worker0 | 1027M | 8188M |    0   |     0   |    0   |     0   | exists,up |
| 1  | worker0 | 1027M | 8188M |    0   |     0   |    0   |     0   | exists,up |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+

related: http://tracker.ceph.com/issues/40011


Related issues

Copied from mgr - Bug #40871: osd status reports old crush location after osd moves Pending Backport 07/22/2019
Copied to mgr - Backport #44522: luminous: osd status reports old crush location after osd moves New
Copied to mgr - Backport #44523: mimic: osd status reports old crush location after osd moves New
Copied to mgr - Backport #44524: nautilus: osd status reports old crush location after osd moves Resolved

History

#1 Updated by Kefu Chai 4 months ago

  • Copied from Bug #40871: osd status reports old crush location after osd moves added

#2 Updated by Kefu Chai 4 months ago

this issue is copied from #40871. the PR of https://github.com/ceph/ceph/pull/30448/files does not address this issue at all. the stale hostname of OSD is caused by another root cause.

#3 Updated by Kefu Chai 4 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 33752

#4 Updated by Sage Weil 4 months ago

  • Status changed from Fix Under Review to Pending Backport

#5 Updated by Nathan Cutler 4 months ago

  • Copied to Backport #44522: luminous: osd status reports old crush location after osd moves added

#6 Updated by Nathan Cutler 4 months ago

  • Copied to Backport #44523: mimic: osd status reports old crush location after osd moves added

#7 Updated by Nathan Cutler 4 months ago

  • Copied to Backport #44524: nautilus: osd status reports old crush location after osd moves added

Also available in: Atom PDF