Project

General

Profile

Feature #23616

osd: admin socket should help debug status at all times

Added by Greg Farnum over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Administration/Usability
Target version:
-
Start date:
04/09/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Component(RADOS):
MonClient, OSD
Pull request ID:

Description

Last week I was looking at an LRC OSD which was having trouble, and it wasn't clear why.

The cause ended up being that the ceph.conf had wrong (old) monitor IPs in it, and so the OSD couldn't talk to a cluster at all. But debugging it was far more difficult than it should have been:
1) the admin socket didn't have all the usual commands available, because the OSD hadn't fully booted yet! So I couldn't run the "status" command (or anything else) to get any indication of the problem.
2)) the logs did not have any complaints about lacking a monitor connection, even when I turned them up

I just had to guess based on seeing the msgr connection faults and that the OSD had gone through load_pgs that the monitor connection was the problem, but even then I expected it to be a keyring problem. :o

We should make sure that the OSD always has enough of an admin socket running to identify the general state of its start-up, of its connection to the monitor cluster, etc.

History

#1 Updated by Greg Farnum over 1 year ago

  • Description updated (diff)

Also available in: Atom PDF