Feature #23616: osd: admin socket should help debug status at all times - RADOS - Ceph

Actions

Copy link

Feature #23616

open

osd: admin socket should help debug status at all times

Added by Greg Farnum about 6 years ago. Updated about 6 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Administration/Usability

Target version:

% Done:

Source:

Tags:

Backport:

Reviewed:

Affected Versions:

Component(RADOS):

MonClient, OSD

Pull request ID:

Description

Last week I was looking at an LRC OSD which was having trouble, and it wasn't clear why.

The cause ended up being that the ceph.conf had wrong (old) monitor IPs in it, and so the OSD couldn't talk to a cluster at all. But debugging it was far more difficult than it should have been:
1) the admin socket didn't have all the usual commands available, because the OSD hadn't fully booted yet! So I couldn't run the "status" command (or anything else) to get any indication of the problem.
2)) the logs did not have any complaints about lacking a monitor connection, even when I turned them up

I just had to guess based on seeing the msgr connection faults and that the OSD had gone through load_pgs that the monitor connection was the problem, but even then I expected it to be a keyring problem. :o

We should make sure that the OSD always has enough of an admin socket running to identify the general state of its start-up, of its connection to the monitor cluster, etc.

Actions

Copy link

Updated by Greg Farnum about 6 years ago

Description updated (diff)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Feature #23616

osd: admin socket should help debug status at all times

Updated by Greg Farnum about 6 years ago