Project

General

Profile

Actions

Bug #45393

closed

Containerized osd config must be updated when adding/removing mons

Added by Tim Serong about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
cephadm
Target version:
% Done:

0%

Source:
Tags:
Backport:
octopus
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Try this:

- bootstrap a cluster (1 mon, 1 mgr)
- add a bunch of osds (ceph orch apply osd --all-available-devices)
- add some more mons and mgrs (ceph orch apply mon 3 ; ceph orch apply mgr 3)

At this point, ceph.conf as seen by the osds (/var/lib/ceph/$FSID/osd.$ID/config) still only lists the first mon. If you restart any osds, and that first mon is down for some reason, the osd's can't join the cluster, because they don't know the other mons exist. They'll just sit there logging "monclient(hunting): authenticate timed out after 300" every five minutes until that first mon comes back.

I can think of two potential ways to address this:

1) Have cephadm mon apply go out and update every single osd config file with the new list of mons. This would of course not work completely if some osd hosts were down at the time. Also it might take a while...
2) Have the osds update their own config file automatically based on current monmaps.

This probably also needs to go into troubleshooting docs (check mon_host in each containerized osd's individual config file)


Related issues 1 (0 open1 closed)

Related to Orchestrator - Feature #45378: cephadm: manage /etc/ceph/ceph.confResolvedSebastian Wagner

Actions
Actions #1

Updated by Tim Serong about 4 years ago

  • Related to Feature #45378: cephadm: manage /etc/ceph/ceph.conf added
Actions #2

Updated by Sebastian Wagner about 4 years ago

  • Priority changed from Normal to High
  • Regression changed from No to Yes

This was fixed in https://github.com/ceph/ceph/pull/33855 . Looks like we have to figure out, what went wrong here.

Actions #3

Updated by Tim Serong about 4 years ago

  • Assignee set to Tim Serong

Thanks for the pointer, I'll try to figure out what's going on, seeing as I'm the one who hit this :-)

Actions #4

Updated by Tim Serong about 4 years ago

A quick grep of my logs shows it reconfiguring the mons and mgrs, but not the osds.

Actions #5

Updated by Tim Serong about 4 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 34922

It's always the little things...

Actions #6

Updated by Tim Serong almost 4 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to octopus
Actions #7

Updated by Sebastian Wagner almost 4 years ago

  • Status changed from Pending Backport to Resolved
  • Target version set to v15.2.4
Actions

Also available in: Atom PDF