Bug #50230
closedmon: spawn loop after mon reinstalled
0%
Description
This is related to #44076. (cluster is running 14.2.19 which has that fix.)
Scenario:- mon is reinstalled (upgraded OS from el7 to el8).
- mkfs writes the monmap to the store at ("mkfs", "monmap") [1]
- During initial boot, the mon connects to the cluster, gets the latest monmap and sees the addr is different, so it stashes the new map at ("mon_sync", "temp_newer_monmap") and respawns. [2]
- During the next boot, the code in `obtain_monmap` checks for the temp_newer_monmap iff store.exists("monmap", "last_committed"), which is still empty at this point.
- So, finding only the mkfs monmap, the mon goes into a respawn loop.
I posted a log at ceph-post-file: 2ab3ff0f-87e9-47ea-8f31-a9d4ebc3e60c
debug_mon = 20 starts at 2021-04-08 11:06:52.928.
Here's a possible fix, not tested... we should check for temp_newer_monmap before falling back to the mkfs monmap:
> git diff 11:24:51 diff --git a/src/ceph_mon.cc b/src/ceph_mon.cc index 306d663d33a..f9712ef96e2 100644 --- a/src/ceph_mon.cc +++ b/src/ceph_mon.cc @@ -128,6 +128,24 @@ int obtain_monmap(MonitorDBStore &store, bufferlist &bl) int err = store.get("mkfs", "monmap", bl); ceph_assert(err == 0); ceph_assert(bl.length() > 0); + + // see if there is stashed newer map (see bootstrap()) + if (store.exists("mon_sync", "temp_newer_monmap")) { + bufferlist bl2; + int err = store.get("mon_sync", "temp_newer_monmap", bl2); + ceph_assert(err == 0); + ceph_assert(bl2.length() > 0); + MonMap b; + b.decode(bl2); + if (b.get_epoch() > latest_ver) { + dout(10) << __func__ << " using stashed monmap " << b.get_epoch() + << " instead" << dendl; + bl = std::move(bl2); + } else { + dout(10) << __func__ << " ignoring stashed monmap " << b.get_epoch() + << dendl; + } + } return 0; }
[1] dump-keys after mkfs:
mkfs / keyring mkfs / monmap monitor / cluster_uuid monitor / feature_set monitor / magic
[2] dump-keys after respawn loop:
mkfs / keyring mkfs / monmap mon_sync / temp_newer_monmap monitor / cluster_uuid monitor / feature_set monitor / magic
[3] P.S. the only way we managed to bootstrap this mon is by adding --monmap <the latest monmap> at mkfs time.
Updated by Dan van der Ster about 3 years ago
Doh, ignore that fix, this is better:
diff --git a/src/ceph_mon.cc b/src/ceph_mon.cc index 306d663d33a..f2d417ac465 100644 --- a/src/ceph_mon.cc +++ b/src/ceph_mon.cc @@ -123,6 +123,14 @@ int obtain_monmap(MonitorDBStore &store, bufferlist &bl) } } + if (store.exists("mon_sync", "temp_newer_monmap")) { + dout(10) << __func__ << " found temp_newer_monmap" << dendl; + int err = store.get("mon_sync", "temp_newer_monmap", bl); + ceph_assert(err == 0); + ceph_assert(bl.length() > 0); + return 0; + } + if (store.exists("mkfs", "monmap")) { dout(10) << __func__ << " found mkfs monmap" << dendl; int err = store.get("mkfs", "monmap", bl);
Updated by Dan van der Ster about 3 years ago
- Status changed from New to Fix Under Review
- Assignee set to Dan van der Ster
We have tested the fix in PR 40660 and it solves our bootstrapping problem.
Updated by Kefu Chai about 3 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot about 3 years ago
- Copied to Backport #50795: nautilus: mon: spawn loop after mon reinstalled added
Updated by Backport Bot about 3 years ago
- Copied to Backport #50796: octopus: mon: spawn loop after mon reinstalled added
Updated by Backport Bot about 3 years ago
- Copied to Backport #50797: pacific: mon: spawn loop after mon reinstalled added
Updated by Loïc Dachary almost 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".