Bug #57977
openosd:tick checking mon for new map
0%
Description
ceph version: 15.2.7
my cluster have a osd down, and it unable join the osdmap.
# ceph osd dump epoch 2614 fsid cf4db967-2c13-4921-9a1e-e83165bb2bbf created 2022-03-24T07:59:06.550966+0000 modified 2022-11-04T08:02:42.559653+0000 flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit crush_version 8 full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 require_min_compat_client luminous min_compat_client luminous require_osd_release octopus pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth pool 2 'replicapool-ssd' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 2591 lfor 0/0/823 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd max_osd 9 osd.0 up in weight 1 up_from 2176 up_thru 2281 down_at 2175 last_clean_interval [38,2170) [v2:10.173.211.24:6810/39,v1:10.173.211.24:6813/39] [v2:10.173.211.24:6815/39,v1:10.173.211.24:6818/39] exists,up 6654dff6-bcb8-4616-9ac2-9e024e230cee osd.1 up in weight 1 up_from 2182 up_thru 2268 down_at 2181 last_clean_interval [30,2170) [v2:10.173.211.25:6808/39,v1:10.173.211.25:6809/39] [v2:10.173.211.25:6810/39,v1:10.173.211.25:6811/39] exists,up 0aed2e15-9ea8-45c7-88c9-4c190fb861a9 osd.2 down out weight 0 up_from 36 up_thru 2186 down_at 2189 last_clean_interval [16,33) [v2:10.173.211.24:6800/39,v1:10.173.211.24:6801/39] [v2:10.173.211.24:6802/39,v1:10.173.211.24:6803/39] autoout,exists 48816ed4-88ff-4284-a70e-041c8373bbe7 osd.3 up in weight 1 up_from 2179 up_thru 2275 down_at 2178 last_clean_interval [32,2170) [v2:10.173.211.25:6800/40,v1:10.173.211.25:6801/40] [v2:10.173.211.25:6802/40,v1:10.173.211.25:6803/40] exists,up c0e5263b-2e57-451b-a2af-1f108a29d868 osd.4 up in weight 1 up_from 2174 up_thru 2280 down_at 2173 last_clean_interval [36,2170) [v2:10.173.211.24:6807/39,v1:10.173.211.24:6809/39] [v2:10.173.211.24:6811/39,v1:10.173.211.24:6812/39] exists,up b699246d-f283-44af-8a68-0ef8852163be osd.5 up in weight 1 up_from 2185 up_thru 2264 down_at 2184 last_clean_interval [42,2170) [v2:10.173.211.26:6808/40,v1:10.173.211.26:6810/40] [v2:10.173.211.26:6812/40,v1:10.173.211.26:6814/40] exists,up 00e70df6-412c-41eb-9eed-742c6ea3291f osd.6 up in weight 1 up_from 2180 up_thru 2260 down_at 2179 last_clean_interval [32,2170) [v2:10.173.211.25:6814/41,v1:10.173.211.25:6816/41] [v2:10.173.211.25:6818/41,v1:10.173.211.25:6819/41] exists,up ef94c655-1c53-46a4-95fe-687ecddc1738 osd.7 up in weight 1 up_from 2176 up_thru 2266 down_at 2175 last_clean_interval [42,2170) [v2:10.173.211.26:6800/39,v1:10.173.211.26:6801/39] [v2:10.173.211.26:6802/39,v1:10.173.211.26:6803/39] exists,up 57108459-ce4c-4bb1-aea1-4c16aaaa6708 osd.8 up in weight 1 up_from 2186 up_thru 2248 down_at 2185 last_clean_interval [42,2170) [v2:10.173.211.26:6809/39,v1:10.173.211.26:6811/39] [v2:10.173.211.26:6813/39,v1:10.173.211.26:6815/39] exists,up 63f416e5-426b-4864-903d-b5c7b3da10bd
osd.2 log report as below:
2022-11-04T07:44:30.345+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map 2022-11-04T07:45:00.653+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map 2022-11-04T07:45:31.601+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map 2022-11-04T07:46:01.773+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map 2022-11-04T07:46:32.611+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map 2022-11-04T07:47:02.735+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map 2022-11-04T07:47:33.586+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map 2022-11-04T07:48:04.463+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map
Files
Updated by Radoslaw Zarzynski over 1 year ago
- Status changed from New to Need More Info
Octopus is EOL. Does it happen on a supported release?
Regardless of that, could you please provide logs from this OSD with debug_osd=20
debug_ms=5
?
Updated by yite gu over 1 year ago
Radoslaw Zarzynski wrote:
Octopus is EOL. Does it happen on a supported release?
Regardless of that, could you please provide logs from this OSD with
debug_osd=20
debug_ms=5
?
At that time, I set debug_osd=20, but have no set debug_ms=5:
2022-11-06T03:13:54.194+0000 7f6adbd1b700 1 osd.2 2622 tick checking mon for new map 2022-11-06T03:14:24.205+0000 7f6adbd1b700 1 osd.2 2622 tick checking mon for new map 2022-11-06T03:14:54.242+0000 7f6adbd1b700 1 osd.2 2622 tick checking mon for new map 2022-11-06T03:15:24.260+0000 7f6adbd1b700 1 osd.2 2622 tick checking mon for new map 2022-11-06T03:15:40.303+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:40.303+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:40.303+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:40.303+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:40.504+0000 7f6abac69700 5 osd.2 2622 heartbeat osd_stat(store_statfs(0x1be20b17000/0x13a000000/0x37e3ec00000, data 0x1bee3d79e98/0x1bee40e7000, compress 0x0/0x0/0x0, omap 0xb286e0, meta 0x1394d7920), peers [] op hist []) 2022-11-06T03:15:40.504+0000 7f6abac69700 20 osd.2 2622 check_full_status cur ratio 0.499741, physical ratio 0.499741, new state none 2022-11-06T03:15:40.636+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:41.257+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:41.257+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:41.257+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:41.257+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:41.591+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:42.297+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:42.297+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:42.297+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:42.297+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:42.634+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:42.837+0000 7f6acc48c700 20 osd.2 2622 reports for 0 queries 2022-11-06T03:15:43.324+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:43.324+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:43.324+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:43.324+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:43.607+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:44.339+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:44.339+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:44.339+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:44.339+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:44.583+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:45.204+0000 7f6abac69700 5 osd.2 2622 heartbeat osd_stat(store_statfs(0x1be20b17000/0x13a000000/0x37e3ec00000, data 0x1bee3d79e98/0x1bee40e7000, compress 0x0/0x0/0x0, omap 0xb286e0, meta 0x1394d7920), peers [] op hist []) 2022-11-06T03:15:45.204+0000 7f6abac69700 20 osd.2 2622 check_full_status cur ratio 0.499741, physical ratio 0.499741, new state none 2022-11-06T03:15:45.376+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:45.376+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:45.376+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:45.376+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:45.554+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:46.381+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:46.381+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:46.381+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:46.381+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:46.572+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:47.336+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:47.336+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:47.336+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:47.336+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:47.525+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:47.837+0000 7f6acc48c700 20 osd.2 2622 reports for 0 queries 2022-11-06T03:15:48.348+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:48.348+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:48.348+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:48.348+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:48.563+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:49.396+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:49.396+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:49.396+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:49.396+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:49.551+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:50.427+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:50.427+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:50.427+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:50.427+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:50.556+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:51.104+0000 7f6abac69700 5 osd.2 2622 heartbeat osd_stat(store_statfs(0x1be20b17000/0x13a000000/0x37e3ec00000, data 0x1bee3d79e98/0x1bee40e7000, compress 0x0/0x0/0x0, omap 0xb286e0, meta 0x1394d7920), peers [] op hist []) 2022-11-06T03:15:51.104+0000 7f6abac69700 20 osd.2 2622 check_full_status cur ratio 0.499741, physical ratio 0.499741, new state none 2022-11-06T03:15:51.475+0000 7f6adbd1b700 10 osd.2 2622 tick
I can not find have useful information, I try to restart osd.2, this problem have no reproduction.
I guess this problem may occur here:
3367 bool OSDMonitor::preprocess_boot(MonOpRequestRef op) 3368 { ... 3467 if (osdmap.exists(from) && 3468 osdmap.get_info(from).up_from > m->version && 3469 osdmap.get_most_recent_addrs(from).legacy_equals( 3470 m->get_orig_source_addrs())) { 3471 dout(7) << "prepare_boot msg from before last up_from, ignoring" << dendl; 3472 send_latest(op, m->sb.current_epoch+1); 3473 return true; 3474 } ... 3488 }
Updated by Radoslaw Zarzynski over 1 year ago
Thanks for the update! Yeah, it might stuck there. To confirm we would logs with increased debugs (maybe debug_mon = 10
) from all monitors.
Keeping the ticket in Need More Info
state.
Updated by yite gu over 1 year ago
I already restart osd daemon, but have no reproduct. If it happens again, I will collect more logs
Updated by Joshua Baergen over 1 year ago
We saw this happen to roughly a dozen OSDs (1-2 per host for some hosts) during a recent upgrade from Nautilus to Pacific (16.2.9). In each case, restarting the OSD allowed it to boot without issues.
We'll try to increase logging if we see this again during future upgrades or restarts. Is there any other information that would be useful?
Updated by Joshua Baergen over 1 year ago
Something that's probably worth mentioning - we had noup set in the cluster for each upgrade, and we wait until all OSDs are in preboot until we clear it. The issue in the example below is that OSD 74 was at state "booting", which was unexpected by our automation.
This is what "ceph daemon osd.XX status" said at the time:
{
"cluster_fsid": "...",
"osd_fsid": "...",
"whoami": 74,
"state": "booting",
"oldest_map": 541874,
"newest_map": 544290,
"num_pgs": 171
}
Compared to one that had entered preboot:
{
"cluster_fsid": "...",
"osd_fsid": "...",
"whoami": 75,
"state": "preboot",
"oldest_map": 541984,
"newest_map": 544291,
"num_pgs": 172
}
The difference in newest_map is probably just due to time progressing between me collecting the two statuses. osd.74's status indicates that somehow it progressed past the NOUP guard in _preboot(), whereas osd.75 indicates that it was properly blocked waiting for NOUP to clear. (Right?) That seems bad, and I wonder if it provides a clue as to the sequence of events that could have led to this state?
Updated by Joshua Baergen over 1 year ago
The more I dig, the more I'm thinking that this might be some race to do with noup, and probably has nothing to do with the originally-reported issue. If that gets confirmed I can file a separate ticket.
Updated by Radoslaw Zarzynski over 1 year ago
- Assignee set to Prashant D
The issue during the upgrade looks awfully similar to a downstream Prashant has working on.
Prashant, would find some time to take a look?
Updated by Prashant D over 1 year ago
Radoslaw Zarzynski wrote:
The issue during the upgrade looks awfully similar to a downstream Prashant has working on.
Prashant, would find some time to take a look?
Thanks Radek.
@Joshua, Is this a containerized environment or rook one ?
I sense 2 possibilities here :
1. The pid nonces are relatively small here and does indicates that this is a containerized environment.
e.g [v2:10.173.211.24:6810/39,v1:10.173.211.24:6813/39] <--- 39 - nonce out of 64 bits is too small.
This causes the OSDMonitor to not to recognize the OSD has been restarted instead considers it as the osd has already been booted. It is because ceph-osd is not running as pid 1 or env CEPH_USE_RANDOM_NONCE (introduced post Octopus release) is not set to indicate that osd daemon is running in a containerized environment. Refer https://github.com/ceph/ceph/pull/39739 for more details.
The code snippet from OSDMonitor::preprocess_boot() points to this issue and you should see "preprocess_boot dup from" for the OSD in the mon log (needs debug_mon >=7 to this log message) which is failing to boot and got stuck in booting state.
// already booted?
if (osdmap.is_up(from) &&
osdmap.get_addrs(from).legacy_equals(m->get_orig_source_addrs()) &&
osdmap.get_cluster_addrs(from).legacy_equals(m->cluster_addrs)) {
// yup.
dout(7) << "preprocess_boot dup from " << m->get_orig_source()
<< " " << m->get_orig_source_addrs()
<< " =~ " << osdmap.get_addrs(from) << dendl;
_booted(op, false);
return true;
}
2. The up flag has a caveat if we unset at a particular stage of OSD boot sequence. The OSD will get stuck in the booting state if remove noup flag when OSD is in preboot state (before _send_boot). This is a rare corner case and the possibility of hitting this issue is 2/10 attempts. The OSD does not get stuck in booting state if we remove noup flag after sometime of OSD restart phase.
We need osd debug logs as requested by Radoslaw in comment#1 along with mon logs with debug_mon=20 to confirm possible cause for this issue. Based on initial information on this tracker, I am suspecting possibility #1 is the real cause for OSD not booting (did we unset noup flag at the time of booting osd.2 ?).
Updated by Joshua Baergen over 1 year ago
@Prashant - thanks! Yes, this is containerized, so that's certainly possible in our case.
Updated by Joshua Baergen over 1 year ago
@Prashant, I was thinking about this further. Although it is a containerized env, hostpid=true so the PIDs should be fine. I think you were getting that OSD data from the original report.
In any case, as noted in previous updates, I don't think that what I was encountering has any relation to what the OP encountered, so feel free to ignore my updates in the context of this ticket.
Updated by Radoslaw Zarzynski over 1 year ago
Updated by Prashant D about 1 year ago
Updated by yite gu about 1 year ago
Prashant D wrote:
yite gu wrote:
Radoslaw Zarzynski wrote:
Per the comment #11 I'm redirecting Prashant's questions from comment #9 to the reporter.
@yite gu: is the deployment containerized?
Yes
Thanks. Can you try with ceph version v17.1.0 or later ?
v17.1.0 is a latest version, I'm worried about stability. But I prepare for update to v16.2.10
Updated by Radoslaw Zarzynski about 1 year ago
Per comment #9 there are 2 hypothesizes at the moment:
1) the nonce issue (small should be visible in log entries from messenger from e.g. debug_ms=5
),
2) the noup. IIUC Prashant wants to exclude it, so he asked: "did we unset noup flag at the time of booting osd.2 ?". yite gu, do we have an answer to that?
Updated by Radoslaw Zarzynski about 1 year ago
It's the nonce issue for sure. Per the ceph osd dump
from the description:
osd.0 up ... [v2:10.173.211.24:6810/39,v1:10.173.211.24:6813/39] [v2:10.173.211.24:6815/39,v1:10.173.211.24:6818/39] osd.1 up ... [v2:10.173.211.25:6808/39,v1:10.173.211.25:6809/39] [v2:10.173.211.25:6810/39,v1:10.173.211.25:6811/39] osd.2 down ...:10.173.211.24:6800/39,v1:10.173.211.24:6801/39] [v2:10.173.211.24:6802/39,v1:10.173.211.24:6803/39] osd.3 up ... [v2:10.173.211.25:6800/40,v1:10.173.211.25:6801/40] [v2:10.173.211.25:6802/40,v1:10.173.211.25:6803/40] osd.4 up ... [v2:10.173.211.24:6807/39,v1:10.173.211.24:6809/39] [v2:10.173.211.24:6811/39,v1:10.173.211.24:6812/39] osd.5 up ... [v2:10.173.211.26:6808/40,v1:10.173.211.26:6810/40] [v2:10.173.211.26:6812/40,v1:10.173.211.26:6814/40] osd.6 up ... [v2:10.173.211.25:6814/41,v1:10.173.211.25:6816/41] [v2:10.173.211.25:6818/41,v1:10.173.211.25:6819/41] osd.7 up ... [v2:10.173.211.26:6800/39,v1:10.173.211.26:6801/39] [v2:10.173.211.26:6802/39,v1:10.173.211.26:6803/39] osd.8 up ... [v2:10.173.211.26:6809/39,v1:10.173.211.26:6811/39] [v2:10.173.211.26:6813/39,v1:10.173.211.26:6815/39]
../src/msg/Messenger.cc-uint64_t Messenger::get_pid_nonce() ../src/msg/Messenger.cc-{ ../src/msg/Messenger.cc- uint64_t nonce = getpid(); ../src/msg/Messenger.cc: if (nonce == 1 || getenv("CEPH_USE_RANDOM_NONCE")) { ../src/msg/Messenger.cc- // we're running in a container; use a random number instead! ../src/msg/Messenger.cc- nonce = ceph::util::generate_random_number<uint64_t>(); ../src/msg/Messenger.cc- } ../src/msg/Messenger.cc- return nonce; ../src/msg/Messenger.cc-}
yite gu, you should be able to workaround the problem just by setting the CEPH_USE_RANDOM_NONCE
env. variable.
Updated by Radoslaw Zarzynski about 1 year ago
- Priority changed from Normal to High
Updated by yite gu about 1 year ago
Radoslaw Zarzynski wrote:
Per comment #9 there are 2 hypothesizes at the moment:
1) the nonce issue (small should be visible in log entries from messenger from e.g.
debug_ms=5
),
2) the noup. IIUC Prashant wants to exclude it, so he asked: "did we unset noup flag at the time of booting osd.2 ?". yite gu, do we have an answer to that?
2. I have no set noup flag.
Updated by Radoslaw Zarzynski about 1 year ago
- Status changed from Need More Info to In Progress
- Assignee changed from Prashant D to Radoslaw Zarzynski
Updated by yite gu about 1 year ago
Radoslaw Zarzynski wrote:
Thanks for the update! Yeah, it might stuck there. To confirm we would logs with increased debugs (maybe
debug_mon = 10
) from all monitors.Keeping the ticket in
Need More Info
state.
Radoslaw, this problem occurs again. I upload mon log with debug_mon=10. I keep the scene. What else do you need to tell me
Updated by yite gu about 1 year ago
2023-02-28T13:00:24.060+0000 7f3baa2cc700 0 auth: could not find secret_id=7559 2023-02-28T13:00:24.060+0000 7f3baa2cc700 0 cephx: verify_authorizer could not get service secret for service osd secret_id=7559 2023-02-28T13:00:24.060+0000 7f3ba9acb700 0 auth: could not find secret_id=7559 2023-02-28T13:00:24.060+0000 7f3ba9acb700 0 cephx: verify_authorizer could not get service secret for service osd secret_id=7559 2023-02-28T13:00:24.060+0000 7f3ba9acb700 0 auth: could not find secret_id=7559 2023-02-28T13:00:24.060+0000 7f3ba9acb700 0 cephx: verify_authorizer could not get service secret for service osd secret_id=7559
This is look like abnormal
Updated by Radoslaw Zarzynski about 1 year ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 50344
Thanks, yite gu!
The fix is: https://github.com/ceph/ceph/pull/50344/commits/fb868d4e71d3871cbd17cfbd4a536470e5c023f3.
If you don't want to wait for backport and then upgrade the cluster, you may set the CEPH_USE_RANDOM_NONCE
env. variable for all daemons as a workaround (might be labor-intensive though).
Updated by yite gu about 1 year ago
Radoslaw Zarzynski wrote:
Thanks, yite gu!
The fix is: https://github.com/ceph/ceph/pull/50344/commits/fb868d4e71d3871cbd17cfbd4a536470e5c023f3.
If you don't want to wait for backport and then upgrade the cluster, you may set theCEPH_USE_RANDOM_NONCE
env. variable for all daemons as a workaround (might be labor-intensive though).
Why nonce cause to this issue?
Updated by Radoslaw Zarzynski about 1 year ago
The unwanted nonce match causes that OSDMonitor::preprocess_boot()
returns true
, and thus prevents OSDMonitor::prepare_boot()
from being called (see the PaxosService::dispatch()
). The net result is this OSD can't be marked up.
/**
* Look at the query; if the query can be handled without changing state,
* do so.
*
* @param m A query message
* @returns 'true' if the query was handled (e.g., was a read that got
* answered, was a state change that has no effect); 'false'
* otherwise.
*/
virtual bool preprocess_query(MonOpRequestRef op) = 0;
bool PaxosService::dispatch(MonOpRequestRef op)
{
// ...
// preprocess
if (preprocess_query(op))
return true; // easy!
// ...
// update
if (!prepare_update(op)) {
// no changes made.
return true;
}
bool OSDMonitor::preprocess_query(MonOpRequestRef op)
{
op->mark_osdmon_event(__func__);
Message *m = op->get_req();
dout(10) << "preprocess_query " << *m << " from " << m->get_orig_source_inst() << dendl;
switch (m->get_type()) {
// ...
case MSG_OSD_BOOT:
return preprocess_boot(op);
// ...
}
bool OSDMonitor::preprocess_boot(MonOpRequestRef op)
{
op->mark_osdmon_event(__func__);
auto m = op->get_req<MOSDBoot>();
int from = m->get_orig_source_inst().name.num();
// ...
// already booted?
if (osdmap.is_up(from) &&
osdmap.get_addrs(from).legacy_equals(m->get_orig_source_addrs()) &&
osdmap.get_cluster_addrs(from).legacy_equals(m->cluster_addrs)) {
// yup.
dout(7) << "preprocess_boot dup from " << m->get_orig_source()
<< " " << m->get_orig_source_addrs()
<< " =~ " << osdmap.get_addrs(from) << dendl;
_booted(op, false);
return true;
}
// ...
dout(10) << "preprocess_boot from " << m->get_orig_source_inst() << dendl;
return false;
ignore:
return true;
}
Updated by Radoslaw Zarzynski about 1 year ago
- Related to Bug #49534: osd connection loop added
Updated by yite gu about 1 year ago
Radoslaw Zarzynski wrote:
The unwanted nonce match causes that
OSDMonitor::preprocess_boot()
returnstrue
, and thus preventsOSDMonitor::prepare_boot()
from being called (see thePaxosService::dispatch()
). The net result is this OSD can't be marked up.
bool OSDMonitor::preprocess_boot(MonOpRequestRef op) { op->mark_osdmon_event(__func__); auto m = op->get_req<MOSDBoot>(); int from = m->get_orig_source_inst().name.num(); // ... // already booted? if (osdmap.is_up(from) && osdmap.get_addrs(from).legacy_equals(m->get_orig_source_addrs()) && osdmap.get_cluster_addrs(from).legacy_equals(m->cluster_addrs)) { // yup. dout(7) << "preprocess_boot dup from " << m->get_orig_source() << " " << m->get_orig_source_addrs() << " =~ " << osdmap.get_addrs(from) << dendl; _booted(op, false); return true; } // ... dout(10) << "preprocess_boot from " << m->get_orig_source_inst() << dendl; return false; ignore: return true; }
Thanks, Radek.
I got it, but osdmap.is_up(from) returns true in OSDMonitor::preprocess_boot. target osd is down now, Should not it return false?
Updated by Radoslaw Zarzynski about 1 year ago
That's a very good question. How about providing logs from both monitors and the problematic OSD with extra verbosity (debug_osd=20
, debug_mon=20
, debug_ms=1
)?
Updated by yite gu about 1 year ago
Radoslaw Zarzynski wrote:
That's a very good question. How about providing logs from both monitors and the problematic OSD with extra verbosity (
debug_osd=20
,debug_mon=20
,debug_ms=1
)?
Radex, problematic environment is already no exists. I think osd already send boot request to mon, but mon hava no reply osd cause to osd waiting for new map, so osd should be send boot request again after a period time?
Updated by yite gu about 1 year ago
yite gu wrote:
Radoslaw Zarzynski wrote:
That's a very good question. How about providing logs from both monitors and the problematic OSD with extra verbosity (
debug_osd=20
,debug_mon=20
,debug_ms=1
)?Radex, problematic environment is already no exists. I think osd already send boot request to mon, but mon hava no reply osd cause to osd waiting for new map, so osd should be send boot request again after a period time?
if you think this is a practicable plan, I can try to write a patch.
Updated by Radoslaw Zarzynski about 1 year ago
Sure, feel free to send anytime! Would love to review.
Updated by yite gu about 1 year ago
Radoslaw Zarzynski wrote:
That's a very good question. How about providing logs from both monitors and the problematic OSD with extra verbosity (
debug_osd=20
,debug_mon=20
,debug_ms=1
)?
ceph cluster have one mon and one osd
1. osd down at e659 osdmap
2023-02-28T12:54:43.847+0000 7fc3c8ad7700 -1 received signal: Terminated from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0 2023-02-28T12:54:43.847+0000 7fc3c8ad7700 -1 osd.0 659 *** Got signal Terminated *** 2023-02-28T12:54:43.847+0000 7fc3c8ad7700 -1 osd.0 659 *** Immediate shutdown (osd_fast_shutdown=true) ***
2. osd start
2023-02-28T13:00:17.526+0000 7f3bafd48f40 0 set uid:gid to 167:167 (ceph:ceph) 2023-02-28T13:00:17.526+0000 7f3bafd48f40 0 ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable), process ceph-osd, pid 39
3. osd init completed, and get osdmap from mon
2023-02-28T13:00:24.062+0000 7f3bafd48f40 1 osd.0 659 start_boot
On the other hand:
1. mon down at e659 osdmap
2023-02-28T12:54:43.850+0000 7fcf7073b700 -1 received signal: Terminated from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0 2023-02-28T12:54:43.850+0000 7fcf7073b700 -1 mon.a@0(leader) e1 *** Got Signal Terminated *** 2023-02-28T12:54:43.850+0000 7fcf7073b700 1 mon.a@0(leader) e1 shutdown
2. mon start
2023-02-28T13:00:17.365+0000 7f9e3d43c700 0 mon.a@-1(probing) e1 my rank is now 0 (was -1) 2023-02-28T13:00:17.366+0000 7f9e3d43c700 0 log_channel(cluster) log [INF] : mon.a is new leader, mons a in quorum (ranks 0)
3. osd is up at e659 osdmap
2023-02-28T13:00:17.366+0000 7f9e3d43c700 0 log_channel(cluster) log [DBG] : osdmap e659: 1 total, 1 up, 1 in
4. osd marked down by mon_osd_report_timeout
2023-02-28T13:30:17.625+0000 7f9e292d2700 0 log_channel(cluster) log [INF] : osd.0 marked down after no beacon for 900.126008 seconds 2023-02-28T13:30:17.625+0000 7f9e292d2700 -1 mon.a@0(leader).osd e663 no beacon from osd.0 since 2023-02-28T13:15:17.500651+0000, 900.126008 seconds ago. marking down 2023-02-28T13:30:17.625+0000 7f9e292d2700 0 log_channel(cluster) log [WRN] : Health check failed: 1 osds down (OSD_DOWN) 2023-02-28T13:30:17.625+0000 7f9e292d2700 0 log_channel(cluster) log [WRN] : Health check failed: 1 host (1 osds) down (OSD_HOST_DOWN) 2023-02-28T13:30:17.625+0000 7f9e292d2700 0 log_channel(cluster) log [WRN] : Health check failed: 1 root (1 osds) down (OSD_ROOT_DOWN) 2023-02-28T13:30:17.625+0000 7f9e252ca700 1 mon.a@0(leader).osd e664 e664: 1 total, 0 up, 1 in
so, osdmap.is_up(from) have possible return true.
Updated by Radoslaw Zarzynski about 1 year ago
My understanding:
0. The OSD (as a process) got down BUT it was up in the OSDMap -- these are 2 different things.
1. Monitor receives the MOSDBoot
message from a new process of osd.0.
2. As the osd.0-in-osdmap is up AND the addresses are equal (likely because of of the nonce conflict), the _booted()
path is taken.
3. This means activation of hearbeats.
Updated by yite gu about 1 year ago
Radoslaw Zarzynski wrote:
My understanding:
0. The OSD (as a process) got down BUT it was up in the OSDMap -- these are 2 different things.
1. Monitor receives theMOSDBoot
message from a new process of osd.0.
2. As the osd.0-in-osdmap is up AND the addresses are equal (likely because of of the nonce conflict), the_booted()
path is taken.
3. This means activation of hearbeats.
So, we should use random nonce at everytime restart
Updated by yite gu about 1 year ago
Radoslaw Zarzynski wrote:
My understanding:
0. The OSD (as a process) got down BUT it was up in the OSDMap -- these are 2 different things.
1. Monitor receives theMOSDBoot
message from a new process of osd.0.
2. As the osd.0-in-osdmap is up AND the addresses are equal (likely because of of the nonce conflict), the_booted()
path is taken.
3. This means activation of hearbeats.
Hi, Radex,I reproduct is issue:
2023-04-19T11:42:15.822+0000 7f457e66e700 7 mon.a@0(leader).osd e809 preprocess_boot dup from osd.0 [v2:10.172.9.23:6802/39,v1:10.172.9.23:6803/39] =~ [v2:10.172.9.23:6802/39,v1:10.172.9.23:6803/39] 2023-04-19T11:42:15.822+0000 7f457e66e700 7 mon.a@0(leader).osd e809 _booted osd.0 v2:10.172.9.23:6802/39 w 0 from 809 2023-04-19T11:42:15.822+0000 7f457e66e700 5 mon.a@0(leader).osd e809 send_latest to osd.0 v2:10.172.9.23:6802/39 start 810 2023-04-19T11:42:15.822+0000 7f457e66e700 5 mon.a@0(leader).osd e809 send_incremental [810..809] to osd.0
Updated by yite gu about 1 year ago
There are two conditions that can cause this problem:
1. The OSDmap version held by the MON is the same as the OSD's OSDmap version, resulting in the MON not sending the OSDmap to the OSD.
2. The MON does have a higher version of OSDmap than the OSD and has sent the OSDmap to the OSD. However, the OSD was not marked up again, resulting in the OSD's up_from still being from the previous version. When the OSD validates the new version of OSDmap's up_from, it finds that the new up_from is smaller than the previous version of OSDmap held by the OSD.
epoch_t _bind_epoch = service.get_bind_epoch(); if (osdmap->is_up(whoami) && osdmap->get_addrs(whoami).legacy_equals( client_messenger->get_myaddrs()) && _bind_epoch < osdmap->get_up_from(whoami)) { if (is_booting()) { dout(1) << "state: booting -> active" << dendl; set_state(STATE_ACTIVE); do_restart = false; // set incarnation so that osd_reqid_t's we generate for our // objecter requests are unique across restarts. service.objecter->set_client_incarnation(osdmap->get_epoch()); cancel_pending_failures(); } }
Updated by Radoslaw Zarzynski almost 1 year ago
Yup, the patch does exactly that – it ensures that a random nonce is always used.
Updated by yite gu almost 1 year ago
Radoslaw Zarzynski wrote:
Yup, the patch does exactly that – it ensures that a random nonce is always used.
I hope this patch backport to Pacific
Updated by Radoslaw Zarzynski 12 months ago
- Backport set to pacific,quincy,reef
Updated by Radoslaw Zarzynski 12 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot 12 months ago
- Copied to Backport #59675: pacific: osd:tick checking mon for new map added
Updated by Backport Bot 12 months ago
- Copied to Backport #59676: reef: osd:tick checking mon for new map added
Updated by Backport Bot 12 months ago
- Copied to Backport #59677: quincy: osd:tick checking mon for new map added