Bug #57977
osd:tick checking mon for new map
0%
Description
ceph version: 15.2.7
my cluster have a osd down, and it unable join the osdmap.
# ceph osd dump epoch 2614 fsid cf4db967-2c13-4921-9a1e-e83165bb2bbf created 2022-03-24T07:59:06.550966+0000 modified 2022-11-04T08:02:42.559653+0000 flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit crush_version 8 full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 require_min_compat_client luminous min_compat_client luminous require_osd_release octopus pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth pool 2 'replicapool-ssd' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 2591 lfor 0/0/823 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd max_osd 9 osd.0 up in weight 1 up_from 2176 up_thru 2281 down_at 2175 last_clean_interval [38,2170) [v2:10.173.211.24:6810/39,v1:10.173.211.24:6813/39] [v2:10.173.211.24:6815/39,v1:10.173.211.24:6818/39] exists,up 6654dff6-bcb8-4616-9ac2-9e024e230cee osd.1 up in weight 1 up_from 2182 up_thru 2268 down_at 2181 last_clean_interval [30,2170) [v2:10.173.211.25:6808/39,v1:10.173.211.25:6809/39] [v2:10.173.211.25:6810/39,v1:10.173.211.25:6811/39] exists,up 0aed2e15-9ea8-45c7-88c9-4c190fb861a9 osd.2 down out weight 0 up_from 36 up_thru 2186 down_at 2189 last_clean_interval [16,33) [v2:10.173.211.24:6800/39,v1:10.173.211.24:6801/39] [v2:10.173.211.24:6802/39,v1:10.173.211.24:6803/39] autoout,exists 48816ed4-88ff-4284-a70e-041c8373bbe7 osd.3 up in weight 1 up_from 2179 up_thru 2275 down_at 2178 last_clean_interval [32,2170) [v2:10.173.211.25:6800/40,v1:10.173.211.25:6801/40] [v2:10.173.211.25:6802/40,v1:10.173.211.25:6803/40] exists,up c0e5263b-2e57-451b-a2af-1f108a29d868 osd.4 up in weight 1 up_from 2174 up_thru 2280 down_at 2173 last_clean_interval [36,2170) [v2:10.173.211.24:6807/39,v1:10.173.211.24:6809/39] [v2:10.173.211.24:6811/39,v1:10.173.211.24:6812/39] exists,up b699246d-f283-44af-8a68-0ef8852163be osd.5 up in weight 1 up_from 2185 up_thru 2264 down_at 2184 last_clean_interval [42,2170) [v2:10.173.211.26:6808/40,v1:10.173.211.26:6810/40] [v2:10.173.211.26:6812/40,v1:10.173.211.26:6814/40] exists,up 00e70df6-412c-41eb-9eed-742c6ea3291f osd.6 up in weight 1 up_from 2180 up_thru 2260 down_at 2179 last_clean_interval [32,2170) [v2:10.173.211.25:6814/41,v1:10.173.211.25:6816/41] [v2:10.173.211.25:6818/41,v1:10.173.211.25:6819/41] exists,up ef94c655-1c53-46a4-95fe-687ecddc1738 osd.7 up in weight 1 up_from 2176 up_thru 2266 down_at 2175 last_clean_interval [42,2170) [v2:10.173.211.26:6800/39,v1:10.173.211.26:6801/39] [v2:10.173.211.26:6802/39,v1:10.173.211.26:6803/39] exists,up 57108459-ce4c-4bb1-aea1-4c16aaaa6708 osd.8 up in weight 1 up_from 2186 up_thru 2248 down_at 2185 last_clean_interval [42,2170) [v2:10.173.211.26:6809/39,v1:10.173.211.26:6811/39] [v2:10.173.211.26:6813/39,v1:10.173.211.26:6815/39] exists,up 63f416e5-426b-4864-903d-b5c7b3da10bd
osd.2 log report as below:
2022-11-04T07:44:30.345+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map 2022-11-04T07:45:00.653+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map 2022-11-04T07:45:31.601+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map 2022-11-04T07:46:01.773+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map 2022-11-04T07:46:32.611+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map 2022-11-04T07:47:02.735+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map 2022-11-04T07:47:33.586+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map 2022-11-04T07:48:04.463+0000 7f6adbd1b700 1 osd.2 2607 tick checking mon for new map
History
#1 Updated by Radoslaw Zarzynski 3 months ago
- Status changed from New to Need More Info
Octopus is EOL. Does it happen on a supported release?
Regardless of that, could you please provide logs from this OSD with debug_osd=20
debug_ms=5
?
#2 Updated by yite gu 3 months ago
Radoslaw Zarzynski wrote:
Octopus is EOL. Does it happen on a supported release?
Regardless of that, could you please provide logs from this OSD with
debug_osd=20
debug_ms=5
?
At that time, I set debug_osd=20, but have no set debug_ms=5:
2022-11-06T03:13:54.194+0000 7f6adbd1b700 1 osd.2 2622 tick checking mon for new map 2022-11-06T03:14:24.205+0000 7f6adbd1b700 1 osd.2 2622 tick checking mon for new map 2022-11-06T03:14:54.242+0000 7f6adbd1b700 1 osd.2 2622 tick checking mon for new map 2022-11-06T03:15:24.260+0000 7f6adbd1b700 1 osd.2 2622 tick checking mon for new map 2022-11-06T03:15:40.303+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:40.303+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:40.303+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:40.303+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:40.504+0000 7f6abac69700 5 osd.2 2622 heartbeat osd_stat(store_statfs(0x1be20b17000/0x13a000000/0x37e3ec00000, data 0x1bee3d79e98/0x1bee40e7000, compress 0x0/0x0/0x0, omap 0xb286e0, meta 0x1394d7920), peers [] op hist []) 2022-11-06T03:15:40.504+0000 7f6abac69700 20 osd.2 2622 check_full_status cur ratio 0.499741, physical ratio 0.499741, new state none 2022-11-06T03:15:40.636+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:41.257+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:41.257+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:41.257+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:41.257+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:41.591+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:42.297+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:42.297+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:42.297+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:42.297+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:42.634+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:42.837+0000 7f6acc48c700 20 osd.2 2622 reports for 0 queries 2022-11-06T03:15:43.324+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:43.324+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:43.324+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:43.324+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:43.607+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:44.339+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:44.339+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:44.339+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:44.339+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:44.583+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:45.204+0000 7f6abac69700 5 osd.2 2622 heartbeat osd_stat(store_statfs(0x1be20b17000/0x13a000000/0x37e3ec00000, data 0x1bee3d79e98/0x1bee40e7000, compress 0x0/0x0/0x0, omap 0xb286e0, meta 0x1394d7920), peers [] op hist []) 2022-11-06T03:15:45.204+0000 7f6abac69700 20 osd.2 2622 check_full_status cur ratio 0.499741, physical ratio 0.499741, new state none 2022-11-06T03:15:45.376+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:45.376+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:45.376+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:45.376+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:45.554+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:46.381+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:46.381+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:46.381+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:46.381+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:46.572+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:47.336+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:47.336+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:47.336+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:47.336+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:47.525+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:47.837+0000 7f6acc48c700 20 osd.2 2622 reports for 0 queries 2022-11-06T03:15:48.348+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:48.348+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:48.348+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:48.348+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:48.563+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:49.396+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:49.396+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:49.396+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:49.396+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:49.551+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:50.427+0000 7f6adbd1b700 10 osd.2 2622 tick 2022-11-06T03:15:50.427+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start 2022-11-06T03:15:50.427+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish 2022-11-06T03:15:50.427+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000 2022-11-06T03:15:50.556+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock 2022-11-06T03:15:51.104+0000 7f6abac69700 5 osd.2 2622 heartbeat osd_stat(store_statfs(0x1be20b17000/0x13a000000/0x37e3ec00000, data 0x1bee3d79e98/0x1bee40e7000, compress 0x0/0x0/0x0, omap 0xb286e0, meta 0x1394d7920), peers [] op hist []) 2022-11-06T03:15:51.104+0000 7f6abac69700 20 osd.2 2622 check_full_status cur ratio 0.499741, physical ratio 0.499741, new state none 2022-11-06T03:15:51.475+0000 7f6adbd1b700 10 osd.2 2622 tick
I can not find have useful information, I try to restart osd.2, this problem have no reproduction.
I guess this problem may occur here:
3367 bool OSDMonitor::preprocess_boot(MonOpRequestRef op) 3368 { ... 3467 if (osdmap.exists(from) && 3468 osdmap.get_info(from).up_from > m->version && 3469 osdmap.get_most_recent_addrs(from).legacy_equals( 3470 m->get_orig_source_addrs())) { 3471 dout(7) << "prepare_boot msg from before last up_from, ignoring" << dendl; 3472 send_latest(op, m->sb.current_epoch+1); 3473 return true; 3474 } ... 3488 }
#3 Updated by Radoslaw Zarzynski 3 months ago
Thanks for the update! Yeah, it might stuck there. To confirm we would logs with increased debugs (maybe debug_mon = 10
) from all monitors.
Keeping the ticket in Need More Info
state.
#5 Updated by Joshua Baergen 2 months ago
We saw this happen to roughly a dozen OSDs (1-2 per host for some hosts) during a recent upgrade from Nautilus to Pacific (16.2.9). In each case, restarting the OSD allowed it to boot without issues.
We'll try to increase logging if we see this again during future upgrades or restarts. Is there any other information that would be useful?
#6 Updated by Joshua Baergen 2 months ago
Something that's probably worth mentioning - we had noup set in the cluster for each upgrade, and we wait until all OSDs are in preboot until we clear it. The issue in the example below is that OSD 74 was at state "booting", which was unexpected by our automation.
This is what "ceph daemon osd.XX status" said at the time:
{
"cluster_fsid": "...",
"osd_fsid": "...",
"whoami": 74,
"state": "booting",
"oldest_map": 541874,
"newest_map": 544290,
"num_pgs": 171
}
Compared to one that had entered preboot:
{
"cluster_fsid": "...",
"osd_fsid": "...",
"whoami": 75,
"state": "preboot",
"oldest_map": 541984,
"newest_map": 544291,
"num_pgs": 172
}
The difference in newest_map is probably just due to time progressing between me collecting the two statuses. osd.74's status indicates that somehow it progressed past the NOUP guard in _preboot(), whereas osd.75 indicates that it was properly blocked waiting for NOUP to clear. (Right?) That seems bad, and I wonder if it provides a clue as to the sequence of events that could have led to this state?
#7 Updated by Joshua Baergen 2 months ago
The more I dig, the more I'm thinking that this might be some race to do with noup, and probably has nothing to do with the originally-reported issue. If that gets confirmed I can file a separate ticket.
#8 Updated by Radoslaw Zarzynski 2 months ago
- Assignee set to Prashant D
The issue during the upgrade looks awfully similar to a downstream Prashant has working on.
Prashant, would find some time to take a look?
#9 Updated by Prashant D 26 days ago
Radoslaw Zarzynski wrote:
The issue during the upgrade looks awfully similar to a downstream Prashant has working on.
Prashant, would find some time to take a look?
Thanks Radek.
@Joshua, Is this a containerized environment or rook one ?
I sense 2 possibilities here :
1. The pid nonces are relatively small here and does indicates that this is a containerized environment.
e.g [v2:10.173.211.24:6810/39,v1:10.173.211.24:6813/39] <--- 39 - nonce out of 64 bits is too small.
This causes the OSDMonitor to not to recognize the OSD has been restarted instead considers it as the osd has already been booted. It is because ceph-osd is not running as pid 1 or env CEPH_USE_RANDOM_NONCE (introduced post Octopus release) is not set to indicate that osd daemon is running in a containerized environment. Refer https://github.com/ceph/ceph/pull/39739 for more details.
The code snippet from OSDMonitor::preprocess_boot() points to this issue and you should see "preprocess_boot dup from" for the OSD in the mon log (needs debug_mon >=7 to this log message) which is failing to boot and got stuck in booting state.
// already booted?
if (osdmap.is_up(from) &&
osdmap.get_addrs(from).legacy_equals(m->get_orig_source_addrs()) &&
osdmap.get_cluster_addrs(from).legacy_equals(m->cluster_addrs)) {
// yup.
dout(7) << "preprocess_boot dup from " << m->get_orig_source()
<< " " << m->get_orig_source_addrs()
<< " =~ " << osdmap.get_addrs(from) << dendl;
_booted(op, false);
return true;
}
2. The up flag has a caveat if we unset at a particular stage of OSD boot sequence. The OSD will get stuck in the booting state if remove noup flag when OSD is in preboot state (before _send_boot). This is a rare corner case and the possibility of hitting this issue is 2/10 attempts. The OSD does not get stuck in booting state if we remove noup flag after sometime of OSD restart phase.
We need osd debug logs as requested by Radoslaw in comment#1 along with mon logs with debug_mon=20 to confirm possible cause for this issue. Based on initial information on this tracker, I am suspecting possibility #1 is the real cause for OSD not booting (did we unset noup flag at the time of booting osd.2 ?).
#10 Updated by Joshua Baergen 25 days ago
@Prashant - thanks! Yes, this is containerized, so that's certainly possible in our case.
#11 Updated by Joshua Baergen 22 days ago
@Prashant, I was thinking about this further. Although it is a containerized env, hostpid=true so the PIDs should be fine. I think you were getting that OSD data from the original report.
In any case, as noted in previous updates, I don't think that what I was encountering has any relation to what the OP encountered, so feel free to ignore my updates in the context of this ticket.