Project

General

Profile

Bug #57977

osd:tick checking mon for new map

Added by yite gu 3 months ago. Updated 22 days ago.

Status:
Need More Info
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version: 15.2.7
my cluster have a osd down, and it unable join the osdmap.

# ceph osd dump
epoch 2614
fsid cf4db967-2c13-4921-9a1e-e83165bb2bbf
created 2022-03-24T07:59:06.550966+0000
modified 2022-11-04T08:02:42.559653+0000
flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
crush_version 8
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client luminous
min_compat_client luminous
require_osd_release octopus
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 'replicapool-ssd' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 2591 lfor 0/0/823 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
max_osd 9
osd.0 up   in  weight 1 up_from 2176 up_thru 2281 down_at 2175 last_clean_interval [38,2170) [v2:10.173.211.24:6810/39,v1:10.173.211.24:6813/39] [v2:10.173.211.24:6815/39,v1:10.173.211.24:6818/39] exists,up 6654dff6-bcb8-4616-9ac2-9e024e230cee
osd.1 up   in  weight 1 up_from 2182 up_thru 2268 down_at 2181 last_clean_interval [30,2170) [v2:10.173.211.25:6808/39,v1:10.173.211.25:6809/39] [v2:10.173.211.25:6810/39,v1:10.173.211.25:6811/39] exists,up 0aed2e15-9ea8-45c7-88c9-4c190fb861a9
osd.2 down out weight 0 up_from 36 up_thru 2186 down_at 2189 last_clean_interval [16,33) [v2:10.173.211.24:6800/39,v1:10.173.211.24:6801/39] [v2:10.173.211.24:6802/39,v1:10.173.211.24:6803/39] autoout,exists 48816ed4-88ff-4284-a70e-041c8373bbe7
osd.3 up   in  weight 1 up_from 2179 up_thru 2275 down_at 2178 last_clean_interval [32,2170) [v2:10.173.211.25:6800/40,v1:10.173.211.25:6801/40] [v2:10.173.211.25:6802/40,v1:10.173.211.25:6803/40] exists,up c0e5263b-2e57-451b-a2af-1f108a29d868
osd.4 up   in  weight 1 up_from 2174 up_thru 2280 down_at 2173 last_clean_interval [36,2170) [v2:10.173.211.24:6807/39,v1:10.173.211.24:6809/39] [v2:10.173.211.24:6811/39,v1:10.173.211.24:6812/39] exists,up b699246d-f283-44af-8a68-0ef8852163be
osd.5 up   in  weight 1 up_from 2185 up_thru 2264 down_at 2184 last_clean_interval [42,2170) [v2:10.173.211.26:6808/40,v1:10.173.211.26:6810/40] [v2:10.173.211.26:6812/40,v1:10.173.211.26:6814/40] exists,up 00e70df6-412c-41eb-9eed-742c6ea3291f
osd.6 up   in  weight 1 up_from 2180 up_thru 2260 down_at 2179 last_clean_interval [32,2170) [v2:10.173.211.25:6814/41,v1:10.173.211.25:6816/41] [v2:10.173.211.25:6818/41,v1:10.173.211.25:6819/41] exists,up ef94c655-1c53-46a4-95fe-687ecddc1738
osd.7 up   in  weight 1 up_from 2176 up_thru 2266 down_at 2175 last_clean_interval [42,2170) [v2:10.173.211.26:6800/39,v1:10.173.211.26:6801/39] [v2:10.173.211.26:6802/39,v1:10.173.211.26:6803/39] exists,up 57108459-ce4c-4bb1-aea1-4c16aaaa6708
osd.8 up   in  weight 1 up_from 2186 up_thru 2248 down_at 2185 last_clean_interval [42,2170) [v2:10.173.211.26:6809/39,v1:10.173.211.26:6811/39] [v2:10.173.211.26:6813/39,v1:10.173.211.26:6815/39] exists,up 63f416e5-426b-4864-903d-b5c7b3da10bd

osd.2 log report as below:

2022-11-04T07:44:30.345+0000 7f6adbd1b700  1 osd.2 2607 tick checking mon for new map
2022-11-04T07:45:00.653+0000 7f6adbd1b700  1 osd.2 2607 tick checking mon for new map
2022-11-04T07:45:31.601+0000 7f6adbd1b700  1 osd.2 2607 tick checking mon for new map
2022-11-04T07:46:01.773+0000 7f6adbd1b700  1 osd.2 2607 tick checking mon for new map
2022-11-04T07:46:32.611+0000 7f6adbd1b700  1 osd.2 2607 tick checking mon for new map
2022-11-04T07:47:02.735+0000 7f6adbd1b700  1 osd.2 2607 tick checking mon for new map
2022-11-04T07:47:33.586+0000 7f6adbd1b700  1 osd.2 2607 tick checking mon for new map
2022-11-04T07:48:04.463+0000 7f6adbd1b700  1 osd.2 2607 tick checking mon for new map

History

#1 Updated by Radoslaw Zarzynski 3 months ago

  • Status changed from New to Need More Info

Octopus is EOL. Does it happen on a supported release?

Regardless of that, could you please provide logs from this OSD with debug_osd=20 debug_ms=5?

#2 Updated by yite gu 3 months ago

Radoslaw Zarzynski wrote:

Octopus is EOL. Does it happen on a supported release?

Regardless of that, could you please provide logs from this OSD with debug_osd=20 debug_ms=5?

At that time, I set debug_osd=20, but have no set debug_ms=5:

2022-11-06T03:13:54.194+0000 7f6adbd1b700  1 osd.2 2622 tick checking mon for new map
2022-11-06T03:14:24.205+0000 7f6adbd1b700  1 osd.2 2622 tick checking mon for new map
2022-11-06T03:14:54.242+0000 7f6adbd1b700  1 osd.2 2622 tick checking mon for new map
2022-11-06T03:15:24.260+0000 7f6adbd1b700  1 osd.2 2622 tick checking mon for new map
2022-11-06T03:15:40.303+0000 7f6adbd1b700 10 osd.2 2622 tick
2022-11-06T03:15:40.303+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start
2022-11-06T03:15:40.303+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish
2022-11-06T03:15:40.303+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000
2022-11-06T03:15:40.504+0000 7f6abac69700  5 osd.2 2622 heartbeat osd_stat(store_statfs(0x1be20b17000/0x13a000000/0x37e3ec00000, data 0x1bee3d79e98/0x1bee40e7000, compress 0x0/0x0/0x0, omap 0xb286e0, meta 0x1394d7920), peers [] op hist [])
2022-11-06T03:15:40.504+0000 7f6abac69700 20 osd.2 2622 check_full_status cur ratio 0.499741, physical ratio 0.499741, new state none 
2022-11-06T03:15:40.636+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock
2022-11-06T03:15:41.257+0000 7f6adbd1b700 10 osd.2 2622 tick
2022-11-06T03:15:41.257+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start
2022-11-06T03:15:41.257+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish
2022-11-06T03:15:41.257+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000
2022-11-06T03:15:41.591+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock
2022-11-06T03:15:42.297+0000 7f6adbd1b700 10 osd.2 2622 tick
2022-11-06T03:15:42.297+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start
2022-11-06T03:15:42.297+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish
2022-11-06T03:15:42.297+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000
2022-11-06T03:15:42.634+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock
2022-11-06T03:15:42.837+0000 7f6acc48c700 20 osd.2 2622 reports for 0 queries
2022-11-06T03:15:43.324+0000 7f6adbd1b700 10 osd.2 2622 tick
2022-11-06T03:15:43.324+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start
2022-11-06T03:15:43.324+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish
2022-11-06T03:15:43.324+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000
2022-11-06T03:15:43.607+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock
2022-11-06T03:15:44.339+0000 7f6adbd1b700 10 osd.2 2622 tick
2022-11-06T03:15:44.339+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start
2022-11-06T03:15:44.339+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish
2022-11-06T03:15:44.339+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000
2022-11-06T03:15:44.583+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock
2022-11-06T03:15:45.204+0000 7f6abac69700  5 osd.2 2622 heartbeat osd_stat(store_statfs(0x1be20b17000/0x13a000000/0x37e3ec00000, data 0x1bee3d79e98/0x1bee40e7000, compress 0x0/0x0/0x0, omap 0xb286e0, meta 0x1394d7920), peers [] op hist [])
2022-11-06T03:15:45.204+0000 7f6abac69700 20 osd.2 2622 check_full_status cur ratio 0.499741, physical ratio 0.499741, new state none 
2022-11-06T03:15:45.376+0000 7f6adbd1b700 10 osd.2 2622 tick
2022-11-06T03:15:45.376+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start
2022-11-06T03:15:45.376+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish
2022-11-06T03:15:45.376+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000
2022-11-06T03:15:45.554+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock
2022-11-06T03:15:46.381+0000 7f6adbd1b700 10 osd.2 2622 tick
2022-11-06T03:15:46.381+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start
2022-11-06T03:15:46.381+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish
2022-11-06T03:15:46.381+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000
2022-11-06T03:15:46.572+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock
2022-11-06T03:15:47.336+0000 7f6adbd1b700 10 osd.2 2622 tick
2022-11-06T03:15:47.336+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start
2022-11-06T03:15:47.336+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish
2022-11-06T03:15:47.336+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000
2022-11-06T03:15:47.525+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock
2022-11-06T03:15:47.837+0000 7f6acc48c700 20 osd.2 2622 reports for 0 queries
2022-11-06T03:15:48.348+0000 7f6adbd1b700 10 osd.2 2622 tick
2022-11-06T03:15:48.348+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start
2022-11-06T03:15:48.348+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish
2022-11-06T03:15:48.348+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000
2022-11-06T03:15:48.563+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock
2022-11-06T03:15:49.396+0000 7f6adbd1b700 10 osd.2 2622 tick
2022-11-06T03:15:49.396+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start
2022-11-06T03:15:49.396+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish
2022-11-06T03:15:49.396+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000
2022-11-06T03:15:49.551+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock
2022-11-06T03:15:50.427+0000 7f6adbd1b700 10 osd.2 2622 tick
2022-11-06T03:15:50.427+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- start
2022-11-06T03:15:50.427+0000 7f6adbd1b700 10 osd.2 2622 do_waiters -- finish
2022-11-06T03:15:50.427+0000 7f6adbd1b700 20 osd.2 2622 tick last_purged_snaps_scrub 2022-11-05T08:02:23.718424+0000 next 2022-11-06T08:02:23.718424+0000
2022-11-06T03:15:50.556+0000 7f6ada4a8700 10 osd.2 2622 tick_without_osd_lock
2022-11-06T03:15:51.104+0000 7f6abac69700  5 osd.2 2622 heartbeat osd_stat(store_statfs(0x1be20b17000/0x13a000000/0x37e3ec00000, data 0x1bee3d79e98/0x1bee40e7000, compress 0x0/0x0/0x0, omap 0xb286e0, meta 0x1394d7920), peers [] op hist [])
2022-11-06T03:15:51.104+0000 7f6abac69700 20 osd.2 2622 check_full_status cur ratio 0.499741, physical ratio 0.499741, new state none 
2022-11-06T03:15:51.475+0000 7f6adbd1b700 10 osd.2 2622 tick 

I can not find have useful information, I try to restart osd.2, this problem have no reproduction.
I guess this problem may occur here:
3367 bool OSDMonitor::preprocess_boot(MonOpRequestRef op)
3368 {
...
3467   if (osdmap.exists(from) &&
3468       osdmap.get_info(from).up_from > m->version &&
3469       osdmap.get_most_recent_addrs(from).legacy_equals(
3470         m->get_orig_source_addrs())) {
3471     dout(7) << "prepare_boot msg from before last up_from, ignoring" << dendl;
3472     send_latest(op, m->sb.current_epoch+1);
3473     return true;
3474   }
...
3488 }

#3 Updated by Radoslaw Zarzynski 3 months ago

Thanks for the update! Yeah, it might stuck there. To confirm we would logs with increased debugs (maybe debug_mon = 10) from all monitors.

Keeping the ticket in Need More Info state.

#4 Updated by yite gu 2 months ago

I already restart osd daemon, but have no reproduct. If it happens again, I will collect more logs

#5 Updated by Joshua Baergen 2 months ago

We saw this happen to roughly a dozen OSDs (1-2 per host for some hosts) during a recent upgrade from Nautilus to Pacific (16.2.9). In each case, restarting the OSD allowed it to boot without issues.

We'll try to increase logging if we see this again during future upgrades or restarts. Is there any other information that would be useful?

#6 Updated by Joshua Baergen 2 months ago

Something that's probably worth mentioning - we had noup set in the cluster for each upgrade, and we wait until all OSDs are in preboot until we clear it. The issue in the example below is that OSD 74 was at state "booting", which was unexpected by our automation.

This is what "ceph daemon osd.XX status" said at the time: {
"cluster_fsid": "...",
"osd_fsid": "...",
"whoami": 74,
"state": "booting",
"oldest_map": 541874,
"newest_map": 544290,
"num_pgs": 171
}

Compared to one that had entered preboot: {
"cluster_fsid": "...",
"osd_fsid": "...",
"whoami": 75,
"state": "preboot",
"oldest_map": 541984,
"newest_map": 544291,
"num_pgs": 172
}

The difference in newest_map is probably just due to time progressing between me collecting the two statuses. osd.74's status indicates that somehow it progressed past the NOUP guard in _preboot(), whereas osd.75 indicates that it was properly blocked waiting for NOUP to clear. (Right?) That seems bad, and I wonder if it provides a clue as to the sequence of events that could have led to this state?

#7 Updated by Joshua Baergen 2 months ago

The more I dig, the more I'm thinking that this might be some race to do with noup, and probably has nothing to do with the originally-reported issue. If that gets confirmed I can file a separate ticket.

#8 Updated by Radoslaw Zarzynski 2 months ago

  • Assignee set to Prashant D

The issue during the upgrade looks awfully similar to a downstream Prashant has working on.
Prashant, would find some time to take a look?

#9 Updated by Prashant D 26 days ago

Radoslaw Zarzynski wrote:

The issue during the upgrade looks awfully similar to a downstream Prashant has working on.
Prashant, would find some time to take a look?

Thanks Radek.

@Joshua, Is this a containerized environment or rook one ?

I sense 2 possibilities here :

1. The pid nonces are relatively small here and does indicates that this is a containerized environment.
e.g [v2:10.173.211.24:6810/39,v1:10.173.211.24:6813/39] <--- 39 - nonce out of 64 bits is too small.

This causes the OSDMonitor to not to recognize the OSD has been restarted instead considers it as the osd has already been booted. It is because ceph-osd is not running as pid 1 or env CEPH_USE_RANDOM_NONCE (introduced post Octopus release) is not set to indicate that osd daemon is running in a containerized environment. Refer https://github.com/ceph/ceph/pull/39739 for more details.

The code snippet from OSDMonitor::preprocess_boot() points to this issue and you should see "preprocess_boot dup from" for the OSD in the mon log (needs debug_mon >=7 to this log message) which is failing to boot and got stuck in booting state.

// already booted?
if (osdmap.is_up(from) &&
osdmap.get_addrs(from).legacy_equals(m->get_orig_source_addrs()) &&
osdmap.get_cluster_addrs(from).legacy_equals(m->cluster_addrs)) {
// yup.
dout(7) << "preprocess_boot dup from " << m->get_orig_source()
<< " " << m->get_orig_source_addrs()
<< " =~ " << osdmap.get_addrs(from) << dendl;
_booted(op, false);
return true;
}

2. The up flag has a caveat if we unset at a particular stage of OSD boot sequence. The OSD will get stuck in the booting state if remove noup flag when OSD is in preboot state (before _send_boot). This is a rare corner case and the possibility of hitting this issue is 2/10 attempts. The OSD does not get stuck in booting state if we remove noup flag after sometime of OSD restart phase.

We need osd debug logs as requested by Radoslaw in comment#1 along with mon logs with debug_mon=20 to confirm possible cause for this issue. Based on initial information on this tracker, I am suspecting possibility #1 is the real cause for OSD not booting (did we unset noup flag at the time of booting osd.2 ?).

#10 Updated by Joshua Baergen 25 days ago

@Prashant - thanks! Yes, this is containerized, so that's certainly possible in our case.

#11 Updated by Joshua Baergen 22 days ago

@Prashant, I was thinking about this further. Although it is a containerized env, hostpid=true so the PIDs should be fine. I think you were getting that OSD data from the original report.

In any case, as noted in previous updates, I don't think that what I was encountering has any relation to what the OP encountered, so feel free to ignore my updates in the context of this ticket.

#12 Updated by Radoslaw Zarzynski 22 days ago

Per the comment #11 I'm redirecting Prashant's questions from comment #9 to the reporter.

@yite gu: is the deployment containerized?

#13 Updated by yite gu 22 days ago

Radoslaw Zarzynski wrote:

Per the comment #11 I'm redirecting Prashant's questions from comment #9 to the reporter.

@yite gu: is the deployment containerized?

Yes

Also available in: Atom PDF