Actions
Bug #38786
closedautoscale down can lead to max_pg_per_osd limit
Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
we adjust pgp_num all the way down to the target, which can make osds hit the max_pgs_per_osd if it's going too far.
saw this on the lab cluster,
pool 4 'libvirt-pool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 3541 pgp_num 12 pg_num_target 4 pgp_num_target 4 autoscale_mode on last_change 1096029 lfor 0/1096029/1096025 flags hashpspool min_write_recency_for_promote 1 stripe_width 0 application libvirt
root@reesi006:~# ceph pg ls activating PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 4.1a4 0 0 0 0 0 0 0 0 activating 6m 0'0 1097345:1340 [24,69,46]p24 [24,69,46]p24 2019-03-14 09:14:01.948061 2019-03-14 09:14:01.948061 4.1b4 1 3 0 0 4194304 0 0 2 activating+degraded 6m 405308'2 1097345:1336 [24,69,46]p24 [24,69,46]p24 2019-03-15 19:01:51.539002 2019-03-13 21:52:53.336251 4.1d4 1 3 0 0 4194304 0 0 2 activating+degraded 6m 405540'2 1097345:1331 [24,69,46]p24 [24,69,46]p24 2019-03-14 13:00:44.455889 2019-03-11 01:35:02.464301 4.1e4 1 3 0 0 4194304 0 0 2 activating+degraded 6m 405300'2 1097345:1329 [24,69,46]p24 [24,69,46]p24 2019-03-15 16:37:55.135432 2019-03-11 23:01:19.714108 4.1f4 0 0 0 0 0 0 0 0 activating 6m 0'0 1097345:1327 [24,69,46]p24 [24,69,46]p24 2019-03-15 20:01:49.959489 2019-03-13 11:36:29.521438 4.224 0 0 0 0 0 0 0 0 activating 6m 0'0 1097345:1329 [24,69,46]p24 [24,69,46]p24 2019-03-14 18:03:28.050687 2019-03-13 14:21:37.077207 4.234 0 0 0 0 0 0 0 0 activating 6m 0'0 1097345:1326 [24,69,46]p24 [24,69,46]p24 2019-03-15 20:32:26.246396 2019-03-09 03:55:44.093963 4.244 0 0 0 0 0 0 0 0 activating 6m 0'0 1097345:1353 [24,69,46]p24 [24,69,46]p24 2019-03-14 12:07:57.114855 2019-03-10 14:26:19.601347 4.274 0 0 0 0 0 0 0 0 activating 6m 0'0 1097345:1328 [24,69,46]p24 [24,69,46]p24 2019-03-15 20:27:12.199661 2019-03-13 21:45:15.440068 4.284 0 0 0 0 0 0 0 0 activating 6m 0'0 1097345:1368 [24,69,46]p24 [24,69,46]p24 2019-03-14 17:03:26.116879 2019-03-10 02:07:07.974526 4.294 0 0 0 0 0 0 0 0 activating 6m 0'0 1097345:1327 [24,69,46]p24 [24,69,46]p24 2019-03-14 04:59:38.598132 2019-03-09 08:07:24.415034 4.2c4 0 0 0 0 0 0 0 0 activating 6m 0'0 1097345:1353 [24,69,46]p24 [24,69,46]p24 2019-03-14 14:44:03.602640 2019-03-09 11:39:35.133023 4.2e4 0 0 0 0 0 0 0 0 activating 6m 0'0 1097345:1327 [24,69,46]p24 [24,69,46]p24 2019-03-14 09:30:22.009430 2019-03-13 07:00:56.840736 4.314 0 0 0 0 0 0 0 0 activating 6m 0'0 1097345:1329 [24,69,46]p24 [24,69,46]p24 2019-03-13 23:23:02.233771 2019-03-10 02:29:07.039327 4.324 0 0 0 0 0 0 0 0 activating 6m 0'0 1097345:1327 [24,69,46]p24 [24,69,46]p24 2019-03-14 08:54:31.642665 2019-03-12 22:16:31.374791 4.334 1 3 0 0 4194304 0 0 2 activating+degraded 6m 405540'2 1097345:1329 [24,69,46]p24 [24,69,46]p24 2019-03-14 21:06:03.101677 2019-03-14 21:06:03.101677 4.34c 0 0 0 0 0 0 0 0 activating 6m 0'0 1097345:1328 [24,69,46]p24 [24,69,46]p24 2019-03-14 11:19:44.965680 2019-03-13 00:53:39.903505 4.364 0 0 0 0 0 0 0 0 activating 6m 0'0 1097345:1328 [24,69,46]p24 [24,69,46]p24 2019-03-14 10:35:56.589251 2019-03-13 09:02:07.524679 ...
and on osd.69,
2019-03-16 18:27:06.104 7f0d5de25700 10 osd.69 1097345 handle_pg_create_info hit max pg, dropping 2019-03-16 18:27:06.104 7f0d5de25700 10 osd.69 1097345 handle_pg_create_info hit max pg, dropping 2019-03-16 18:27:06.112 7f0d5e626700 10 osd.69 1097345 handle_pg_create_info hit max pg, dropping 2019-03-16 18:27:06.112 7f0d5de25700 10 osd.69 1097345 handle_pg_create_info hit max pg, dropping
Actions