https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2019-01-19T08:42:05Z
Ceph
mgr - Bug #37940: upmap balancer won't refill underfull osds if zero overfull found
https://tracker.ceph.com/issues/37940?journal_id=127756
2019-01-19T08:42:05Z
xie xingguo
258156334@qq.com
<ul><li><strong>Assignee</strong> set to <i>xie xingguo</i></li></ul>
mgr - Bug #37940: upmap balancer won't refill underfull osds if zero overfull found
https://tracker.ceph.com/issues/37940?journal_id=127851
2019-01-21T16:52:49Z
Dan van der Ster
<ul></ul><p>Today I saw the opposite case: calc_pg_upmaps cannot find any upmaps to move PGs out of the most overfull osd.</p>
<p>I guess that this is because for each pg it tries to move, none of the up osds for that PG are in the underfull list <strong>and</strong> in a unique crush bucket.</p>
<pre>
2019-01-21 15:15:02.474148 7fc1ecc33700 10 total_deviation 1608.3 overfull 23,34,41,54,62,73,81,94,108,117,154,167,181,205,207,229,239,241,282,304,357,365,374,386,465,510,511,512,513,514,515,516,517,518,5
19,520,521,522,523,524,525,526,528,529,533,534,535,536,537,538,1137,1139,1142,1144,1151,1156,1159,1164,1168,1174,1179,1182,1186,1189,1204,1205,1210,1214,1215,1221,1222,1223,1224,1225,1226,1228,1231,1232,12
35,1237,1239,1240,1241,1242,1244,1245,1246,1247,1248,1249,1250,1251,1253 underfull [1121,80,192,183,468,491,152,155,184,406,1043,1045,1057,1062,1067,1081,1083,1089,1094,1099,1102,78,232,246,280,347,400,482
,464,1046,1054,1078,1087,1127,1298,10,13,17,48,87,121,136,180,273,371,383,500,8,409,1028,1035,1036,1050,1056,1068,1069,1077,1079,1092,1096,1097,1103,1271,1278,1281,1296,1297,1328,1334,4,46,63,93,116,135,14
0,144,168,228,237,276,288,294,300,327,337,353,392,128,452,463,1022,1024,1031,1034,1037,1041,1052,1059,1060,1064,1074,1076,1082,1255,1266,1269,1283,1286,1290,1310,1323,1335,16,26,35,43,82,85,153,161,172,173,218,224,242,259,266,284,295,303,308,318,326,334,335,366,369,419,474,1038,1044,1058,1070,1075,1080,1084,1091,1093,1107,1108,1111,1118,1128,1254,1280,1282,1306,1331,1,22,27,33,55,61,75,90,95,114,119,127,164,175,190,194,212,217,255,258,271,310,311,325,363,375,404,470,487,493,506,349,1021,1029,1032,1033,1039,1042,1047,1049,1051,1063,1071,1086,1098,1105,1109,1112,1113,1115,1120,1122,1124,1125,1160,1257,1258,1264,1267,1272,1274,1277,1284,1292,1301,1302,1314,1327]
2019-01-21 15:15:02.474204 7fc1ecc33700 10 osd.1239 move 25
2019-01-21 15:15:02.474342 7fc1ecc33700 10 trying 68.7e
2019-01-21 15:15:02.475060 7fc1ecc33700 10 trying 68.9d
2019-01-21 15:15:02.475707 7fc1ecc33700 10 trying 68.e2
2019-01-21 15:15:02.476336 7fc1ecc33700 10 trying 68.4fb
2019-01-21 15:15:02.476971 7fc1ecc33700 10 trying 68.b65
2019-01-21 15:15:02.477596 7fc1ecc33700 10 trying 68.df3
2019-01-21 15:15:02.478223 7fc1ecc33700 10 trying 68.e07
2019-01-21 15:15:02.478854 7fc1ecc33700 10 trying 68.e95
2019-01-21 15:15:02.479505 7fc1ecc33700 10 trying 68.1060
2019-01-21 15:15:02.480145 7fc1ecc33700 10 trying 68.1085
2019-01-21 15:15:02.480806 7fc1ecc33700 10 trying 68.1107
2019-01-21 15:15:02.481452 7fc1ecc33700 10 trying 68.130b
2019-01-21 15:15:02.482079 7fc1ecc33700 10 trying 68.13a9
2019-01-21 15:15:02.482727 7fc1ecc33700 10 trying 68.1581
2019-01-21 15:15:02.483352 7fc1ecc33700 10 trying 68.1590
2019-01-21 15:15:02.483974 7fc1ecc33700 10 trying 68.160f
2019-01-21 15:15:02.484612 7fc1ecc33700 10 trying 68.174f
2019-01-21 15:15:02.485276 7fc1ecc33700 10 trying 68.18c2
2019-01-21 15:15:02.485903 7fc1ecc33700 10 trying 68.1a3b
2019-01-21 15:15:02.486525 7fc1ecc33700 10 trying 68.1b4a
2019-01-21 15:15:02.487152 7fc1ecc33700 10 trying 68.1f19
2019-01-21 15:15:02.487797 7fc1ecc33700 10 trying 68.207d
2019-01-21 15:15:02.488428 7fc1ecc33700 10 trying 68.2123
2019-01-21 15:15:02.489047 7fc1ecc33700 10 trying 68.216e
2019-01-21 15:15:02.489675 7fc1ecc33700 10 trying 68.221c
2019-01-21 15:15:02.490300 7fc1ecc33700 10 trying 68.22cb
2019-01-21 15:15:02.490917 7fc1ecc33700 10 trying 68.23cf
2019-01-21 15:15:02.491588 7fc1ecc33700 10 trying 68.2483
2019-01-21 15:15:02.492225 7fc1ecc33700 10 trying 68.25c7
2019-01-21 15:15:02.492856 7fc1ecc33700 10 trying 68.268b
2019-01-21 15:15:02.493479 7fc1ecc33700 10 trying 68.26c3
2019-01-21 15:15:02.494100 7fc1ecc33700 10 trying 68.275d
2019-01-21 15:15:02.494739 7fc1ecc33700 10 trying 68.27c4
2019-01-21 15:15:02.495363 7fc1ecc33700 10 trying 68.28ff
2019-01-21 15:15:02.495988 7fc1ecc33700 10 trying 68.294a
2019-01-21 15:15:02.496611 7fc1ecc33700 10 trying 68.2b19
2019-01-21 15:15:02.497237 7fc1ecc33700 10 trying 68.2bcf
2019-01-21 15:15:02.497865 7fc1ecc33700 10 trying 68.2c00
2019-01-21 15:15:02.498494 7fc1ecc33700 10 trying 68.2c12
2019-01-21 15:15:02.499113 7fc1ecc33700 10 trying 68.2c16
2019-01-21 15:15:02.499751 7fc1ecc33700 10 trying 68.2e36
2019-01-21 15:15:02.500383 7fc1ecc33700 10 trying 68.2f2a
2019-01-21 15:15:02.501042 7fc1ecc33700 10 trying 68.3031
2019-01-21 15:15:02.501687 7fc1ecc33700 10 trying 68.30de
2019-01-21 15:15:02.502341 7fc1ecc33700 10 trying 68.320b
2019-01-21 15:15:02.502976 7fc1ecc33700 10 trying 68.3377
2019-01-21 15:15:02.503610 7fc1ecc33700 10 trying 68.33fa
2019-01-21 15:15:02.504224 7fc1ecc33700 10 trying 68.341b
2019-01-21 15:15:02.504867 7fc1ecc33700 10 trying 68.346b
2019-01-21 15:15:02.505520 7fc1ecc33700 10 trying 68.3692
2019-01-21 15:15:02.506162 7fc1ecc33700 10 trying 68.3708
2019-01-21 15:15:02.506794 7fc1ecc33700 10 trying 68.3873
2019-01-21 15:15:02.507427 7fc1ecc33700 10 trying 68.38d3
2019-01-21 15:15:02.508047 7fc1ecc33700 10 trying 68.3a8f
2019-01-21 15:15:02.508690 7fc1ecc33700 10 trying 68.3acf
2019-01-21 15:15:02.509317 7fc1ecc33700 10 trying 68.3c79
2019-01-21 15:15:02.509940 7fc1ecc33700 10 trying 68.3e24
2019-01-21 15:15:02.510571 7fc1ecc33700 10 osd.1226 move 23
...
</pre>
<p>So perhaps here it will also be better to more aggressively fill the underfull osds list.</p>
mgr - Bug #37940: upmap balancer won't refill underfull osds if zero overfull found
https://tracker.ceph.com/issues/37940?journal_id=128015
2019-01-24T00:59:33Z
xie xingguo
258156334@qq.com
<ul></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/26039">https://github.com/ceph/ceph/pull/26039</a></p>
mgr - Bug #37940: upmap balancer won't refill underfull osds if zero overfull found
https://tracker.ceph.com/issues/37940?journal_id=128016
2019-01-24T01:00:31Z
xie xingguo
258156334@qq.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>4</i></li></ul>
mgr - Bug #37940: upmap balancer won't refill underfull osds if zero overfull found
https://tracker.ceph.com/issues/37940?journal_id=128042
2019-01-24T14:37:51Z
Dan van der Ster
<ul></ul><p>Thanks, this looks great!<br />Do you already have a luminous branch with this backported? I can eventually build and try it here, but the cherry-pick is a non-trivial merge.</p>
mgr - Bug #37940: upmap balancer won't refill underfull osds if zero overfull found
https://tracker.ceph.com/issues/37940?journal_id=128046
2019-01-24T15:21:53Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>4</i> to <i>Pending Backport</i></li><li><strong>Backport</strong> set to <i>mimic, luminous</i></li></ul>
mgr - Bug #37940: upmap balancer won't refill underfull osds if zero overfull found
https://tracker.ceph.com/issues/37940?journal_id=128047
2019-01-24T15:22:07Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/38036">Backport #38036</a>: mimic: upmap balancer won't refill underfull osds if zero overfull found</i> added</li></ul>
mgr - Bug #37940: upmap balancer won't refill underfull osds if zero overfull found
https://tracker.ceph.com/issues/37940?journal_id=128049
2019-01-24T15:22:15Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/38037">Backport #38037</a>: luminous: upmap balancer won't refill underfull osds if zero overfull found</i> added</li></ul>
mgr - Bug #37940: upmap balancer won't refill underfull osds if zero overfull found
https://tracker.ceph.com/issues/37940?journal_id=128112
2019-01-25T13:57:25Z
Dan van der Ster
<ul></ul><p>Thanks Xie! This is helping, but I think there are still some improvements needed.</p>
<p>1) With this code, I see lots of improvements to the general balancing, but unfortunately our most overfull osds are unlucky and not getting balanced, because all of the possible PG upmaps to make are already remapped (or not remappable). For example:</p>
<pre>
2019-01-25 14:28:32.225243 7fc6be64e700 10 osd.1239 move 27
2019-01-25 14:28:32.225387 7fc6be64e700 20 already remapped 68.16
2019-01-25 14:28:32.225390 7fc6be64e700 20 already remapped 68.3a
2019-01-25 14:28:32.225391 7fc6be64e700 20 already remapped 68.5f
2019-01-25 14:28:32.225393 7fc6be64e700 20 already remapped 68.64
2019-01-25 14:28:32.225394 7fc6be64e700 10 trying 68.7e
2019-01-25 14:28:32.226151 7fc6be64e700 10 trying 68.9d
2019-01-25 14:28:32.226883 7fc6be64e700 10 trying 68.e2
2019-01-25 14:28:32.227656 7fc6be64e700 20 already remapped 68.20a
2019-01-25 14:28:32.227668 7fc6be64e700 20 already remapped 68.27b
2019-01-25 14:28:32.227669 7fc6be64e700 20 already remapped 68.28c
...
2019-01-25 14:28:32.261401 7fc6be64e700 10 trying 68.3e24
2019-01-25 14:28:32.262133 7fc6be64e700 20 already remapped 68.3e31
2019-01-25 14:28:32.262140 7fc6be64e700 20 already remapped 68.3f12
2019-01-25 14:28:32.262141 7fc6be64e700 20 already remapped 68.3f56
2019-01-25 14:28:32.262142 7fc6be64e700 20 already remapped 68.3f7b
2019-01-25 14:28:32.262143 7fc6be64e700 20 already remapped 68.3ffa
2019-01-25 14:28:32.262145 7fc6be64e700 10 osd.1226 move 23
...
</pre>
<p>Here are some of those existing upmaps which prevent 1239 from moving off some PGs:</p>
<pre>
pg_upmap_items 68.16 [1318,1271] // now [1239,1036,1150,180,325,1271]
pg_upmap_items 68.3a [1030,80] // now [1111,1267,1054,1208,1239,80]
pg_upmap_items 68.5f [1158,1133] // now [1239,1063,1293,824,1133,1108]
</pre>
<p>and here are the osd's involved in those upmaps:</p>
<pre>
ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME
1318 hdd 5.45799 1.00000 5.46TiB 4.00TiB 1.46TiB 73.23 1.01 184 osd.1318
1271 hdd 5.45799 1.00000 5.46TiB 3.85TiB 1.60TiB 70.61 0.98 176 osd.1271
1030 hdd 5.45799 1.00000 5.46TiB 4.02TiB 1.43TiB 73.72 1.02 185 osd.1030
80 hdd 5.43700 1.00000 5.46TiB 3.98TiB 1.48TiB 72.91 1.01 183 osd.80
1158 hdd 5.45799 1.00000 5.46TiB 4.02TiB 1.44TiB 73.65 1.02 185 osd.1158
1133 hdd 5.45799 1.00000 5.46TiB 4.02TiB 1.44TiB 73.68 1.02 185 osd.1133
</pre>
<p>The code printing "already remapped" is as follows:<br /><pre>
if (tmp.have_pg_upmaps(pg)) {
ldout(cct, 20) << " already remapped " << pg << dendl;
continue;
}
</pre></p>
<p>Could we add some logic to append a 2nd, 3rd, etc... pair to existing upmaps, rather than just continuing to the next PG?</p>
<p>2) A second optimization: I noticed that we only call rm-pg-upmap-items to remove an existing upmap which remaps to an overfull OSD. We should also handle the underfull case: we can rm-pg-upmap-items if there exist any upmaps which remapped a PG out from an underfull OSD.</p>
<p>Thanks!</p>
mgr - Bug #37940: upmap balancer won't refill underfull osds if zero overfull found
https://tracker.ceph.com/issues/37940?journal_id=128152
2019-01-25T20:22:27Z
Dan van der Ster
<ul></ul><p>Regarding my point (1) above, I think I've understood out the real issue. In my cluster, we have a 4+2 EC pool and 6 racks of servers. Due to some disks being down, the racks are identical in crush weight but one rack has slightly fewer up/in OSDs than the others.</p>
<p>So basically, all OSDs in this rack are overfull, and due to the constraint that pool size = num racks in my cluster, there are very few, or no underfill OSDs to remap to.</p>
<p>Here's a toy example to demonstrate this case:</p>
<p>3.0 rack a<br /> 1.0 OSD.0: 2 pgs<br /> 1.0 OSD.1: 2 pgs<br /> 1.0 OSD.2: 2 pgs<br />3.0 rack b<br /> 1.0 OSD.3: 2 pgs<br /> 1.0 OSD.4: 2 pgs<br /> 1.0 OSD.5: 2 pgs<br />3.0 rack c<br /> 1.0 OSD.6: 2 pgs<br /> 1.0 OSD.7: 4 pgs<br /> 1.0 OSD.8: 0 pgs (is down/out)</p>
<p>Pool has size 3, pg_num 6, replicated across racks.</p>
<p>In that toy example, I think we <strong>want</strong> an upmap to balance osds 6 and 7 with 3 pgs each. But it won't because there are no underfull osds in rack c.</p>
mgr - Bug #37940: upmap balancer won't refill underfull osds if zero overfull found
https://tracker.ceph.com/issues/37940?journal_id=131666
2019-03-12T14:20:59Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul>