Bug #23763
closedupgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
0%
Description
This happened in a luminous-x/point-to-point run. Logs in teuthology:/home/yuriw/logs/2387999/
Versions at this point:
# date Tue Apr 17 00:12:19 UTC 2018 # ceph osd versions { "ceph version 12.2.4-425-g2f700ad (2f700ada826ff722787c8381b255a709c8b7abf4) luminous (stable)": 6 } # ceph mon versions { "ceph version 12.2.4-425-g2f700ad (2f700ada826ff722787c8381b255a709c8b7abf4) luminous (stable)": 2, "ceph version 13.0.2-1040-g110015a (110015a538f553d860e25445ae014d40b7ac9922) mimic (dev)": 1 } # ceph mgr versions { "ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)": 1 }
PG dump shows all active+clean, with 151 pgs:
# ceph pg dump dumped all version 4750 stamp 2018-04-17 00:00:33.963740 last_osdmap_epoch 0 last_pg_scan 0 full_ratio 0 nearfull_ratio 0 PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP 3.2 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:18:35.423230 0'0 1286:1224 [1,5] 1 [1,5] 1 0'0 2018-04-16 22:05:21.250213 0'0 2018-04-16 22:05:21.250213 2.3 7 0 0 0 0 592 18 18 active+clean 2018-04-16 22:18:35.357025 692'18 1286:1251 [5,2] 5 [5,2] 5 0'0 2018-04-16 22:05:20.236637 0'0 2018-04-16 22:05:20.236637 1.0 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:17:09.309674 0'0 1287:1219 [3,0] 3 [3,0] 3 0'0 2018-04-16 22:05:16.163309 0'0 2018-04-16 22:05:16.163309 3.3 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:17:51.710789 0'0 1286:1200 [4,0] 4 [4,0] 4 0'0 2018-04-16 22:05:21.250213 0'0 2018-04-16 22:05:21.250213 2.2 2 0 0 0 0 0 9 9 active+clean 2018-04-16 22:18:35.349997 692'9 1286:1227 [5,1] 5 [5,1] 5 0'0 2018-04-16 22:05:20.236637 0'0 2018-04-16 22:05:20.236637 1.1 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:18:35.353517 0'0 1286:1216 [5,0] 5 [5,0] 5 0'0 2018-04-16 22:05:16.163309 0'0 2018-04-16 22:05:16.163309 3.0 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:16:25.880098 0'0 1286:1224 [1,2] 1 [1,2] 1 0'0 2018-04-16 22:05:21.250213 0'0 2018-04-16 22:05:21.250213 1.2 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:17:51.713145 0'0 1286:1204 [4,0] 4 [4,0] 4 0'0 2018-04-16 22:05:16.163309 0'0 2018-04-16 22:05:16.163309 2.1 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:16:25.875594 0'0 1286:1232 [2,1] 2 [2,1] 2 0'0 2018-04-16 22:05:20.236637 0'0 2018-04-16 22:05:20.236637 3.1 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:17:51.786315 0'0 1287:1267 [0,4] 0 [0,4] 0 0'0 2018-04-16 22:05:21.250213 0'0 2018-04-16 22:05:21.250213 1.3 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:18:35.427456 0'0 1286:1224 [1,5] 1 [1,5] 1 0'0 2018-04-16 22:05:16.163309 0'0 2018-04-16 22:05:16.163309 2.0 4 0 0 0 0 0 8 8 active+clean 2018-04-16 22:17:09.311425 19'8 1287:1235 [3,1] 3 [3,1] 3 0'0 2018-04-16 22:05:20.236637 0'0 2018-04-16 22:05:20.236637 1.4 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:17:09.379082 0'0 1286:1224 [1,3] 1 [1,3] 1 0'0 2018-04-16 22:05:16.163309 0'0 2018-04-16 22:05:16.163309 1.5 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:17:51.719632 0'0 1286:1210 [4,2] 4 [4,2] 4 0'0 2018-04-16 22:05:16.163309 0'0 2018-04-16 22:05:16.163309 3.4 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:16:25.880240 0'0 1286:1224 [1,2] 1 [1,2] 1 0'0 2018-04-16 22:05:21.250213 0'0 2018-04-16 22:05:21.250213 2.5 4 0 0 0 0 136 6 6 active+clean 2018-04-16 22:17:09.309483 19'6 1287:1230 [3,0] 3 [3,0] 3 0'0 2018-04-16 22:05:20.236637 0'0 2018-04-16 22:05:20.236637 1.6 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:17:09.309300 0'0 1287:1219 [3,0] 3 [3,0] 3 0'0 2018-04-16 22:05:16.163309 0'0 2018-04-16 22:05:16.163309 3.5 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:18:35.351653 0'0 1286:1218 [5,2] 5 [5,2] 5 0'0 2018-04-16 22:05:21.250213 0'0 2018-04-16 22:05:21.250213 2.4 5 0 0 0 0 1518 10 10 active+clean 2018-04-16 22:15:43.345541 692'10 1286:1251 [1,0] 1 [1,0] 1 0'0 2018-04-16 22:05:20.236637 0'0 2018-04-16 22:05:20.236637 1.7 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:18:35.352788 0'0 1286:1222 [5,2] 5 [5,2] 5 0'0 2018-04-16 22:05:16.163309 0'0 2018-04-16 22:05:16.163309 266.a 2 0 0 0 0 192 21 21 active+clean 2018-04-16 22:45:55.559266 762'21 1286:593 [5,1] 5 [5,1] 5 0'0 2018-04-16 22:45:00.222579 0'0 2018-04-16 22:45:00.222579 256.0 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.410581 0'0 1286:594 [2,5] 2 [2,5] 2 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 257.1 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:47.084436 0'0 1287:587 [0,4] 0 [0,4] 0 0'0 2018-04-16 22:35:45.980716 0'0 2018-04-16 22:35:45.980716 258.2 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:49.125072 0'0 1287:585 [0,5] 0 [0,5] 0 0'0 2018-04-16 22:35:48.019766 0'0 2018-04-16 22:35:48.019766 259.3 0 0 0 0 0 0 273 273 active+clean 2018-04-16 22:35:51.139211 691'273 1286:849 [1,2] 1 [1,2] 1 0'0 2018-04-16 22:35:50.052386 0'0 2018-04-16 22:35:50.052386 260.4 85 0 0 0 0 267407 703 703 active+clean 2018-04-16 22:35:53.402521 691'703 1286:1296 [4,1] 4 [4,1] 4 0'0 2018-04-16 22:35:52.074742 0'0 2018-04-16 22:35:52.074742 261.5 0 0 0 0 0 0 450 450 active+clean 2018-04-16 22:36:32.959905 691'450 1286:1317 [1,2] 1 [1,2] 1 0'0 2018-04-16 22:36:31.888429 0'0 2018-04-16 22:36:31.888429 262.6 34 0 0 0 0 82847745 398 398 active+clean 2018-04-16 22:36:34.950244 691'398 1286:1640 [5,2] 5 [5,2] 5 0'0 2018-04-16 22:36:33.910981 0'0 2018-04-16 22:36:33.910981 263.7 0 0 0 0 0 0 36 36 active+clean 2018-04-16 22:37:11.004961 691'36 1286:638 [4,2] 4 [4,2] 4 0'0 2018-04-16 22:37:09.938741 0'0 2018-04-16 22:37:09.938741 256.1 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.528559 0'0 1286:583 [5,0] 5 [5,0] 5 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 257.0 1 0 0 0 0 333 1 1 active+clean 2018-04-16 22:35:47.042361 675'1 1287:609 [3,5] 3 [3,5] 3 0'0 2018-04-16 22:35:45.980716 [198/1915] 2018-04-16 22:35:45.980716 258.3 1 0 0 0 0 0 21 21 active+clean 2018-04-16 22:35:49.088902 691'21 1287:687 [3,2] 3 [3,2] 3 0'0 2018-04-16 22:35:48.019766 0'0 2018-04-16 22:35:48.019766 259.2 0 0 0 0 0 0 194 194 active+clean 2018-04-16 22:35:51.094588 691'194 1287:774 [3,4] 3 [3,4] 3 0'0 2018-04-16 22:35:50.052386 0'0 2018-04-16 22:35:50.052386 260.5 77 0 0 0 0 166779 460 460 active+clean 2018-04-16 22:35:53.129042 691'460 1287:1059 [3,5] 3 [3,5] 3 0'0 2018-04-16 22:35:52.074742 0'0 2018-04-16 22:35:52.074742 261.4 0 0 0 0 0 0 394 394 active+clean 2018-04-16 22:36:32.973503 691'394 1286:1282 [5,0] 5 [5,0] 5 0'0 2018-04-16 22:36:31.888429 0'0 2018-04-16 22:36:31.888429 262.7 42 0 0 0 0 118509568 386 386 active+clean 2018-04-16 22:36:34.999873 691'386 1286:1490 [1,5] 1 [1,5] 1 0'0 2018-04-16 22:36:33.910981 0'0 2018-04-16 22:36:33.910981 263.6 0 0 0 0 0 0 28 28 active+clean 2018-04-16 22:37:10.988913 691'28 1287:629 [3,4] 3 [3,4] 3 0'0 2018-04-16 22:37:09.938741 0'0 2018-04-16 22:37:09.938741 266.8 1 0 0 0 0 64 9 9 active+clean 2018-04-16 22:45:55.526246 697'9 1287:582 [3,5] 3 [3,5] 3 0'0 2018-04-16 22:45:00.222579 0'0 2018-04-16 22:45:00.222579 256.2 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.384206 0'0 1287:589 [0,5] 0 [0,5] 0 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 257.3 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:47.028371 0'0 1287:582 [3,2] 3 [3,2] 3 0'0 2018-04-16 22:35:45.980716 0'0 2018-04-16 22:35:45.980716 258.0 1 0 0 0 0 0 21 21 active+clean 2018-04-16 22:35:49.102805 691'21 1286:680 [1,4] 1 [1,4] 1 0'0 2018-04-16 22:35:48.019766 0'0 2018-04-16 22:35:48.019766 259.1 0 0 0 0 0 0 1589 1589 active+clean 2018-04-16 22:35:51.144120 691'1589 1287:2576 [0,3] 0 [0,3] 0 0'0 2018-04-16 22:35:50.052386 0'0 2018-04-16 22:35:50.052386 260.6 56 0 0 0 0 132726 368 368 active+clean 2018-04-16 22:35:53.129795 691'368 1287:964 [3,4] 3 [3,4] 3 0'0 2018-04-16 22:35:52.074742 0'0 2018-04-16 22:35:52.074742 261.7 0 0 0 0 0 0 478 478 active+clean 2018-04-16 22:36:32.928792 691'478 1286:1343 [5,1] 5 [5,1] 5 0'0 2018-04-16 22:36:31.888429 0'0 2018-04-16 22:36:31.888429 262.4 40 0 0 0 0 99190785 241 241 active+clean 2018-04-16 22:36:34.985413 691'241 1287:1088 [3,4] 3 [3,4] 3 0'0 2018-04-16 22:36:33.910981 0'0 2018-04-16 22:36:33.910981 263.5 0 0 0 0 0 0 27 27 active+clean 2018-04-16 22:37:11.068952 691'27 1287:632 [0,5] 0 [0,5] 0 0'0 2018-04-16 22:37:09.938741 0'0 2018-04-16 22:37:09.938741 266.9 1 0 0 0 0 128 12 12 active+clean 2018-04-16 22:45:55.534070 697'12 1286:584 [5,0] 5 [5,0] 5 0'0 2018-04-16 22:45:00.222579 0'0 2018-04-16 22:45:00.222579 256.3 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.482721 0'0 1286:575 [4,0] 4 [4,0] 4 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 257.2 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:47.066472 0'0 1286:573 [4,2] 4 [4,2] 4 0'0 2018-04-16 22:35:45.980716 0'0 2018-04-16 22:35:45.980716 258.1 1 0 0 0 0 0 21 21 active+clean 2018-04-16 22:35:49.087167 691'21 1287:861 [3,2] 3 [3,2] 3 0'0 2018-04-16 22:35:48.019766 0'0 2018-04-16 22:35:48.019766 259.0 0 0 0 0 0 0 203 203 active+clean 2018-04-16 22:35:51.142057 691'203 1286:772 [4,0] 4 [4,0] 4 0'0 2018-04-16 22:35:50.052386 0'0 2018-04-16 22:35:50.052386 260.7 82 0 0 0 0 633113 1497 1497 active+clean 2018-04-16 22:35:53.141919 691'1497 1286:2084 [4,0] 4 [4,0] 4 0'0 2018-04-16 22:35:52.074742 0'0 2018-04-16 22:35:52.074742 261.6 0 0 0 0 0 0 1431 1431 active+clean 2018-04-16 22:36:32.957469 691'1431 1286:2848 [2,5] 2 [2,5] 2 0'0 2018-04-16 22:36:31.888429 0'0 2018-04-16 22:36:31.888429 262.5 39 0 0 0 0 77604866 270 270 active+clean 2018-04-16 22:36:34.964867 691'270 1286:1200 [5,4] 5 [5,4] 5 0'0 2018-04-16 22:36:33.910981 0'0 2018-04-16 22:36:33.910981 263.4 0 0 0 0 0 0 31 31 active+clean 2018-04-16 22:37:11.007253 691'31 1286:626 [4,2] 4 [4,2] 4 0'0 2018-04-16 22:37:09.938741 0'0 2018-04-16 22:37:09.938741 256.4 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.256741 0'0 1286:575 [4,2] 4 [4,2] 4 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 257.5 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:47.075484 0'0 1287:587 [0,5] 0 [0,5] 0 0'0 2018-04-16 22:35:45.980716 0'0 2018-04-16 22:35:45.980716 258.6 1 0 0 0 0 0 21 21 active+clean 2018-04-16 22:35:49.694553 691'21 1286:670 [4,1] 4 [4,1] 4 0'0 2018-04-16 22:35:48.019766 0'0 2018-04-16 22:35:48.019766 259.7 0 0 0 0 0 0 290 290 active+clean 2018-04-16 22:35:51.120606 691'290 1286:861 [4,2] 4 [4,2] 4 0'0 2018-04-16 22:35:50.052386 0'0 2018-04-16 22:35:50.052386 260.0 74 0 0 0 0 375034 913 913 active+clean 2018-04-16 22:35:53.139273 691'913 1287:1510 [3,2] 3 [3,2] 3 0'0 2018-04-16 22:35:52.074742 0'0 2018-04-16 22:35:52.074742 261.1 0 0 0 0 0 0 718 718 active+clean 2018-04-16 22:36:32.962247 691'718 1287:1705 [0,3] 0 [0,3] 0 0'0 2018-04-16 22:36:31.888429 0'0 2018-04-16 22:36:31.888429 262.2 36 0 0 0 0 100663297 407 407 active+clean 2018-04-16 22:36:34.989585 691'407 1286:1438 [4,5] 4 [4,5] 4 0'0 2018-04-16 22:36:33.910981 0'0 2018-04-16 22:36:33.910981 263.3 0 0 0 0 0 0 51 51 active+clean 2018-04-16 22:37:10.987196 691'51 1287:678 [3,0] 3 [3,0] 3 0'0 2018-04-16 22:37:09.938741 0'0 2018-04-16 22:37:09.938741 256.5 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.505720 0'0 1286:582 [1,5] 1 [1,5] 1 0'0 2018-04-16 22:35:43.957031 [132/1915] 2018-04-16 22:35:43.957031 257.4 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:47.101907 0'0 1286:580 [1,0] 1 [1,0] 1 0'0 2018-04-16 22:35:45.980716 0'0 2018-04-16 22:35:45.980716 258.7 1 0 0 0 0 0 21 21 active+clean 2018-04-16 22:35:49.104938 691'21 1286:695 [2,5] 2 [2,5] 2 0'0 2018-04-16 22:35:48.019766 0'0 2018-04-16 22:35:48.019766 259.6 0 0 0 0 0 0 184 184 active+clean 2018-04-16 22:35:51.124848 691'184 1286:774 [2,5] 2 [2,5] 2 0'0 2018-04-16 22:35:50.052386 0'0 2018-04-16 22:35:50.052386 260.1 75 0 0 0 0 176996 494 494 active+clean 2018-04-16 22:35:53.402934 691'494 1286:1093 [5,1] 5 [5,1] 5 0'0 2018-04-16 22:35:52.074742 0'0 2018-04-16 22:35:52.074742 261.0 0 0 0 0 0 0 314 314 active+clean 2018-04-16 22:36:33.003994 691'314 1286:1135 [1,0] 1 [1,0] 1 0'0 2018-04-16 22:36:31.888429 0'0 2018-04-16 22:36:31.888429 262.3 34 0 0 0 0 89333761 462 462 active+clean 2018-04-16 22:36:34.990223 691'462 1286:1551 [2,5] 2 [2,5] 2 0'0 2018-04-16 22:36:33.910981 0'0 2018-04-16 22:36:33.910981 263.2 0 0 0 0 0 0 31 31 active+clean 2018-04-16 22:37:11.002359 691'31 1286:632 [5,3] 5 [5,3] 5 0'0 2018-04-16 22:37:09.938741 0'0 2018-04-16 22:37:09.938741 256.6 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.168506 0'0 1286:582 [1,2] 1 [1,2] 1 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 257.7 2 0 0 0 0 92 2 2 active+clean 2018-04-16 22:35:47.045295 675'2 1286:645 [5,2] 5 [5,2] 5 0'0 2018-04-16 22:35:45.980716 0'0 2018-04-16 22:35:45.980716 258.4 1 0 0 0 0 0 21 21 active+clean 2018-04-16 22:35:49.647120 691'21 1286:2397 [4,1] 4 [4,1] 4 0'0 2018-04-16 22:35:48.019766 0'0 2018-04-16 22:35:48.019766 259.5 0 0 0 0 0 0 270 270 active+clean 2018-04-16 22:35:51.088929 691'270 1286:868 [5,3] 5 [5,3] 5 0'0 2018-04-16 22:35:50.052386 0'0 2018-04-16 22:35:50.052386 260.2 64 0 0 0 0 227219 575 575 active+clean 2018-04-16 22:35:53.140376 691'575 1286:1164 [4,2] 4 [4,2] 4 0'0 2018-04-16 22:35:52.074742 0'0 2018-04-16 22:35:52.074742 261.3 0 0 0 0 0 0 506 506 active+clean 2018-04-16 22:36:32.965651 691'506 1286:1485 [2,4] 2 [2,4] 2 0'0 2018-04-16 22:36:31.888429 0'0 2018-04-16 22:36:31.888429 262.0 39 0 0 0 0 90832898 193 193 active+clean 2018-04-16 22:36:34.998871 691'193 1287:1096 [0,1] 0 [0,1] 0 0'0 2018-04-16 22:36:33.910981 0'0 2018-04-16 22:36:33.910981 263.1 0 0 0 0 0 0 17 17 active+clean 2018-04-16 22:37:11.052510 691'17 1286:612 [2,4] 2 [2,4] 2 0'0 2018-04-16 22:37:09.938741 0'0 2018-04-16 22:37:09.938741 256.7 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.337793 0'0 1286:582 [1,5] 1 [1,5] 1 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 257.6 1 0 0 0 0 688 1 1 active+clean 2018-04-16 22:35:47.084158 675'1 1287:616 [0,4] 0 [0,4] 0 0'0 2018-04-16 22:35:45.980716 0'0 2018-04-16 22:35:45.980716 258.5 2 0 0 0 0 0 42 42 active+clean 2018-04-16 22:35:49.678868 691'42 1286:811 [2,1] 2 [2,1] 2 0'0 2018-04-16 22:35:48.019766 0'0 2018-04-16 22:35:48.019766 259.4 0 0 0 0 0 0 213 213 active+clean 2018-04-16 22:35:51.146734 691'213 1287:798 [0,5] 0 [0,5] 0 0'0 2018-04-16 22:35:50.052386 0'0 2018-04-16 22:35:50.052386 260.3 70 0 0 0 0 364556 889 889 active+clean 2018-04-16 22:35:53.143459 691'889 1287:1484 [3,0] 3 [3,0] 3 0'0 2018-04-16 22:35:52.074742 0'0 2018-04-16 22:35:52.074742 261.2 0 0 0 0 0 0 501 501 active+clean 2018-04-16 22:36:32.986235 691'501 1286:1471 [2,1] 2 [2,1] 2 0'0 2018-04-16 22:36:31.888429 0'0 2018-04-16 22:36:31.888429 262.1 38 0 0 0 0 97527808 418 418 active+clean 2018-04-16 22:36:34.965327 691'418 1286:1670 [5,4] 5 [5,4] 5 0'0 2018-04-16 22:36:33.910981 0'0 2018-04-16 22:36:33.910981 263.0 0 0 0 0 0 0 47 47 active+clean 2018-04-16 22:37:11.067249 691'47 1287:675 [0,5] 0 [0,5] 0 0'0 2018-04-16 22:37:09.938741 0'0 2018-04-16 22:37:09.938741 256.8 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.091931 0'0 1286:583 [5,3] 5 [5,3] 5 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 266.2 1 0 0 0 0 192 15 15 active+clean 2018-04-16 22:45:55.552424 697'15 1286:588 [5,1] 5 [5,1] 5 0'0 2018-04-16 22:45:00.222579 0'0 2018-04-16 22:45:00.222579 256.9 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.481725 0'0 1286:583 [5,0] 5 [5,0] 5 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 266.3 4 0 0 0 0 512 34 34 active+clean 2018-04-16 22:45:01.771625 762'34 1286:588 [4,2] 4 [4,2] 4 0'0 2018-04-16 22:45:00.222579 0'0 2018-04-16 22:45:00.222579 256.a 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.122117 0'0 1286:583 [5,1] 5 [5,1] 5 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 266.0 0 0 0 0 0 64 11 11 active+clean 2018-04-16 22:45:55.526427 762'11 1287:585 [3,5] 3 [3,5] 3 0'0 2018-04-16 22:45:00.222579 0'0 2018-04-16 22:45:00.222579 256.b 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.183198 0'0 1286:594 [2,3] 2 [2,3] 2 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 266.1 1 0 0 0 0 128 16 16 active+clean 2018-04-16 22:45:55.528113 762'16 1286:589 [5,0] 5 [5,0] 5 0'0 2018-04-16 22:45:00.222579 0'0 2018-04-16 22:45:00.222579 256.c 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.512242 0'0 1286:582 [1,0] 1 [1,0] 1 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 266.6 2 0 0 0 0 256 16 16 active+clean 2018-04-16 22:45:02.352268 762'16 1286:570 [4,0] 4 [4,0] 4 0'0 2018-04-16 22:45:00.222579 [66/1915] 2018-04-16 22:45:00.222579 256.d 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.349363 0'0 1286:583 [5,2] 5 [5,2] 5 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 266.7 2 0 0 0 0 256 14 14 active+clean 2018-04-16 22:45:03.166886 762'14 1286:575 [1,0] 1 [1,0] 1 0'0 2018-04-16 22:45:00.222579 0'0 2018-04-16 22:45:00.222579 256.e 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.340583 0'0 1286:575 [4,1] 4 [4,1] 4 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 266.4 0 0 0 0 0 0 2 2 active+clean 2018-04-16 22:45:02.472402 696'2 1286:564 [5,1] 5 [5,1] 5 0'0 2018-04-16 22:45:00.222579 0'0 2018-04-16 22:45:00.222579 256.f 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.085116 0'0 1287:584 [3,4] 3 [3,4] 3 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 266.5 2 0 0 0 0 256 16 16 active+clean 2018-04-16 22:45:03.312347 762'16 1287:584 [0,5] 0 [0,5] 0 0'0 2018-04-16 22:45:00.222579 0'0 2018-04-16 22:45:00.222579 256.10 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.429590 0'0 1286:594 [2,4] 2 [2,4] 2 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.11 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.164877 0'0 1287:589 [0,2] 0 [0,2] 0 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.12 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.130822 0'0 1287:589 [0,5] 0 [0,5] 0 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.13 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.479455 0'0 1286:575 [4,0] 4 [4,0] 4 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.14 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.480323 0'0 1287:584 [3,0] 3 [3,0] 3 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.15 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.165253 0'0 1287:589 [0,2] 0 [0,2] 0 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.16 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.182238 0'0 1286:594 [2,5] 2 [2,5] 2 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.17 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.480768 0'0 1287:584 [3,0] 3 [3,0] 3 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.18 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.088768 0'0 1286:583 [5,3] 5 [5,3] 5 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.19 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.473096 0'0 1287:584 [3,5] 3 [3,5] 3 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.1a 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.253275 0'0 1287:584 [3,5] 3 [3,5] 3 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.1b 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.400928 0'0 1286:583 [5,4] 5 [5,4] 5 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.1c 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.030988 0'0 1287:584 [3,1] 3 [3,1] 3 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.1d 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.246700 0'0 1287:589 [0,1] 0 [0,1] 0 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.1e 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.287396 0'0 1286:575 [4,3] 4 [4,3] 4 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.1f 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.082536 0'0 1287:584 [3,4] 3 [3,4] 3 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.20 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.120240 0'0 1286:582 [1,3] 1 [1,3] 1 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.21 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.116750 0'0 1287:589 [0,4] 0 [0,4] 0 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.22 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.237535 0'0 1286:575 [4,3] 4 [4,3] 4 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.23 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.343413 0'0 1286:575 [4,1] 4 [4,1] 4 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.24 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.244678 0'0 1286:582 [1,4] 1 [1,4] 1 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.25 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.049663 0'0 1287:584 [3,1] 3 [3,1] 3 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.26 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.250701 0'0 1286:594 [2,1] 2 [2,1] 2 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.27 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.428994 0'0 1286:594 [2,4] 2 [2,4] 2 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.28 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.429275 0'0 1286:594 [2,4] 2 [2,4] 2 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.29 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.561851 0'0 1287:589 [0,5] 0 [0,5] 0 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.2a 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.123910 0'0 1287:584 [3,4] 3 [3,4] 3 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.2b 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.169377 0'0 1286:582 [1,2] 1 [1,2] 1 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.2c 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.237189 0'0 1286:575 [4,5] 4 [4,5] 4 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.2d 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.072795 0'0 1287:589 [0,5] 0 [0,5] 0 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.2e 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.111177 0'0 1287:589 [0,4] 0 [0,4] 0 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.2f 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.131857 0'0 1287:584 [3,2] 3 [3,2] 3 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.30 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.346134 0'0 1286:583 [5,2] 5 [5,2] 5 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.31 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.123622 0'0 1286:583 [5,1] 5 [5,1] 5 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.32 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.479267 0'0 1286:575 [4,0] 4 [4,0] 4 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.33 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.088495 0'0 1286:583 [5,3] 5 [5,3] 5 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.34 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.244504 0'0 1286:594 [2,4] 2 [2,4] 2 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.35 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.481555 0'0 1286:583 [5,0] 5 [5,0] 5 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.36 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.481399 0'0 1286:583 [5,0] 5 [5,0] 5 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.37 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.191458 0'0 1286:575 [4,2] 4 [4,2] 4 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.38 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.163638 0'0 1286:582 [1,2] 1 [1,2] 1 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.39 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.240803 0'0 1286:575 [4,3] 4 [4,3] 4 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.3a 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.477799 0'0 1286:583 [5,0] 5 [5,0] 5 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.3b 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.041612 0'0 1287:584 [3,5] 3 [3,5] 3 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.3c 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.511619 0'0 1286:582 [1,0] 1 [1,0] 1 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.3d 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.130872 0'0 1286:582 [1,5] 1 [1,5] 1 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.3e 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.065925 0'0 1286:594 [2,0] 2 [2,0] 2 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 256.3f 0 0 0 0 0 0 0 0 active+clean 2018-04-16 22:35:45.064064 0'0 1287:589 [0,1] 0 [0,1] 0 0'0 2018-04-16 22:35:43.957031 0'0 2018-04-16 22:35:43.957031 3 0 0 0 0 0 0 0 0 256 0 0 0 0 0 0 0 0 2 22 0 0 0 0 2246 51 51 1 0 0 0 0 0 0 0 0 257 4 0 0 0 0 1113 4 4 258 8 0 0 0 0 0 168 168 259 0 0 0 0 0 0 3216 3216 260 583 0 0 0 0 2343830 5899 5899 261 0 0 0 0 0 0 4792 4792 262 302 0 0 0 0 756510728 2775 2775 263 0 0 0 0 0 0 268 268 266 16 0 0 0 0 2048 166 166 sum 935 0 0 0 0 758859965 17339 17339 OSD_STAT USED AVAIL TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM 5 764M 29940M 30705M [0,1,2,3,4] 63 32 4 582M 30122M 30705M [0,1,2,3,5] 50 27 3 318M 30386M 30705M [0,1,2,4,5] 42 29 2 387M 30317M 30705M [0,1,3,4,5] 48 18 1 423M 30281M 30705M [0,2,3,4,5] 46 23 0 310M 30394M 30705M [1,2,3,4,5] 53 22 sum 2786M 177G 179G
while ceph -s shows stale info with PGs from deleted pools:
# ceph -s [330/1915] cluster: id: a2375afb-2696-4d44-9d27-4f6648918d09 health: HEALTH_WARN 1 pools have pg_num > pgp_num services: mon: 3 daemons, quorum a,b,c mgr: x(active) mds: cephfs-1/1/1 up {0=a=up:active} osd: 6 osds: 6 up, 6 in data: pools: 12 pools, 151 pgs objects: 935 objects, 723 MB usage: 2786 MB used, 177 GB / 179 GB avail pgs: 151 active+clean 5 unknown 2 active+clean+snaptrim_wait 2 creating+peering 1 active+clean+snaptrim 1 creating+activating
Somehow a test rados pool has ended up with pg_num > pgp_num as well:
# ceph osd dump epoch 1288 fsid a2375afb-2696-4d44-9d27-4f6648918d09 created 2018-04-16 22:04:53.847827 modified 2018-04-16 22:56:56.037035 flags sortbitwise,recovery_deletes,purged_snapdirs crush_version 200 full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 require_min_compat_client jewel min_compat_client jewel require_osd_release luminous pool 1 'rbd' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 13 flags hashpspool stripe_width 0 application rbd pool 2 'cephfs_metadata' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 6 pgp_num 6 last_change 17 flags hashpspool stripe_width 0 application cephfs pool 3 'cephfs_data' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 6 pgp_num 6 last_change 17 flags hashpspool stripe_width 0 application cephfs pool 256 '.rgw.buckets' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 673 flags hashpspool stripe_width 0 application rgw pool 257 '.rgw.root' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 675 flags hashpspool stripe_width 0 application rgw pool 258 'default.rgw.control' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 677 flags hashpspool stripe_width 0 application rgw pool 259 'default.rgw.meta' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 679 flags hashpspool stripe_width 0 application rgw pool 260 'default.rgw.log' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 681 flags hashpspool stripe_width 0 application rgw pool 261 'default.rgw.buckets.index' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 684 flags hashpspool stripe_width 0 application rgw pool 262 'default.rgw.buckets.data' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 686 flags hashpspool stripe_width 0 application rgw pool 263 'default.rgw.buckets.non-ec' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 689 flags hashpspool stripe_width 0 application rgw pool 266 'test-rados-api-ovh086-65141-1' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 11 pgp_num 8 last_change 748 lfor 0/748 flags hashpspool stripe_width 0 application rados max_osd 6 osd.0 up in weight 1 up_from 22 up_thru 1281 down_at 21 last_clean_interval [11,20) 158.69.83.194:6801/24342 158.69.83.194:6802/24342 158.69.83.194:6803/24342 158.69.83.194:6804/24342 exists,up 79e38aa3-8e27-4f8a-badb-fa704f782e68 osd.1 up in weight 1 up_from 25 up_thru 1246 down_at 24 last_clean_interval [11,23) 158.69.83.194:6805/25792 158.69.83.194:6806/25792 158.69.83.194:6807/25792 158.69.83.194:6808/25792 exists,up 3da16be5-d48b-451d-8a22-d3500f05c796 osd.2 up in weight 1 up_from 28 up_thru 1268 down_at 27 last_clean_interval [11,26) 158.69.83.194:6809/27216 158.69.83.194:6810/27216 158.69.83.194:6811/27216 158.69.83.194:6812/27216 exists,up 2148c576-9c9b-4eda-b743-dc2f8c9a4cd8 osd.3 up in weight 1 up_from 31 up_thru 1281 down_at 30 last_clean_interval [11,29) 158.69.83.218:6800/32320 158.69.83.218:6801/32320 158.69.83.218:6802/32320 158.69.83.218:6803/32320 exists,up f40d85e7-f4cf-4342-bddd-ba79851959f0 osd.4 up in weight 1 up_from 34 up_thru 1281 down_at 33 last_clean_interval [11,32) 158.69.83.218:6804/66952 158.69.83.218:6805/66952 158.69.83.218:6806/66952 158.69.83.218:6807/66952 exists,up 6023a2d3-606a-44e8-8e55-9ee24cf55a02 osd.5 up in weight 1 up_from 37 up_thru 1281 down_at 36 last_clean_interval [11,35) 158.69.83.218:6808/67064 158.69.83.218:6809/67064 158.69.83.218:6810/67064 158.69.83.218:6811/67064 exists,up ea3b9e85-0b01-4984-9cb6-197f3b6b1bcb blacklist 158.69.83.194:6813/320171766 expires 2018-04-17 22:44:42.409268
The last 'pg dump' by teuthology shows pools stuck creating+peering and unknown, but there should be no thrashing going on here:
... {"pgid":"353.7","version":"0'0","reported_seq":"3","reported_epoch":"771","state":"creating+peering","last_fresh":"2018-04-16 22:46:17.895520","last_change":"2018-04-16 22:46:17.879857","last_active":"2018-04-16 22:46:17.826689","la st_peered":"2018-04-16 22:46:17.826689","last_clean":"2018-04-16 22:46:17.826689","last_became_active":"0.000000","last_became_peered":"0.000000","last_unstale":"2018-04-16 22:46:17.895520","last_undegraded":"2018-04-16 22:46:17.895520", "last_fullsized":"2018-04-16 22:46:17.895520","mapping_epoch":771,"log_start":"0'0","ondisk_log_start":"0'0","created":771,"last_epoch_clean":0,"parent":"0.0","parent_split_bits":0,"last_scrub":"0'0","last_scrub_stamp":"2018-04-16 22:46: 17.826689","last_deep_scrub":"0'0","last_deep_scrub_stamp":"2018-04-16 22:46:17.826689","last_clean_scrub_stamp":"2018-04-16 22:46:17.826689","log_size":0,"ondisk_log_size":0,"stats_invalid":false,"dirty_stats_invalid":false,"omap_stats_ invalid":false,"hitset_stats_invalid":false,"hitset_bytes_stats_invalid":false,"pin_stats_invalid":false,"stat_sum":{"num_bytes":0,"num_objects":0,"num_object_clones":0,"num_object_copies":0,"num_objects_missing_on_primary":0,"num_object s_missing":0,"num_objects_degraded":0,"num_objects_misplaced":0,"num_objects_unfound":0,"num_objects_dirty":0,"num_whiteouts":0,"num_read":0,"num_read_kb":0,"num_write":0,"num_write_kb":0,"num_scrub_errors":0,"num_shallow_scrub_errors":0 ,"num_deep_scrub_errors":0,"num_objects_recovered":0,"num_bytes_recovered":0,"num_keys_recovered":0,"num_objects_omap":0,"num_objects_hit_set_archive":0,"num_bytes_hit_set_archive":0,"num_flush":0,"num_flush_kb":0,"num_evict":0,"num_evic t_kb":0,"num_promote":0,"num_flush_mode_high":0,"num_flush_mode_low":0,"num_evict_mode_some":0,"num_evict_mode_full":0,"num_objects_pinned":0,"num_legacy_snapsets":0},"up":[5,1],"acting":[5,1],"blocked_by":[],"up_primary":5,"acting_prima ry":5}
Updated by Yuri Weinstein about 6 years ago
- ceph-qa-suite upgrade/luminous-x added
Updated by Kefu Chai about 6 years ago
the pgs with creating or unknown status "pg dump" were active+clean after 2018-04-16 22:47. so the output of last "pg dump" was stale.
but the pg_num of test-rados-api-ovh086-65141-1 is kind of weird: 11. it should be 8 by default.
@Yuri, is this issue reproduciable?
Updated by Josh Durgin almost 6 years ago
Yuri reproduced the bad pg_num in 1 of 2 runs:
$ date Fri Apr 20 00:29:29 UTC 2018 [ubuntu@smithi037 ~]$ sudo ceph osd dump epoch 1203 fsid e7d2fd7c-dcf5-44bf-8f03-4605e82192b2 created 2018-04-19 23:43:04.465761 modified 2018-04-20 00:28:18.092987 flags sortbitwise,recovery_deletes,purged_snapdirs crush_version 198 full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 require_min_compat_client jewel min_compat_client jewel require_osd_release luminous pool 1 'rbd' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 13 flags hashpspool stripe_width 0 application rbd pool 2 'cephfs_metadata' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 6 pgp_num 6 last_change 17 flags hashpspool stripe_width 0 application cephfs pool 3 'cephfs_data' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 6 pgp_num 6 last_change 17 flags hashpspool stripe_width 0 application cephfs pool 256 '.rgw.buckets' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 658 flags hashpspool stripe_width 0 application rgw pool 257 '.rgw.root' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 660 flags hashpspool stripe_width 0 application rgw pool 258 'default.rgw.control' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 662 flags hashpspool stripe_width 0 application rgw pool 259 'default.rgw.meta' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 664 flags hashpspool stripe_width 0 application rgw pool 260 'default.rgw.log' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 666 flags hashpspool stripe_width 0 application rgw pool 261 'default.rgw.buckets.index' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 669 flags hashpspool stripe_width 0 application rgw pool 262 'default.rgw.buckets.data' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 671 flags hashpspool stripe_width 0 application rgw pool 263 'default.rgw.buckets.non-ec' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 674 flags hashpspool stripe_width 0 application rgw pool 267 'test-rados-api-smithi106-855-1' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 11 pgp_num 8 last_change 730 lfor 0/730 flags hashpspool stripe_width 0 application rados max_osd 6 osd.0 up in weight 1 up_from 21 up_thru 1196 down_at 20 last_clean_interval [11,19) 172.21.15.37:6805/2165 172.21.15.37:6806/2165 172.21.15.37:6807/2165 172.21.15.37:6808/2165 exists,up a567f23c-930e-4ea7-8eb9-cfd7c51e0717 osd.1 up in weight 1 up_from 24 up_thru 1161 down_at 23 last_clean_interval [11,22) 172.21.15.37:6801/3613 172.21.15.37:6802/3613 172.21.15.37:6803/3613 172.21.15.37:6804/3613 exists,up e23ed5b6-4d8b-4205-848c-312fc7f954d0 osd.2 up in weight 1 up_from 28 up_thru 1183 down_at 26 last_clean_interval [11,25) 172.21.15.37:6809/5166 172.21.15.37:6810/5166 172.21.15.37:6811/5166 172.21.15.37:6812/5166 exists,up 04ee2e4c-3946-4cf0-a041-91cc491756f7 osd.3 up in weight 1 up_from 32 up_thru 1196 down_at 30 last_clean_interval [11,29) 172.21.15.170:6800/1675 172.21.15.170:6801/1675 172.21.15.170:6802/1675 172.21.15.170:6803/1675 exists,up c21c5b0c-c592-4fad-b964-0a65a05882ee osd.4 up in weight 1 up_from 35 up_thru 1196 down_at 34 last_clean_interval [11,33) 172.21.15.170:6808/1797 172.21.15.170:6809/1797 172.21.15.170:6810/1797 172.21.15.170:6811/1797 exists,up 6c0f9144-e18d-47f4-9cea-2639daa3c7d1 osd.5 up in weight 1 up_from 38 up_thru 1196 down_at 37 last_clean_interval [11,36) 172.21.15.170:6804/1919 172.21.15.170:6805/1919 172.21.15.170:6806/1919 172.21.15.170:6807/1919 exists,up 1f4f2110-f34c-49b4-9188-2941ad725c7c blacklist 172.21.15.37:6813/850180070 expires 2018-04-21 00:18:42.823676
The cluster is still up, with logs in teuthology:~yuriw/logs/2413738 - and on machines smithi037 smithi106 smithi170
Updated by Kefu Chai almost 6 years ago
i think the pg_num = 11 is set by LibRadosList.EnumerateObjects
// Ensure a non-power-of-two PG count to avoid only // touching the easy path. std::string err_str = set_pg_num(&s_cluster, pool_name, 11); ASSERT_TRUE(err_str.empty());
Updated by Kefu Chai almost 6 years ago
- Category changed from Correctness/Safety to Tests
- Status changed from New to Fix Under Review
- Assignee set to Kefu Chai
- Backport set to luminous
Updated by Kefu Chai almost 6 years ago
- Copied to Backport #23808: luminous: upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster added
Updated by Kefu Chai almost 6 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Kefu Chai almost 6 years ago
- Status changed from Pending Backport to Resolved