Actions
Bug #21331
closedpg recovery priority inversion
% Done:
0%
Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
32387/832388 n=10607 ec=1/1 lis/c 832387/827296 les/c/f 832388/827299/819533 832378/832387/832379) [92,18,26]/[92,18,16] r=0 lpr=832387 pi=[827296,832387)/3 bft=26 crt=832403'3721335 mlcod 0'0 active+recovery_wait+degraded+remapped m=26] execute_ctx 0:6c035d59:::10033c72719.00000000:head [create,setxattr parent (36 32387/832388 n=10607 ec=1/1 lis/c 832387/827296 les/c/f 832388/827299/819533 832378/832387/832379) [92,18,26]/[92,18,16] r=0 lpr=832387 pi=[827296,832387)/3 bft=26 crt=832403'3721335 mlcod 0'0 active+recovery_wait+degraded+remapped m=26] do_osd_op 0:6c035d59:::10033c72719.00000000:head [create,setxattr parent (369) 32387/832388 n=10607 ec=1/1 lis/c 832387/827296 les/c/f 832388/827299/819533 832378/832387/832379) [92,18,26]/[92,18,16] r=0 lpr=832387 pi=[827296,832387)/3 bft=26 crt=832403'3721335 mlcod 0'0 active+recovery_wait+degraded+remapped m=26] do_osd_op create 32387/832388 n=10607 ec=1/1 lis/c 832387/827296 les/c/f 832388/827299/819533 832378/832387/832379) [92,18,26]/[92,18,16] r=0 lpr=832387 pi=[827296,832387)/3 bft=26 crt=832403'3721335 mlcod 0'0 active+recovery_wait+degraded+remapped m=26] do_osd_op setxattr parent (369) 32387/832388 n=10607 ec=1/1 lis/c 832387/827296 les/c/f 832388/827299/819533 832378/832387/832379) [92,18,26]/[92,18,16] r=0 lpr=832387 pi=[827296,832387)/3 bft=26 crt=832403'3721335 mlcod 0'0 active+recovery_wait+degraded+remapped m=26] do_osd_op setxattr layout (30) 32387/832388 n=10607 ec=1/1 lis/c 832387/827296 les/c/f 832388/827299/819533 832378/832387/832379) [92,18,26]/[92,18,16] r=0 lpr=832387 pi=[827296,832387)/3 bft=26 crt=832403'3721335 mlcod 0'0 active+recovery_wait+degraded+remapped m=26] using newer snapc 1=[] 32387/832388 n=10607 ec=1/1 lis/c 832387/827296 les/c/f 832388/827299/819533 832378/832387/832379) [92,18,26]/[92,18,16] r=0 lpr=832387 pi=[827296,832387)/3 bft=26 crt=832403'3721335 mlcod 0'0 active+recovery_wait+degraded+remapped m=26] set mtime to 2017-09-09 17:48:37.925036 32387/832388 n=10607 ec=1/1 lis/c 832387/827296 les/c/f 832388/827299/819533 832378/832387/832379) [92,18,26]/[92,18,16] r=0 lpr=832387 pi=[827296,832387)/3 bft=26 crt=832403'3721335 mlcod 0'0 active+recovery_wait+degraded+remapped m=26] final snapset 1=[]:{} in 0:6c035d59:::10033c72719.00000000:head 32387/832388 n=10607 ec=1/1 lis/c 832387/827296 les/c/f 832388/827299/819533 832378/832387/832379) [92,18,26]/[92,18,16] r=0 lpr=832387 pi=[827296,832387)/3 bft=26 crt=832403'3721335 mlcod 0'0 active+recovery_wait+degraded+remapped m=26] new_repop rep_tid 2041 on osd_op(mds.0.308850:3887948 0.36 0:6c035d59:::10033c
notably does not trim in calc_trim_to. that's because mlcod = 0'0, and
<prE>
eversion_t limit = MIN(
min_last_complete_ondisk,
pg_log.get_can_rollback_to());
if (limit != eversion_t() &&
limit != pg_trim_to &&
pg_log.get_log().approx_size() > target) {
this pg has 54,000 entries! and is eating all the ram on the box.
Updated by Sage Weil over 6 years ago
it looks lke peer_last_commit_ondisk for osd.26 isn't getting updated since it is not in acting (it's backfill target), and we aren't backfilling:
"up": [ 92, 18, 26 ], "acting": [ 92, 18 ], "backfill_targets": [ "26" ], "actingbackfill": [ "18", "26", "92" ],
and
"state": "active+recovery_wait+undersized+degraded+remapped",
the peer info last_complete matches last_update,
{ "peer": "26", "pgid": "0.36", "last_update": "832429'3721336", "last_complete": "832429'3721336",
Updated by Sage Weil over 6 years ago
Actually, this isn't quite right.
The real problem is that the primary has an ancient last_complete, because it is in recovery_wait, and it is prioritized behind backfills that take forever.
Updated by Sage Weil over 6 years ago
Every 2.0s: ceph pg dump | sort -k 9 -r -n Sun Sep 10 00:39:55 2017 dumped all sum 45550500 20 9651113 9742985 0 65275054173865 15707932 15707932 0 43745950 20 9154984 9157687 0 65271957084386 15350175 15350175 1 1803795 0 496129 585298 0 728765188 354060 354060 0.70b 10593 0 245 10505 0 15725662580 95270 95270 active+recovery_wait+degraded+remapped 2017-09-09 19:59:22.472629 833302'3751117 833303:4957628 [80,0,69] 80 [80,69,61] 80 827049'3614209 2017-09-07 13:15:12.396269 827032'3547250 2017-09-04 19:29:05.528322 0.cc7 10770 0 130 10764 0 15884739011 42096 42096 active+recovery_wait+degraded+remapped 2017-09-09 19:59:22.199903 833302'3441196 833303:4857777 [3,81,86] 3 [81,86,39] 81 827049'3330105 2017-09-07 01:37:13.997576 827049'3330105 2017-09-07 01:37:13.997576 0.943 10677 0 177 10619 0 15862478339 38461 38461 active+recovery_wait+degraded+remapped 2017-09-09 19:59:27.757117 833302'3750408 833303:4761967 [74,2,12] 74 [74,12,39] 74 827049'3666300 2017-09-07 07:44:30.585682 827049'3666300 2017-09-07 07:44:30.585682 0.52e 10736 0 135 10728 0 15950525996 35868 35868 active+recovery_wait+degraded+remapped 2017-09-09 19:59:26.259678 833302'3668952 833303:4804051 [86,118,0] 86 [86,118,1] 86 827045'3468140 2017-09-06 07:21:58.018293 827032'3460139 2017-09-04 17:40:00.982576 0.aaa 10759 0 174 10715 0 16379572804 35086 35086 active+recovery_wait+degraded+remapped 2017-09-09 19:59:40.328664 833294'3643005 833303:5032293 [23,88,4] 23 [4,88,16] 4 827216'3555739 2017-09-07 17:36:34.743211 827044'3484483 2017-09-05 22:08:32.322834 0.92e 10680 0 230 10606 0 16137890605 33170 33170 active+recovery_wait+degraded+remapped 2017-09-09 19:59:26.604688 833302'3367288 833302:4469626 [55,14,0] 55 [55,14,58] 55 827048'3268719 2017-09-06 16:23:48.155945 827031'3258244 2017-09-03 23:40:56.965737 0.4ce 10738 0 243 10647 0 15928569742 31505 31505 active+recovery_wait+degraded+remapped 2017-09-09 19:59:38.363924 833302'3620237 833303:4680956 [37,12,23] 37 [37,12,21] 37 827049'3497915 2017-09-07 02:51:24.947311 827049'3497915 2017-09-07 02:51:24.947311 0.902 10616 0 232 10525 0 15873498590 29006 29006 active+recovery_wait+degraded+remapped 2017-09-09 19:59:32.642511 833304'3393873 833304:4619034 [3,65,126] 3 [65,126,21] 65 827212'3270492 2017-09-07 15:36:53.468499 827031'3123446 2017-09-03 22:32:27.540699 0.c36 10793 0 189 10748 0 16144758849 28721 28721 active+recovery_wait+degraded+remapped 2017-09-09 19:59:37.652769 833302'3333383 833303:4600623 [88,80,13] 88 [88,80,16] 88 827157'3231330 2017-09-07 15:11:29.356817 827044'3188787 2017-09-06 01:48:42.700501 0.a08 10609 0 197 10529 0 15880133106 27685 27685 active+recovery_wait+degraded+remapped 2017-09-09 19:59:41.272426 833304'3373497 833304:4741143 [73,7,80] 73 [7,80,21] 7 827046'3310444 2017-09-06 12:02:47.620633 826961'3254016 2017-09-02 18:30:11.520031 0.45d 10663 0 0 10185 0 16098153003 26356 26356 active+recovery_wait+degraded+remapped 2017-09-09 19:59:30.817526 833302'3317660 833302:4466927 [69,88,26] 69 [69,88,16] 69 827049'3176390 2017-09-06 21:39:57.724289 826974'3171907 2017-09-03 00:10:08.962798 0.9fd 10478 0 281 10314 0 15550854041 25556 25556 active+recovery_wait+degraded+remapped 2017-09-09 19:59:24.585437 833294'3776002 833303:4784853 [81,14,73] 81 [81,14,39] 81 827049'3649706 2017-09-07 10:22:20.157062 827044'3622946 2017-09-06 03:40:46.832134 0.444 10773 0 157 10746 0 16263547569 25341 25341 active+recovery_wait+degraded+remapped 2017-09-09 19:59:35.306642 833304'3915544 833304:4578979 [82,3,92] 82 [82,92,16] 82 827046'3735667 2017-09-06 09:51:40.545540 827032'3726546 2017-09-05 04:13:06.035978 0.b92 10527 0 252 10422 0 15506757288 24602 24602 active+recovery_wait+degraded+remapped 2017-09-09 19:59:35.635577 833304'3407720 833304:4753268 [64,26,72] 64 [64,72,16] 64 827049'3326503 2017-09-07 06:01:47.559311 827044'3300304 2017-09-05 23:53:55.231309 0.61f 10644 0 237 10561 0 16247319772 24293 24293 active+recovery_wait+degraded+remapped 2017-09-09 19:59:29.445424 833302'3824547 833302:4991671 [2,28,27] 2 [27,28,21] 27 827046'3728077 2017-09-06 14:47:42.234617 826949'3603169 2017-08-31 19:19:04.440645 0.d5a 10611 0 212 10530 0 15713082461 23340 23340 active+recovery_wait+degraded+remapped 2017-09-09 19:59:37.386322 833302'3544887 833303:4967467 [13,57,4] 13 [4,57,21] 4 827254'3408556 2017-09-07 19:00:12.034747 826974'3299325 2017-09-03 03:07:44.889349 0.11b 10624 0 152 10614 0 15868796961 22623 22623 active+recovery_wait+degraded+remapped 2017-09-09 19:59:34.847390 833302'3372294 833303:4458376 [18,72,73] 18 [18,72,63] 18 827049'3300020 2017-09-07 12:08:19.354375 827049'3300020 2017-09-07 12:08:19.354375 0.4f 10600 0 119 10592 0 16058693765 22323 22323 active+recovery_wait+degraded+remapped 2017-09-09 19:59:22.150608 833302'3294469 833303:4707045 [36,26,87] 36 [36,87,58] 36 827049'3242468 2017-09-06 16:30:46.361715 826949'3145239 2017-08-31 22:24:59.386017 0.df9 10730 0 181 10725 0 15905103992 22047 22047 active+recovery_wait+degraded+remapped 2017-09-09 19:59:20.828986 833302'3322425 833303:4716496 [0,68,65] 0 [68,65,39] 68 827049'3208000 2017-09-07 02:00:44.465320 827031'3154642 2017-09-04 10:06:02.669626 0.b5e 10585 0 157 10560 0 15694156896 21939 21939 active+recovery_wait+degraded+remapped 2017-09-09 19:59:41.919490 833302'3317632 833303:4516003 [73,12,92] 73 [92,12,16] 92 827214'3246835 2017-09-07 16:46:45.341690 827044'3186624 2017-09-06 00:44:47.482618 0.832 10597 0 226 10511 0 15810801929 21366 21366 active+recovery_wait+degraded+remapped 2017-09-09 19:59:26.262124 833302'3124730 833303:3844651 [37,74,13] 37 [37,74,61] 37 827049'3075918 2017-09-07 10:18:26.314923 827045'3053604 2017-09-06 05:06:08.366571 0.e5a 10708 0 177 10701 0 16112566856 20980 20980 active+recovery_wait+degraded+remapped 2017-09-09 19:59:25.815887 833304'3172251 833304:4548005 [86,49,13] 86 [86,49,63] 86 827044'3070590 2017-09-06 01:01:22.269798 826961'3053074 2017-09-01 20:44:11.002632 0.cf8 10545 0 221 10469 0 15677028730 20958 20958 active+recovery_wait+degraded+remapped 2017-09-09 19:59:36.912722 833302'3567639 833303:4985499 [87,13,14] 87 [87,14,16] 87 827044'3545021 2017-09-06 03:27:25.193698 827044'3545021 2017-09-06 03:27:25.193698 0.3b3 10465 0 164 10432 0 15716760395 20734 20734 active+recovery_wait+degraded+remapped 2017-09-09 19:59:25.985937 833302'3692942 833303:4939972 [2,76,92] 2 [76,92,16] 76 827046'3561082 2017-09-06 11:01:58.331164 827031'3557366 2017-09-04 03:56:11.424728 0.f42 10631 0 233 10544 0 15687177981 20235 20235 active+recovery_wait+degraded+remapped 2017-09-09 19:59:29.825357 833302'3323008 833303:4596713 [2,64,74] 2 [64,74,16] 64 827046'3210539 2017-09-06 11:42:08.829265 826961'3201394 2017-09-02 03:53:34.399678 0.ea3 10673 0 234 10603 0 16262438334 20076 20076 active+recovery_wait+degraded+remapped 2017-09-09 19:59:37.912744 833302'3501374 833303:4229681 [65,2,7] 65 [65,7,21] 65 827049'3388083 2017-09-07 01:24:42.989196 827049'3388083 2017-09-07 01:24:42.989196 0.112 10535 0 0 1372 0 15379251659 20024 20024 active+recovery_wait+degraded+remapped 2017-09-09 19:59:38.410230 833298'3256304 833303:4532353 [78,13,7] 78 [78,7,16] 78 827049'3179335 2017-09-07 12:39:59.424373 827031'3004530 2017-09-03 15:14:18.092065 0.a04 10634 0 236 10536 0 16013799825 19990 19990 active+recovery_wait+degraded+remapped 2017-09-09 19:59:26.467989 833302'3602740 833302:4995717 [26,12,72] 26 [72,12,39] 72 827046'3433525 2017-09-06 12:27:32.299217 826948'3306706 2017-08-31 11:37:21.026743 0.569 10880 0 255 10777 0 16630657275 19757 19757 active+recovery_wait+degraded+remapped 2017-09-09 19:59:27.125163 833302'3836654 833303:5015974 [0,127,38] 0 [38,127,16] 38 827049'3778516 2017-09-07 10:10:02.308094 827049'3778516 2017-09-07 10:10:02.308094 0.e02 10633 0 170 10632 0 15800551075 19681 19681 active+recovery_wait+degraded+remapped 2017-09-09 19:59:19.974019 833296'3887885 833303:5170175 [26,78,37] 26 [78,37,63] 78 827049'3808533 2017-09-06 19:48:34.608093 827049'3808533 2017-09-06 19:48:34.608093 0.55f 10551 0 204 10472 0 16005710365 19677 19677 active+recovery_wait+degraded+remapped 2017-09-09 19:59:34.492007 833302'3843720 833303:4920786 [10,23,72] 10 [10,72,21] 10 827049'3740823 2017-09-07 12:16:06.374705 827044'3700959 2017-09-06 02:18:07.146949 0.66e 10676 0 155 10650 0 16315958135 19650 19650 active+recovery_wait+degraded+remapped 2017-09-09 19:59:36.598138 833302'3345479 833303:4722157 [44,2,87] 44 [44,87,21] 44 827049'3217768 2017-09-06 21:41:50.211526 827032'3202921 2017-09-04 11:38:23.217519 0.516 10704 0 144 10691 0 15942599865 19527 19527 active+recovery_wait+degraded+remapped 2017-09-09 19:59:36.784746 833302'3890669 833303:5016703 [13,12,85] 13 [12,85,21] 12 827045'3731023 2017-09-06 07:37:13.591157 827032'3721082 2017-09-04 21:58:47.408326 0.a99 10475 0 192 10402 0 15482148590 18601 18601 active+recovery_wait+degraded+remapped 2017-09-09 19:59:39.394797 833298'3576867 833303:4644812 [75,73,22] 75 [75,22,21] 75 827049'3533656 2017-09-07 03:48:01.620665 826961'3489050 2017-09-02 04:54:50.212676 0.c7e 10630 0 140 10624 0 15863216162 18442 18442 active+recovery_wait+degraded+remapped 2017-09-09 19:59:27.229278 833298'3442566 833303:4615082 [67,8,73] 67 [67,8,63] 67 827049'3393659 2017-09-07 11:22:59.195776 827031'3278657 2017-09-03 10:27:19.049459 0.29d 10577 0 167 10572 0 15486023766 18407 18407 active+recovery_wait+degraded+remapped 2017-09-09 19:59:36.079278 833302'3018278 833303:4265821 [85,83,3] 85 [85,83,21] 85 827045'2913307 2017-09-06 07:22:59.680943 826949'2818578 2017-09-01 00:33:11.980807 0.bf 10814 0 245 10724 0 16134801877 18365 18365 active+recovery_wait+degraded+remapped 2017-09-09 19:59:27.780406 833302'3758150 833303:4810283 [54,13,12] 54 [54,12,58] 54 827046'3672809 2017-09-06 12:38:17.275954 827046'3672809 2017-09-06 12:38:17.275954 0.70a 10879 0 210 10802 0 16304665542 17734 17734 active+recovery_wait+degraded+remapped 2017-09-09 19:59:32.664614 833294'3585869 833303:4530756 [123,1,0] 123 [123,1,21] 123 827049'3480220 2017-09-07 09:54:24.770979 827049'3480220 2017-09-07 09:54:24.770979 0.179 10848 0 170 10825 0 16239330579 17675 17675 active+recovery_wait+degraded+remapped 2017-09-09 19:59:30.059594 833302'3200974 833302:4464954 [90,26,9] 90 [90,9,61] 90 827046'3065091 2017-09-06 13:10:14.612719 826947'2921834 2017-08-31 11:07:04.193592 0.1f5 10636 0 137 10627 0 16341173209 17501 17501 active+recovery_wait+degraded+remapped 2017-09-09 19:59:27.726357 833282'3917749 833303:5312050 [73,80,27] 73 [80,27,63] 80 827049'3805168 2017-09-06 20:48:53.216754 826948'3637330 2017-08-31 18:41:03.572123 0.922 10787 0 255 10684 0 16213696316 17301 17301 active+recovery_wait+degraded+remapped 2017-09-09 19:59:31.147799 833302'3709168 833303:4569542 [18,73,54] 18 [18,54,61] 18 827049'3665744 2017-09-06 19:07:37.225860 827049'3665744 2017-09-06 19:07:37.225860 0.790 10717 0 176 10695 0 16085501398 17222 17222 active+recovery_wait+degraded+remapped 2017-09-09 19:59:24.549833 833302'3636199 833302:4638291 [3,91,55] 3 [55,91,21] 55 827049'3551647 2017-09-07 06:30:50.551217 827049'3551647 2017-09-07 06:30:50.551217 0.f00 10654 0 219 10582 0 15852930963 17012 17012 active+recovery_wait+degraded+remapped 2017-09-09 19:59:28.213869 833300'3528802 833303:4856770 [74,2,7] 74 [74,7,39] 74 827049'3446844 2017-09-06 23:51:09.563708 827032'3441163 2017-09-04 11:47:53.146352 0.13a 10805 0 130 10795 0 16275357737 16994 16994 active+recovery_wait+degraded+remapped 2017-09-09 19:59:32.361925 833302'3553539 833303:4748318 [0,71,9] 0 [71,9,63] 71 827216'3521409 2017-09-07 17:42:42.457373 826974'3346887 2017-09-03 05:04:59.572873 0.5fc 10734 0 249 10642 0 16598177230 16904 16904 active+recovery_wait+degraded+remapped 2017-09-09 19:59:26.792549 833303'3613319 833303:4621328 [69,86,2] 69 [69,86,16] 69 827049'3554170 2017-09-07 10:54:43.536665 826961'3462137 2017-09-01 20:00:31.110685 0.38d 10652 0 179 10618 0 15883070617 16667 16667 active+recovery_wait+degraded+remapped 2017-09-09 19:59:34.956318 833302'3573286 833303:4730071 [86,66,2] 86 [86,66,21] 86 827044'3477117 2017-09-05 21:51:29.628451 827032'3467067 2017-09-04 16:55:14.044461 0.762 10751 0 217 10670 0 15833396341 15767 15767 active+recovery_wait+degraded+remapped 2017-09-09 19:59:29.363695 833302'3643403 833303:4439600 [44,26,18] 44 [44,18,58] 44 827046'3575758 2017-09-06 13:36:53.894145 827037'3573783 2017-09-05 06:23:46.659999 0.17f 10640 0 209 10552 0 15845344084 15739 15739 active+recovery_wait+degraded+remapped 2017-09-09 19:59:28.337027 833294'3520864 833303:4810800 [57,26,79] 57 [57,79,16] 57 827046'3451914 2017-09-06 11:00:05.305222 826905'3324791 2017-08-31 05:01:13.747840 0.722 10519 0 235 10425 0 15594727416 15289 15289 active+recovery_wait+degraded+remapped 2017-09-09 19:59:25.893809 833296'3366065 833303:4758038 [37,13,52] 37 [37,52,61] 37 827049'3282519 2017-09-07 02:39:56.577494 827044'3277451 2017-09-05 21:45:17.200006 0.9c0 10594 0 149 10584 0 15881663550 15249 15249 active+recovery_wait+degraded+remapped 2017-09-09 19:59:34.485062 833302'3569622 833303:4734351 [85,74,0] 85 [85,74,21] 85 827046'3463055 2017-09-06 10:16:38.577063 827046'3463055 2017-09-06 10:16:38.577063 0.a5e 10798 0 246 10678 0 16165122510 14816 14816 active+recovery_wait+degraded+remapped 2017-09-09 19:59:24.740378 833290'3535234 833303:4919012 [13,47,27] 13 [47,27,39] 47 827046'3484379 2017-09-06 12:01:16.783422 827046'3484379 2017-09-06 12:01:16.783422 0.eee 10596 0 235 10506 0 16044617744 14188 14188 active+recovery_wait+degraded+remapped 2017-09-09 19:59:28.984580 833302'3438271 833303:4393250 [24,27,73] 24 [24,27,16] 24 827046'3349187 2017-09-06 15:30:02.160142 827037'3324448 2017-09-05 12:37:58.271070 0.ea2 10806 0 131 10800 0 16255613624 13630 13630 active+recovery_wait+degraded+remapped 2017-09-09 19:59:34.979282 833298'3550497 833303:4643900 [10,18,73] 10 [10,18,63] 10 827049'3486677 2017-09-06 16:56:31.368640 826941'3346779 2017-08-31 09:07:35.080404 0.d4a 10969 0 221 10920 0 16310759119 13395 13395 active+recovery_wait+degraded+remapped 2017-09-09 19:59:37.260946 833302'3572684 833303:4697029 [85,2,37] 85 [85,37,16] 85 827046'3483009 2017-09-06 13:13:31.522250 827046'3483009 2017-09-06 13:13:31.522250 0.326 10734 0 162 10712 0 16241669113 12637 12637 active+recovery_wait+degraded+remapped 2017-09-09 19:59:29.688602 833302'3688707 833303:4927076 [86,24,0] 86 [86,24,61] 86 827046'3611428 2017-09-06 12:22:09.884754 826205'3422424 2017-08-30 15:31:11.760783 0.527 10727 0 246 10635 0 16079562562 12636 12636 active+recovery_wait+degraded+remapped 2017-09-09 19:59:39.418691 833302'3708274 833302:5335339 [55,10,13] 55 [55,10,21] 55 827046'3571290 2017-09-06 14:43:25.814125 826340'3440177 2017-08-30 17:55:12.476191 0.174 10894 0 234 10821 0 16427577906 12523 12523 active+recovery_wait+degraded+remapped 2017-09-09 19:59:20.151760 833302'3335191 833303:4422773 [26,80,49] 26 [80,49,58] 80 827049'3304622 2017-09-06 21:31:16.897195 826948'3224052 2017-08-31 15:51:10.061613 0.214 10754 0 344 10551 0 15848213511 12345 12345 active+recovery_wait+degraded+remapped 2017-09-09 19:59:27.059585 833302'3288352 833303:4535546 [18,126,73] 18 [18,126,39] 18 827049'3241852 2017-09-07 04:13:51.608020 827049'3241852 2017-09-07 04:13:51.608020 0.921 10727 0 225 10635 0 15970415790 10926 10926 active+recovery_wait+degraded+remapped 2017-09-09 19:59:30.648425 833302'4050011 833302:5329965 [34,126,23] 34 [34,126,16] 34 827049'3996172 2017-09-07 10:32:54.567461 827032'3942656 2017-09-04 13:47:18.040718 0.a65 10750 0 138 10745 0 16534988186 10908 10908 active+recovery_wait+degraded+remapped 2017-09-09 19:59:26.186671 833304'3668205 833304:4861656 [92,23,28] 92 [92,28,39] 92 827046'3536746 2017-09-06 14:00:55.992456 827032'3531584 2017-09-05 04:00:02.712845 0.240 10804 0 186 10791 0 16146428696 10203 10203 active+recovery_wait+degraded+remapped 2017-09-09 19:59:30.576625 833302'3681339 833302:4681889 [26,91,66] 26 [66,91,21] 66 827045'3488797 2017-09-06 04:32:28.044557 826923'3404384 2017-08-31 07:02:10.438613 0.354 10702 0 173 10675 0 15431796310 10182 10182 active+recovery_wait+degraded+remapped 2017-09-09 19:59:31.127433 833302'3237879 833302:4256393 [66,92,23] 66 [66,92,16] 66 827046'3174218 2017-09-06 15:48:06.913172 827031'3158225 2017-09-04 09:34:28.495865 0.512 10620 0 237 10517 0 16070207959 10154 10154 active+recovery_wait+degraded+remapped 2017-09-09 19:59:41.085735 833294'4189589 833303:5430286 [1,0,86] 1 [1,86,16] 1 827049'4027505 2017-09-06 17:15:16.166582 826441'3836069 2017-08-30 20:08:27.699655 0.b7b 10790 0 10786 10712 0 15832506785 10148 10148 active+recovery_wait+undersized+degraded+remapped 2017-09-09 19:59:36.949896 833304'3473183 833304:4634855 [1,0,19] 1 [1,21] 1 827049'3393728 2017-09-06 16:47:30.128239 827037'3391731 2017-09-05 11:49:00.446243 0.e10 10670 0 10721 0 0 15860765136 10100 10100 active+undersized+degraded+remapped+backfill_wait 2017-09-09 19:59:23.918310 833302'3399041 833303:4398319 [24,19,64] 24 [24,64] 24 827214'3314112 2017-09-07 16:45:25.530227 826949'3167695 2017-08-31 21:08:38.250720 0.9b2 10859 0 10929 0 0 15975049643 10100 10100 active+undersized+degraded+remapped+backfill_wait 2017-09-09 19:59:26.502221 833302'3656929 833303:4484313 [1,15,34] 1 [1,34] 1 827049'3591687 2017-09-06 23:45:13.672278 826974'3499516 2017-09-03 02:56:21.554506 0.550 10684 0 123 10684 0 15734743347 10100 10100 active+degraded+remapped+backfill_wait 2017-09-09 22:01:26.947805 833302'3523235 833303:4451255 [123,69,26] 123 [123,69,63] 123 827049'3437773 2017-09-07 13:08:27.619006 827049'3437773 2017-09-07 13:08:27.619006 0.124 10669 0 10749 0 0 16185101353 10100 10100 active+undersized+degraded+remapped+backfill_wait 2017-09-09 19:59:23.124036 833302'3703924 833303:4816640 [37,76,26] 37 [37,76] 37 827049'3606032 2017-09-07 08:08:32.135760 827032'3558477 2017-09-04 18:17:28.262969 0.e3a 10739 0 10793 10739 0 16253620047 10099 10099 active+undersized+degraded+remapped+backfill_wait 2017-09-09 19:59:28.194934 833302'3736309 833302:4890298 [2,15,72] 2 [72,63] 72 827049'3592574 2017-09-06 20:16:43.894841 826949'3478916 2017-08-31 20:37:52.811838 0.887 10414 0 2816 0 0 15614065866 10099 10099 active+undersized+degraded+remapped+backfill_wait 2017-09-09 19:59:26.192917 833302'3914305 833303:5296547 [92,22,6] 92 [92,22] 92 827045'3848572 2017-09-06 07:49:20.592645 827045'3848572 2017-09-06 07:49:20.592645 0.e47 10499 0 110 10499 0 15782413239 10098 10098 active+remapped+backfill_wait 2017-09-09 19:59:22.137933 833302'3449783 833303:4553697 [65,126,3] 65 [65,126,61] 65 827046'3309913 2017-09-06 12:53:04.498618 827031'3226276 2017-09-03 22:22:24.853794 0.e35 10706 0 10788 0 0 15780722548 10098 10098 active+undersized+degraded+remapped+backfill_wait 2017-09-09 19:59:20.629337 833302'3439415 833303:4670932 [17,80,127] 17 [80,127] 80 827049'3361408 2017-09-07 10:33:03.925504 827044'3335615 2017-09-06 02:02:56.409389 0.b65 10757 0 10822 0 0 15848308168 10098 10098 active+undersized+degraded+remapped+backfill_wait 2017-09-09 19:59:26.081532 833288'3528799 833303:4551462 [15,36,12] 15 [36,12] 36 827049'3443149 2017-09-07 08:43:15.840851 827044'3400136 2017-09-05 23:10:55.214032
column 9 is the pg log length. note that the backfill_wait ones have big logs.
workaround to make them recover and trim is
ceph osd set nobackfill ceph tell osd.* config set osd_max_backfills 500
although i think if you set osd_max_backfills back to 1 and unset nobackfill they will all backfill in parallel?
Updated by Sage Weil over 6 years ago
- Subject changed from pg log not getting trimmed while degraded to pg recovery priority inversion
- Priority changed from Immediate to Urgent
Updated by Sage Weil over 6 years ago
- Status changed from 12 to Resolved
- Backport set to luminous
https://github.com/ceph/ceph/pull/18025 is luminous backport
Updated by Sage Weil over 6 years ago
- Related to Bug #21761: ceph-osd consumes way too much memory during recovery added
Actions