Bug #42341
openOSD PGs are not being purged
0%
Description
related ML thread: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-October/037017.html
Apparently some PGs are not being removed from my OSDs causing nearfull warnings and issues. Here are some pastes.
53 hdd 5.00000 0.83492 9.1 TiB 8.0 TiB 8.0 TiB 44 KiB 20 GiB 1.1 TiB 87.88 1.46 22 up
ceph pg ls-by-osd osd.53 PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 1.29 282817 0 0 0 220560679815 0 0 3077 active+clean 112m 17045'2067154 17045:5289544 [30,53,70]p30 [30,53,70]p30 2019-10-15 09:03:56.868722 2019-10-10 23:32:11.354197 1.45 282980 0 0 0 220411431493 0 0 3020 active+clean 111m 17045'2084399 17045:12137474 [94,74,53]p94 [94,74,53]p94 2019-10-16 00:16:47.701687 2019-10-14 14:41:00.249220 1.7c 284356 0 0 0 221808097131 0 0 3090 active+clean 111m 17045'2130086 17045:5791560 [53,13,43]p53 [53,13,43]p53 2019-10-16 02:26:04.973534 2019-10-12 00:32:06.858531 1.f2 283399 0 0 0 219650536860 0 0 3095 active+clean 112m 17045'2425457 17045:5992851 [79,53,51]p79 [79,53,51]p79 2019-10-15 14:31:11.591802 2019-10-12 12:07:52.577883 1.112 283553 0 0 0 220898787832 0 0 3065 active+clean 112m 17045'2435370 17045:7368634 [53,94,125]p53 [53,94,125]p53 2019-10-16 02:30:00.809350 2019-10-10 18:38:39.972958 1.18f 283314 0 0 0 221976057920 0 0 3038 active+clean 111m 17045'2071396 17045:4265103 [32,123,53]p32 [32,123,53]p32 2019-10-15 14:29:21.525562 2019-10-15 14:29:21.525562 1.194 283196 0 0 0 220699740332 0 0 3062 active+clean 111m 17045'2154418 17045:7995518 [53,129,46]p53 [53,129,46]p53 2019-10-16 12:02:04.997934 2019-10-16 12:02:04.997934 1.204 282956 0 0 0 219643753735 0 0 3016 active+clean 112m 17045'2124361 17045:5620074 [53,30,23]p53 [53,30,23]p53 2019-10-16 02:23:00.697768 2019-10-10 11:18:54.690067 1.218 282564 0 0 0 220388731811 0 0 3003 active+clean 111m 17045'2501910 17045:8164540 [93,53,66]p93 [93,53,66]p93 2019-10-14 22:10:43.125142 2019-10-10 12:34:01.551560 1.262 283197 0 0 0 220195503650 0 0 3005 active+clean 111m 17045'2432958 17045:7119552 [127,35,53]p127 [127,35,53]p127 2019-10-15 20:48:34.524771 2019-10-14 11:57:30.317359 1.275 284192 0 0 0 222111843461 0 0 3072 active+clean 111m 17045'2069248 17045:10663804 [53,48,30]p53 [53,48,30]p53 2019-10-15 05:30:54.762916 2019-10-13 10:40:49.406416 1.28d 283247 0 0 0 220349109540 0 0 3063 active+clean 112m 17045'2088232 17045:11133191 [53,70,92]p53 [53,70,92]p53 2019-10-16 09:33:37.863761 2019-10-15 00:24:04.291294 1.299 284035 0 0 0 221127170854 0 0 3009 active+clean 112m 17045'2092393 17045:5311336 [34,53,121]p34 [34,53,121]p34 2019-10-15 20:13:34.120776 2019-10-15 20:13:34.120776 1.2a9 284030 0 0 0 219894151583 0 0 3070 active+clean 43m 17045'2076905 17045:3839903 [22,39,53]p22 [22,39,53]p22 2019-10-16 13:49:03.938502 2019-10-12 03:11:02.269113 1.2e2 283409 0 0 0 221213474940 0 0 3041 active+clean 112m 17045'2420468 17045:6531480 [53,115,17]p53 [53,115,17]p53 2019-10-15 05:29:11.461121 2019-10-11 02:05:16.059308 1.307 282867 0 0 0 219942934336 0 0 3051 active+clean 112m 17045'2067797 17045:4819303 [53,48,133]p53 [53,48,133]p53 2019-10-16 07:45:40.928664 2019-10-14 17:46:41.089602 1.318 284208 0 0 0 220679998910 0 0 3025 active+clean 112m 17045'2514820 17045:8351967 [72,53,23]p72 [72,53,23]p72 2019-10-15 04:59:48.008063 2019-10-15 04:59:48.008063 1.31d 283905 0 0 0 221371232942 0 0 3016 active+clean 112m 17045'2072496 17045:12375371 [53,21,122]p53 [53,21,122]p53 2019-10-15 11:51:26.985937 2019-10-15 11:51:26.985937 1.344 283071 0 0 0 221573920986 0 0 3020 active+clean 112m 17045'2127774 17045:6435101 [49,53,121]p49 [49,53,121]p49 2019-10-15 08:58:44.089854 2019-10-11 17:20:16.609486 1.378 282313 0 0 0 218502615603 0 0 3067 active+clean 111m 17045'2505671 17045:9515492 [16,118,53]p16 [16,118,53]p16 2019-10-16 05:23:15.589724 2019-10-11 18:43:50.914987 1.3ca 282369 0 0 0 218998014008 0 0 3008 active+clean 112m 17045'2454453 17045:5985515 [31,53,79]p31 [31,53,79]p31 2019-10-15 14:36:18.379442 2019-10-14 03:08:34.334118 1.3e2 283037 0 0 0 220060081077 0 0 3009 active+clean 111m 17045'2440422 17045:6397983 [53,66,39]p53 [53,66,39]p53 2019-10-16 05:21:06.829565 2019-10-16 05:21:06.829565
Mounted OSD:
ls -lah /mnt total 0 drwx------ 0 root root 0 Jan 1 1970 1.112_head drwx------ 0 root root 0 Jan 1 1970 1.178_head drwx------ 0 root root 0 Jan 1 1970 1.18f_head drwx------ 0 root root 0 Jan 1 1970 1.194_head drwx------ 0 root root 0 Jan 1 1970 1.1b5_head drwx------ 0 root root 0 Jan 1 1970 1.1c6_head drwx------ 0 root root 0 Jan 1 1970 1.204_head drwx------ 0 root root 0 Jan 1 1970 1.210_head drwx------ 0 root root 0 Jan 1 1970 1.218_head drwx------ 0 root root 0 Jan 1 1970 1.252_head drwx------ 0 root root 0 Jan 1 1970 1.261_head drwx------ 0 root root 0 Jan 1 1970 1.262_head drwx------ 0 root root 0 Jan 1 1970 1.275_head drwx------ 0 root root 0 Jan 1 1970 1.28d_head drwx------ 0 root root 0 Jan 1 1970 1.299_head drwx------ 0 root root 0 Jan 1 1970 1.29_head drwx------ 0 root root 0 Jan 1 1970 1.2a9_head drwx------ 0 root root 0 Jan 1 1970 1.2e2_head drwx------ 0 root root 0 Jan 1 1970 1.2e7_head drwx------ 0 root root 0 Jan 1 1970 1.2e8_head drwx------ 0 root root 0 Jan 1 1970 1.307_head drwx------ 0 root root 0 Jan 1 1970 1.318_head drwx------ 0 root root 0 Jan 1 1970 1.31d_head drwx------ 0 root root 0 Jan 1 1970 1.344_head drwx------ 0 root root 0 Jan 1 1970 1.36e_head drwx------ 0 root root 0 Jan 1 1970 1.378_head drwx------ 0 root root 0 Jan 1 1970 1.3ca_head drwx------ 0 root root 0 Jan 1 1970 1.3e1_head drwx------ 0 root root 0 Jan 1 1970 1.3e2_head drwx------ 0 root root 0 Jan 1 1970 1.3ff_head drwx------ 0 root root 0 Jan 1 1970 1.45_head drwx------ 0 root root 0 Jan 1 1970 1.4e_head drwx------ 0 root root 0 Jan 1 1970 1.7c_head drwx------ 0 root root 0 Jan 1 1970 1.7d_head drwx------ 0 root root 0 Jan 1 1970 1.9c_head drwx------ 0 root root 0 Jan 1 1970 1.b0_head drwx------ 0 root root 0 Jan 1 1970 1.d6_head drwx------ 0 root root 0 Jan 1 1970 1.f2_head drwx------ 0 root root 0 Jan 1 1970 meta -rwx------ 0 root root 10 Jan 1 1970 type
osd df and df -h
foo 9.1T 8.0T 1.1T 88% /mnt
but PG calculation only ends up at about 4.5T.
No removal/deletion entries in OSD log, otherwise OSD is quite healthy and fine.
Updated by Anonymous over 4 years ago
This happens during data copy or rebalance.
It's a major issue because ceph only copies data to the FULLEST OSD, for whatever reason. Reweights are ignored.
Balancer Module is off.
Weights for disk are all the same if not emergency reweighted. Everything is default, it's a fresh 14.2.4 cluster. But I see the same issue on an Luminous to Nautilus upgraded cluster, so the issue is with Nautilus or Mimic code changes.
Updated by Dan van der Ster over 3 years ago
Does the workaround mentioned in #43948 help?
Updated by Anonymous over 3 years ago
Has long been resolved, dont even remember the details anymore.