Project

General

Profile

Actions

Bug #42341

open

OSD PGs are not being purged

Added by Anonymous over 4 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

related ML thread: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-October/037017.html

Apparently some PGs are not being removed from my OSDs causing nearfull warnings and issues. Here are some pastes.

 53   hdd  5.00000  0.83492 9.1 TiB 8.0 TiB 8.0 TiB  44 KiB  20 GiB 1.1 TiB 87.88 1.46  22     up
ceph pg ls-by-osd osd.53
PG    OBJECTS DEGRADED MISPLACED UNFOUND BYTES        OMAP_BYTES* OMAP_KEYS* LOG  STATE        SINCE VERSION       REPORTED       UP              ACTING          SCRUB_STAMP                DEEP_SCRUB_STAMP           
1.29   282817        0         0       0 220560679815           0          0 3077 active+clean  112m 17045'2067154  17045:5289544   [30,53,70]p30   [30,53,70]p30 2019-10-15 09:03:56.868722 2019-10-10 23:32:11.354197 
1.45   282980        0         0       0 220411431493           0          0 3020 active+clean  111m 17045'2084399 17045:12137474   [94,74,53]p94   [94,74,53]p94 2019-10-16 00:16:47.701687 2019-10-14 14:41:00.249220 
1.7c   284356        0         0       0 221808097131           0          0 3090 active+clean  111m 17045'2130086  17045:5791560   [53,13,43]p53   [53,13,43]p53 2019-10-16 02:26:04.973534 2019-10-12 00:32:06.858531 
1.f2   283399        0         0       0 219650536860           0          0 3095 active+clean  112m 17045'2425457  17045:5992851   [79,53,51]p79   [79,53,51]p79 2019-10-15 14:31:11.591802 2019-10-12 12:07:52.577883 
1.112  283553        0         0       0 220898787832           0          0 3065 active+clean  112m 17045'2435370  17045:7368634  [53,94,125]p53  [53,94,125]p53 2019-10-16 02:30:00.809350 2019-10-10 18:38:39.972958 
1.18f  283314        0         0       0 221976057920           0          0 3038 active+clean  111m 17045'2071396  17045:4265103  [32,123,53]p32  [32,123,53]p32 2019-10-15 14:29:21.525562 2019-10-15 14:29:21.525562 
1.194  283196        0         0       0 220699740332           0          0 3062 active+clean  111m 17045'2154418  17045:7995518  [53,129,46]p53  [53,129,46]p53 2019-10-16 12:02:04.997934 2019-10-16 12:02:04.997934 
1.204  282956        0         0       0 219643753735           0          0 3016 active+clean  112m 17045'2124361  17045:5620074   [53,30,23]p53   [53,30,23]p53 2019-10-16 02:23:00.697768 2019-10-10 11:18:54.690067 
1.218  282564        0         0       0 220388731811           0          0 3003 active+clean  111m 17045'2501910  17045:8164540   [93,53,66]p93   [93,53,66]p93 2019-10-14 22:10:43.125142 2019-10-10 12:34:01.551560 
1.262  283197        0         0       0 220195503650           0          0 3005 active+clean  111m 17045'2432958  17045:7119552 [127,35,53]p127 [127,35,53]p127 2019-10-15 20:48:34.524771 2019-10-14 11:57:30.317359 
1.275  284192        0         0       0 222111843461           0          0 3072 active+clean  111m 17045'2069248 17045:10663804   [53,48,30]p53   [53,48,30]p53 2019-10-15 05:30:54.762916 2019-10-13 10:40:49.406416 
1.28d  283247        0         0       0 220349109540           0          0 3063 active+clean  112m 17045'2088232 17045:11133191   [53,70,92]p53   [53,70,92]p53 2019-10-16 09:33:37.863761 2019-10-15 00:24:04.291294 
1.299  284035        0         0       0 221127170854           0          0 3009 active+clean  112m 17045'2092393  17045:5311336  [34,53,121]p34  [34,53,121]p34 2019-10-15 20:13:34.120776 2019-10-15 20:13:34.120776 
1.2a9  284030        0         0       0 219894151583           0          0 3070 active+clean   43m 17045'2076905  17045:3839903   [22,39,53]p22   [22,39,53]p22 2019-10-16 13:49:03.938502 2019-10-12 03:11:02.269113 
1.2e2  283409        0         0       0 221213474940           0          0 3041 active+clean  112m 17045'2420468  17045:6531480  [53,115,17]p53  [53,115,17]p53 2019-10-15 05:29:11.461121 2019-10-11 02:05:16.059308 
1.307  282867        0         0       0 219942934336           0          0 3051 active+clean  112m 17045'2067797  17045:4819303  [53,48,133]p53  [53,48,133]p53 2019-10-16 07:45:40.928664 2019-10-14 17:46:41.089602 
1.318  284208        0         0       0 220679998910           0          0 3025 active+clean  112m 17045'2514820  17045:8351967   [72,53,23]p72   [72,53,23]p72 2019-10-15 04:59:48.008063 2019-10-15 04:59:48.008063 
1.31d  283905        0         0       0 221371232942           0          0 3016 active+clean  112m 17045'2072496 17045:12375371  [53,21,122]p53  [53,21,122]p53 2019-10-15 11:51:26.985937 2019-10-15 11:51:26.985937 
1.344  283071        0         0       0 221573920986           0          0 3020 active+clean  112m 17045'2127774  17045:6435101  [49,53,121]p49  [49,53,121]p49 2019-10-15 08:58:44.089854 2019-10-11 17:20:16.609486 
1.378  282313        0         0       0 218502615603           0          0 3067 active+clean  111m 17045'2505671  17045:9515492  [16,118,53]p16  [16,118,53]p16 2019-10-16 05:23:15.589724 2019-10-11 18:43:50.914987 
1.3ca  282369        0         0       0 218998014008           0          0 3008 active+clean  112m 17045'2454453  17045:5985515   [31,53,79]p31   [31,53,79]p31 2019-10-15 14:36:18.379442 2019-10-14 03:08:34.334118 
1.3e2  283037        0         0       0 220060081077           0          0 3009 active+clean  111m 17045'2440422  17045:6397983   [53,66,39]p53   [53,66,39]p53 2019-10-16 05:21:06.829565 2019-10-16 05:21:06.829565 

Mounted OSD:

ls -lah /mnt
total 0
drwx------ 0 root root  0 Jan  1  1970 1.112_head
drwx------ 0 root root  0 Jan  1  1970 1.178_head
drwx------ 0 root root  0 Jan  1  1970 1.18f_head
drwx------ 0 root root  0 Jan  1  1970 1.194_head
drwx------ 0 root root  0 Jan  1  1970 1.1b5_head
drwx------ 0 root root  0 Jan  1  1970 1.1c6_head
drwx------ 0 root root  0 Jan  1  1970 1.204_head
drwx------ 0 root root  0 Jan  1  1970 1.210_head
drwx------ 0 root root  0 Jan  1  1970 1.218_head
drwx------ 0 root root  0 Jan  1  1970 1.252_head
drwx------ 0 root root  0 Jan  1  1970 1.261_head
drwx------ 0 root root  0 Jan  1  1970 1.262_head
drwx------ 0 root root  0 Jan  1  1970 1.275_head
drwx------ 0 root root  0 Jan  1  1970 1.28d_head
drwx------ 0 root root  0 Jan  1  1970 1.299_head
drwx------ 0 root root  0 Jan  1  1970 1.29_head
drwx------ 0 root root  0 Jan  1  1970 1.2a9_head
drwx------ 0 root root  0 Jan  1  1970 1.2e2_head
drwx------ 0 root root  0 Jan  1  1970 1.2e7_head
drwx------ 0 root root  0 Jan  1  1970 1.2e8_head
drwx------ 0 root root  0 Jan  1  1970 1.307_head
drwx------ 0 root root  0 Jan  1  1970 1.318_head
drwx------ 0 root root  0 Jan  1  1970 1.31d_head
drwx------ 0 root root  0 Jan  1  1970 1.344_head
drwx------ 0 root root  0 Jan  1  1970 1.36e_head
drwx------ 0 root root  0 Jan  1  1970 1.378_head
drwx------ 0 root root  0 Jan  1  1970 1.3ca_head
drwx------ 0 root root  0 Jan  1  1970 1.3e1_head
drwx------ 0 root root  0 Jan  1  1970 1.3e2_head
drwx------ 0 root root  0 Jan  1  1970 1.3ff_head
drwx------ 0 root root  0 Jan  1  1970 1.45_head
drwx------ 0 root root  0 Jan  1  1970 1.4e_head
drwx------ 0 root root  0 Jan  1  1970 1.7c_head
drwx------ 0 root root  0 Jan  1  1970 1.7d_head
drwx------ 0 root root  0 Jan  1  1970 1.9c_head
drwx------ 0 root root  0 Jan  1  1970 1.b0_head
drwx------ 0 root root  0 Jan  1  1970 1.d6_head
drwx------ 0 root root  0 Jan  1  1970 1.f2_head
drwx------ 0 root root  0 Jan  1  1970 meta
-rwx------ 0 root root 10 Jan  1  1970 type

osd df and df -h
foo 9.1T 8.0T 1.1T 88% /mnt

but PG calculation only ends up at about 4.5T.

No removal/deletion entries in OSD log, otherwise OSD is quite healthy and fine.

Actions #1

Updated by Anonymous over 4 years ago

This happens during data copy or rebalance.
It's a major issue because ceph only copies data to the FULLEST OSD, for whatever reason. Reweights are ignored.

Balancer Module is off.

Weights for disk are all the same if not emergency reweighted. Everything is default, it's a fresh 14.2.4 cluster. But I see the same issue on an Luminous to Nautilus upgraded cluster, so the issue is with Nautilus or Mimic code changes.

Actions #2

Updated by Dan van der Ster over 3 years ago

Does the workaround mentioned in #43948 help?

Actions #3

Updated by Anonymous over 3 years ago

Has long been resolved, dont even remember the details anymore.

Actions

Also available in: Atom PDF