Actions
Bug #58460
openLRC cluster: cluster is in HEALTH_ERR
Status:
New
Priority:
Normal
Assignee:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
- cluster status :
cluster: id: 28f7427e-5558-4ffd-ae1a-51ec3042759a health: HEALTH_ERR full ratio(s) out of order Low space hindering backfill (add storage if this doesn't resolve itself): 21 pgs backfill_toofull Degraded data redundancy: 137452/498931805 objects degraded (0.028%), 21 pgs degraded, 21 pgs undersized services: mon: 5 daemons, quorum reesi003,reesi002,reesi001,ivan02,ivan01 (age 9d) mgr: reesi006.erytot(active, since 7d), standbys: reesi005.xxyjcw, reesi004.tplfrt mds: 4/4 daemons up, 5 standby, 1 hot standby osd: 166 osds: 166 up (since 41h), 165 in (since 15h); 67 remapped pgs rgw: 2 daemons active (2 hosts, 1 zones) data: volumes: 4/4 healthy pools: 24 pools, 2965 pgs objects: 104.01M objects, 118 TiB usage: 207 TiB used, 850 TiB / 1.0 PiB avail pgs: 137452/498931805 objects degraded (0.028%) 2768577/498931805 objects misplaced (0.555%) 2898 active+clean 46 active+remapped+backfilling 21 active+undersized+degraded+remapped+backfill_toofull io: client: 4.6 KiB/s rd, 84 B/s wr, 5 op/s rd, 0 op/s wr recovery: 0 B/s, 4 objects/s progress: Global Recovery Event (22h) [===========================.] (remaining: 31m)
- ceph health detail
HEALTH_ERR full ratio(s) out of order; Low space hindering backfill (add storage if this doesn't resolve itself): 21 pgs backfill_toofull; Degraded data redundancy: 137452/498931805 objects degraded (0.028%), 21 pgs degraded, 21 pgs undersized [ERR] OSD_OUT_OF_ORDER_FULL: full ratio(s) out of order osd_failsafe_full_ratio (0.97) < full_ratio (0.99), increased [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this doesn't resolve itself): 21 pgs backfill_toofull pg 124.4 is active+undersized+degraded+remapped+backfill_toofull, acting [41,52] pg 124.9 is active+undersized+degraded+remapped+backfill_toofull, acting [41,52] ...
Updated by Prashant D over 1 year ago
osd.137 util was over 97% :
$ cat ceph_osd_df_tree.2023-01-04_10-20-28 | awk -F' ' '$17>50' ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME 137 hdd 3.66899 1.00000 3.6 TiB 3.6 TiB 3.6 TiB 7 KiB 6.6 GiB 75 GiB 97.98 5.00 151 up osd.137
Updated by Prashant D over 1 year ago
Adam and I discussed this issue over g-chat last week. The LRC cluster is now in the healthy state.
Documenting steps followed to get LRC cluster to healthy state :
- Re-set ratios to default values
ceph osd set-backfillfull-ratio 0.9
ceph osd set-full-ratio 0.95
ceph osd set-nearfull-ratio 0.85
- Reweight osd.137 to offload PGs to other OSDs
ceph osd reweight-by-utilization 102 0.05 1 --no-increasing
Actions