Project

General

Profile

Bug #53138

cluster [WRN] Health check failed: Degraded data redundancy: 3/1164 objects degraded (0.258%) seen in rbd

Added by Deepika Upadhyay 3 months ago. Updated about 2 months ago.

Status:
Triaged
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-11-02T14:46:34.713 INFO:tasks.ceph:Scrubbing osd.0
2021-11-02T14:46:34.714 DEBUG:teuthology.orchestra.run.smithi008:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell osd.0 config set osd_debug_deep_scrub_sleep 0
2021-11-02T14:46:34.872 INFO:teuthology.orchestra.run.smithi008.stdout:{
2021-11-02T14:46:34.873 INFO:teuthology.orchestra.run.smithi008.stdout:    "success": "osd_debug_deep_scrub_sleep = '0.000000' (not observed, change may require restart) " 
2021-11-02T14:46:34.873 INFO:teuthology.orchestra.run.smithi008.stdout:}
2021-11-02T14:46:34.885 DEBUG:teuthology.orchestra.run.smithi008:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd deep-scrub 0
2021-11-02T14:46:35.188 INFO:teuthology.orchestra.run.smithi008.stderr:instructed osd(s) 0 to deep-scrub
2021-11-02T14:46:35.199 INFO:tasks.ceph:Scrubbing osd.1
2021-11-02T14:46:35.199 DEBUG:teuthology.orchestra.run.smithi008:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell osd.1 config set osd_debug_deep_scrub_sleep 0
2021-11-02T14:46:35.343 INFO:teuthology.orchestra.run.smithi008.stdout:{
2021-11-02T14:46:35.343 INFO:teuthology.orchestra.run.smithi008.stdout:    "success": "osd_debug_deep_scrub_sleep = '0.000000' (not observed, change may require restart) " 
2021-11-02T14:46:35.343 INFO:teuthology.orchestra.run.smithi008.stdout:}
2021-11-02T14:46:35.354 DEBUG:teuthology.orchestra.run.smithi008:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd deep-scrub 1
2021-11-02T14:46:35.654 INFO:teuthology.orchestra.run.smithi008.stderr:instructed osd(s) 1 to deep-scrub
2021-11-02T14:46:35.664 INFO:tasks.ceph:Scrubbing osd.2
failure_reason: '"2021-11-02T14:37:56.579360+0000 mon.a (mon.0) 879 : cluster [WRN]
  Health check failed: Degraded data redundancy: 3/1164 objects degraded (0.258%),
  3 pgs degraded (PG_DEGRADED)" in cluster log'
flavor: default

although RBD tests finish fine, this is seen when trying to wind-up the tests.

/ceph/teuthology-archive/ideepika-2021-11-02_12:33:30-rbd-wip-ssd-cache-testing-distro-basic-smithi/6477559/teuthology.log

History

#1 Updated by Neha Ojha 2 months ago

  • Status changed from New to Triaged

This warning comes up because there are PGs recovering, probably because the test is injecting failures - we can ignore such warnings.

2021-11-02T14:37:55.923243+0000 mgr.x (mgr.14099) 334 : cluster [DBG] pgmap v865: 41 pgs: 1 active+recovering+undersized+remapped, 3 active+recovering+undersized+degraded+remapped, 37 active+clean; 454 MiB data, 1.4 GiB used, 719 GiB / 720 GiB avail; 22 MiB/s rd, 35 MiB/s wr, 1.48k op/s; 3/1164 objects degraded (0.258%); 22/1164 objects misplaced (1.890%); 2.6 MiB/s, 4 keys/s, 6 objects/s recovering
2021-11-02T14:37:57.569993+0000 mon.a (mon.0) 880 : cluster [DBG] osdmap e542: 8 total, 8 up, 8 in
2021-11-02T14:37:55.923243+0000 mgr.x (mgr.14099) 334 : cluster [DBG] pgmap v865: 41 pgs: 1 active+recovering+undersized+remapped, 3 active+recovering+undersized+degraded+remapped, 37 active+clean; 454 MiB data, 1.4 GiB used, 719 GiB / 720 GiB avail; 22 MiB/s rd, 35 MiB/s wr, 1.48k op/s; 3/1164 objects degraded (0.258%); 22/1164 objects misplaced (1.890%); 2.6 MiB/s, 4 keys/s, 6 objects/s recovering
2021-11-02T14:37:57.569993+0000 mon.a (mon.0) 880 : cluster [DBG] osdmap e542: 8 total, 8 up, 8 in
2021-11-02T14:37:57.923775+0000 mgr.x (mgr.14099) 335 : cluster [DBG] pgmap v867: 41 pgs: 1 active+recovering+undersized+remapped, 3 active+recovering+undersized+degraded+remapped, 37 active+clean; 454 MiB data, 1.4 GiB used, 719 GiB / 720 GiB avail; 734 KiB/s rd, 6.7 MiB/s wr, 557 op/s; 3/1164 objects degraded (0.258%); 22/1164 objects misplaced (1.890%); 2.2 MiB/s, 4 keys/s, 5 objects/s recovering
2021-11-02T14:37:58.570932+0000 mon.a (mon.0) 881 : cluster [DBG] osdmap e543: 8 total, 8 up, 8 in
2021-11-02T14:37:57.923775+0000 mgr.x (mgr.14099) 335 : cluster [DBG] pgmap v867: 41 pgs: 1 active+recovering+undersized+remapped, 3 active+recovering+undersized+degraded+remapped, 37 active+clean; 454 MiB data, 1.4 GiB used, 719 GiB / 720 GiB avail; 734 KiB/s rd, 6.7 MiB/s wr, 557 op/s; 3/1164 objects degraded (0.258%); 22/1164 objects misplaced (1.890%); 2.2 MiB/s, 4 keys/s, 5 objects/s recovering

#2 Updated by Deepika Upadhyay about 2 months ago

@Neha I am seeing these failures more than usual, maybe we might be having performance regression, if not, can we increase the timeout?

#3 Updated by Deepika Upadhyay about 2 months ago

  • Priority changed from Normal to High

Also available in: Atom PDF