Project

General

Profile

Actions

Bug #44510

open

osd/osd-recovery-space.sh TEST_recovery_test_simple failure

Added by Sage Weil about 4 years ago. Updated 1 day ago.

Status:
Fix Under Review
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-03-08T23:19:15.259 INFO:tasks.workunit.client.0.smithi192.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:144: TEST_recovery_test_simple:  ceph status --format=json-pretty
2020-03-08T23:19:15.703 INFO:tasks.workunit.client.0.smithi192.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:146: TEST_recovery_test_simple:  jq .health.checks.PG_RECOVERY_FULL.severity td/osd-recovery-space/stat.json
2020-03-08T23:19:15.705 INFO:tasks.workunit.client.0.smithi192.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:146: TEST_recovery_test_simple:  eval SEV=null
2020-03-08T23:19:15.705 INFO:tasks.workunit.client.0.smithi192.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:146: TEST_recovery_test_simple:  SEV=null
2020-03-08T23:19:15.706 INFO:tasks.workunit.client.0.smithi192.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:147: TEST_recovery_test_simple:  '[' null '!=' HEALTH_ERR ']'
2020-03-08T23:19:15.706 INFO:tasks.workunit.client.0.smithi192.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:148: TEST_recovery_test_simple:  echo 'PG_RECOVERY_FULL severity null not HEALTH_ERR'
2020-03-08T23:19:15.706 INFO:tasks.workunit.client.0.smithi192.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:149: TEST_recovery_test_simple:  expr 1 + 1
2020-03-08T23:19:15.706 INFO:tasks.workunit.client.0.smithi192.stdout:PG_RECOVERY_FULL severity null not HEALTH_ERR
2020-03-08T23:19:15.707 INFO:tasks.workunit.client.0.smithi192.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:149: TEST_recovery_test_simple:  ERRORS=2
2020-03-08T23:19:15.707 INFO:tasks.workunit.client.0.smithi192.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:151: TEST_recovery_test_simple:  jq .health.checks.PG_RECOVERY_FULL.summary.message td/osd-recovery-space/stat.json
2020-03-08T23:19:15.709 INFO:tasks.workunit.client.0.smithi192.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:151: TEST_recovery_test_simple:  eval MSG=null
2020-03-08T23:19:15.709 INFO:tasks.workunit.client.0.smithi192.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:151: TEST_recovery_test_simple:  MSG=null
2020-03-08T23:19:15.709 INFO:tasks.workunit.client.0.smithi192.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:152: TEST_recovery_test_simple:  '[' null '!=' 'Full OSDs blocking recovery: 1 pg recovery_toofull' ']'
2020-03-08T23:19:15.709 INFO:tasks.workunit.client.0.smithi192.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:153: TEST_recovery_test_simple:  echo 'PG_RECOVERY_FULL message '\''null'\'' mismatched'
2020-03-08T23:19:15.710 INFO:tasks.workunit.client.0.smithi192.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:154: TEST_recovery_test_simple:  expr 2 + 1
2020-03-08T23:19:15.710 INFO:tasks.workunit.client.0.smithi192.stdout:PG_RECOVERY_FULL message 'null' mismatched
2020-03-08T23:19:15.711 INFO:tasks.workunit.client.0.smithi192.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:154: TEST_recovery_test_simple:  ERRORS=3
2020-03-08T23:19:15.712 INFO:tasks.workunit.client.0.smithi192.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:156: TEST_recovery_test_simple:  rm -f td/osd-recovery-space/stat.json
2020-03-08T23:19:15.712 INFO:tasks.workunit.client.0.smithi192.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:158: TEST_recovery_test_simple:  '[' 3 '!=' 0 ']'
2020-03-08T23:19:15.712 INFO:tasks.workunit.client.0.smithi192.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:160: TEST_recovery_test_simple:  return 1
2020-03-08T23:19:15.712 INFO:tasks.workunit.client.0.smithi192.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:35: run:  return 1

/a/sage-2020-03-08_21:18:15-rados:standalone-wip-sage2-testing-2020-03-08-1456-distro-basic-smithi/4838360
Actions #2

Updated by Neha Ojha almost 4 years ago

  • Priority changed from Urgent to High
Actions #3

Updated by Neha Ojha almost 4 years ago

  • Priority changed from High to Normal
Actions #4

Updated by Laura Flores 6 months ago

  • Translation missing: en.field_tag_list set to test-failure

/a/yuriw-2023-11-01_21:37:41-rados-wip-yuri6-testing-2023-11-01-0745-reef-distro-default-smithi/7443892

Actions #5

Updated by Radoslaw Zarzynski 6 months ago

The test is basically querying ceph status for error flags, so the symptom is pretty generic and likely there are many paths leading to it. It could be something new.

Actions #6

Updated by Matan Breizman 15 days ago

/a/yuriw-2024-04-16_23:25:35-rados-wip-yuriw-testing-20240416.150233-distro-default-smithi/7659542

Actions #7

Updated by Radoslaw Zarzynski 10 days ago

  • Assignee set to Nitzan Mordechai

Hi Nitzan, would you mind taking a look?

Actions #8

Updated by Nitzan Mordechai 3 days ago

  • Status changed from New to In Progress

from /a/yuriw-2024-04-16_23:25:35-rados-wip-yuriw-testing-20240416.150233-distro-default-smithi/7659542
we can see that the too full flag is not on (yet?)

2024-04-17T03:56:53.852 INFO:tasks.workunit.client.0.smithi138.stdout:PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED  UNFOUND  BYTES    OMAP_BYTES*  OMAP_KEYS*  LOG  LOG_DUPS  DISK_LOG  STATE         STATE_STAMP                      VERSION  REPORTED  UP     UP_PRIMARY  ACTING  ACTING_PRIMARY  LAST_
SCRUB  SCRUB_STAMP                      LAST_DEEP_SCRUB  DEEP_SCRUB_STAMP                 SNAPTRIMQ_LEN  LAST_SCRUB_DURATION  SCRUB_SCHEDULING                                            OBJECTS_SCRUBBED  OBJECTS_TRIMMED
2024-04-17T03:56:53.853 INFO:tasks.workunit.client.0.smithi138.stdout:1.0          600                   0         0          0        0  3072000            0           0  600         0       600  active+clean  2024-04-17T03:56:49.179149+0000   23'600   32:1842  [1,0]           1   [1,0]               1       
  0'0  2024-04-17T03:55:16.961244+0000              0'0  2024-04-17T03:55:16.961244+0000              0                    0  periodic scrub scheduled @ 2024-04-18T11:04:26.442350+0000                 0                0
2024-04-17T03:56:53.853 INFO:tasks.workunit.client.0.smithi138.stdout:
2024-04-17T03:56:53.853 INFO:tasks.workunit.client.0.smithi138.stdout:* NOTE: Omap statistics are gathered during deep scrub and may be inaccurate soon afterwards depending on utilization. See http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for further details.
2024-04-17T03:56:53.853 INFO:tasks.workunit.client.0.smithi138.stderr:dumped pgs
2024-04-17T03:56:53.863 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:67: wait_for_state:  return 1
2024-04-17T03:56:53.863 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:136: TEST_recovery_test_simple:  ERRORS=0
2024-04-17T03:56:53.864 INFO:tasks.workunit.client.0.smithi138.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:137: TEST_recovery_test_simple:  ceph pg dump pgs
2024-04-17T03:56:53.864 INFO:tasks.workunit.client.0.smithi138.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:137: TEST_recovery_test_simple:  grep +recovery_toofull
2024-04-17T03:56:53.865 INFO:tasks.workunit.client.0.smithi138.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-recovery-space.sh:137: TEST_recovery_test_simple:  wc -l

osd-recovery-space waiting for too full: # If this times out, we'll detected errors below
wait_for_recovery_toofull 30

But we didn't receive any 'too-full' flag. The 600 objects weren't written completely to the OSDs, which is why we didn't receive that flag.

Actions #9

Updated by Nitzan Mordechai 1 day ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 57193
Actions

Also available in: Atom PDF