Project

General

Profile

Bug #46405

osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1

Added by Neha Ojha over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-07-07T23:54:01.124 INFO:tasks.workunit.client.0.smithi022.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:227: TEST_rados_repair_warning:  ceph pg 2.0 query
2020-07-07T23:54:01.124 INFO:tasks.workunit.client.0.smithi022.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:227: TEST_rados_repair_warning:  jq .info.stats.stat_sum.num_objects_repaired
2020-07-07T23:54:01.260 INFO:tasks.workunit.client.0.smithi022.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:227: TEST_rados_repair_warning:  COUNT=21
2020-07-07T23:54:01.260 INFO:tasks.workunit.client.0.smithi022.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:228: TEST_rados_repair_warning:  expr 11 '*' 2
2020-07-07T23:54:01.262 INFO:tasks.workunit.client.0.smithi022.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:228: TEST_rados_repair_warning:  test 21 = 22
2020-07-07T23:54:01.262 INFO:tasks.workunit.client.0.smithi022.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:228: TEST_rados_repair_warning:  return 1
2020-07-07T23:54:01.262 INFO:tasks.workunit.client.0.smithi022.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:42: run:  return 1

/a/nojha-2020-07-07_21:05:58-rados:standalone-master-distro-basic-smithi/5207209
/a/nojha-2020-07-07_21:05:58-rados:standalone-master-distro-basic-smithi/5207210


Related issues

Related to RADOS - Feature #41564: Issue health status warning if num_shards_repaired exceeds some threshold Resolved
Copied to RADOS - Backport #47825: nautilus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1 Resolved
Copied to RADOS - Backport #47826: octopus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1 Resolved

History

#1 Updated by Neha Ojha over 3 years ago

  • Related to Feature #41564: Issue health status warning if num_shards_repaired exceeds some threshold added

#3 Updated by Neha Ojha over 3 years ago

  • Backport set to nautilus,octopus

Since the original feature is being backported to nautilus and octopus.

/a/yuriw-2020-07-06_17:23:10-rados-wip-yuri8-testing-2020-07-01-2358-octopus-distro-basic-smithi/5203825

#4 Updated by Neha Ojha over 3 years ago

  • Priority changed from High to Urgent

/a/yuriw-2020-07-13_23:06:23-rados-wip-yuri5-testing-2020-07-13-1944-octopus-distro-basic-smithi/5224649

#5 Updated by David Zafman over 3 years ago

I'm not seeing this on my build machine using run-standalone.sh

#6 Updated by Kefu Chai over 3 years ago

/a/kchai-2020-07-27_15:50:48-rados-wip-kefu-testing-2020-07-27-2127-distro-basic-smithi/5261869

#7 Updated by Brad Hubbard over 3 years ago

/a/yuriw-2020-08-06_00:31:28-rados-wip-yuri8-testing-octopus-distro-basic-smithi/5291111

#8 Updated by Brad Hubbard over 3 years ago

/a/yuriw-2020-08-27_00:49:53-rados-wip-yuri8-testing-2020-08-26-2329-octopus-distro-basic-smithi/5379176/

#9 Updated by Brad Hubbard over 3 years ago

  • Assignee set to Brad Hubbard

Here's the actual problem I think. Working on a fix.

2020-08-27T07:07:31.848 INFO:tasks.workunit.client.0.smithi098.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:188: TEST_rados_repair_warning:  local obj-base=obj-warn-
2020-08-27T07:07:31.849 INFO:tasks.workunit.client.0.smithi098.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh: line 188: local: `obj-base=obj-warn-': not a valid identifier

#10 Updated by Kefu Chai over 3 years ago

2020-09-10T21:46:03.940 INFO:tasks.workunit.client.0.smithi007.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:172: TEST_rados_get_with_eio:  rados_get_data eio td/osd-r
ep-recov-eio.sh
...
2020-09-10T21:46:16.890 INFO:tasks.workunit.client.0.smithi007.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:143: rados_get_data:  rados_get td/osd-rep-recov-eio.sh po
ol-rep obj-eio-859439
2020-09-10T21:46:16.891 INFO:tasks.workunit.client.0.smithi007.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:80: rados_get:  local dir=td/osd-rep-recov-eio.sh
2020-09-10T21:46:16.891 INFO:tasks.workunit.client.0.smithi007.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:81: rados_get:  local poolname=pool-rep
2020-09-10T21:46:16.891 INFO:tasks.workunit.client.0.smithi007.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:82: rados_get:  local objname=obj-eio-859439
2020-09-10T21:46:16.891 INFO:tasks.workunit.client.0.smithi007.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:83: rados_get:  local expect=ok
2020-09-10T21:46:16.891 INFO:tasks.workunit.client.0.smithi007.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:88: rados_get:  '[' ok = fail ']'
2020-09-10T21:46:16.891 INFO:tasks.workunit.client.0.smithi007.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:96: rados_get:  '[' ok = hang ']'
2020-09-10T21:46:16.892 INFO:tasks.workunit.client.0.smithi007.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:105: rados_get:  rados --pool pool-rep get obj-eio-859439
td/osd-rep-recov-eio.sh/COPY
2020-09-10T21:46:17.017 INFO:tasks.workunit.client.0.smithi007.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:106: rados_get:  diff td/osd-rep-recov-eio.sh/ORIGINAL td/
osd-rep-recov-eio.sh/COPY
2020-09-10T21:46:17.017 INFO:tasks.workunit.client.0.smithi007.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:107: rados_get:  rm td/osd-rep-recov-eio.sh/COPY
2020-09-10T21:46:17.019 INFO:tasks.workunit.client.0.smithi007.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:145: rados_get_data:  ceph pg 2.0 query
2020-09-10T21:46:17.019 INFO:tasks.workunit.client.0.smithi007.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:145: rados_get_data:  jq .info.stats.stat_sum.num_objects
_repaired
2020-09-10T21:46:17.154 INFO:tasks.workunit.client.0.smithi007.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:145: rados_get_data:  COUNT=2
2020-09-10T21:46:17.155 INFO:tasks.workunit.client.0.smithi007.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:146: rados_get_data:  test 2 = 3
2020-09-10T21:46:17.155 INFO:tasks.workunit.client.0.smithi007.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:146: rados_get_data:  return 1

/a/kchai-2020-09-10_16:44:13-rados-wip-kefu-testing-2020-09-10-1633-distro-basic-smithi/5421813/teuthology.log

#11 Updated by Brad Hubbard over 3 years ago

Kefu,

/a/kchai-2020-09-10_16:44:13-rados-wip-kefu-testing-2020-09-10-1633-distro-basic-smithi/5421813/teuthology.log may be a different problem since it's happening in TEST_rados_get_with_eio (earlier than TEST_rados_repair_warning) and not showing the 'not a valid identifier' message.

#12 Updated by Kefu Chai over 3 years ago

Brad, thanks. will create a separate ticket.

#13 Updated by Neha Ojha over 3 years ago

2020-09-22T19:40:43.835 INFO:tasks.workunit.client.0.smithi134.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:228: TEST_rados_repair_warning:  expr 11 '*' 2
2020-09-22T19:40:43.837 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:228: TEST_rados_repair_warning:  test 21 = 22
2020-09-22T19:40:43.837 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:228: TEST_rados_repair_warning:  return 1
2020-09-22T19:40:43.837 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-rep-recov-eio.sh:42: run:  return 1

/a/teuthology-2020-09-22_07:01:02-rados-master-distro-basic-smithi/5458830

#14 Updated by Neha Ojha over 3 years ago

/a/teuthology-2020-09-25_07:01:01-rados-master-distro-basic-smithi/5466817

#15 Updated by David Zafman over 3 years ago

This change fixes the odd object names in the subtest, but shouldn't change help fix this problem. On my build machine, using run-standalone.sh the subtest passes with and without the change below. Could we need a short sleep before query in order to let things update for all test cases?

$ git diff
diff --git a/qa/standalone/osd/osd-rep-recov-eio.sh b/qa/standalone/osd/osd-rep-recov-eio.sh
index 613bfc316f7..6929e580d9f 100755
--- a/qa/standalone/osd/osd-rep-recov-eio.sh
+++ b/qa/standalone/osd/osd-rep-recov-eio.sh
@@ -185,7 +185,7 @@ function TEST_rados_repair_warning() {
     wait_for_clean || return 1

     local poolname=pool-rep
-    local obj-base=obj-warn-
+    local objbase=obj-warn
     local inject=eio

    for i in $(seq 1 $OBJS)

#16 Updated by Neha Ojha over 3 years ago

/a/teuthology-2020-09-29_07:01:02-rados-master-distro-basic-smithi/5480928

#17 Updated by David Zafman over 3 years ago

  • Status changed from New to In Progress
  • Assignee changed from Brad Hubbard to David Zafman
  • Pull request ID set to 37483

#18 Updated by Neha Ojha over 3 years ago

/a/teuthology-2020-09-30_07:01:02-rados-master-distro-basic-smithi/5483631

#19 Updated by Neha Ojha over 3 years ago

  • Status changed from In Progress to Fix Under Review

#20 Updated by Neha Ojha over 3 years ago

  • Status changed from Fix Under Review to Pending Backport

#21 Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47825: nautilus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1 added

#22 Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47826: octopus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1 added

#23 Updated by David Zafman over 3 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF