Project

General

Profile

Actions

Bug #57883

closed

test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get: grep '\<5\>'"

Added by Laura Flores over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2022-10-13_17:24:48-rados-main-distro-default-smithi/7065580

2022-10-14T10:49:22.699 INFO:tasks.workunit.client.0.smithi187.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-code.sh:91: rados_put_get:  get_osds ecpool SOMETHING
2022-10-14T10:49:22.700 INFO:tasks.workunit.client.0.smithi187.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1039: get_osds:  local poolname=ecpool
2022-10-14T10:49:22.700 INFO:tasks.workunit.client.0.smithi187.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1040: get_osds:  local objectname=SOMETHING
2022-10-14T10:49:22.701 INFO:tasks.workunit.client.0.smithi187.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-code.sh:91: rados_put_get:  grep '\<5\>'
2022-10-14T10:49:22.701 INFO:tasks.workunit.client.0.smithi187.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1043: get_osds:  ceph --format json osd map ecpool SOMETHING
2022-10-14T10:49:22.702 INFO:tasks.workunit.client.0.smithi187.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1043: get_osds:  jq '.acting | .[]'
2022-10-14T10:49:23.456 INFO:tasks.workunit.client.0.smithi187.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1043: get_osds:  local 'osds=6
2022-10-14T10:49:23.457 INFO:tasks.workunit.client.0.smithi187.stderr:0
2022-10-14T10:49:23.457 INFO:tasks.workunit.client.0.smithi187.stderr:7
2022-10-14T10:49:23.457 INFO:tasks.workunit.client.0.smithi187.stderr:5'
2022-10-14T10:49:23.458 INFO:tasks.workunit.client.0.smithi187.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1045: get_osds:  echo 6 0 7 5
2022-10-14T10:49:23.458 INFO:tasks.workunit.client.0.smithi187.stdout:6 0 7 5
2022-10-14T10:49:23.460 INFO:tasks.workunit.client.0.smithi187.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-code.sh:91: rados_put_get:  return 1
2022-10-14T10:49:23.461 INFO:tasks.workunit.client.0.smithi187.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-code.sh:210: TEST_rados_put_get_jerasure:  return 1

This test marks an OSD down and checks to make sure that it's not in the acting set. In this case, the OSD that was marked down was still in the acting set.

OSD.5 was marked down here in the mon.a log:

2022-10-14T10:49:22.045+0000 7fad73819700  5 --2- v2:127.0.0.1:7101/0 >> 127.0.0.1:0/3515117041 conn(0x560849b97800 0x5608490d4c00 crc :-1 s=READ_MESSAGE_COMPLETE pgs=1 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_message received message m=0x5608498f5680 seq=6 from=client.? type=50 mon_command({"prefix": "osd out", "ids": ["5"]} v 0) v1
2022-10-14T10:49:22.045+0000 7fad73819700 20 -- v2:127.0.0.1:7101/0 queue 0x5608498f5680 prio 127
2022-10-14T10:49:22.045+0000 7fad73819700 20 --2- v2:127.0.0.1:7101/0 >> 127.0.0.1:0/3515117041 conn(0x560849b97800 0x5608490d4c00 crc :-1 s=READY pgs=1 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).read_frame
2022-10-14T10:49:22.045+0000 7fad73819700 20 -- v2:127.0.0.1:7101/0 >> 127.0.0.1:0/3515117041 conn(0x560849b97800 msgr2=0x5608490d4c00 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read start len=32
2022-10-14T10:49:22.045+0000 7fad7581d700  1 -- v2:127.0.0.1:7101/0 <== client.? 127.0.0.1:0/3515117041 6 ==== mon_command({"prefix": "osd out", "ids": ["5"]} v 0) v1 ==== 77+0+0 (crc 0 0 0) 0x5608498f5680 con 0x560849b97800
2022-10-14T10:49:22.045+0000 7fad7581d700 20 mon.a@0(leader) e1 _ms_dispatch existing session 0x56084a17fb00 for client.?
2022-10-14T10:49:22.045+0000 7fad7581d700 20 mon.a@0(leader) e1  entity_name client.admin global_id 5021 (none) caps allow *
2022-10-14T10:49:22.045+0000 7fad7581d700  0 mon.a@0(leader) e1 handle_command mon_command({"prefix": "osd out", "ids": ["5"]} v 0) v1
2022-10-14T10:49:22.045+0000 7fad7581d700 20 is_capable service=osd command=osd out read write addr 127.0.0.1:0/3515117041 on cap allow *
2022-10-14T10:49:22.045+0000 7fad7581d700 20  allow so far , doing grant allow *
2022-10-14T10:49:22.045+0000 7fad7581d700 20  allow all
2022-10-14T10:49:22.045+0000 7fad7581d700 10 mon.a@0(leader) e1 _allowed_command capable
2022-10-14T10:49:22.045+0000 7fad7581d700  0 log_channel(audit) log [INF] : from='client.? 127.0.0.1:0/3515117041' entity='client.admin' cmd=[{"prefix": "osd out", "ids": ["5"]}]: dispatch
2022-10-14T10:49:22.045+0000 7fad7581d700  1 -- v2:127.0.0.1:7101/0 --> v2:127.0.0.1:7101/0 -- log(1 entries from seq 1295 at 2022-10-14T10:49:22.050967+0000) v1 -- 0x560849a548c0 con 0x560847ca5800
2022-10-14T10:49:22.045+0000 7fad7581d700 20 -- v2:127.0.0.1:7101/0 >> v2:127.0.0.1:7101/0 conn(0x560847ca5800 msgr2=0x560847f7c800 unknown :-1 s=STATE_NONE l=0).send_message log(1 entries from seq 1295 at 2022-10-14T10:49:22.050967+0000) v1 local
2022-10-14T10:49:22.045+0000 7fad7501c700 20 -- v2:127.0.0.1:7101/0 queue 0x560849a548c0 prio 196
2022-10-14T10:49:22.045+0000 7fad7581d700 10 mon.a@0(leader).paxosservice(osdmap 1..126) dispatch 0x5608498f5680 mon_command({"prefix": "osd out", "ids": ["5"]} v 0) v1 from client.? 127.0.0.1:0/3515117041 con 0x560849b97800
2022-10-14T10:49:22.045+0000 7fad7581d700  5 mon.a@0(leader).paxos(paxos active c 754..1417) is_readable = 1 - now=2022-10-14T10:49:22.051040+0000 lease_expire=1970-01-01T00:00:00.000000+0000 has v0 lc 1417
2022-10-14T10:49:22.045+0000 7fad7581d700 10 mon.a@0(leader).osd e126 preprocess_query mon_command({"prefix": "osd out", "ids": ["5"]} v 0) v1 from client.? 127.0.0.1:0/3515117041
2022-10-14T10:49:22.045+0000 7fad7581d700  7 mon.a@0(leader).osd e126 prepare_update mon_command({"prefix": "osd out", "ids": ["5"]} v 0) v1 from client.? 127.0.0.1:0/3515117041
2022-10-14T10:49:22.045+0000 7fad7581d700 -1 mon.a@0(leader).osd e126 definitely_dead 0
2022-10-14T10:49:22.045+0000 7fad7581d700  0 log_channel(cluster) log [INF] : Client client.admin marked osd.5 out, while it was still marked up

When we marked osd.5 down in the script, we got:

2022-10-14T10:49:21.756 INFO:tasks.workunit.client.0.smithi187.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-code.sh:90: rados_put_get:  ceph osd out 5
2022-10-14T10:49:22.685 INFO:tasks.workunit.client.0.smithi187.stderr:osd.5 is already out.

The acting set (which we later use to check whether the down OSD is there or not) was established here:

2022-10-14T10:49:23.458 INFO:tasks.workunit.client.0.smithi187.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1045: get_osds:  echo 6 0 7 5
2022-10-14T10:49:23.458 INFO:tasks.workunit.client.0.smithi187.stdout:6 0 7 5


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #15211: tests: test-erasure-code.sh failing sometimes during 'make check' on masterResolved03/18/2016

Actions
Actions

Also available in: Atom PDF