Project

General

Profile

Bug #41236

cosbench failures in rados/perf

Added by Neha Ojha 11 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:

Description

Starts here:

2019-08-14T01:50:08.678 INFO:teuthology.orchestra.run.smithi163.stderr:01:50:08 - DEBUG    - cbt      - get_fqdn_local()=smithi163.front.sepia.ceph.com
2019-08-14T01:50:08.679 INFO:teuthology.orchestra.run.smithi163.stderr:01:50:08 - DEBUG    - cbt      - CheckedPopen continue_if_error=True, shell=True args=radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift
2019-08-14T01:50:08.729 INFO:teuthology.orchestra.run.smithi163.stderr:01:50:08 - WARNING  - cbt      - radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift
2019-08-14T01:50:08.730 INFO:teuthology.orchestra.run.smithi163.stderr:01:50:08 - WARNING  - cbt      - error 242 seen, continuing anyway...

/a/nojha-2019-08-13_23:51:37-rados-wip-bluefs-shared-alloc-2018-08-13-distro-basic-smithi/4213208/

History

#1 Updated by Kefu Chai 11 months ago

rerunning the tests with more verbose logging.

http://pulpito.ceph.com/kchai-2019-08-14_03:56:19-perf-basic-master-distro-basic-mira/

2019-08-14T04:11:56.630 INFO:teuthology.orchestra.run.mira117.stderr:04:11:56 - WARNING  - cbt      - radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift
2019-08-14T04:11:56.630 INFO:teuthology.orchestra.run.mira117.stderr:04:11:56 - WARNING  - cbt      -
2019-08-14T04:11:56.631 INFO:teuthology.orchestra.run.mira117.stderr:04:11:56 - WARNING  - cbt      - could not create key: unable to add access key, empty secret key
2019-08-14T04:11:56.631 INFO:teuthology.orchestra.run.mira117.stderr:
2019-08-14T04:11:56.631 INFO:teuthology.orchestra.run.mira117.stderr:04:11:56 - WARNING  - cbt      - error 242 seen, continuing anyway...

#2 Updated by Kefu Chai 11 months ago

Neha, i reverted the cbt to ff779d212f5fb9bae6947952ac40e32308ceead5 and reran the cosbench tests. tests were still green and all of them passed without error/warning messages.

but if i print out the stderr and stdout, i have:

['pdsh', '-f', '1', '-R', 'ssh', '-w', 'ubuntu@ubuntu@mira059.front.sepia.ceph.com', 'radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift']
06:31:22 - DEBUG    - cbt      - CheckedPopen continue_if_error=True args=pdsh -f 1 -R ssh -w ubuntu@ubuntu@mira059.front.sepia.ceph.com radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift
========== OK ==============

ubuntu@mira059: Warning: Permanently added 'mira059.front.sepia.ceph.com,172.21.6.132' (ECDSA) to the list of known hosts.
ubuntu@mira059: could not create key: unable to add access key, empty secret key
pdsh@mira059: ubuntu@mira059: ssh exited with exit code 242

per the man page of pdsh(1), see https://linux.die.net/man/1/pdsh

-S
Return the largest of the remote command return values.

so i reran the pdsh command manually:

[ubuntu@mira059 cbt]$ pdsh -f 1 -R ssh -w ubuntu@ubuntu@ubuntu@mira059.front.sepia.ceph.com 'radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift'
ubuntu@ubuntu@mira059: Warning: Permanently added 'mira059.front.sepia.ceph.com,172.21.6.132' (ECDSA) to the list of known hosts.
ubuntu@ubuntu@mira059: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
pdsh@mira059: ubuntu@ubuntu@mira059: ssh exited with exit code 255
[ubuntu@mira059 cbt]$ echo $?
0

and then with the -S option

[ubuntu@mira059 cbt]$ pdsh -S -f 1 -R ssh -w ubuntu@ubuntu@ubuntu@mira059.front.sepia.ceph.com 'radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift'
ubuntu@ubuntu@mira059: Warning: Permanently added 'mira059.front.sepia.ceph.com,172.21.6.132' (ECDSA) to the list of known hosts.
ubuntu@ubuntu@mira059: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
pdsh@mira059: ubuntu@ubuntu@mira059: ssh exited with exit code 255
[ubuntu@mira059 cbt]$ echo $?
255

so, i think this is not a regression.

our change just renders the failure more visible.

also, i found

06:50:17 - DEBUG    - cbt      - CheckedPopen continue_if_error=True args=pdsh -f 1 -R ssh -w ubuntu@ubuntu@mira059.front.sepia.ceph.com swift -A http://172.21.6.132:80/auth/v1.0 -U cosbench:operator -K intel2012 list
======= OK ========
stdout
stderr ubuntu@mira059: Warning: Permanently added 'mira059.front.sepia.ceph.com,172.21.6.132' (ECDSA) to the list of known hosts.
ubuntu@mira059: bash: swift: command not found
pdsh@mira059: ubuntu@mira059: ssh exited with exit code 127

06:50:18 - DEBUG    - cbt      - Pausing for 60s for idle monitoring.
06:50:18 - DEBUG    - cbt      - Nodes : ubuntu@mira059.front.sepia.ceph.com
06:50:18 - DEBUG    - cbt      - CheckedPopen continue_if_error=True args=pdsh -f 1 -R ssh -w ubuntu@mira059.front.sepia.ceph.com mkdir -p -m0755 -- /tmp/cbt/00000000/Cosbench/osd_ra-00004096/op_size-64KB/concurrent_procs-001/containers-00100/objects-00100/write/idle_monitoring/collectl
06:50:18 - DEBUG    - cbt      - CheckedPopen continue_if_error=True args=pdsh -f 1 -R ssh -w ubuntu@mira059.front.sepia.ceph.com collectl -s+mYZ -i 1:10 --rawdskfilt "+cciss/c\d+d\d+ |hd[ab] | sd[a-z]+ |dm-\d+ |xvd[a-z] |fio[a-z]+ | vd[a-z]+ |emcpower[a-z]+ |psv\d+ |nvme[0-9]n[0-9]+p[0-9]+ " -F0 -f /tmp/cbt/00000000/Cosbench/osd_ra-00004096/op_size-64KB/concurrent_procs-001/containers-00100/objects-00100/write/idle_monitoring/collectl
06:51:18 - DEBUG    - cbt      - Nodes : ubuntu@mira059.front.sepia.ceph.com
06:51:18 - DEBUG    - cbt      - CheckedPopen continue_if_error=True args=pdsh -f 1 -R ssh -w ubuntu@mira059.front.sepia.ceph.com killall -SIGINT -f collectl
======= OK ========
stdout
stderr mira059: Warning: Permanently added 'mira059.front.sepia.ceph.com,172.21.6.132' (ECDSA) to the list of known hosts.
mira059: Usage: killall [-Z CONTEXT] [-u USER] [ -eIgiqrvw ] [ -SIGNAL ] NAME...
mira059:        killall -l, --list
mira059:        killall -V, --version
mira059:
mira059:   -e,--exact          require exact match for very long names
mira059:   -I,--ignore-case    case insensitive process name match
mira059:   -g,--process-group  kill process group instead of process
mira059:   -y,--younger-than   kill processes younger than TIME
mira059:   -o,--older-than     kill processes older than TIME
mira059:   -i,--interactive    ask for confirmation before killing
mira059:   -l,--list           list all known signal names
mira059:   -q,--quiet          don't print complaints
mira059:   -r,--regexp         interpret NAME as an extended regular expression
mira059:   -s,--signal SIGNAL  send this signal instead of SIGTERM
mira059:   -u,--user USER      kill only process(es) running as USER
mira059:   -v,--verbose        report if the signal was successfully sent
mira059:   -V,--version        display version information
mira059:   -w,--wait           wait for processes to die
mira059:   -Z,--context REGEXP kill only process(es) having context
mira059:                       (must precede other arguments)
mira059:
pdsh@mira059: mira059: ssh exited with exit code 1

#3 Updated by Neha Ojha 11 months ago

The issue of cosbench not gathering test results is fixed by reverting https://github.com/ceph/cbt/pull/152.
Revert branch: https://github.com/neha-ojha/cbt/tree/wip-fix-cosbench-sync
Test runs: http://pulpito.ceph.com/nojha-2019-08-15_17:58:44-perf-basic-master-distro-basic-smithi/

Another test run http://pulpito.ceph.com/nojha-2019-08-15_19:12:22-rados:perf-master-distro-basic-smithi/ results in this failure: http://pulpito.ceph.com/nojha-2019-08-15_19:12:22-rados:perf-master-distro-basic-smithi/4217364/

Comparing logs

FAILED:

2019-08-15T19:39:17.187 INFO:teuthology.orchestra.run.smithi175.stderr:19:39:17 - DEBUG    - cbt      - get_fqdn_local()=smithi175.front.sepia.ceph.com
2019-08-15T19:39:17.187 INFO:teuthology.orchestra.run.smithi175.stderr:19:39:17 - DEBUG    - cbt      - CheckedPopen continue_if_error=True, shell=True args=radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift
2019-08-15T19:39:17.236 INFO:teuthology.orchestra.run.smithi175.stderr:19:39:17 - WARNING  - cbt      - radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift
2019-08-15T19:39:17.236 INFO:teuthology.orchestra.run.smithi175.stderr:19:39:17 - WARNING  - cbt      - error 242 seen, continuing anyway...

PASSED

2019-08-15T19:36:37.506 INFO:teuthology.orchestra.run.smithi142.stderr:19:36:37 - DEBUG    - cbt      - get_fqdn_local()=smithi142
2019-08-15T19:36:37.507 INFO:teuthology.orchestra.run.smithi142.stderr:19:36:37 - DEBUG    - cbt      - CheckedPopen continue_if_error=True, shell=False args=pdsh -f 1 -R ssh -w ubuntu@smithi142.front.sepia.ceph.com radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift

This difference in args and the value of shell seems to be cause, which could happen if get_localnode() returns true for some reason

https://github.com/ceph/cbt/pull/189/files#diff-cf5a98d3b8ded947cd1a470ce7e3769eL100-L105

#4 Updated by Kefu Chai 11 months ago

  • Status changed from New to Fix Under Review
  • Assignee set to Kefu Chai
  • Pull request ID set to 191

#5 Updated by Kefu Chai 11 months ago

  • Pull request ID deleted (191)

#6 Updated by Neha Ojha about 1 month ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF