Bug #41236
closedcosbench failures in rados/perf
0%
Description
Starts here:
2019-08-14T01:50:08.678 INFO:teuthology.orchestra.run.smithi163.stderr:01:50:08 - DEBUG - cbt - get_fqdn_local()=smithi163.front.sepia.ceph.com 2019-08-14T01:50:08.679 INFO:teuthology.orchestra.run.smithi163.stderr:01:50:08 - DEBUG - cbt - CheckedPopen continue_if_error=True, shell=True args=radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift 2019-08-14T01:50:08.729 INFO:teuthology.orchestra.run.smithi163.stderr:01:50:08 - WARNING - cbt - radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift 2019-08-14T01:50:08.730 INFO:teuthology.orchestra.run.smithi163.stderr:01:50:08 - WARNING - cbt - error 242 seen, continuing anyway...
/a/nojha-2019-08-13_23:51:37-rados-wip-bluefs-shared-alloc-2018-08-13-distro-basic-smithi/4213208/
Updated by Kefu Chai over 4 years ago
rerunning the tests with more verbose logging.
http://pulpito.ceph.com/kchai-2019-08-14_03:56:19-perf-basic-master-distro-basic-mira/
2019-08-14T04:11:56.630 INFO:teuthology.orchestra.run.mira117.stderr:04:11:56 - WARNING - cbt - radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift 2019-08-14T04:11:56.630 INFO:teuthology.orchestra.run.mira117.stderr:04:11:56 - WARNING - cbt - 2019-08-14T04:11:56.631 INFO:teuthology.orchestra.run.mira117.stderr:04:11:56 - WARNING - cbt - could not create key: unable to add access key, empty secret key 2019-08-14T04:11:56.631 INFO:teuthology.orchestra.run.mira117.stderr: 2019-08-14T04:11:56.631 INFO:teuthology.orchestra.run.mira117.stderr:04:11:56 - WARNING - cbt - error 242 seen, continuing anyway...
Updated by Kefu Chai over 4 years ago
Neha, i reverted the cbt to ff779d212f5fb9bae6947952ac40e32308ceead5 and reran the cosbench tests. tests were still green and all of them passed without error/warning messages.
but if i print out the stderr and stdout, i have:
['pdsh', '-f', '1', '-R', 'ssh', '-w', 'ubuntu@ubuntu@mira059.front.sepia.ceph.com', 'radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift'] 06:31:22 - DEBUG - cbt - CheckedPopen continue_if_error=True args=pdsh -f 1 -R ssh -w ubuntu@ubuntu@mira059.front.sepia.ceph.com radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift ========== OK ============== ubuntu@mira059: Warning: Permanently added 'mira059.front.sepia.ceph.com,172.21.6.132' (ECDSA) to the list of known hosts. ubuntu@mira059: could not create key: unable to add access key, empty secret key pdsh@mira059: ubuntu@mira059: ssh exited with exit code 242
per the man page of pdsh(1), see https://linux.die.net/man/1/pdsh
-S
Return the largest of the remote command return values.
so i reran the pdsh
command manually:
[ubuntu@mira059 cbt]$ pdsh -f 1 -R ssh -w ubuntu@ubuntu@ubuntu@mira059.front.sepia.ceph.com 'radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift' ubuntu@ubuntu@mira059: Warning: Permanently added 'mira059.front.sepia.ceph.com,172.21.6.132' (ECDSA) to the list of known hosts. ubuntu@ubuntu@mira059: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). pdsh@mira059: ubuntu@ubuntu@mira059: ssh exited with exit code 255 [ubuntu@mira059 cbt]$ echo $? 0
and then with the -S
option
[ubuntu@mira059 cbt]$ pdsh -S -f 1 -R ssh -w ubuntu@ubuntu@ubuntu@mira059.front.sepia.ceph.com 'radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift' ubuntu@ubuntu@mira059: Warning: Permanently added 'mira059.front.sepia.ceph.com,172.21.6.132' (ECDSA) to the list of known hosts. ubuntu@ubuntu@mira059: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). pdsh@mira059: ubuntu@ubuntu@mira059: ssh exited with exit code 255 [ubuntu@mira059 cbt]$ echo $? 255
so, i think this is not a regression.
our change just renders the failure more visible.
also, i found
06:50:17 - DEBUG - cbt - CheckedPopen continue_if_error=True args=pdsh -f 1 -R ssh -w ubuntu@ubuntu@mira059.front.sepia.ceph.com swift -A http://172.21.6.132:80/auth/v1.0 -U cosbench:operator -K intel2012 list ======= OK ======== stdout stderr ubuntu@mira059: Warning: Permanently added 'mira059.front.sepia.ceph.com,172.21.6.132' (ECDSA) to the list of known hosts. ubuntu@mira059: bash: swift: command not found pdsh@mira059: ubuntu@mira059: ssh exited with exit code 127 06:50:18 - DEBUG - cbt - Pausing for 60s for idle monitoring. 06:50:18 - DEBUG - cbt - Nodes : ubuntu@mira059.front.sepia.ceph.com 06:50:18 - DEBUG - cbt - CheckedPopen continue_if_error=True args=pdsh -f 1 -R ssh -w ubuntu@mira059.front.sepia.ceph.com mkdir -p -m0755 -- /tmp/cbt/00000000/Cosbench/osd_ra-00004096/op_size-64KB/concurrent_procs-001/containers-00100/objects-00100/write/idle_monitoring/collectl 06:50:18 - DEBUG - cbt - CheckedPopen continue_if_error=True args=pdsh -f 1 -R ssh -w ubuntu@mira059.front.sepia.ceph.com collectl -s+mYZ -i 1:10 --rawdskfilt "+cciss/c\d+d\d+ |hd[ab] | sd[a-z]+ |dm-\d+ |xvd[a-z] |fio[a-z]+ | vd[a-z]+ |emcpower[a-z]+ |psv\d+ |nvme[0-9]n[0-9]+p[0-9]+ " -F0 -f /tmp/cbt/00000000/Cosbench/osd_ra-00004096/op_size-64KB/concurrent_procs-001/containers-00100/objects-00100/write/idle_monitoring/collectl 06:51:18 - DEBUG - cbt - Nodes : ubuntu@mira059.front.sepia.ceph.com 06:51:18 - DEBUG - cbt - CheckedPopen continue_if_error=True args=pdsh -f 1 -R ssh -w ubuntu@mira059.front.sepia.ceph.com killall -SIGINT -f collectl ======= OK ======== stdout stderr mira059: Warning: Permanently added 'mira059.front.sepia.ceph.com,172.21.6.132' (ECDSA) to the list of known hosts. mira059: Usage: killall [-Z CONTEXT] [-u USER] [ -eIgiqrvw ] [ -SIGNAL ] NAME... mira059: killall -l, --list mira059: killall -V, --version mira059: mira059: -e,--exact require exact match for very long names mira059: -I,--ignore-case case insensitive process name match mira059: -g,--process-group kill process group instead of process mira059: -y,--younger-than kill processes younger than TIME mira059: -o,--older-than kill processes older than TIME mira059: -i,--interactive ask for confirmation before killing mira059: -l,--list list all known signal names mira059: -q,--quiet don't print complaints mira059: -r,--regexp interpret NAME as an extended regular expression mira059: -s,--signal SIGNAL send this signal instead of SIGTERM mira059: -u,--user USER kill only process(es) running as USER mira059: -v,--verbose report if the signal was successfully sent mira059: -V,--version display version information mira059: -w,--wait wait for processes to die mira059: -Z,--context REGEXP kill only process(es) having context mira059: (must precede other arguments) mira059: pdsh@mira059: mira059: ssh exited with exit code 1
Updated by Neha Ojha over 4 years ago
The issue of cosbench not gathering test results is fixed by reverting https://github.com/ceph/cbt/pull/152.
Revert branch: https://github.com/neha-ojha/cbt/tree/wip-fix-cosbench-sync
Test runs: http://pulpito.ceph.com/nojha-2019-08-15_17:58:44-perf-basic-master-distro-basic-smithi/
Another test run http://pulpito.ceph.com/nojha-2019-08-15_19:12:22-rados:perf-master-distro-basic-smithi/ results in this failure: http://pulpito.ceph.com/nojha-2019-08-15_19:12:22-rados:perf-master-distro-basic-smithi/4217364/
Comparing logs
FAILED:
2019-08-15T19:39:17.187 INFO:teuthology.orchestra.run.smithi175.stderr:19:39:17 - DEBUG - cbt - get_fqdn_local()=smithi175.front.sepia.ceph.com 2019-08-15T19:39:17.187 INFO:teuthology.orchestra.run.smithi175.stderr:19:39:17 - DEBUG - cbt - CheckedPopen continue_if_error=True, shell=True args=radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift 2019-08-15T19:39:17.236 INFO:teuthology.orchestra.run.smithi175.stderr:19:39:17 - WARNING - cbt - radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift 2019-08-15T19:39:17.236 INFO:teuthology.orchestra.run.smithi175.stderr:19:39:17 - WARNING - cbt - error 242 seen, continuing anyway...
PASSED
2019-08-15T19:36:37.506 INFO:teuthology.orchestra.run.smithi142.stderr:19:36:37 - DEBUG - cbt - get_fqdn_local()=smithi142 2019-08-15T19:36:37.507 INFO:teuthology.orchestra.run.smithi142.stderr:19:36:37 - DEBUG - cbt - CheckedPopen continue_if_error=True, shell=False args=pdsh -f 1 -R ssh -w ubuntu@smithi142.front.sepia.ceph.com radosgw-admin key create --uid=cosbench --subuser=cosbench:operator --key-type=swift
This difference in args and the value of shell seems to be cause, which could happen if get_localnode() returns true for some reason
https://github.com/ceph/cbt/pull/189/files#diff-cf5a98d3b8ded947cd1a470ce7e3769eL100-L105
Updated by Kefu Chai over 4 years ago
- Status changed from New to Fix Under Review
- Assignee set to Kefu Chai
- Pull request ID set to 191
Updated by Neha Ojha almost 4 years ago
- Status changed from Fix Under Review to Resolved