Project

General

Profile

Bug #23184

rbd workunit return 0 response code for fail

Added by Valerii Shevhcenko about 1 year ago. Updated 12 months ago.

Status:
Can't reproduce
Priority:
Normal
Target version:
-
Start date:
02/28/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

Expected: rbd workunit test return non-zero response code for fail which breaks ci integration:

Actual: rbd workunit test return 0 response code for fail which breaks ci integration:

======================================================================
ERROR: test_rbd.TestImage.test_metadata
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/cephuser/cephtest/ceph/src/test/pybind/test_rbd.py", line 831, in test_metadata
    assert_raises(KeyError, self.image.metadata_get, "key1")
  File "/usr/lib64/python2.7/unittest/case.py", line 513, in assertRaises
    callableObj(*args, **kwargs)
  File "rbd.pyx", line 2687, in rbd.Image.metadata_get (/builddir/build/BUILD/ceph-12.2.1/build/src/pybind/rbd/pyrex/rbd.c:25130)
AttributeError: 'rbd.Image' object has no attribute 'key'

----------------------------------------------------------------------
Ran 101 tests in 659.568s

FAILED (SKIP=8, errors=1)

2018-02-19 14:45:43,783 - test_workunit - INFO - Workunit completed successfully
2018-02-19 14:45:43,783 - __main__ - INFO - Test <module 'test_workunit' from '/home/jenkins-build/workspace/ceph-ansible-sanity-3.x/tests/test_workunit.py'> passed

Affected:
we propagate test status by return code in our ci scripts :

ec = client.exit_status
if ec == 0:
log.info("Workunit completed successfully")
else:
log.info("Error during workunit")
return ec

Thus failures can be missed without manual log review

History

#1 Updated by Jason Dillaman about 1 year ago

  • Status changed from New to Need More Info

What workunit is this in reference to? The logs indicate it has something to do with ceph-ansible, so if that's the source of this workunit, this tracker ticket is probably best redirected to that project.

#2 Updated by Vasu Kulkarni about 1 year ago

Jason,

we are trying to run some of the workunits in CI with jenkins pipeline, the workunits dont return non zero for failed unit test, As shown in. that example: ERROR: test_rbd.TestImage.test_metadata , souldn't such faliure cause the librbd workunit to exit in non zero instead of zero?

#3 Updated by Vasu Kulkarni about 1 year ago

  • Status changed from Need More Info to New

#4 Updated by Jason Dillaman about 1 year ago

  • Status changed from New to Need More Info

The question is where did this CI test come from? It's not an RBD test. If it's part of ceph-ansible repo, this ticket should be re-assigned to the ceph-ansible project.

#5 Updated by Vasu Kulkarni about 1 year ago

Jason,

I think the original description is bit confusing, the CI test just invokes the librbd workunit after ceph-ansible sets up the cluster, what we are asking here is why the librbd workunit is not returning non-zero for the failed unit test, I guess this is how its written in c++ unit test, we are asking to return -1 or non zero exit status when the librbd test fail,

But as I was looking more into why it doesn't return non zero for asserts in unit tests, I found a bug in the workunit itself, regardless of asserts in librbd.py it ends up returning 0(provided link below),

https://github.com/ceph/ceph/blob/master/qa/workunits/rbd/test_librbd_python.sh#L12

I think there it should just return the exit status from test_rbd.py

#6 Updated by Vasu Kulkarni about 1 year ago

This is the assert that is not returning non zero in case of failure, the workunit is being run on existing cluster

test_rbd.TestClone.test_flatten_drops_cache ... ok
test_rbd.TestClone.test_flatten_errors ... ok
test_rbd.TestClone.test_flatten_larger_order ... ok
test_rbd.TestClone.test_flatten_multi_level ... ok
test_rbd.TestClone.test_flatten_smaller_order ... ok
test_rbd.TestClone.test_list_children ... ok
test_rbd.TestClone.test_read ... ok
test_rbd.TestClone.test_resize_flatten_multi_level ... ok
test_rbd.TestClone.test_resize_io ... ok
test_rbd.TestClone.test_resize_stat ... ok
test_rbd.TestClone.test_stat ... ok
test_rbd.TestClone.test_unprotect_with_children ... ok
test_rbd.TestClone.test_unprotected ... ok
test_rbd.TestClone.test_with_params ... ok
test_rbd.TestClone.test_with_params2 ... ok
test_rbd.TestClone.test_with_params3 ... SKIP
test_rbd.TestClone.test_write ... ok
test_rbd.TestExclusiveLock.test_acquire_release_lock ... ok
test_rbd.TestExclusiveLock.test_break_lock ... ok
test_rbd.TestExclusiveLock.test_follower_discard ... ok
test_rbd.TestExclusiveLock.test_follower_flatten ... ok
test_rbd.TestExclusiveLock.test_follower_resize ... ok
test_rbd.TestExclusiveLock.test_follower_snap_create ... ok
test_rbd.TestExclusiveLock.test_follower_snap_rollback ... ok
test_rbd.TestExclusiveLock.test_follower_write ... ok
test_rbd.TestExclusiveLock.test_ownership ... ok
test_rbd.TestExclusiveLock.test_read_only_leadership ... ok
test_rbd.TestExclusiveLock.test_snapshot_leadership ... ok
test_rbd.TestImage.test_aio_discard ... ok
test_rbd.TestImage.test_aio_flush ... ok
test_rbd.TestImage.test_aio_read ... ok
test_rbd.TestImage.test_aio_write ... ok
test_rbd.TestImage.test_block_name_prefix ... ok
test_rbd.TestImage.test_copy ... ok
test_rbd.TestImage.test_copy2 ... ok
test_rbd.TestImage.test_copy3 ... SKIP
test_rbd.TestImage.test_create_snap ... ok
test_rbd.TestImage.test_create_timestamp ... ok
test_rbd.TestImage.test_create_with_params ... SKIP
test_rbd.TestImage.test_diff_iterate ... ok
test_rbd.TestImage.test_flags ... ok
test_rbd.TestImage.test_id ... ok
test_rbd.TestImage.test_image_auto_close ... ok
test_rbd.TestImage.test_invalidate_cache ... ok
test_rbd.TestImage.test_large_read ... ok
test_rbd.TestImage.test_large_write ... ok
test_rbd.TestImage.test_limit_snaps ... ok
test_rbd.TestImage.test_list_lockers ... ok
test_rbd.TestImage.test_list_snaps ... ok
test_rbd.TestImage.test_list_snaps_iterator_auto_close ... ok
test_rbd.TestImage.test_lock_unlock ... ok
test_rbd.TestImage.test_many_snaps ... ok
test_rbd.TestImage.test_metadata ... ERROR
test_rbd.TestImage.test_protect_snap ... ok
test_rbd.TestImage.test_read ... ok
test_rbd.TestImage.test_read_bad_offset ... ok
test_rbd.TestImage.test_read_with_fadvise_flags ... ok
test_rbd.TestImage.test_remove_snap ... ok
test_rbd.TestImage.test_remove_with_exclusive_lock ... ok
test_rbd.TestImage.test_remove_with_snap ... SKIP
test_rbd.TestImage.test_remove_with_watcher ... SKIP
test_rbd.TestImage.test_rename_snap ... ok
test_rbd.TestImage.test_resize ... ok
test_rbd.TestImage.test_resize_bytes ... ok
test_rbd.TestImage.test_resize_down ... ok
test_rbd.TestImage.test_rollback_to_snap ... ok
test_rbd.TestImage.test_rollback_to_snap_sparse ... ok
test_rbd.TestImage.test_rollback_with_resize ... ok
test_rbd.TestImage.test_set_no_snap ... ok
test_rbd.TestImage.test_set_snap ... ok
test_rbd.TestImage.test_set_snap_deleted ... ok
test_rbd.TestImage.test_set_snap_recreated ... ok
test_rbd.TestImage.test_set_snap_sparse ... ok
test_rbd.TestImage.test_size ... ok
test_rbd.TestImage.test_snap_timestamp ... ok
test_rbd.TestImage.test_stat ... ok
test_rbd.TestImage.test_update_features ... SKIP
test_rbd.TestImage.test_write ... ok
test_rbd.TestImage.test_write_read ... ok
test_rbd.TestImage.test_write_with_fadvise_flags ... ok
test_rbd.TestMirroring.test_mirror_image ... SKIP
test_rbd.TestMirroring.test_mirror_image_status ... SKIP
test_rbd.TestMirroring.test_mirror_peer ... ok
test_rbd.TestTrash.test_get ... ok
test_rbd.TestTrash.test_list ... ok
test_rbd.TestTrash.test_move ... ok
test_rbd.TestTrash.test_remove ... ok
test_rbd.TestTrash.test_remove_denied ... ok
test_rbd.TestTrash.test_restore ... ok
test_rbd.test_version ... ok
test_rbd.test_create ... ok
test_rbd.test_create_defaults ... ok
test_rbd.test_context_manager ... ok
test_rbd.test_open_read_only ... ok
test_rbd.test_open_dne ... ok
test_rbd.test_open_readonly_dne ... ok
test_rbd.test_remove_dne ... ok
test_rbd.test_list_empty ... ok
test_rbd.test_list ... ok
test_rbd.test_rename ... ok

======================================================================
ERROR: test_rbd.TestImage.test_metadata
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/cephuser/cephtest/ceph/src/test/pybind/test_rbd.py", line 831, in test_metadata
    assert_raises(KeyError, self.image.metadata_get, "key1")
  File "/usr/lib64/python2.7/unittest/case.py", line 513, in assertRaises
    callableObj(*args, **kwargs)
  File "rbd.pyx", line 2687, in rbd.Image.metadata_get (/builddir/build/BUILD/ceph-12.2.1/build/src/pybind/rbd/pyrex/rbd.c:25130)
AttributeError: 'rbd.Image' object has no attribute 'key'

----------------------------------------------------------------------
Ran 101 tests in 705.580s

FAILED (SKIP=8, errors=1)

#7 Updated by Jason Dillaman about 1 year ago

... still don't get why this is an RBD issue. If you look here [1], you can see that the script should immediately exit with the appropriate failure code when it hits a failed test. It doesn't matter that that script has an "exit 0" at the end of the script since it won't run. In fact, our teuthology test cases that invoke the Python tests rely on that behavior and they appropriately fail [2].

[1] https://github.com/ceph/ceph/blob/master/qa/workunits/rbd/test_librbd_python.sh#L1
[2] http://pulpito.ceph.com/jdillaman-2018-02-26_12:04:27-rbd-wip-jd-testing-distro-basic-smithi/2229895/

#8 Updated by Vasu Kulkarni about 1 year ago

Going to try manually with nosetest command and check exit status($?) to see whats wrong, you are right the script would fail due to L1

#9 Updated by Jason Dillaman about 1 year ago

@Vasu: any update?

#10 Updated by Vasu Kulkarni about 1 year ago

I think the exit status 0 is coming from the c++ unit test itself based on manual testing

I used existing cluster and ran the workunit manually


$git clone -b luminous git://git.ceph.com/ceph.git

#run workunit: sh ceph/qa/workunits/rbd/test_librbd_python.sh

======================================================================
ERROR: test_rbd.TestImage.test_metadata
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/cephuser/test/cephtest/cephtest/ceph/src/test/pybind/test_rbd.py", line 831, in test_metadata
    assert_raises(KeyError, self.image.metadata_get, "key1")
  File "/usr/lib64/python2.7/unittest/case.py", line 513, in assertRaises
    callableObj(*args, **kwargs)
  File "rbd.pyx", line 2687, in rbd.Image.metadata_get (/builddir/build/BUILD/ceph-12.2.1/build/src/pybind/rbd/pyrex/rbd.c:25130)
AttributeError: 'rbd.Image' object has no attribute 'key'

----------------------------------------------------------------------
Ran 101 tests in 781.606s

FAILED (SKIP=8, errors=1)

bash-4.2$ echo $?
0

bash-4.2$ cat ceph/qa/workunits/rbd/test_librbd_python.sh
#!/bin/sh -ex

relpath=$(dirname $0)/../../../src/test/pybind

if [ -n "${VALGRIND}" ]; then
  valgrind ${VALGRIND} --suppressions=${TESTDIR}/valgrind.supp \
    --errors-for-leak-kinds=definite --error-exitcode=1 \
    nosetests -v $relpath/test_rbd.py
else
  nosetests -v $relpath/test_rbd.py
fi
exit 0

#12 Updated by Jason Dillaman about 1 year ago

Works for me (and teuthology):

# nosetests -v test_rbd:TestImage.test_metadata
test_rbd.TestImage.test_metadata ... ERROR

======================================================================
ERROR: test_rbd.TestImage.test_metadata
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/jdillaman/ceph_wip/src/test/pybind/test_rbd.py", line 909, in test_metadata
    assert_raises(KeyError, self.image.metadata_get, "key1")
  File "/usr/lib64/python2.7/unittest/case.py", line 511, in assertRaises
    callableObj(*args, **kwargs)
  File "rbd.pyx", line 3185, in rbd.Image.metadata_get
AttributeError: 'rbd.Image' object has no attribute 'key'

----------------------------------------------------------------------
Ran 1 test in 4.192s

FAILED (errors=1)
# echo $?
1

#13 Updated by Jason Dillaman about 1 year ago

@Vasu: what's the status here?

#14 Updated by Jason Dillaman 12 months ago

  • Status changed from Need More Info to Can't reproduce

Also available in: Atom PDF