Bug #23184
closedrbd workunit return 0 response code for fail
0%
Description
Expected: rbd workunit test return non-zero response code for fail which breaks ci integration:
Actual: rbd workunit test return 0 response code for fail which breaks ci integration:
====================================================================== ERROR: test_rbd.TestImage.test_metadata ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/home/cephuser/cephtest/ceph/src/test/pybind/test_rbd.py", line 831, in test_metadata assert_raises(KeyError, self.image.metadata_get, "key1") File "/usr/lib64/python2.7/unittest/case.py", line 513, in assertRaises callableObj(*args, **kwargs) File "rbd.pyx", line 2687, in rbd.Image.metadata_get (/builddir/build/BUILD/ceph-12.2.1/build/src/pybind/rbd/pyrex/rbd.c:25130) AttributeError: 'rbd.Image' object has no attribute 'key' ---------------------------------------------------------------------- Ran 101 tests in 659.568s FAILED (SKIP=8, errors=1) 2018-02-19 14:45:43,783 - test_workunit - INFO - Workunit completed successfully 2018-02-19 14:45:43,783 - __main__ - INFO - Test <module 'test_workunit' from '/home/jenkins-build/workspace/ceph-ansible-sanity-3.x/tests/test_workunit.py'> passed
Affected:
we propagate test status by return code in our ci scripts :
ec = client.exit_status
if ec == 0:
log.info("Workunit completed successfully")
else:
log.info("Error during workunit")
return ec
Thus failures can be missed without manual log review
Updated by Jason Dillaman about 6 years ago
- Status changed from New to Need More Info
What workunit is this in reference to? The logs indicate it has something to do with ceph-ansible, so if that's the source of this workunit, this tracker ticket is probably best redirected to that project.
Updated by Vasu Kulkarni about 6 years ago
Jason,
we are trying to run some of the workunits in CI with jenkins pipeline, the workunits dont return non zero for failed unit test, As shown in. that example: ERROR: test_rbd.TestImage.test_metadata , souldn't such faliure cause the librbd workunit to exit in non zero instead of zero?
Updated by Vasu Kulkarni about 6 years ago
- Status changed from Need More Info to New
Updated by Jason Dillaman about 6 years ago
- Status changed from New to Need More Info
The question is where did this CI test come from? It's not an RBD test. If it's part of ceph-ansible repo, this ticket should be re-assigned to the ceph-ansible project.
Updated by Vasu Kulkarni about 6 years ago
Jason,
I think the original description is bit confusing, the CI test just invokes the librbd workunit after ceph-ansible sets up the cluster, what we are asking here is why the librbd workunit is not returning non-zero for the failed unit test, I guess this is how its written in c++ unit test, we are asking to return -1 or non zero exit status when the librbd test fail,
But as I was looking more into why it doesn't return non zero for asserts in unit tests, I found a bug in the workunit itself, regardless of asserts in librbd.py it ends up returning 0(provided link below),
https://github.com/ceph/ceph/blob/master/qa/workunits/rbd/test_librbd_python.sh#L12
I think there it should just return the exit status from test_rbd.py
Updated by Vasu Kulkarni about 6 years ago
This is the assert that is not returning non zero in case of failure, the workunit is being run on existing cluster
test_rbd.TestClone.test_flatten_drops_cache ... ok test_rbd.TestClone.test_flatten_errors ... ok test_rbd.TestClone.test_flatten_larger_order ... ok test_rbd.TestClone.test_flatten_multi_level ... ok test_rbd.TestClone.test_flatten_smaller_order ... ok test_rbd.TestClone.test_list_children ... ok test_rbd.TestClone.test_read ... ok test_rbd.TestClone.test_resize_flatten_multi_level ... ok test_rbd.TestClone.test_resize_io ... ok test_rbd.TestClone.test_resize_stat ... ok test_rbd.TestClone.test_stat ... ok test_rbd.TestClone.test_unprotect_with_children ... ok test_rbd.TestClone.test_unprotected ... ok test_rbd.TestClone.test_with_params ... ok test_rbd.TestClone.test_with_params2 ... ok test_rbd.TestClone.test_with_params3 ... SKIP test_rbd.TestClone.test_write ... ok test_rbd.TestExclusiveLock.test_acquire_release_lock ... ok test_rbd.TestExclusiveLock.test_break_lock ... ok test_rbd.TestExclusiveLock.test_follower_discard ... ok test_rbd.TestExclusiveLock.test_follower_flatten ... ok test_rbd.TestExclusiveLock.test_follower_resize ... ok test_rbd.TestExclusiveLock.test_follower_snap_create ... ok test_rbd.TestExclusiveLock.test_follower_snap_rollback ... ok test_rbd.TestExclusiveLock.test_follower_write ... ok test_rbd.TestExclusiveLock.test_ownership ... ok test_rbd.TestExclusiveLock.test_read_only_leadership ... ok test_rbd.TestExclusiveLock.test_snapshot_leadership ... ok test_rbd.TestImage.test_aio_discard ... ok test_rbd.TestImage.test_aio_flush ... ok test_rbd.TestImage.test_aio_read ... ok test_rbd.TestImage.test_aio_write ... ok test_rbd.TestImage.test_block_name_prefix ... ok test_rbd.TestImage.test_copy ... ok test_rbd.TestImage.test_copy2 ... ok test_rbd.TestImage.test_copy3 ... SKIP test_rbd.TestImage.test_create_snap ... ok test_rbd.TestImage.test_create_timestamp ... ok test_rbd.TestImage.test_create_with_params ... SKIP test_rbd.TestImage.test_diff_iterate ... ok test_rbd.TestImage.test_flags ... ok test_rbd.TestImage.test_id ... ok test_rbd.TestImage.test_image_auto_close ... ok test_rbd.TestImage.test_invalidate_cache ... ok test_rbd.TestImage.test_large_read ... ok test_rbd.TestImage.test_large_write ... ok test_rbd.TestImage.test_limit_snaps ... ok test_rbd.TestImage.test_list_lockers ... ok test_rbd.TestImage.test_list_snaps ... ok test_rbd.TestImage.test_list_snaps_iterator_auto_close ... ok test_rbd.TestImage.test_lock_unlock ... ok test_rbd.TestImage.test_many_snaps ... ok test_rbd.TestImage.test_metadata ... ERROR test_rbd.TestImage.test_protect_snap ... ok test_rbd.TestImage.test_read ... ok test_rbd.TestImage.test_read_bad_offset ... ok test_rbd.TestImage.test_read_with_fadvise_flags ... ok test_rbd.TestImage.test_remove_snap ... ok test_rbd.TestImage.test_remove_with_exclusive_lock ... ok test_rbd.TestImage.test_remove_with_snap ... SKIP test_rbd.TestImage.test_remove_with_watcher ... SKIP test_rbd.TestImage.test_rename_snap ... ok test_rbd.TestImage.test_resize ... ok test_rbd.TestImage.test_resize_bytes ... ok test_rbd.TestImage.test_resize_down ... ok test_rbd.TestImage.test_rollback_to_snap ... ok test_rbd.TestImage.test_rollback_to_snap_sparse ... ok test_rbd.TestImage.test_rollback_with_resize ... ok test_rbd.TestImage.test_set_no_snap ... ok test_rbd.TestImage.test_set_snap ... ok test_rbd.TestImage.test_set_snap_deleted ... ok test_rbd.TestImage.test_set_snap_recreated ... ok test_rbd.TestImage.test_set_snap_sparse ... ok test_rbd.TestImage.test_size ... ok test_rbd.TestImage.test_snap_timestamp ... ok test_rbd.TestImage.test_stat ... ok test_rbd.TestImage.test_update_features ... SKIP test_rbd.TestImage.test_write ... ok test_rbd.TestImage.test_write_read ... ok test_rbd.TestImage.test_write_with_fadvise_flags ... ok test_rbd.TestMirroring.test_mirror_image ... SKIP test_rbd.TestMirroring.test_mirror_image_status ... SKIP test_rbd.TestMirroring.test_mirror_peer ... ok test_rbd.TestTrash.test_get ... ok test_rbd.TestTrash.test_list ... ok test_rbd.TestTrash.test_move ... ok test_rbd.TestTrash.test_remove ... ok test_rbd.TestTrash.test_remove_denied ... ok test_rbd.TestTrash.test_restore ... ok test_rbd.test_version ... ok test_rbd.test_create ... ok test_rbd.test_create_defaults ... ok test_rbd.test_context_manager ... ok test_rbd.test_open_read_only ... ok test_rbd.test_open_dne ... ok test_rbd.test_open_readonly_dne ... ok test_rbd.test_remove_dne ... ok test_rbd.test_list_empty ... ok test_rbd.test_list ... ok test_rbd.test_rename ... ok ====================================================================== ERROR: test_rbd.TestImage.test_metadata ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/home/cephuser/cephtest/ceph/src/test/pybind/test_rbd.py", line 831, in test_metadata assert_raises(KeyError, self.image.metadata_get, "key1") File "/usr/lib64/python2.7/unittest/case.py", line 513, in assertRaises callableObj(*args, **kwargs) File "rbd.pyx", line 2687, in rbd.Image.metadata_get (/builddir/build/BUILD/ceph-12.2.1/build/src/pybind/rbd/pyrex/rbd.c:25130) AttributeError: 'rbd.Image' object has no attribute 'key' ---------------------------------------------------------------------- Ran 101 tests in 705.580s FAILED (SKIP=8, errors=1)
Updated by Jason Dillaman about 6 years ago
... still don't get why this is an RBD issue. If you look here [1], you can see that the script should immediately exit with the appropriate failure code when it hits a failed test. It doesn't matter that that script has an "exit 0" at the end of the script since it won't run. In fact, our teuthology test cases that invoke the Python tests rely on that behavior and they appropriately fail [2].
[1] https://github.com/ceph/ceph/blob/master/qa/workunits/rbd/test_librbd_python.sh#L1
[2] http://pulpito.ceph.com/jdillaman-2018-02-26_12:04:27-rbd-wip-jd-testing-distro-basic-smithi/2229895/
Updated by Vasu Kulkarni about 6 years ago
Going to try manually with nosetest command and check exit status($?) to see whats wrong, you are right the script would fail due to L1
Updated by Vasu Kulkarni about 6 years ago
I think the exit status 0 is coming from the c++ unit test itself based on manual testing
I used existing cluster and ran the workunit manually
$git clone -b luminous git://git.ceph.com/ceph.git #run workunit: sh ceph/qa/workunits/rbd/test_librbd_python.sh ====================================================================== ERROR: test_rbd.TestImage.test_metadata ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/home/cephuser/test/cephtest/cephtest/ceph/src/test/pybind/test_rbd.py", line 831, in test_metadata assert_raises(KeyError, self.image.metadata_get, "key1") File "/usr/lib64/python2.7/unittest/case.py", line 513, in assertRaises callableObj(*args, **kwargs) File "rbd.pyx", line 2687, in rbd.Image.metadata_get (/builddir/build/BUILD/ceph-12.2.1/build/src/pybind/rbd/pyrex/rbd.c:25130) AttributeError: 'rbd.Image' object has no attribute 'key' ---------------------------------------------------------------------- Ran 101 tests in 781.606s FAILED (SKIP=8, errors=1) bash-4.2$ echo $? 0 bash-4.2$ cat ceph/qa/workunits/rbd/test_librbd_python.sh #!/bin/sh -ex relpath=$(dirname $0)/../../../src/test/pybind if [ -n "${VALGRIND}" ]; then valgrind ${VALGRIND} --suppressions=${TESTDIR}/valgrind.supp \ --errors-for-leak-kinds=definite --error-exitcode=1 \ nosetests -v $relpath/test_rbd.py else nosetests -v $relpath/test_rbd.py fi exit 0
Updated by Vasu Kulkarni about 6 years ago
Not sure why nose return 0 for assert_raises: https://github.com/ceph/ceph/blob/luminous/src/test/pybind/test_rbd.py#L831
Updated by Jason Dillaman about 6 years ago
Works for me (and teuthology):
# nosetests -v test_rbd:TestImage.test_metadata test_rbd.TestImage.test_metadata ... ERROR ====================================================================== ERROR: test_rbd.TestImage.test_metadata ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/home/jdillaman/ceph_wip/src/test/pybind/test_rbd.py", line 909, in test_metadata assert_raises(KeyError, self.image.metadata_get, "key1") File "/usr/lib64/python2.7/unittest/case.py", line 511, in assertRaises callableObj(*args, **kwargs) File "rbd.pyx", line 3185, in rbd.Image.metadata_get AttributeError: 'rbd.Image' object has no attribute 'key' ---------------------------------------------------------------------- Ran 1 test in 4.192s FAILED (errors=1) # echo $? 1
Updated by Jason Dillaman about 6 years ago
@Vasu Kulkarni: what's the status here?
Updated by Jason Dillaman almost 6 years ago
- Status changed from Need More Info to Can't reproduce