Bug #18139
closedLibRadosList.EnumerateObjects failure
0%
Description
2016-12-03T20:44:59.112 INFO:tasks.workunit.client.0.smithi074.stdout: api_list: [ RUN ] LibRadosList.EnumerateObjects 2016-12-03T20:44:59.112 INFO:tasks.workunit.client.0.smithi074.stdout: api_list: /srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.2-2202-g0ba4bc1/src/test/librados/list.cc:718: Failure 2016-12-03T20:44:59.113 INFO:tasks.workunit.client.0.smithi074.stdout: api_list: Expected: n_objects 2016-12-03T20:44:59.113 INFO:tasks.workunit.client.0.smithi074.stdout: api_list: Which is: 16 2016-12-03T20:44:59.113 INFO:tasks.workunit.client.0.smithi074.stdout: api_list: To be equal to: saw_obj.size() 2016-12-03T20:44:59.113 INFO:tasks.workunit.client.0.smithi074.stdout: api_list: Which is: 17 2016-12-03T20:44:59.113 INFO:tasks.workunit.client.0.smithi074.stdout: api_list: [ FAILED ] LibRadosList.EnumerateObjects (20621 ms)/a/sage-2016-12-03_19:34:03-rados-wip-sage-testing---basic-smithi/599201 description: rados/thrash/{0-size-min-size-overrides/3-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml
clusters/{fixed-2.yaml openstack.yaml} fs/btrfs.yaml hobj-sort.yaml msgr-failures/few.yaml
msgr/async.yaml objectstore/filestore.yaml rados.yaml rocksdb.yaml thrashers/mapgap.yaml
workloads/rados_api_tests.yaml}
Updated by David Zafman over 7 years ago
This test verifies 3 things. First, as objects are returned it makes sure there are no duplicates AND inserts the name into an std::set. Second, after processing the elements it makes sure that nothing is missing. Finally, it verifies that there aren't any extra elements. This last check is what failed. Given that the saw_obj is a std::set with 17 elements and the test only created 16 objects, there is 1 extra unique string. We know that all 16 expected elements are present based on the first 2 checks.
Need to examine further how this could happen. The obvious pg split seems like it could get the same element more that one once, but that didn't happen. Where could an extraneous object name come from?
Updated by David Zafman over 7 years ago
Still need to investigate rollforward which I think caused an object "foo" to be created then deleted, but not before it was found by the filestore servicing the pgnls from the test. 4 other curiosities:
The writes to "foo" are 4 bytes (current test writes 128 bytes). Where did that come from?
The snapset is changed, so was this a clone?
The test that is running in a recent sub-test wrote an 18'1 version of "foo" just before this, but this is 18'9.
Why a 19 seconds delay between the 17 writes and the pgnls?
TEST WROTE 16 objects just before this point 2016-12-03 20:44:19.895607 7faf70127700 10 osd.5 pg_epoch: 18 pg[14.6( v 18'8 (0'0,18'8] local-les=18 n=2 ec=17 les/c/f 18/18/0 17/17/17) [5,2,1] r=0 lpr=17 crt=18'8 lcod 18'7 mlcod 18'7 active+clean] do_op 14:602f83fe:::foo:head [write 0~4] ov 0'0 av 18'9 snapc 0=[] snapset 0=[]:[] 2016-12-03 20:44:20.895064 7faf70127700 10 osd.5 pg_epoch: 19 pg[14.6( v 18'9 (0'0,18'9] local-les=18 n=3 ec=17 les/c/f 18/18/0 17/17/17) [5,2,1] r=0 lpr=17 crt=18'9 lcod 18'8 mlcod 18'8 active+clean] do_op 14:602f83fe:::foo:head [write 0~4] ov 18'9 av 19'10 snapc 0=[] snapset 0=[]:[]+head 2016-12-03 20:44:39.685193 7faf6d922700 20 filestore(/var/lib/ceph/osd/ceph-5) objects: [#14:60000000::::head#,#14:602f83fe:::foo:head#,#14:62a1935d:::14:head#,#14:6cac518f:::0:head#] 2016-12-03 20:44:39.707414 7faf6d922700 10 osd.5 pg_epoch: 37 pg[14.6( v 19'10 (0'0,19'10] local-les=18 n=3 ec=17 les/c/f 18/18/0 17/17/17) [5,2,1] r=0 lpr=17 crt=19'10 lcod 18'9 mlcod 18'9 active+clean] do_op 14:602f83fe:::foo:head [delete] ov 19'10 av 37'11 snapc 0=[] snapset 0=[]:[]+head
Updated by David Zafman over 7 years ago
ceph_test_rados_api_misc with pid 15487 wrote 4 bytes to "foo" to pool 14 which belongs to the other test!
Updated by David Zafman over 7 years ago
Caused by new PoolCreationRace test in ceph_test_rados_api_misc added by c0fb48331213e6934d19b7dbeeb83afeed5d7f7a
It creates a pool then uses that pool id + 1 to create an ioctx. If another concurrently running test has added a pool it could have this pool id.
We could either disable this test or find another way to do it.
Updated by David Zafman over 7 years ago
- Status changed from 12 to 17