Bug #11356
closedfailed LibCephFS.MulticlientSimple unit test
0%
Description
http://pulpito.ceph.com/teuthology-2015-04-07_23:04:03-fs-hammer-testing-basic-multi/840725/
2015-04-08T09:09:11.918 INFO:tasks.workunit.client.0.plana12.stdout:test/libcephfs/multiclient.cc:55: Failure 2015-04-08T09:09:11.918 INFO:tasks.workunit.client.0.plana12.stdout:Value of: memcmp(bufa, bufb, sizeof(bufa)) 2015-04-08T09:09:11.918 INFO:tasks.workunit.client.0.plana12.stdout: Actual: -98 2015-04-08T09:09:11.918 INFO:tasks.workunit.client.0.plana12.stdout:Expected: 0 2015-04-08T09:09:11.918 INFO:tasks.workunit.client.0.plana12.stdout:[ FAILED ] LibCephFS.MulticlientSimple (318 ms)
which means that bufa was incorrectly less than bufb, when they should have been equal.
In other words, something happened to let client A read stale data. :/
I haven't tried to figure out how that happened, or if it was something else — but by the memcmp man page that return code shouldn't be an error code or anything (98 is EADDRINUSE, if that matters?).
Updated by Zheng Yan about 9 years ago
client log shows client did send read/write request to OSD.2.
The strange thing in OSD log is that 'uninline' operations (client.2219.0:2, client.2222.0.3) from both clients were successful.
2015-04-08 09:09:11.814824 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] do_op osd_op(client.4222.0:3 1000000066f.00000000 [cmpxattr inline_version (8) op 3 mode 2,write 0~4 [1@-1],setxattr inline_version (8)] 2.3b11ab71 snapc 1=[] ondisk+write+known_if_redirected e6) v5 may_write may_read -> write-ordered flags ondisk+write+known_if_redirected 2015-04-08 09:09:11.814851 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] get_object_context: found obc in cache: 0x56a2940 2015-04-08 09:09:11.814862 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] get_object_context: 0x56a2940 3b11ab71/1000000066f.00000000/head//2 rwstate(none n=0 w=0) oi: 3b11ab71/1000000066f.00000000/head//2(6'1 client.4222.0:2 wrlock_by=unknown.0.0:0 dirty|data_digest|omap_digest s 0 uv 1 dd ffffffff od ffffffff) ssc: 0x55f9770 snapset: 1=[]:[]+head 2015-04-08 09:09:11.814879 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] find_object_context 3b11ab71/1000000066f.00000000/head//2 @head oi=3b11ab71/1000000066f.00000000/head//2(6'1 client.4222.0:2 wrlock_by=unknown.0.0:0 dirty|data_digest|omap_digest s 0 uv 1 dd ffffffff od ffffffff) 2015-04-08 09:09:11.814910 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] execute_ctx 0x5b24800 2015-04-08 09:09:11.814922 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] do_op 3b11ab71/1000000066f.00000000/head//2 [cmpxattr inline_version (8) op 3 mode 2,write 0~4 [1@-1],setxattr inline_version (8)] ov 6'1 av 6'2 snapc 1=[] snapset 1=[]:[]+head 2015-04-08 09:09:11.814938 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] taking ondisk_read_lock 2015-04-08 09:09:11.814948 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] do_osd_op 3b11ab71/1000000066f.00000000/head//2 [cmpxattr inline_version (8) op 3 mode 2,write 0~4 [1@-1],setxattr inline_version (8)] 2015-04-08 09:09:11.814961 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] do_osd_op cmpxattr inline_version (8) op 3 mode 2 2015-04-08 09:09:11.814974 7f24c50d8700 15 filestore(/var/lib/ceph/osd/ceph-2) getattr 2.1_head/3b11ab71/1000000066f.00000000/head//2 '_inline_version' 2015-04-08 09:09:11.815028 7f24c50d8700 10 filestore(/var/lib/ceph/osd/ceph-2) getattr got.size() is 0 2015-04-08 09:09:11.815032 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] CEPH_OSD_OP_CMPXATTR name=_inline_version val=2 op=3 mode=2 2015-04-08 09:09:11.815045 7f24c50d8700 20 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] do_xattr_cmp_u64 '2' vs '0' op 3 2015-04-08 09:09:11.815055 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] comparison returned true
2015-04-08 09:09:11.886483 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] do_op osd_op(client.4219.0:2 1000000066f.00000000 [cmpxattr inline_version (8) op 3 mode 2,write 0~4 [1@-1],setxattr inline_version (8)] 2.3b11ab71 snapc 1=[] ondisk+write+known_if_redirected e6) v5 may_write may_read -> write-ordered flags ondisk+write+known_if_redirected 2015-04-08 09:09:11.886508 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] get_object_context: found obc in cache: 0x56a2940 2015-04-08 09:09:11.886519 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] get_object_context: 0x56a2940 3b11ab71/1000000066f.00000000/head//2 rwstate(none n=0 w=0) oi: 3b11ab71/1000000066f.00000000/head//2(6'4 client.4219.0:1 wrlock_by=unknown.0.0:0 dirty|omap_digest s 7 uv 4 od ffffffff) ssc: 0x55f9770 snapset: 1=[]:[]+head 2015-04-08 09:09:11.886535 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] find_object_context 3b11ab71/1000000066f.00000000/head//2 @head oi=3b11ab71/1000000066f.00000000/head//2(6'4 client.4219.0:1 wrlock_by=unknown.0.0:0 dirty|omap_digest s 7 uv 4 od ffffffff) 2015-04-08 09:09:11.886564 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] execute_ctx 0x5b24200 2015-04-08 09:09:11.886577 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] do_op 3b11ab71/1000000066f.00000000/head//2 [cmpxattr inline_version (8) op 3 mode 2,write 0~4 [1@-1],setxattr inline_version (8)] ov 6'4 av 6'5 snapc 1=[] snapset 1=[]:[]+head 2015-04-08 09:09:11.886592 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] taking ondisk_read_lock 2015-04-08 09:09:11.886603 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] do_osd_op 3b11ab71/1000000066f.00000000/head//2 [cmpxattr inline_version (8) op 3 mode 2,write 0~4 [1@-1],setxattr inline_version (8)] 2015-04-08 09:09:11.886616 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] do_osd_op cmpxattr inline_version (8) op 3 mode 2 2015-04-08 09:09:11.886627 7f24c28d3700 15 filestore(/var/lib/ceph/osd/ceph-2) getattr 2.1_head/3b11ab71/1000000066f.00000000/head//2 '_inline_version' 2015-04-08 09:09:11.886656 7f24c28d3700 10 filestore(/var/lib/ceph/osd/ceph-2) getattr 2.1_head/3b11ab71/1000000066f.00000000/head//2 '_inline_version' = 8 2015-04-08 09:09:11.886662 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] CEPH_OSD_OP_CMPXATTR name=_inline_version val=2 op=3 mode=2 2015-04-08 09:09:11.886683 7f24c28d3700 20 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] do_xattr_cmp_u64 '2' vs '0' op 3 2015-04-08 09:09:11.886694 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] comparison returned true
Updated by Zheng Yan about 9 years ago
The issue is in ReplicatedPG::do_xattr_cmp_u64(). For CEPH_OSD_CMPXATTR_MODE_U64, OSD expects the u64 in osd_op be encoded in raw format, but expects the u64 in object's xattr be encoded as string. This is really weird. should we fix the issue on client side or osd side?
Updated by Zheng Yan about 9 years ago
- Status changed from New to Fix Under Review
Updated by Greg Farnum about 9 years ago
- Status changed from Fix Under Review to Resolved
Merged to master in commit:a84ae7b14e1aaa1c6f65d09a78230b54d0c25110