Project

General

Profile

Actions

Bug #11356

closed

failed LibCephFS.MulticlientSimple unit test

Added by Greg Farnum about 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/teuthology-2015-04-07_23:04:03-fs-hammer-testing-basic-multi/840725/

2015-04-08T09:09:11.918 INFO:tasks.workunit.client.0.plana12.stdout:test/libcephfs/multiclient.cc:55: Failure
2015-04-08T09:09:11.918 INFO:tasks.workunit.client.0.plana12.stdout:Value of: memcmp(bufa, bufb, sizeof(bufa))
2015-04-08T09:09:11.918 INFO:tasks.workunit.client.0.plana12.stdout:  Actual: -98
2015-04-08T09:09:11.918 INFO:tasks.workunit.client.0.plana12.stdout:Expected: 0
2015-04-08T09:09:11.918 INFO:tasks.workunit.client.0.plana12.stdout:[  FAILED  ] LibCephFS.MulticlientSimple (318 ms)

which means that bufa was incorrectly less than bufb, when they should have been equal.

In other words, something happened to let client A read stale data. :/

I haven't tried to figure out how that happened, or if it was something else — but by the memcmp man page that return code shouldn't be an error code or anything (98 is EADDRINUSE, if that matters?).

Actions #1

Updated by Zheng Yan about 9 years ago

client log shows client did send read/write request to OSD.2.

The strange thing in OSD log is that 'uninline' operations (client.2219.0:2, client.2222.0.3) from both clients were successful.

2015-04-08 09:09:11.814824 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] do_op osd_op(client.4222.0:3 1000000066f.00000000 [cmpxattr inline_version (8) op 3 mode 2,write 0~4 [1@-1],setxattr inline_version (8)] 2.3b11ab71 snapc 1=[] ondisk+write+known_if_redirected e6) v5 may_write may_read -> write-ordered flags ondisk+write+known_if_redirected
2015-04-08 09:09:11.814851 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] get_object_context: found obc in cache: 0x56a2940
2015-04-08 09:09:11.814862 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] get_object_context: 0x56a2940 3b11ab71/1000000066f.00000000/head//2 rwstate(none n=0 w=0) oi: 3b11ab71/1000000066f.00000000/head//2(6'1 client.4222.0:2 wrlock_by=unknown.0.0:0 dirty|data_digest|omap_digest s 0 uv 1 dd ffffffff od ffffffff) ssc: 0x55f9770 snapset: 1=[]:[]+head
2015-04-08 09:09:11.814879 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] find_object_context 3b11ab71/1000000066f.00000000/head//2 @head oi=3b11ab71/1000000066f.00000000/head//2(6'1 client.4222.0:2 wrlock_by=unknown.0.0:0 dirty|data_digest|omap_digest s 0 uv 1 dd ffffffff od ffffffff)
2015-04-08 09:09:11.814910 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] execute_ctx 0x5b24800
2015-04-08 09:09:11.814922 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] do_op 3b11ab71/1000000066f.00000000/head//2 [cmpxattr inline_version (8) op 3 mode 2,write 0~4 [1@-1],setxattr inline_version (8)] ov 6'1 av 6'2 snapc 1=[] snapset 1=[]:[]+head
2015-04-08 09:09:11.814938 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean]  taking ondisk_read_lock
2015-04-08 09:09:11.814948 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] do_osd_op 3b11ab71/1000000066f.00000000/head//2 [cmpxattr inline_version (8) op 3 mode 2,write 0~4 [1@-1],setxattr inline_version (8)]
2015-04-08 09:09:11.814961 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] do_osd_op  cmpxattr inline_version (8) op 3 mode 2
2015-04-08 09:09:11.814974 7f24c50d8700 15 filestore(/var/lib/ceph/osd/ceph-2) getattr 2.1_head/3b11ab71/1000000066f.00000000/head//2 '_inline_version'
2015-04-08 09:09:11.815028 7f24c50d8700 10 filestore(/var/lib/ceph/osd/ceph-2) getattr got.size() is 0
2015-04-08 09:09:11.815032 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] CEPH_OSD_OP_CMPXATTR name=_inline_version val=2 op=3 mode=2
2015-04-08 09:09:11.815045 7f24c50d8700 20 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] do_xattr_cmp_u64 '2' vs '0' op 3
2015-04-08 09:09:11.815055 7f24c50d8700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'1 (0'0,6'1] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=0'0 lcod 0'0 mlcod 0'0 active+clean] comparison returned true
2015-04-08 09:09:11.886483 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] do_op osd_op(client.4219.0:2 1000000066f.00000000 [cmpxattr inline_version (8) op 3 mode 2,write 0~4 [1@-1],setxattr inline_version (8)] 2.3b11ab71 snapc 1=[] ondisk+write+known_if_redirected e6) v5 may_write may_read -> write-ordered flags ondisk+write+known_if_redirected
2015-04-08 09:09:11.886508 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] get_object_context: found obc in cache: 0x56a2940
2015-04-08 09:09:11.886519 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] get_object_context: 0x56a2940 3b11ab71/1000000066f.00000000/head//2 rwstate(none n=0 w=0) oi: 3b11ab71/1000000066f.00000000/head//2(6'4 client.4219.0:1 wrlock_by=unknown.0.0:0 dirty|omap_digest s 7 uv 4 od ffffffff) ssc: 0x55f9770 snapset: 1=[]:[]+head
2015-04-08 09:09:11.886535 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] find_object_context 3b11ab71/1000000066f.00000000/head//2 @head oi=3b11ab71/1000000066f.00000000/head//2(6'4 client.4219.0:1 wrlock_by=unknown.0.0:0 dirty|omap_digest s 7 uv 4 od ffffffff)
2015-04-08 09:09:11.886564 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] execute_ctx 0x5b24200
2015-04-08 09:09:11.886577 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] do_op 3b11ab71/1000000066f.00000000/head//2 [cmpxattr inline_version (8) op 3 mode 2,write 0~4 [1@-1],setxattr inline_version (8)] ov 6'4 av 6'5 snapc 1=[] snapset 1=[]:[]+head
2015-04-08 09:09:11.886592 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean]  taking ondisk_read_lock
2015-04-08 09:09:11.886603 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] do_osd_op 3b11ab71/1000000066f.00000000/head//2 [cmpxattr inline_version (8) op 3 mode 2,write 0~4 [1@-1],setxattr inline_version (8)]
2015-04-08 09:09:11.886616 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] do_osd_op  cmpxattr inline_version (8) op 3 mode 2
2015-04-08 09:09:11.886627 7f24c28d3700 15 filestore(/var/lib/ceph/osd/ceph-2) getattr 2.1_head/3b11ab71/1000000066f.00000000/head//2 '_inline_version'
2015-04-08 09:09:11.886656 7f24c28d3700 10 filestore(/var/lib/ceph/osd/ceph-2) getattr 2.1_head/3b11ab71/1000000066f.00000000/head//2 '_inline_version' = 8
2015-04-08 09:09:11.886662 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] CEPH_OSD_OP_CMPXATTR name=_inline_version val=2 op=3 mode=2
2015-04-08 09:09:11.886683 7f24c28d3700 20 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] do_xattr_cmp_u64 '2' vs '0' op 3
2015-04-08 09:09:11.886694 7f24c28d3700 10 osd.2 pg_epoch: 6 pg[2.1( v 6'4 (0'0,6'4] local-les=6 n=1 ec=5 les/c 6/6 5/5/5) [2,3] r=0 lpr=5 crt=6'1 lcod 6'3 mlcod 6'3 active+clean] comparison returned true
Actions #2

Updated by Zheng Yan about 9 years ago

The issue is in ReplicatedPG::do_xattr_cmp_u64(). For CEPH_OSD_CMPXATTR_MODE_U64, OSD expects the u64 in osd_op be encoded in raw format, but expects the u64 in object's xattr be encoded as string. This is really weird. should we fix the issue on client side or osd side?

Actions #3

Updated by Zheng Yan about 9 years ago

  • Status changed from New to Fix Under Review
Actions #4

Updated by Greg Farnum about 9 years ago

  • Status changed from Fix Under Review to Resolved

Merged to master in commit:a84ae7b14e1aaa1c6f65d09a78230b54d0c25110

Actions

Also available in: Atom PDF