https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2015-02-02T13:44:29Z
Ceph
CephFS - Bug #10703: failures in libcephfs-java tests
https://tracker.ceph.com/issues/10703?journal_id=47301
2015-02-02T13:44:29Z
Zheng Yan
ukernel@gmail.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>12</i></li></ul><p>looks like the ftruncate failure is caused by out-of-order cap message (mds processes setattr request and cap message in wrong order). but For some unknown reason, the corresponding MDS log is missing (the first request in MDS log is 4139:23)</p>
CephFS - Bug #10703: failures in libcephfs-java tests
https://tracker.ceph.com/issues/10703?journal_id=47351
2015-02-02T22:37:46Z
Greg Farnum
gfarnum@redhat.com
<ul></ul><p>Are we supposed to provide any sort of ordering on MClientCap versus MClientRequest messages? Truncate and length issues should be covered with the truncate_seq etc values.</p>
<p>And what exactly did you see that's causing the break?</p>
CephFS - Bug #10703: failures in libcephfs-java tests
https://tracker.ceph.com/issues/10703?journal_id=47361
2015-02-03T02:44:13Z
Zheng Yan
ukernel@gmail.com
<ul></ul><pre>
2015-01-29 12:25:19.350635 7fb060fc1700 10 client.4139 _setattr mask 32 issued pAsxLsXsxFsxcrwb
2015-01-29 12:25:19.350647 7fb060fc1700 10 client.4139 choose_target_mds from caps on inode 100000003eb.head(ref=4 ll_ref=0 cap_refs={4096=0,8192=0} open={3=1} mode=100600 size=1398331/4194304 mtime=2015-01-29 12:25:19.342951 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[100000003eb ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fb0582136b0 0x7fafd8013ed0)
2015-01-29 12:25:19.350658 7fb060fc1700 10 client.4139 send_request rebuilding request 5 for mds.0
2015-01-29 12:25:19.350666 7fb060fc1700 10 client.4139 send_request client_request(unknown.0:5 setattr size=333333 #100000003eb 2015-01-29 12:25:19.350645) v2 to mds.0
2015-01-29 12:25:19.350671 7fb060fc1700 1 -- 10.214.137.128:0/1758282524 --> 10.214.131.14:6808/31633 -- client_request(client.4139:5 setattr size=333333 #100000003eb 2015-01-29 12:25:19.350645) v2 -- ?+0 0x7fb0587741e0 con 0x7fb05814a860
2015-01-29 12:25:19.352365 7fafdeffd700 1 -- 10.214.137.128:0/1758282524 <== mds.0 10.214.131.14:6808/31633 9 ==== client_caps(revoke ino 100000003eb 14 seq 3 caps=pLsXsFsxcwb dirty=- wanted=- follows 0 size 0/4194304 ts 1 mtime 2015-01-29 12:25:17.989149) v5 ==== 196+0+0 (2924960165 0 0) 0x7fafcc0025a0 con 0x7fb05814a860
2015-01-29 12:25:19.352421 7fafdeffd700 10 client.4139 mds.0 seq now 2
2015-01-29 12:25:19.352432 7fafdeffd700 5 client.4139 handle_cap_grant on in 100000003eb mds.0 seq 3 caps now pLsXsFsxcwb was pAxLsXsxFsxcwb
2015-01-29 12:25:19.352445 7fafdeffd700 10 client.4139 update_inode_file_bits 100000003eb.head(ref=4 ll_ref=0 cap_refs={4096=0,8192=0} open={3=1} mode=100600 size=1398331/4194304 mtime=2015-01-29 12:25:19.342951 caps=pAxLsXsxFsxcwb(0=pAxLsXsxFsxcwb) dirty_caps=Fw objectset[100000003eb ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fb0582136b0 0x7fafd8013ed0) pAxLsXsxFsxcwb mtime 2015-01-29 12:25:17.989149
2015-01-29 12:25:19.352476 7fafdeffd700 10 client.4139 revocation of AxXx
2015-01-29 12:25:19.352481 7fafdeffd700 10 client.4139 check_caps on 100000003eb.head(ref=4 ll_ref=0 cap_refs={4096=0,8192=0} open={3=1} mode=100600 size=1398331/4194304 mtime=2015-01-29 12:25:19.342951 caps=pLsXsFsxcwb(0=pLsXsFsxcwb) dirty_caps=Fw objectset[100000003eb ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fb0582136b0 0x7fafd8013ed0) wanted pAsxXsxFsxcrwb used - is_delayed=0
2015-01-29 12:25:19.352504 7fafdeffd700 10 client.4139 cap_delay_requeue on 100000003eb.head(ref=4 ll_ref=0 cap_refs={4096=0,8192=0} open={3=1} mode=100600 size=1398331/4194304 mtime=2015-01-29 12:25:19.342951 caps=pLsXsFsxcwb(0=pLsXsFsxcwb) dirty_caps=Fw objectset[100000003eb ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fb0582136b0 0x7fafd8013ed0)
2015-01-29 12:25:19.352526 7fafdeffd700 10 client.4139 cap mds.0 issued pLsXsFsxcwb implemented pAxLsXsxFsxcwb revoking AxXx
2015-01-29 12:25:19.352538 7fafdeffd700 10 client.4139 completed revocation of AxXx
2015-01-29 12:25:19.352545 7fafdeffd700 10 client.4139 mark_caps_flushing Fw 100000003eb.head(ref=4 ll_ref=0 cap_refs={4096=0,8192=0} open={3=1} mode=100600 size=1398331/4194304 mtime=2015-01-29 12:25:19.342951 caps=pLsXsFsxcwb(0=pLsXsFsxcwb) dirty_caps=Fw objectset[100000003eb ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fb0582136b0 0x7fafd8013ed0)
2015-01-29 12:25:19.352565 7fafdeffd700 10 client.4139 send_cap 100000003eb.head(ref=4 ll_ref=0 cap_refs={4096=0,8192=0} open={3=1} mode=100600 size=1398331/4194304 mtime=2015-01-29 12:25:19.342951 caps=pLsXsFsxcwb(0=pLsXsFsxcwb) flushing_caps=Fw objectset[100000003eb ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fb0582136b0 0x7fafd8013ed0) mds.0 seq 3 used - want pAsxXsxFsxcrwb flush Fw retain pAsLsxXsFsxcrwbl held pAxLsXsxFsxcwb revoking AxXx dropping -
2015-01-29 12:25:19.364141 7fafdeffd700 1 -- 10.214.137.128:0/1758282524 --> 10.214.131.14:6808/31633 -- client_caps(update ino 100000003eb 14 seq 3 tid 1 caps=pLsXsFsxcwb dirty=Fw wanted=pAsxXsxFsxcrwb follows 1 size 1398331/4095 ts 1 mtime 2015-01-29 12:25:19.342951) v5 -- ?+0 0x7fafd8010d90 con 0x7fb05814a860
2015-01-29 12:25:19.364191 7fafdeffd700 1 -- 10.214.137.128:0/1758282524 <== mds.0 10.214.131.14:6808/31633 10 ==== client_caps(revoke ino 100000003eb 14 seq 4 caps=pLsXsFcb dirty=- wanted=- follows 0 size 0/4194304 ts 1 mtime 2015-01-29 12:25:17.989149) v5 ==== 196+0+0 (2741223658 0 0) 0x7fafcc0008c0 con 0x7fb05814a860
2015-01-29 12:25:19.364205 7fafdeffd700 10 client.4139 mds.0 seq now 3
2015-01-29 12:25:19.364210 7fafdeffd700 5 client.4139 handle_cap_grant on in 100000003eb mds.0 seq 4 caps now pLsXsFcb was pLsXsFsxcwb
2015-01-29 12:25:19.364219 7fafdeffd700 10 client.4139 update_inode_file_bits 100000003eb.head(ref=4 ll_ref=0 cap_refs={4096=0,8192=0} open={3=1} mode=100600 size=1398331/4194304 mtime=2015-01-29 12:25:19.342951 caps=pLsXsFsxcwb(0=pLsXsFsxcwb) flushing_caps=Fw objectset[100000003eb ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fb0582136b0 0x7fafd8013ed0) pLsXsFsxcwb mtime 2015-01-29 12:25:17.989149
2015-01-29 12:25:19.364241 7fafdeffd700 10 client.4139 revocation of Fsxw
2015-01-29 12:25:19.364242 7fafdeffd700 10 client.4139 check_caps on 100000003eb.head(ref=4 ll_ref=0 cap_refs={4096=0,8192=0} open={3=1} mode=100600 size=1398331/4194304 mtime=2015-01-29 12:25:19.342951 caps=pLsXsFcb(0=pLsXsFcb) flushing_caps=Fw objectset[100000003eb ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fb0582136b0 0x7fafd8013ed0) wanted pAsxXsxFsxcrwb used - is_delayed=0
2015-01-29 12:25:19.364253 7fafdeffd700 10 client.4139 cap_delay_requeue on 100000003eb.head(ref=4 ll_ref=0 cap_refs={4096=0,8192=0} open={3=1} mode=100600 size=1398331/4194304 mtime=2015-01-29 12:25:19.342951 caps=pLsXsFcb(0=pLsXsFcb) flushing_caps=Fw objectset[100000003eb ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fb0582136b0 0x7fafd8013ed0)
2015-01-29 12:25:19.364260 7fafdeffd700 10 client.4139 cap mds.0 issued pLsXsFcb implemented pLsXsFsxcwb revoking Fsxw
2015-01-29 12:25:19.364265 7fafdeffd700 10 client.4139 completed revocation of Fsxw
2015-01-29 12:25:19.364266 7fafdeffd700 10 client.4139 send_cap 100000003eb.head(ref=4 ll_ref=0 cap_refs={4096=0,8192=0} open={3=1} mode=100600 size=1398331/4194304 mtime=2015-01-29 12:25:19.342951 caps=pLsXsFcb(0=pLsXsFcb) flushing_caps=Fw objectset[100000003eb ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fb0582136b0 0x7fafd8013ed0) mds.0 seq 4 used - want pAsxXsxFsxcrwb flush - retain pAsxLsxXsxFcrbl held pLsXsFsxcwb revoking Fsxw dropping -
2015-01-29 12:25:19.364287 7fafdeffd700 1 -- 10.214.137.128:0/1758282524 --> 10.214.131.14:6808/31633 -- client_caps(update ino 100000003eb 14 seq 4 caps=pLsXsFcb dirty=- wanted=pAsxXsxFsxcrwb follows 0 size 1398331/4095 ts 1 mtime 2015-01-29 12:25:19.342951) v5 -- ?+0 0x7fafd82c1030 con 0x7fb05814a860
2015-01-29 12:25:19.394667 7fafdeffd700 1 -- 10.214.137.128:0/1758282524 <== mds.0 10.214.131.14:6808/31633 11 ==== client_caps(grant ino 100000003eb 14 seq 5 caps=pLsXsFcb dirty=- wanted=- follows 0 size 0/4194304 ts 1 mtime 2015-01-29 12:25:17.989149) v5 ==== 196+0+0 (88382502 0 0) 0x7fafcc002cf0 con 0x7fb05814a860
2015-01-29 12:25:19.394745 7fafdeffd700 10 client.4139 mds.0 seq now 4
2015-01-29 12:25:19.394758 7fafdeffd700 5 client.4139 handle_cap_grant on in 100000003eb mds.0 seq 5 caps now pLsXsFcb was pLsXsFcb
2015-01-29 12:25:19.394769 7fafdeffd700 10 client.4139 update_inode_file_bits 100000003eb.head(ref=4 ll_ref=0 cap_refs={4096=0,8192=0} open={3=1} mode=100600 size=1398331/4194304 mtime=2015-01-29 12:25:19.342951 caps=pLsXsFcb(0=pLsXsFcb) flushing_caps=Fw objectset[100000003eb ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fb0582136b0 0x7fafd8013ed0) pLsXsFcwb mtime 2015-01-29 12:25:17.989149
2015-01-29 12:25:19.394796 7fafdeffd700 10 client.4139 caps unchanged at pLsXsFcb
2015-01-29 12:25:19.394809 7fafdeffd700 1 -- 10.214.137.128:0/1758282524 <== mds.0 10.214.131.14:6808/31633 12 ==== client_reply(???:3 = 0 (0) Success safe) v1 ==== 27+0+0 (3219600132 0 0) 0x7fafcc002720 con 0x7fb05814a860
2015-01-29 12:25:19.394828 7fafdeffd700 10 client.4139 insert_trace from 2015-01-29 12:25:17.989177 mds.0 is_target=0 is_dentry=0
2015-01-29 12:25:19.394831 7fafdeffd700 10 client.4139 insert_trace -- already got unsafe; ignoring
2015-01-29 12:25:19.394834 7fafdeffd700 10 client.4139 put_inode on 100000003ea.head(ref=3 ll_ref=0 cap_refs={} open={} mode=40777 size=0/0 mtime=2015-01-29 12:25:17.989149 caps=pAsLsXsFsx(0=pAsLsXsFsx) COMPLETE parents=0x7fb058139ed0 0x7fafd8011200)
2015-01-29 12:25:23.928498 7fb01ed8d700 10 client.4139 check_caps on 100000003ea.head(ref=2 ll_ref=0 cap_refs={} open={} mode=40777 size=0/0 mtime=2015-01-29 12:25:17.989149 caps=pAsLsXsFsx(0=pAsLsXsFsx) COMPLETE parents=0x7fb058139ed0 0x7fafd8011200) wanted - used - is_delayed=1
2015-01-29 12:25:23.928548 7fb01ed8d700 10 client.4139 cap mds.0 issued pAsLsXsFsx implemented pAsLsXsFsx revoking -
2015-01-29 12:25:23.928559 7fb01ed8d700 10 client.4139 send_cap 100000003ea.head(ref=2 ll_ref=0 cap_refs={} open={} mode=40777 size=0/0 mtime=2015-01-29 12:25:17.989149 caps=pAsLsXsFsx(0=pAsLsXsFsx) COMPLETE parents=0x7fb058139ed0 0x7fafd8011200) mds.0 seq 4 used - want - flush - retain pAsLsXsFs held pAsLsXsFsx revoking - dropping Fx
2015-01-29 12:25:23.928588 7fb01ed8d700 1 -- 10.214.137.128:0/1758282524 --> 10.214.131.14:6808/31633 -- client_caps(update ino 100000003ea 13 seq 4 caps=pAsLsXsFs dirty=- wanted=- follows 0 size 0/0 ts 1 mtime 2015-01-29 12:25:17.989149) v5 -- ?+0 0x7fafd4000a30 con 0x7fb05814a860
2015-01-29 12:25:24.928678 7fb01ed8d700 10 client.4139 check_caps on 100000003eb.head(ref=4 ll_ref=0 cap_refs={4096=0,8192=0} open={3=1} mode=100600 size=1398331/4194304 mtime=2015-01-29 12:25:19.342951 caps=pLsXsFcb(0=pLsXsFcb) flushing_caps=Fw objectset[100000003eb ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fb0582136b0 0x7fafd8013ed0) wanted pAsxXsxFsxcrwb used - is_delayed=1
2015-01-29 12:25:24.928730 7fb01ed8d700 10 client.4139 cap mds.0 issued pLsXsFcb implemented pLsXsFcb revoking -
2015-01-29 12:25:38.931378 7fb01ed8d700 10 client.4139 renew_caps()
2015-01-29 12:25:38.931403 7fb01ed8d700 10 client.4139 renew_caps mds.0
2015-01-29 12:25:38.931408 7fb01ed8d700 1 -- 10.214.137.128:0/1758282524 --> 10.214.131.14:6808/31633 -- client_session(request_renewcaps seq 3) v2 -- ?+0 0x7fafd40012c0 con 0x7fb05814a860
2015-01-29 12:25:39.765062 7fafdeffd700 1 -- 10.214.137.128:0/1758282524 <== mds.0 10.214.131.14:6808/31633 13 ==== client_reply(???:5 = 0 (0) Success unsafe) v1 ==== 346+0+0 (4075263461 0 0) 0x7fafcc002cf0 con 0x7fb05814a860
2015-01-29 12:25:39.765132 7fafdeffd700 10 client.4139 insert_trace from 2015-01-29 12:25:19.350666 mds.0 is_target=1 is_dentry=0
2015-01-29 12:25:39.765145 7fafdeffd700 10 client.4139 features 0xffffffffffff
2015-01-29 12:25:39.765147 7fafdeffd700 10 client.4139 update_snap_trace len 48
2015-01-29 12:25:39.765153 7fafdeffd700 10 client.4139 update_snap_trace snaprealm(1 nref=4 c=0 seq=1 parent=0 my_snaps=[] cached_snapc=1=[]) seq 1 <= 1 and same parent, SKIPPING
2015-01-29 12:25:39.765160 7fafdeffd700 10 client.4139 hrm is_target=1 is_dentry=0
2015-01-29 12:25:39.765170 7fafdeffd700 10 client.4139 update_inode_file_bits 100000003eb.head(ref=4 ll_ref=0 cap_refs={4096=0,8192=0} open={3=1} mode=100600 size=1398331/4194304 mtime=2015-01-29 12:25:19.342951 caps=pLsXsFcb(0=pLsXsFcb) flushing_caps=Fw objectset[100000003eb ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fb0582136b0 0x7fafd8013ed0) pLsXsFcwb mtime 2015-01-29 12:25:19.350645
2015-01-29 12:25:39.766313 7fafdeffd700 10 client.4139 add_update_cap issued pLsXsFcb -> pAsxLsXsxFscb from mds.0 on 100000003eb.head(ref=4 ll_ref=0 cap_refs={4096=0,8192=0} open={3=1} mode=100600 size=1398331/4194304 mtime=2015-01-29 12:25:19.350645 caps=pAsxLsXsxFscb(0=pAsxLsXsxFscb) flushing_caps=Fw objectset[100000003eb ts 0/0 objects 0 dirty_or_tx 0] parents=0x7fb0582136b0 0x7fafd8013ed0)
2015-01-29 12:25:39.766391 7fb060fc1700 10 client.4139 _setattr result=0
</pre>
<p>you can see that the mds reply did not update inode's size. one likely reason is that truncate_seq was not increased. client flushed inode's Fw cap while waiting mds reply, which make me suspect the bug is caused by some kind of message order issue. but unfortunately, the corresponding part of mds log is missing</p>
CephFS - Bug #10703: failures in libcephfs-java tests
https://tracker.ceph.com/issues/10703?journal_id=47653
2015-02-09T19:23:03Z
Greg Farnum
gfarnum@redhat.com
<ul><li><strong>Backport</strong> set to <i>giant</i></li></ul><p>This is buggy on giant as well (and given what Zheng found out, probably Firefly?).</p>
CephFS - Bug #10703: failures in libcephfs-java tests
https://tracker.ceph.com/issues/10703?journal_id=47986
2015-02-12T05:42:13Z
Greg Farnum
gfarnum@redhat.com
<ul><li><strong>Priority</strong> changed from <i>High</i> to <i>Urgent</i></li></ul><p>Those who have access can check out <a class="external" href="http://pulpito.ceph.redhat.com/teuthology-2015-02-06_23:04:01-fs-giant-distro-basic-magna/42028/">http://pulpito.ceph.redhat.com/teuthology-2015-02-06_23:04:01-fs-giant-distro-basic-magna/42028/</a>, which if we're lucky will include the necessary MDS logging.</p>
CephFS - Bug #10703: failures in libcephfs-java tests
https://tracker.ceph.com/issues/10703?journal_id=47988
2015-02-12T06:29:56Z
Zheng Yan
ukernel@gmail.com
<ul></ul><p>It's not the same as previous one. All failure are caused by "java.lang.UnsatisfiedLinkError: cephfs_jni (Not found in java.library.path)"</p>
CephFS - Bug #10703: failures in libcephfs-java tests
https://tracker.ceph.com/issues/10703?journal_id=47989
2015-02-12T06:36:49Z
Greg Farnum
gfarnum@redhat.com
<ul></ul><p>Argh, sorry for the bad pattern matching. I wonder if that's another thing not working properly on RHEL; I'll check it out in the morning.</p>
CephFS - Bug #10703: failures in libcephfs-java tests
https://tracker.ceph.com/issues/10703?journal_id=48463
2015-02-24T23:18:07Z
Greg Farnum
gfarnum@redhat.com
<ul><li><strong>Priority</strong> changed from <i>Urgent</i> to <i>High</i></li></ul><p>Maybe we'll start seeing this again once we stop seeing the LinkErrors in <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: java.lang.UnsatisfiedLinkError: cephfs_jni (Not found in java.library.path) on RHEL7 (Resolved)" href="https://tracker.ceph.com/issues/10863">#10863</a>, but for now downgrade priority.</p>
CephFS - Bug #10703: failures in libcephfs-java tests
https://tracker.ceph.com/issues/10703?journal_id=50013
2015-03-26T16:43:49Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>Backport</strong> deleted (<del><i>giant</i></del>)</li></ul><p>giant is end of life</p>
CephFS - Bug #10703: failures in libcephfs-java tests
https://tracker.ceph.com/issues/10703?journal_id=50194
2015-03-30T22:49:20Z
Greg Farnum
gfarnum@redhat.com
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>Duplicate</i></li></ul><p><a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: cephfs-java ftruncate unit test failure (Resolved)" href="https://tracker.ceph.com/issues/11258">#11258</a>.</p>
CephFS - Bug #10703: failures in libcephfs-java tests
https://tracker.ceph.com/issues/10703?journal_id=74427
2016-07-13T00:32:29Z
Greg Farnum
gfarnum@redhat.com
<ul><li><strong>Component(FS)</strong> <i>Hadoop/Java</i> added</li></ul>
CephFS - Bug #10703: failures in libcephfs-java tests
https://tracker.ceph.com/issues/10703?journal_id=131311
2019-03-09T00:29:23Z
Patrick Donnelly
pdonnell@redhat.com
<ul><li><strong>Category</strong> deleted (<del><i>48</i></del>)</li><li><strong>Labels (FS)</strong> <i>Java/Hadoop</i> added</li></ul>