Project

General

Profile

Activity

From 04/30/2015 to 05/29/2015

05/29/2015

06:15 PM Bug #11792: mds: recursive statistics are either inaccurate or too "chunky"
Well, they're delayed, yes. But the few times I've bothered to measure the delay is very short (a few seconds at most... Greg Farnum
07:13 AM Bug #11792: mds: recursive statistics are either inaccurate or too "chunky"
thanks to delay propagation, recursive statistics are never accurate Zheng Yan

05/28/2015

06:16 PM Bug #11781: multiple_rsync failure with fuse client
#11807. I'm not actually sure what good options there are for replacement, but it won't get lost. Greg Farnum
12:24 PM Bug #11781: multiple_rsync failure with fuse client
Hmm, maybe using /usr/ as a source of workload files is something we just need to stop doing, there's no particular r... John Spray
12:12 PM Bug #11781 (Rejected): multiple_rsync failure with fuse client
... Zheng Yan
06:15 PM Bug #11807 (Resolved): workunits: stop rsyncing with /usr as the source material
We've had various issues with /usr. The permissions on it are fiddly, its size can vary, and apparently some times th... Greg Farnum
11:47 AM Bug #11789: knfs mount fails with "getfh failed: Function not implemented"
I don't think this is likely to be a regression from previous hammer release, so not a blocker imho John Spray
06:17 AM Bug #11783 (In Progress): protocol: flushing caps on MDS restart can go bad
this is a message ordering issue when MDS failover.
chown marks Ax dirty
client flushes and releases Ax cap
cho...
Zheng Yan
01:02 AM Bug #11790 (Resolved): "cannot utime" errors in from tar in knfs workloads
it's nfs bug in 4.1-rc2 kernel. now the testing branch has been rebased to 4.1-rc5, this issue should disappear Zheng Yan

05/27/2015

10:07 PM Bug #11779 (Resolved): Intermittent failure in test_full
Greg Farnum
10:02 PM Bug #11779: Intermittent failure in test_full
The breaker was... John Spray
09:18 PM Bug #11779 (Pending Backport): Intermittent failure in test_full
Not sure when the change happened that broke this — do we need to backport to older qa branches? Greg Farnum
12:53 PM Bug #11779 (Fix Under Review): Intermittent failure in test_full
https://github.com/ceph/ceph-qa-suite/pull/447 John Spray
11:16 AM Bug #11779 (Resolved): Intermittent failure in test_full

It isn't waiting long enough before writing the last chunk of data. This used to work because the osd report inter...
John Spray
09:54 PM Bug #11781: multiple_rsync failure with fuse client
#9884 was the case where it timed out b/c there were just so many files, whereas in this instance it's actually faili... John Spray
09:22 PM Bug #11781: multiple_rsync failure with fuse client
Possibly a dup of #9884? :/ I'm not sure how accurate that "too many files" is as the actual bug cause, though... Greg Farnum
12:46 PM Bug #11781: multiple_rsync failure with fuse client
Wrong project, sorry. John Spray
12:28 PM Bug #11781 (Rejected): multiple_rsync failure with fuse client

The second rsync run is transferring some data, so either timestamps are going wrong, dentries are going missing, o...
John Spray
09:50 PM Bug #11790: "cannot utime" errors in from tar in knfs workloads
It's the failure to set mtimes etc on an extracted file via utimensat.
I have a slightly fuzzy memory that we were...
John Spray
09:25 PM Bug #11790: "cannot utime" errors in from tar in knfs workloads
... Greg Farnum
08:14 PM Bug #11790: "cannot utime" errors in from tar in knfs workloads
hammer v0.94.2 release
do you think is this a blocker?
Yuri Weinstein
07:21 PM Bug #11790 (Resolved): "cannot utime" errors in from tar in knfs workloads
http://pulpito.ceph.com/teuthology-2015-05-18_13:43:17-knfs-hammer-testing-basic-multi/897933/
http://pulpito.ceph.c...
John Spray
09:29 PM Bug #11301 (Resolved): mds-full test occasionally fails due to blacklist expiry
Greg Farnum
09:23 PM Bug #11783: protocol: flushing caps on MDS restart can go bad
Yep, this one looks unfamiliar to me. :( Do we have client logs from when it happened that we can reference? Greg Farnum
01:04 PM Bug #11783 (Resolved): protocol: flushing caps on MDS restart can go bad

Not consistent, not happening on master.
http://pulpito.ceph.com/teuthology-2015-05-16_23:04:02-fs-next-testing-...
John Spray
08:44 PM Bug #11792 (New): mds: recursive statistics are either inaccurate or too "chunky"
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg20005.html... Greg Farnum
08:26 PM Bug #11784: ceph-fuse hang on unmount (stuck dentry refs)
John, is this likely to have been a dup of #11294? We can tell by checking out the ceph-fuse log if it's still availa... Greg Farnum
01:28 PM Bug #11784 (Can't reproduce): ceph-fuse hang on unmount (stuck dentry refs)
http://pulpito-rdu.front.sepia.ceph.com/teuthology-2015-05-19_23:14:02-samba-master-testing-basic-typica/21446/
Th...
John Spray
08:14 PM Bug #11789: knfs mount fails with "getfh failed: Function not implemented"
hammer v0.94.2 release
do you think is this a blocker?
Yuri Weinstein
07:08 PM Bug #11789 (Can't reproduce): knfs mount fails with "getfh failed: Function not implemented"

See plana37 syslog on /a/teuthology-2015-05-18_13:43:17-knfs-hammer-testing-basic-multi/897936...
John Spray
01:29 PM Bug #11756 (Resolved): 'ls /sys/fs/fuse/connections' causes fuse mount fails
John Spray

05/26/2015

10:36 AM Bug #11758 (Resolved): kernel_untar_build fails on EL7

Perhaps the more recent GCC is too strict to compile the old kernel tarball we use in the test?
http://magna002....
John Spray
03:12 AM Bug #11756: 'ls /sys/fs/fuse/connections' causes fuse mount fails
http://pulpito.ceph.com/teuthology-2015-05-22_23:04:02-fs-master-testing-basic-multi/906267/ Zheng Yan
03:01 AM Bug #11756 (Fix Under Review): 'ls /sys/fs/fuse/connections' causes fuse mount fails
https://github.com/ceph/ceph-qa-suite/pull/444 Zheng Yan
03:00 AM Bug #11756 (Resolved): 'ls /sys/fs/fuse/connections' causes fuse mount fails
before list fuse connections, we should make sure fusectl is mounted on /sys/fs/fuse/connections Zheng Yan
02:22 AM Bug #11752 (Resolved): valgrind InvalidRead
Zheng Yan

05/25/2015

03:56 AM Bug #11752 (Fix Under Review): valgrind InvalidRead
https://github.com/ceph/ceph/pull/4755 Zheng Yan
03:38 AM Bug #11752 (Resolved): valgrind InvalidRead
http://qa-proxy.ceph.com/teuthology/teuthology-2015-05-20_23:04:01-fs-master-testing-basic-multi/902897/teuthology.log Zheng Yan

05/22/2015

07:52 PM Bug #11746 (Resolved): cephfs Dumper tries to load whole journal into memory at once
It should be streaming instead.
(From http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001622.html)
John Spray
01:14 PM Bug #11301 (Fix Under Review): mds-full test occasionally fails due to blacklist expiry
https://github.com/ceph/ceph/pull/4746
https://github.com/ceph/ceph-qa-suite/pull/443
John Spray
08:03 AM Backport #11737 (Resolved): MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
https://github.com/ceph/ceph/pull/4886 Loïc Dachary

05/19/2015

08:31 AM Bug #11481 (Resolved): "mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->replay trans...
Zheng Yan
08:05 AM Bug #10624 (Resolved): hung samba tests after lsof failure
Zheng Yan
08:04 AM Bug #11641 (Resolved): test_journal_repair failing on mds_verify_scatter
Zheng Yan

05/18/2015

04:08 PM Bug #11641 (Fix Under Review): test_journal_repair failing on mds_verify_scatter
https://github.com/ceph/ceph/pull/4715 John Spray
12:53 PM Bug #11641 (In Progress): test_journal_repair failing on mds_verify_scatter
John Spray
12:52 PM Bug #11641: test_journal_repair failing on mds_verify_scatter
OK, turns out to be nothing to do with journaltool, this is a straight up bug in the change I made to populate_mydir.... John Spray

05/15/2015

05:35 PM Bug #11635 (Rejected): hung fsx run
s/microseconds/milliseconds :)
Rejecting this since it's an infrastructure thing we can't do much about for now an...
Greg Farnum
07:12 AM Bug #11635: hung fsx run
I compare osd.log with success run's log. It seems the osd handled [write,startsync] slowly. It used several hundred ... Zheng Yan
04:47 AM Bug #11635: hung fsx run
Was it apparent why things were so slow? Just the OSDs being sluggish, or something else? :/ Greg Farnum
04:16 AM Bug #11635 (Need More Info): hung fsx run
I checked the OSD log, The test was not hung, just very slow. In the test, fsx uses about 20 minutes to reach "10000 ... Zheng Yan
05:13 PM Bug #11641: test_journal_repair failing on mds_verify_scatter
Note that this run has the dirfrag enable PR included. I'm not sure if this test would be hitting that or if it's fai... Greg Farnum
01:44 PM Bug #11641: test_journal_repair failing on mds_verify_scatter
... John Spray
01:43 PM Bug #11641 (Resolved): test_journal_repair failing on mds_verify_scatter

Possibly related to change to recent changes to the mydir population stuff?
http://pulpito.front.sepia.ceph.com/...
John Spray
04:59 AM Bug #9995 (Resolved): failing test_filelock
Finally merged in ceph-qa-suite commit:8eb3255a4c5c62720cf5c749032cbeccc4f9b264 Greg Farnum
03:56 AM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
Generally speaking no, but unless it's difficult to backport we might as well do so for things people have actually h... Greg Farnum
03:53 AM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
this bug happens only in multimds case. do we need to backport multimds fixes? Zheng Yan

05/14/2015

11:37 PM Feature #11276 (Resolved): fuse: Handle per-pool FULL flags (such as set by quota limit)
Merged to master by commit:e63c44b253027b9fa5f6566a089ec18b39e16b45 Greg Farnum
09:58 PM Bug #11635 (Rejected): hung fsx run
http://typica002.front.sepia.ceph.com/teuthology-2015-05-11_23:08:01-kcephfs-master-testing-basic-typica/15179/
<p...
Greg Farnum
01:42 AM Bug #11540 (Resolved): Test failure in LibCephFS.BadArgument
Zheng Yan

05/13/2015

08:58 PM Feature #11588: teuthology: set up log rotate for MDS logs
Yes, we could also stop the rotation after the tests execute and before archiving — but that doesn't actually solve o... Greg Farnum
11:03 AM Feature #11588: teuthology: set up log rotate for MDS logs
Perhaps disabling rotation for #5451 was just a quick fix, would it work make teuthology allow rotation, but stop all... John Spray
01:26 PM Bug #11301 (In Progress): mds-full test occasionally fails due to blacklist expiry
John Spray
02:18 AM Fix #11187: ceph-qa-suite: reduce required machine numbers
Greg Farnum
12:10 AM Fix #11187: ceph-qa-suite: reduce required machine numbers
We can consolidate kernel mounts on the same machine as long as we use the -o noshare option when mounting. Greg Farnum

05/12/2015

11:52 PM Fix #11187 (In Progress): ceph-qa-suite: reduce required machine numbers
Okay, there are some specialized ones in suites/fs that are pretty easy (done locally).
kcephfs has 2-client tests...
Greg Farnum
11:25 PM Feature #11588: teuthology: set up log rotate for MDS logs
We explicitly disable logrotate in ceph-qa-suite/tasks/ceph.py, because of #5451.
I think maybe the answer is to k...
Greg Farnum
08:48 AM Bug #11541 (Pending Backport): MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
Needs backport to hammer as that's where the issue appeared. John Spray
05:07 AM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
Put it in a PR, please? Greg Farnum
05:02 AM Bug #11481: "mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->replay transition
John, can you do something so the MDS resets itself if it gets moved out of the active state? I think there might be ... Greg Farnum
04:48 AM Bug #11481: "mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->replay transition
For some reason the leader is not responding to all of the MDSBeacon messages it receives (but it does respond to oth... Greg Farnum

05/11/2015

08:34 PM Feature #11588 (Resolved): teuthology: set up log rotate for MDS logs
There might be some stuff around log rotation already in teuthology. If not, it shouldn't be hard to add handling.
...
Greg Farnum

05/09/2015

12:04 AM Bug #11510 (Resolved): ceph-fuse crashes
I backported the pool permissions checking (https://github.com/ceph/ceph/pull/4629); it's waiting for the backports g... Greg Farnum

05/08/2015

08:35 AM Bug #11568 (Resolved): check_pool_perm blocks forever on nonexistent pool
Zheng Yan
06:28 AM Bug #11236 (Resolved): lock inversions in test/libcephfs flock test
it's http://tracker.ceph.com/issues/11540 Zheng Yan
04:16 AM Bug #11236 (Need More Info): lock inversions in test/libcephfs flock test
Did we create a ticket for the failed test case(s)? We should perhaps fix the tests so that we don't get semi-fake wa... Greg Farnum
03:14 AM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
I think the fix should be:... Zheng Yan

05/07/2015

11:53 PM Bug #11510: ceph-fuse crashes
Wasn't planning on backporting those, since EPERM is caught before the object cacher is used for rbd, but feel free t... Josh Durgin
11:32 PM Bug #11510: ceph-fuse crashes
Looks like this is something that will cause trouble for RBD as well. Josh, Jason, are you guys planning to backport ... Greg Farnum
11:36 PM Bug #11300: client-limits and mds-full failures on clog warning/errors
I think this is done now? Was probably me who merged the backports without going through Github... Greg Farnum
05:47 PM Bug #11568 (Fix Under Review): check_pool_perm blocks forever on nonexistent pool
https://github.com/ceph/ceph/pull/4604 John Spray
05:46 PM Bug #11568 (Resolved): check_pool_perm blocks forever on nonexistent pool

Reproduce: a file with a layout pointing to a non-existent pool, try reading it from ceph-fuse, see that the read b...
John Spray
01:26 PM Bug #11504 (Fix Under Review): CephFS restriction on removing cache tiers is overly strict
https://github.com/ceph/ceph/pull/4602 John Spray
10:01 AM Bug #11236 (Resolved): lock inversions in test/libcephfs flock test
Cool, that makes sense to me. John Spray
01:55 AM Bug #11236: lock inversions in test/libcephfs flock test
John Spray wrote:
> This appears to have resurfaced here:
>
> http://pulpito.front.sepia.ceph.com/teuthology-2015...
Zheng Yan
02:01 AM Bug #11522 (Resolved): FLUSHSNAP_ACKs are not always sent back on unexpected FLUSHSNAP
by commit 585bc2bc6a7aeddf32e9c1d43d0aa01568e01d63 Zheng Yan

05/06/2015

09:07 PM Bug #11316 (Resolved): "ceph-fuse not mounted, got fs type 'ext2/ext3" looping in ceph-deploy-ham...
ceph-deploy suite was optimized and scheduled to run on hammer and next Yuri Weinstein
06:08 PM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
Hi, i many restart mds daemons, and it started normally. Andrey Matyashov
03:25 PM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
Created wip-11541-hammer-workaround branch. Andrey: once its built you should be able to find some packages on http:... John Spray
02:50 PM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
The contradiction here seems to be that our code wants any unlinked directories (i.e. in a stray directory) to have n... John Spray
02:36 PM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
This may be due to the following change:... John Spray
10:28 AM Bug #11541 (Resolved): MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
After update from 0.82 to 0.94.1 mds crached:... Andrey Matyashov
01:52 PM Bug #11540: Test failure in LibCephFS.BadArgument
Merged the fix, let's see that it's fixed in the next lightly before resolving this John Spray
10:39 AM Bug #11540: Test failure in LibCephFS.BadArgument
https://github.com/ceph/ceph/pull/4572 Haomai Wang
10:18 AM Bug #11540 (Resolved): Test failure in LibCephFS.BadArgument
This is the new test for the 0 length write case, it's failing in the nightly run:
http://pulpito.front.sepia.ceph...
John Spray
10:10 AM Bug #11236: lock inversions in test/libcephfs flock test
This appears to have resurfaced here:
http://pulpito.front.sepia.ceph.com/teuthology-2015-05-04_23:04:01-fs-master...
John Spray
09:18 AM Bug #10624: hung samba tests after lsof failure
* hammer backport https://github.com/ceph/ceph-qa-suite/pull/418 Loïc Dachary
09:13 AM Bug #10712 (Resolved): TestFlush intermittent failure on scatter_writebehind event
Loïc Dachary
08:02 AM Bug #11482 (Resolved): kclient: intermittent log warnings "client.XXXX isn't responding to mclien...
Zheng Yan
07:58 AM Bug #11510: ceph-fuse crashes
file creation go through MDS, it does not require client have permission to write to OSD. Zheng Yan
03:31 AM Bug #11510: ceph-fuse crashes
Actually, I'm wondering that how the file creation(or maybe copying files) works without enough permissions? Could an... Jifeng Yin
02:17 AM Bug #11510: ceph-fuse crashes
It crashes because ObjectCacher does not check error code of osd read request. The bug is already fix in master branc... Zheng Yan

05/05/2015

04:48 PM Bug #11481: "mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->replay transition
The mons appear to be rather unhappy in this test run, but there are no thrashers or messenger failure settings turne... John Spray
04:16 PM Bug #11481: "mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->replay transition

This looks like a mon bug of some kind. After a leader election, a bunch of routed messages are being resent to th...
John Spray
03:16 PM Bug #11481: "mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->replay transition
Weird! This is an MDS going from state active to state standby, then back again. In the final stage it notices that... John Spray
02:41 PM Bug #11462: kernel: crash (when MDS died?)
I didn't do a character-by-character diff, but I checked most of the function names and they were the same, yes. :) Greg Farnum
07:35 AM Bug #11462: kernel: crash (when MDS died?)
Greg Farnum wrote:
> FYI this popped up again in teuthology-2015-04-30_23:04:01-fs-next-testing-basic-multi/870815
...
Zheng Yan
01:48 PM Bug #11510 (Need More Info): ceph-fuse crashes
It still shouldn't have crashed and I'm not sure why that would have happened unless the kernel gave us ENOMEM or som... Greg Farnum
07:13 AM Bug #11510: ceph-fuse crashes

It works when I use the user with caps ...
Jifeng Yin
07:03 AM Bug #11510: ceph-fuse crashes
this bug is fixed by commit 540346d4 (osdc: In _readx() only no error can tidy read result). Zheng Yan
12:38 PM Bug #11482: kclient: intermittent log warnings "client.XXXX isn't responding to mclientcaps(revoke)"
* hammer backport https://github.com/ceph/ceph/pull/4481 Loïc Dachary
09:50 AM Bug #10712: TestFlush intermittent failure on scatter_writebehind event
http://pulpito.ceph.com/loic-2015-05-04_11:27:13-fs-hammer-backports---basic-multi/874954 Loïc Dachary
09:48 AM Bug #10712 (Pending Backport): TestFlush intermittent failure on scatter_writebehind event
* hammer backport https://github.com/ceph/ceph-qa-suite/pull/429 Loïc Dachary

05/04/2015

01:57 PM Bug #11462: kernel: crash (when MDS died?)
FYI this popped up again in teuthology-2015-04-30_23:04:01-fs-next-testing-basic-multi/870815 Greg Farnum
10:09 AM Bug #11522: FLUSHSNAP_ACKs are not always sent back on unexpected FLUSHSNAP
client creates cap_snap and sends FLUSHSNAP message to MDS even when snap notification contains no new snap (snap rem... Zheng Yan

05/02/2015

06:14 AM Bug #11510: ceph-fuse crashes
The admin account works, and it turns out to be a permission problem, although I'm surprised that it works at the fir... Jifeng Yin
12:10 AM Bug #10863 (Resolved): java.lang.UnsatisfiedLinkError: cephfs_jni (Not found in java.library.path...
commit:aef0272d72afaef849b5d4acbf55626033369ee8 Greg Farnum

05/01/2015

11:14 PM Bug #10863: java.lang.UnsatisfiedLinkError: cephfs_jni (Not found in java.library.path) on RHEL7
That last failure in ceph-qa-suite that you emailed about has a fix here. I submitted it to firefly-backports. That r... Noah Watkins
06:29 PM Bug #11522 (Resolved): FLUSHSNAP_ACKs are not always sent back on unexpected FLUSHSNAP
There is a PR to make the MDS always send back a FLUSHSNAP_ACK: https://github.com/ceph/ceph/pull/4523
But I guess...
Greg Farnum
04:23 AM Bug #11517 (Resolved): Libcephfs: Doesn't check file's open mode when do read/write
Something like this:
fd = cephfs_open(path, O_WRONLY|O_CREAT, 0)
cephfs.read(fd, buf, 0, 10) // ok, but actually ...
Haomai Wang
03:45 AM Bug #9995 (Fix Under Review): failing test_filelock
https://github.com/ceph/ceph-qa-suite/pull/425 Greg Farnum

04/30/2015

10:34 PM Bug #11316: "ceph-fuse not mounted, got fs type 'ext2/ext3" looping in ceph-deploy-hammer-distro-...
See update here: http://tracker.ceph.com/issues/11495#change-51225 Yuri Weinstein
06:04 PM Bug #11255: nfs: mount failures on ceph-backed NFS share
http://pulpito.ceph.com/teuthology-2015-04-29_23:10:02-knfs-next-testing-basic-multi/869428/
I also realized recen...
Greg Farnum
03:07 PM Bug #11510: ceph-fuse crashes
John, could you know how to export the data from the under-laid pool? I'm very curious.
The ceph-fuse just goes cr...
Jifeng Yin
02:47 PM Bug #11510: ceph-fuse crashes
Nothing special. I just use "ceph fs new cephfs".
I've noticed the ticket #11120, and trying 'getfattr -d' shows n...
Jifeng Yin
01:49 PM Bug #11510: ceph-fuse crashes
I've definitely seen this backtrace before, trying to remember when... could it have been when we were dealing with t... John Spray
11:04 AM Bug #11510 (Resolved): ceph-fuse crashes
When I read the file through ceph-fuse, it crashes, while list is okay.
kernel drive doesn't crash, but shows 'read ...
Jifeng Yin
 

Also available in: Atom