Activity
From 05/03/2015 to 06/01/2015
06/01/2015
- 09:55 PM Bug #11835 (Resolved): FuseMount.umount_wait can hang
Currently code in FuseMount.umount assumes that the write to /sys/fuse/connections/X/abort causes the process to te...- 09:49 PM Bug #11784: ceph-fuse hang on unmount (stuck dentry refs)
- We saw this again today, and it's definitely inode refs this time:
http://pulpito-rdu.front.sepia.ceph.com/gregf-201...
05/31/2015
- 10:14 PM Bug #10248: messenger: failed Pipe;:connect::assert(m) in Hadoop client
- Yes — the related issue chain has been seen a few times, more recently.
- 06:05 PM Bug #10248: messenger: failed Pipe;:connect::assert(m) in Hadoop client
- is it still valid ?
05/29/2015
- 06:15 PM Bug #11792: mds: recursive statistics are either inaccurate or too "chunky"
- Well, they're delayed, yes. But the few times I've bothered to measure the delay is very short (a few seconds at most...
- 07:13 AM Bug #11792: mds: recursive statistics are either inaccurate or too "chunky"
- thanks to delay propagation, recursive statistics are never accurate
05/28/2015
- 06:16 PM Bug #11781: multiple_rsync failure with fuse client
- #11807. I'm not actually sure what good options there are for replacement, but it won't get lost.
- 12:24 PM Bug #11781: multiple_rsync failure with fuse client
- Hmm, maybe using /usr/ as a source of workload files is something we just need to stop doing, there's no particular r...
- 12:12 PM Bug #11781 (Rejected): multiple_rsync failure with fuse client
- ...
- 06:15 PM Bug #11807 (Resolved): workunits: stop rsyncing with /usr as the source material
- We've had various issues with /usr. The permissions on it are fiddly, its size can vary, and apparently some times th...
- 11:47 AM Bug #11789: knfs mount fails with "getfh failed: Function not implemented"
- I don't think this is likely to be a regression from previous hammer release, so not a blocker imho
- 06:17 AM Bug #11783 (In Progress): protocol: flushing caps on MDS restart can go bad
- this is a message ordering issue when MDS failover.
chown marks Ax dirty
client flushes and releases Ax cap
cho... - 01:02 AM Bug #11790 (Resolved): "cannot utime" errors in from tar in knfs workloads
- it's nfs bug in 4.1-rc2 kernel. now the testing branch has been rebased to 4.1-rc5, this issue should disappear
05/27/2015
- 10:07 PM Bug #11779 (Resolved): Intermittent failure in test_full
- 10:02 PM Bug #11779: Intermittent failure in test_full
- The breaker was...
- 09:18 PM Bug #11779 (Pending Backport): Intermittent failure in test_full
- Not sure when the change happened that broke this — do we need to backport to older qa branches?
- 12:53 PM Bug #11779 (Fix Under Review): Intermittent failure in test_full
- https://github.com/ceph/ceph-qa-suite/pull/447
- 11:16 AM Bug #11779 (Resolved): Intermittent failure in test_full
It isn't waiting long enough before writing the last chunk of data. This used to work because the osd report inter...- 09:54 PM Bug #11781: multiple_rsync failure with fuse client
- #9884 was the case where it timed out b/c there were just so many files, whereas in this instance it's actually faili...
- 09:22 PM Bug #11781: multiple_rsync failure with fuse client
- Possibly a dup of #9884? :/ I'm not sure how accurate that "too many files" is as the actual bug cause, though...
- 12:46 PM Bug #11781: multiple_rsync failure with fuse client
- Wrong project, sorry.
- 12:28 PM Bug #11781 (Rejected): multiple_rsync failure with fuse client
The second rsync run is transferring some data, so either timestamps are going wrong, dentries are going missing, o...- 09:50 PM Bug #11790: "cannot utime" errors in from tar in knfs workloads
- It's the failure to set mtimes etc on an extracted file via utimensat.
I have a slightly fuzzy memory that we were... - 09:25 PM Bug #11790: "cannot utime" errors in from tar in knfs workloads
- ...
- 08:14 PM Bug #11790: "cannot utime" errors in from tar in knfs workloads
- hammer v0.94.2 release
do you think is this a blocker? - 07:21 PM Bug #11790 (Resolved): "cannot utime" errors in from tar in knfs workloads
- http://pulpito.ceph.com/teuthology-2015-05-18_13:43:17-knfs-hammer-testing-basic-multi/897933/
http://pulpito.ceph.c... - 09:29 PM Bug #11301 (Resolved): mds-full test occasionally fails due to blacklist expiry
- 09:23 PM Bug #11783: protocol: flushing caps on MDS restart can go bad
- Yep, this one looks unfamiliar to me. :( Do we have client logs from when it happened that we can reference?
- 01:04 PM Bug #11783 (Resolved): protocol: flushing caps on MDS restart can go bad
Not consistent, not happening on master.
http://pulpito.ceph.com/teuthology-2015-05-16_23:04:02-fs-next-testing-...- 08:44 PM Bug #11792 (New): mds: recursive statistics are either inaccurate or too "chunky"
- https://www.mail-archive.com/ceph-users@lists.ceph.com/msg20005.html...
- 08:26 PM Bug #11784: ceph-fuse hang on unmount (stuck dentry refs)
- John, is this likely to have been a dup of #11294? We can tell by checking out the ceph-fuse log if it's still availa...
- 01:28 PM Bug #11784 (Can't reproduce): ceph-fuse hang on unmount (stuck dentry refs)
- http://pulpito-rdu.front.sepia.ceph.com/teuthology-2015-05-19_23:14:02-samba-master-testing-basic-typica/21446/
Th... - 08:14 PM Bug #11789: knfs mount fails with "getfh failed: Function not implemented"
- hammer v0.94.2 release
do you think is this a blocker? - 07:08 PM Bug #11789 (Can't reproduce): knfs mount fails with "getfh failed: Function not implemented"
See plana37 syslog on /a/teuthology-2015-05-18_13:43:17-knfs-hammer-testing-basic-multi/897936...- 01:29 PM Bug #11756 (Resolved): 'ls /sys/fs/fuse/connections' causes fuse mount fails
05/26/2015
- 10:36 AM Bug #11758 (Resolved): kernel_untar_build fails on EL7
Perhaps the more recent GCC is too strict to compile the old kernel tarball we use in the test?
http://magna002....- 03:12 AM Bug #11756: 'ls /sys/fs/fuse/connections' causes fuse mount fails
- http://pulpito.ceph.com/teuthology-2015-05-22_23:04:02-fs-master-testing-basic-multi/906267/
- 03:01 AM Bug #11756 (Fix Under Review): 'ls /sys/fs/fuse/connections' causes fuse mount fails
- https://github.com/ceph/ceph-qa-suite/pull/444
- 03:00 AM Bug #11756 (Resolved): 'ls /sys/fs/fuse/connections' causes fuse mount fails
- before list fuse connections, we should make sure fusectl is mounted on /sys/fs/fuse/connections
- 02:22 AM Bug #11752 (Resolved): valgrind InvalidRead
05/25/2015
- 03:56 AM Bug #11752 (Fix Under Review): valgrind InvalidRead
- https://github.com/ceph/ceph/pull/4755
- 03:38 AM Bug #11752 (Resolved): valgrind InvalidRead
- http://qa-proxy.ceph.com/teuthology/teuthology-2015-05-20_23:04:01-fs-master-testing-basic-multi/902897/teuthology.log
05/22/2015
- 07:52 PM Bug #11746 (Resolved): cephfs Dumper tries to load whole journal into memory at once
- It should be streaming instead.
(From http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001622.html) - 01:14 PM Bug #11301 (Fix Under Review): mds-full test occasionally fails due to blacklist expiry
- https://github.com/ceph/ceph/pull/4746
https://github.com/ceph/ceph-qa-suite/pull/443 - 08:03 AM Backport #11737 (Resolved): MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
- https://github.com/ceph/ceph/pull/4886
05/19/2015
- 08:31 AM Bug #11481 (Resolved): "mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->replay trans...
- 08:05 AM Bug #10624 (Resolved): hung samba tests after lsof failure
- 08:04 AM Bug #11641 (Resolved): test_journal_repair failing on mds_verify_scatter
05/18/2015
- 04:08 PM Bug #11641 (Fix Under Review): test_journal_repair failing on mds_verify_scatter
- https://github.com/ceph/ceph/pull/4715
- 12:53 PM Bug #11641 (In Progress): test_journal_repair failing on mds_verify_scatter
- 12:52 PM Bug #11641: test_journal_repair failing on mds_verify_scatter
- OK, turns out to be nothing to do with journaltool, this is a straight up bug in the change I made to populate_mydir....
05/15/2015
- 05:35 PM Bug #11635 (Rejected): hung fsx run
- s/microseconds/milliseconds :)
Rejecting this since it's an infrastructure thing we can't do much about for now an... - 07:12 AM Bug #11635: hung fsx run
- I compare osd.log with success run's log. It seems the osd handled [write,startsync] slowly. It used several hundred ...
- 04:47 AM Bug #11635: hung fsx run
- Was it apparent why things were so slow? Just the OSDs being sluggish, or something else? :/
- 04:16 AM Bug #11635 (Need More Info): hung fsx run
- I checked the OSD log, The test was not hung, just very slow. In the test, fsx uses about 20 minutes to reach "10000 ...
- 05:13 PM Bug #11641: test_journal_repair failing on mds_verify_scatter
- Note that this run has the dirfrag enable PR included. I'm not sure if this test would be hitting that or if it's fai...
- 01:44 PM Bug #11641: test_journal_repair failing on mds_verify_scatter
- ...
- 01:43 PM Bug #11641 (Resolved): test_journal_repair failing on mds_verify_scatter
Possibly related to change to recent changes to the mydir population stuff?
http://pulpito.front.sepia.ceph.com/...- 04:59 AM Bug #9995 (Resolved): failing test_filelock
- Finally merged in ceph-qa-suite commit:8eb3255a4c5c62720cf5c749032cbeccc4f9b264
- 03:56 AM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
- Generally speaking no, but unless it's difficult to backport we might as well do so for things people have actually h...
- 03:53 AM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
- this bug happens only in multimds case. do we need to backport multimds fixes?
05/14/2015
- 11:37 PM Feature #11276 (Resolved): fuse: Handle per-pool FULL flags (such as set by quota limit)
- Merged to master by commit:e63c44b253027b9fa5f6566a089ec18b39e16b45
- 09:58 PM Bug #11635 (Rejected): hung fsx run
- http://typica002.front.sepia.ceph.com/teuthology-2015-05-11_23:08:01-kcephfs-master-testing-basic-typica/15179/
<p... - 01:42 AM Bug #11540 (Resolved): Test failure in LibCephFS.BadArgument
05/13/2015
- 08:58 PM Feature #11588: teuthology: set up log rotate for MDS logs
- Yes, we could also stop the rotation after the tests execute and before archiving — but that doesn't actually solve o...
- 11:03 AM Feature #11588: teuthology: set up log rotate for MDS logs
- Perhaps disabling rotation for #5451 was just a quick fix, would it work make teuthology allow rotation, but stop all...
- 01:26 PM Bug #11301 (In Progress): mds-full test occasionally fails due to blacklist expiry
- 02:18 AM Fix #11187: ceph-qa-suite: reduce required machine numbers
- 12:10 AM Fix #11187: ceph-qa-suite: reduce required machine numbers
- We can consolidate kernel mounts on the same machine as long as we use the -o noshare option when mounting.
05/12/2015
- 11:52 PM Fix #11187 (In Progress): ceph-qa-suite: reduce required machine numbers
- Okay, there are some specialized ones in suites/fs that are pretty easy (done locally).
kcephfs has 2-client tests... - 11:25 PM Feature #11588: teuthology: set up log rotate for MDS logs
- We explicitly disable logrotate in ceph-qa-suite/tasks/ceph.py, because of #5451.
I think maybe the answer is to k... - 08:48 AM Bug #11541 (Pending Backport): MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
- Needs backport to hammer as that's where the issue appeared.
- 05:07 AM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
- Put it in a PR, please?
- 05:02 AM Bug #11481: "mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->replay transition
- John, can you do something so the MDS resets itself if it gets moved out of the active state? I think there might be ...
- 04:48 AM Bug #11481: "mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->replay transition
- For some reason the leader is not responding to all of the MDSBeacon messages it receives (but it does respond to oth...
05/11/2015
- 08:34 PM Feature #11588 (Resolved): teuthology: set up log rotate for MDS logs
- There might be some stuff around log rotation already in teuthology. If not, it shouldn't be hard to add handling.
...
05/09/2015
- 12:04 AM Bug #11510 (Resolved): ceph-fuse crashes
- I backported the pool permissions checking (https://github.com/ceph/ceph/pull/4629); it's waiting for the backports g...
05/08/2015
- 08:35 AM Bug #11568 (Resolved): check_pool_perm blocks forever on nonexistent pool
- 06:28 AM Bug #11236 (Resolved): lock inversions in test/libcephfs flock test
- it's http://tracker.ceph.com/issues/11540
- 04:16 AM Bug #11236 (Need More Info): lock inversions in test/libcephfs flock test
- Did we create a ticket for the failed test case(s)? We should perhaps fix the tests so that we don't get semi-fake wa...
- 03:14 AM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
- I think the fix should be:...
05/07/2015
- 11:53 PM Bug #11510: ceph-fuse crashes
- Wasn't planning on backporting those, since EPERM is caught before the object cacher is used for rbd, but feel free t...
- 11:32 PM Bug #11510: ceph-fuse crashes
- Looks like this is something that will cause trouble for RBD as well. Josh, Jason, are you guys planning to backport ...
- 11:36 PM Bug #11300: client-limits and mds-full failures on clog warning/errors
- I think this is done now? Was probably me who merged the backports without going through Github...
- 05:47 PM Bug #11568 (Fix Under Review): check_pool_perm blocks forever on nonexistent pool
- https://github.com/ceph/ceph/pull/4604
- 05:46 PM Bug #11568 (Resolved): check_pool_perm blocks forever on nonexistent pool
Reproduce: a file with a layout pointing to a non-existent pool, try reading it from ceph-fuse, see that the read b...- 01:26 PM Bug #11504 (Fix Under Review): CephFS restriction on removing cache tiers is overly strict
- https://github.com/ceph/ceph/pull/4602
- 10:01 AM Bug #11236 (Resolved): lock inversions in test/libcephfs flock test
- Cool, that makes sense to me.
- 01:55 AM Bug #11236: lock inversions in test/libcephfs flock test
- John Spray wrote:
> This appears to have resurfaced here:
>
> http://pulpito.front.sepia.ceph.com/teuthology-2015... - 02:01 AM Bug #11522 (Resolved): FLUSHSNAP_ACKs are not always sent back on unexpected FLUSHSNAP
- by commit 585bc2bc6a7aeddf32e9c1d43d0aa01568e01d63
05/06/2015
- 09:07 PM Bug #11316 (Resolved): "ceph-fuse not mounted, got fs type 'ext2/ext3" looping in ceph-deploy-ham...
- ceph-deploy suite was optimized and scheduled to run on hammer and next
- 06:08 PM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
- Hi, i many restart mds daemons, and it started normally.
- 03:25 PM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
- Created wip-11541-hammer-workaround branch. Andrey: once its built you should be able to find some packages on http:...
- 02:50 PM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
- The contradiction here seems to be that our code wants any unlinked directories (i.e. in a stray directory) to have n...
- 02:36 PM Bug #11541: MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
- This may be due to the following change:...
- 10:28 AM Bug #11541 (Resolved): MDS is crashed (mds/CDir.cc: 1391: FAILED assert(!is_complete()))
- After update from 0.82 to 0.94.1 mds crached:...
- 01:52 PM Bug #11540: Test failure in LibCephFS.BadArgument
- Merged the fix, let's see that it's fixed in the next lightly before resolving this
- 10:39 AM Bug #11540: Test failure in LibCephFS.BadArgument
- https://github.com/ceph/ceph/pull/4572
- 10:18 AM Bug #11540 (Resolved): Test failure in LibCephFS.BadArgument
- This is the new test for the 0 length write case, it's failing in the nightly run:
http://pulpito.front.sepia.ceph... - 10:10 AM Bug #11236: lock inversions in test/libcephfs flock test
- This appears to have resurfaced here:
http://pulpito.front.sepia.ceph.com/teuthology-2015-05-04_23:04:01-fs-master... - 09:18 AM Bug #10624: hung samba tests after lsof failure
- * hammer backport https://github.com/ceph/ceph-qa-suite/pull/418
- 09:13 AM Bug #10712 (Resolved): TestFlush intermittent failure on scatter_writebehind event
- 08:02 AM Bug #11482 (Resolved): kclient: intermittent log warnings "client.XXXX isn't responding to mclien...
- 07:58 AM Bug #11510: ceph-fuse crashes
- file creation go through MDS, it does not require client have permission to write to OSD.
- 03:31 AM Bug #11510: ceph-fuse crashes
- Actually, I'm wondering that how the file creation(or maybe copying files) works without enough permissions? Could an...
- 02:17 AM Bug #11510: ceph-fuse crashes
- It crashes because ObjectCacher does not check error code of osd read request. The bug is already fix in master branc...
05/05/2015
- 04:48 PM Bug #11481: "mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->replay transition
- The mons appear to be rather unhappy in this test run, but there are no thrashers or messenger failure settings turne...
- 04:16 PM Bug #11481: "mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->replay transition
This looks like a mon bug of some kind. After a leader election, a bunch of routed messages are being resent to th...- 03:16 PM Bug #11481: "mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->replay transition
- Weird! This is an MDS going from state active to state standby, then back again. In the final stage it notices that...
- 02:41 PM Bug #11462: kernel: crash (when MDS died?)
- I didn't do a character-by-character diff, but I checked most of the function names and they were the same, yes. :)
- 07:35 AM Bug #11462: kernel: crash (when MDS died?)
- Greg Farnum wrote:
> FYI this popped up again in teuthology-2015-04-30_23:04:01-fs-next-testing-basic-multi/870815
... - 01:48 PM Bug #11510 (Need More Info): ceph-fuse crashes
- It still shouldn't have crashed and I'm not sure why that would have happened unless the kernel gave us ENOMEM or som...
- 07:13 AM Bug #11510: ceph-fuse crashes
It works when I use the user with caps ...- 07:03 AM Bug #11510: ceph-fuse crashes
- this bug is fixed by commit 540346d4 (osdc: In _readx() only no error can tidy read result).
- 12:38 PM Bug #11482: kclient: intermittent log warnings "client.XXXX isn't responding to mclientcaps(revoke)"
- * hammer backport https://github.com/ceph/ceph/pull/4481
- 09:50 AM Bug #10712: TestFlush intermittent failure on scatter_writebehind event
- http://pulpito.ceph.com/loic-2015-05-04_11:27:13-fs-hammer-backports---basic-multi/874954
- 09:48 AM Bug #10712 (Pending Backport): TestFlush intermittent failure on scatter_writebehind event
- * hammer backport https://github.com/ceph/ceph-qa-suite/pull/429
05/04/2015
- 01:57 PM Bug #11462: kernel: crash (when MDS died?)
- FYI this popped up again in teuthology-2015-04-30_23:04:01-fs-next-testing-basic-multi/870815
- 10:09 AM Bug #11522: FLUSHSNAP_ACKs are not always sent back on unexpected FLUSHSNAP
- client creates cap_snap and sends FLUSHSNAP message to MDS even when snap notification contains no new snap (snap rem...
Also available in: Atom