Project

General

Profile

Activity

From 10/05/2014 to 11/03/2014

11/03/2014

07:55 PM Feature #1398: qa: multiclient file io test
A first pass of this is in origin/wip-multiclientio-wusui Anonymous
12:10 PM Bug #9997 (Resolved): test_client_pin case is failing
http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-02_23:04:01-fs-next-testing-basic-multi/583588/
RuntimeErro...
Greg Farnum
12:05 PM Bug #9995 (Resolved): failing test_filelock
http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-02_23:04:01-fs-next-testing-basic-multi/583589/
It's gettin...
Greg Farnum
11:43 AM Bug #9994: ceph-qa-suite: nfs mount timeouts
http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-31_23:10:01-knfs-giant-testing-basic-multi/582459/
http://q...
Greg Farnum
11:34 AM Bug #9994 (Resolved): ceph-qa-suite: nfs mount timeouts
... Greg Farnum
11:27 AM Bug #9977: cephfs-journal-tool falsely reports invalid start_ptr
https://github.com/ceph/ceph/pull/2853 John Spray
11:27 AM Bug #9977 (Fix Under Review): cephfs-journal-tool falsely reports invalid start_ptr
PR up for next, probably also worth backporting to giant as without it journal-tool is pretty useless on filesystems ... John Spray

10/31/2014

05:10 PM Tasks #3680 (Rejected): deduplication in ceph
we should discuss this on the email list Sage Weil
10:48 AM Bug #9977 (Resolved): cephfs-journal-tool falsely reports invalid start_ptr

This is happening when the journal expire_pos isn't at an object boundary. The expected start_ptr counter is being...
John Spray
10:03 AM Feature #1398: qa: multiclient file io test
... Anonymous

10/30/2014

10:11 AM Feature #1398: qa: multiclient file io test
A task that implements this could be useful for testing calamari as well (I manually did some of the things needed he... Anonymous
10:08 AM Feature #1398 (In Progress): qa: multiclient file io test
Anonymous
09:37 AM Feature #9881 (In Progress): mds: admin command to flush the mds journal
John Spray

10/29/2014

09:34 PM Feature #9940: uclient: be more robust when dealing with outstanding RADOS IO and stale caps
While in the general case it is necessary to fence clients that have become unresponsive to the MDS, this type of "so... John Spray
09:23 PM Feature #9940 (New): uclient: be more robust when dealing with outstanding RADOS IO and stale caps
If we've given IO to the Objecter and our caps go stale, we need to do something to handle it. Greg Farnum
09:06 PM Bug #1666 (Resolved): hadoop: time-related meta-data problems
We now take client timestamps for almost everything, so this should no longer be a problem and I'm closing it unless ... Greg Farnum
11:04 AM Bug #9935: client: segfault on ceph_rmdir path "/"
Yes, EBUSY is what a local filesystem gives you, so that sounds right to me. John Spray
10:48 AM Bug #9935 (Resolved): client: segfault on ceph_rmdir path "/"
A segfault occurs when removing the root directory. What is the expected behavior? I think -EBUSY is what makes sense. Noah Watkins

10/28/2014

12:43 PM Bug #9900 (Duplicate): Failure in multiple_rsync (directories wrongly appear changed)
I imagine this is a dup of #9894? Greg Farnum
12:18 PM Bug #9800 (Pending Backport): client-limits test is not passing
I don't know that we need/want to try and push this in before release (although since it's all guarded inside of a br... Greg Farnum
05:29 AM Bug #9800 (Resolved): client-limits test is not passing
... John Spray
11:12 AM Bug #8255 (Fix Under Review): mds: directory with missing object cannot be removed
https://github.com/ceph/ceph/pull/2821 Zheng Yan

10/27/2014

06:17 PM Feature #4138 (Fix Under Review): MDS: forward scrub: add functionality to verify disk data is co...
This bit at least has been isolated and put into a PR:
https://github.com/ceph/ceph/pull/2814
Greg Farnum
04:23 PM Bug #9870 (Resolved): kernel: not handling cap_flush_ack messages properly
Zheng Yan
10:28 AM Bug #9904 (Resolved): Don't crash MDS on clients sending messages with bad seq
Currently in Server::handle_client_session, we do this:... John Spray
10:14 AM Feature #9903 (Resolved): Recover lost dirfrag via data pool

[While the MDS cluster is offline and journal has been flushed if necessary]
Given that a particular dirfrag obj...
John Spray
09:36 AM Bug #9900 (Duplicate): Failure in multiple_rsync (directories wrongly appear changed)

http://pulpito.ceph.com/teuthology-2014-10-24_23:08:01-kcephfs-giant-testing-basic-multi/570840/
http://pulpito.ce...
John Spray
06:05 AM Bug #9800: client-limits test is not passing
https://github.com/ceph/ceph/pull/2809
http://pulpito.front.sepia.ceph.com/john-2014-10-27_13:05:29-fs:recovery-wip-...
John Spray

10/24/2014

11:14 AM Bug #9884: too many files in /usr for multiple_rsync.sh
Yeah, just cutting it down to a more predictable/smaller directory sounds good to me. Greg Farnum
10:50 AM Bug #9884: too many files in /usr for multiple_rsync.sh
one failure http://pulpito.ceph.com/teuthology-2014-10-20_23:04:01-fs-giant-distro-basic-multi/562537/ Zheng Yan
10:49 AM Bug #9884 (Closed): too many files in /usr for multiple_rsync.sh
for example, plana81 has 60k files in /usr, but plana90 has 90k files in /usr. perhaps multiple_rsync should /usr/src... Zheng Yan
09:53 AM Feature #3882 (Rejected): Hide snapshot directory name in mount/mtab
we can now restrict snap access by uid... Sage Weil
09:49 AM Feature #9883 (Resolved): journal-tool: smarter scavenge (conditionally update dir objects)
Sage Weil
09:42 AM Feature #9881 (Resolved): mds: admin command to flush the mds journal
Sage Weil
09:41 AM Feature #9880 (Resolved): mds: more gracefully handle EIO on missing dir object
Sage Weil

10/23/2014

01:47 PM Bug #9869 (Pending Backport): Client: not handling cap_flush_ack messages properly
I tested this manually with a patch that sets the starting tid value to 65535 and looking at the logs. That causes im... Greg Farnum
12:47 PM Bug #9870: kernel: not handling cap_flush_ack messages properly
Zheng Yan

10/22/2014

05:34 PM Bug #9870 (Resolved): kernel: not handling cap_flush_ack messages properly
This is the analogue to #9869, which Zheng tells me is also a problem in the kernel. We need to downcast the message ... Greg Farnum
05:30 PM Bug #9869: Client: not handling cap_flush_ack messages properly
Waiting for this to build so it can be tested. Greg Farnum
05:28 PM Bug #9869 (Resolved): Client: not handling cap_flush_ack messages properly
We saw a log segment that contained this:... Greg Farnum

10/21/2014

03:22 PM Feature #9557 (Fix Under Review): mds: verify backtrace on fetch_dir
Zheng Yan
10:44 AM Feature #9557 (In Progress): mds: verify backtrace on fetch_dir
Greg Farnum
11:43 AM Bug #8809 (Can't reproduce): uclient: memory leak
maybe fixed by 2313ce1d024361fd7f4d2cbca789010f0fe0faad Zheng Yan
10:55 AM Bug #9674: nightly failed multiple_rsync.sh
commit:477073aba1da880dfd0b8c82f4792788579f28b9 in master and commit:44ce33c12443909b02c7ee451ad45400f55d53c9 in giant Greg Farnum

10/20/2014

01:23 PM Feature #414 (Resolved): ceph-fuse: implement file locking
Zheng Yan
01:22 PM Bug #8576: teuthology: nfs tests failing on umount
teuthology commit:4f2957c42d0f76a399cb26c660ede9243c095779 runs those commands as well as the previous ones. Greg Farnum
01:02 PM Bug #9679 (Closed): Ceph hadoop terasort job failure
Fixed in cephfs-hadoop repo. Noah Watkins
11:15 AM Bug #9800: client-limits test is not passing

Same failure:
http://pulpito.front.sepia.ceph.com/teuthology-2014-10-17_23:04:02-fs-giant-distro-basic-multi/555...
John Spray

10/19/2014

07:20 PM Bug #9341 (Pending Backport): MDS: very slow rejoin
Hmm, we didn't put this in Giant initially because we were trying not to perturb it. Master hasn't been run through t... Greg Farnum
06:45 PM Bug #9341 (Fix Under Review): MDS: very slow rejoin
Please include this fix to 0.87 which is affected just as badly as 0.80.x.
On 0.87 MDS stuck in "rejoin" for hours a...
Dmitry Smirnov

10/16/2014

01:54 PM Bug #9800 (Resolved): client-limits test is not passing
/a/teuthology-2014-10-13_23:04:01-fs-giant-distro-basic-multi/547170
The client isn't dropping its caps:...
Greg Farnum
10:50 AM Feature #4137: MDS: Implement a forward-scrubbing mechanism.
I realized today that we probably want to optionally scrub directories that were renamed into place following a scrub... Greg Farnum

10/15/2014

12:39 AM Bug #8576: teuthology: nfs tests failing on umount
I notice that if I execute 'service nfs stop' first, umounting cephfs always successes. 'service nfs stop' runs two c... Zheng Yan

10/14/2014

06:15 PM Bug #9674: nightly failed multiple_rsync.sh
rsync asks us to see previous errors;) yes, I think sudo should work Zheng Yan
02:36 PM Bug #9674: nightly failed multiple_rsync.sh
Well, that would make sense. How did you find those in the log?
We should probably just run this as sudo or someth...
Greg Farnum
06:30 AM Bug #9674: nightly failed multiple_rsync.sh
... Zheng Yan
06:20 AM Feature #9755: Fence late clients during reconnect timeout
There can be certain cases where a client can reconnect after being evicted, e.g. if:
* the client didn't hold an...
John Spray

10/13/2014

04:50 PM Feature #414 (Fix Under Review): ceph-fuse: implement file locking
Zheng Yan
12:52 PM Feature #9755: Fence late clients during reconnect timeout
Hmm, I like the basic thrust of this, but I'm a little concerned as well — we have other tickets to let clients recon... Greg Farnum
03:39 AM Feature #9755 (Resolved): Fence late clients during reconnect timeout

During reconnect, MDSs terminate the sessions of any clients which fail to reconnect within the window. Because wh...
John Spray
03:16 AM Feature #9754 (Resolved): A 'fence and evict' client eviction command

Currently the "session evict" operation on the MDS admin socket will terminate the session, and release any capabil...
John Spray

10/10/2014

04:11 PM Bug #9679: Ceph hadoop terasort job failure
I do believe that Hadoop kills the clients after they reach a point that the run-time believes everything has been fl... Noah Watkins
02:02 PM Bug #9679: Ceph hadoop terasort job failure
Looking at the bad client (11139), the first thing I notice is that the messaging is way backed up. What's the networ... Greg Farnum
09:13 AM Bug #9679: Ceph hadoop terasort job failure
Here is the directory listing. All of the files should be the same size.... Noah Watkins
07:18 AM Bug #9692 (Resolved): ACL workunit syntax error
Zheng Yan

10/09/2014

12:07 PM Bug #9679: Ceph hadoop terasort job failure
empty fs:... Noah Watkins
08:21 AM Bug #9679: Ceph hadoop terasort job failure
Thanks Huamin. Yeh, It looks like some writes are being lost, probably due to an unclean shutdown. I'll get some trac... Noah Watkins
08:06 AM Bug #9679: Ceph hadoop terasort job failure
For comparison, teragen files on CephFS
./hadoop/bin/hadoop fs -ls /in-dir-3
14/10/09 08:05:05 WARN util.NativeC...
Huamin Chen
07:04 AM Bug #9679: Ceph hadoop terasort job failure
Run the same tests on HDFS 2.4.1, thoguh on a different setup. Terasort finished without any problem.
Cmd:
./hado...
Huamin Chen

10/08/2014

11:08 PM Bug #9679: Ceph hadoop terasort job failure
missing one of these?... Noah Watkins
10:46 PM Bug #9679: Ceph hadoop terasort job failure
My bet at this point is on the generation of the input data set. Teragen creates a file with X 100byte entries. When ... Noah Watkins
07:28 AM Feature #9437 (Resolved): make 'ceph tell mds.* ...' work, deprecate 'ceph mds tell * ...'
... John Spray

10/07/2014

07:28 PM Bug #9692 (Resolved): ACL workunit syntax error
http://pulpito.ceph.com/gregf-2014-10-06_19:59:42-kcephfs-wip-9628-testing-basic-multi/531900... Greg Farnum
07:26 PM Bug #9628 (Resolved): mds: race between ms_handle_accept() and ms_handle_reset()
Merged to master in commit:1b7fae7b2953649564a9e226b4abedad0ce652cc Greg Farnum
09:54 AM Bug #9679: Ceph hadoop terasort job failure
https://issues.apache.org/jira/browse/MAPREDUCE-2018 Noah Watkins
09:53 AM Bug #9679: Ceph hadoop terasort job failure
https://svn.apache.org/repos/asf/hadoop/common/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/terasor... Noah Watkins
09:39 AM Bug #9679: Ceph hadoop terasort job failure
Teragen command:
./hadoop/bin/hadoop jar ./hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar t...
Huamin Chen
09:22 AM Bug #9679: Ceph hadoop terasort job failure
Thanks for adding this. What command did you use to generate the input? Noah Watkins
09:04 AM Bug #9679 (Closed): Ceph hadoop terasort job failure
Hadoop version: 2.4.1
Ceph version:
ceph --version
ceph version 0.85-986-g031ef05 (031ef0551ebc98d824075558e884...
Huamin Chen
07:03 AM Bug #9636 (Duplicate): segfault in CInode::get_caps_allowed_for_client
Greg Farnum
07:02 AM Bug #9562 (Resolved): Lockdep assertion in Filer purge
Backported to giant:... John Spray
07:02 AM Bug #8576 (Need More Info): teuthology: nfs tests failing on umount
Greg Farnum

10/06/2014

06:27 PM Bug #9674: nightly failed multiple_rsync.sh
rsync return codes aren't standard error codes. The man page says that 23 means... Greg Farnum
05:59 PM Bug #9674: nightly failed multiple_rsync.sh
#define ENFILE 23 /* File table overflow */
maybe we should adjust ulimit
Zheng Yan
02:23 PM Bug #9674 (Resolved): nightly failed multiple_rsync.sh
http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-03_23:04:01-fs-giant-distro-basic-multi/527949/... Greg Farnum
 

Also available in: Atom