Project

General

Profile

Activity

From 05/18/2010 to 06/16/2010

06/16/2010

11:56 AM Feature #206 (New): make a 'soft' mode
On Wed, 16 Jun 2010, Peter Niemayer wrote:
> Hi,
>
> trying to "umount" a formerly mounted ceph filesystem that h...
Sage Weil
09:57 AM Bug #204 (Resolved): crush update crash
> mkcephfs -c /etc/ceph/ceph.conf --mkbtrfs (1mds, 1mon, 2osd (2 phyiscal disks$
> start ceph
> mount ceph fs
> wr...
Sage Weil

06/14/2010

04:07 PM Bug #200: umount hangs with clustered mds
Seems to be waiting forever here:
wait_event(mdsc->cap_flushing_wq, check_cap_flush(mdsc, want_flush));
Yehuda Sadeh
03:21 PM Bug #200 (Resolved): umount hangs with clustered mds
Happens on both current master and the unstable branches (6/14). The umount at the following scenario never exits:
...
Yehuda Sadeh

06/13/2010

10:45 AM Bug #194: MOSDMap memory leak?
fixed by commit:ae32be31341a5fecfa16c5b3eb78095207182cce Sage Weil
10:43 AM Bug #194 (Resolved): MOSDMap memory leak?
Sage Weil

06/10/2010

10:13 PM Bug #193: protocol error after control-c
Yehuda Sadeh wrote:
> This was on the rbd branch, does it also happen on the unstable branch? The wait_for_completio...
Sage Weil
05:18 PM Bug #193: protocol error after control-c
This was on the rbd branch, does it also happen on the unstable branch? The wait_for_completion_killable() might have... Yehuda Sadeh
04:42 PM Bug #193 (Resolved): protocol error after control-c
Saw this on wido's machine:... Sage Weil
10:11 PM Bug #194 (Resolved): MOSDMap memory leak?
Code audit looks ok. Could this be a false alarm somehow?... Sage Weil
12:28 PM Feature #190: krbd: DISCARD support
Yeah, it's called 'discard'. In order to get a block device to support it we need to do something like:
queue_flag_...
Yehuda Sadeh

06/09/2010

02:40 PM Feature #190 (Resolved): krbd: DISCARD support
TRIM does exist, somewhere in Linux. RBD should support it so if the client system is using a supporting filesystem, ... Greg Farnum

06/08/2010

10:59 PM Bug #189 (Resolved): leaked dentry
Running unstable, commit:e041c5f
I think this is triggered by bonnie.sh. The bonnie.sh below is the dir where the...
Sage Weil

06/07/2010

12:04 PM Bug #182 (Resolved): VFS: Busy inodes after unmount of ceph.
This was actually an mds bug. It wasn't responding to a client_caps flushsnap. Fixed in ceph.git commit:4ecd8facd91... Sage Weil
09:43 AM Bug #182 (Resolved): VFS: Busy inodes after unmount of ceph.
Am Sun, 06 Jun 2010 21:10:28 -0700 schrieb Sage Weil:
> On Sat, 5 Jun 2010, Thomas Mueller wrote:
>> hi
>>
>> ...
Sage Weil

06/03/2010

01:54 PM Bug #111 (Resolved): handle EAGAIN from osd
Looks to me like this can't actually happen. The function ReplicatedPG::find_object_context can return EAGAIN, and an... Greg Farnum

06/02/2010

11:09 PM Bug #111: handle EAGAIN from osd
I agree. Though we should differentiate between two cases. One is that we initiate the EAGAIN (e.g., when reached a l... Yehuda Sadeh
11:00 PM Bug #111: handle EAGAIN from osd
Yehuda Sadeh wrote:
> We should make the client handle it, but we should also try to make sure that the osd doesn't ...
Sage Weil
10:51 PM Bug #111: handle EAGAIN from osd
We should make the client handle it, but we should also try to make sure that the osd doesn't ever return it (at leas... Yehuda Sadeh
10:40 PM Bug #38 (Resolved): rm -r failure
I'm going to chalk this one up to commit:13a4214cd9ec14d7b77e98bd3ee51f60f868a6e5 (the d_subdirs ordering problem) an... Sage Weil
10:37 PM Bug #69: ceph: ffff88001976ba50 auth cap (null) not mds0 ???
For a multi-mds system, this can be caused if we are between an export and import on a cap.
But when I saw this th...
Sage Weil

06/01/2010

03:08 PM Cleanup #168 (Closed): new truncate sequence
The new truncate sequence was merged for 2.6.35-rc1. (->truncate is deprecated?)
We need to see what updates (i...
Sage Weil
12:48 PM Bug #166 (Can't reproduce): Failing some pjd tests?
Best guess is an unsychronized client/server clock. Greg Farnum
11:55 AM Feature #42: Resize of rbd image
There is a refresh /sys/class/.. interface, however, resizing of an image should be lock protected, and probably shou... Yehuda Sadeh
10:28 AM Bug #164 (Resolved): memory leak in statfs
Fixed.
commit: 5d97634a3b824ed746ba0d5441bf3d1d65f490a0
Yehuda Sadeh

05/31/2010

03:41 AM Bug #166 (Can't reproduce): Failing some pjd tests?
Failed Test Stat Wstat Total Fail Failed List of Failed
----------------------------------------...
Greg Farnum

05/29/2010

10:07 PM Bug #144: GPF at con_close_socket+0x40/0x9f
Yeah, i think this is related to #163, but i still don't know how that would cause this problem. The basic issue is ... Sage Weil
09:58 PM Bug #163 (Resolved): put_osd on umount can use client after free
fixed by commit:a922d38fd10d55d5033f10df15baf966e8f5b18c Sage Weil
04:40 PM Bug #163: put_osd on umount can use client after free
That would explain bug #144:
[12836.065773] Last user: [<ffffffffa01106b9>](put_osd+0x3f/0x82 [ceph])
Yehuda Sadeh
09:25 AM Bug #163 (Resolved): put_osd on umount can use client after free
the connection can be put after ceph_client is freed, at which point this will dereference a bad pointer... Sage Weil
09:57 PM Bug #164 (Resolved): memory leak in statfs
workload dbench
master branch...
Sage Weil

05/28/2010

01:44 PM Bug #162 (Can't reproduce): list bug during shrink_dcache_for_umount
ceph3, rsync workload.
unstable circa 5/25...
Sage Weil
01:12 PM Bug #141 (Resolved): ERESTARTSYS on mds update operations cause bad results
Sage Weil
10:49 AM Bug #141: ERESTARTSYS on mds update operations cause bad results
I assume that switching to wait_for_completion_killable() fixed this one?
related commit: 0ec773c7f9ecbff4b75c3c68...
Yehuda Sadeh
12:07 PM Bug #148 (Resolved): iozone failure
yeah, this has survived 24 hours, whereas before it was failing after an hour or two. Sage Weil
12:00 PM Bug #144: GPF at con_close_socket+0x40/0x9f
What was the specific scenario? Can it be reproduced? Yehuda Sadeh
11:17 AM Bug #150: order:1 page allocation failure
Too many dirty pages? Too many pending osd requests?
We should probably try to get how many osds requests were in-fl...
Yehuda Sadeh
11:07 AM Bug #147: lockdep: possible irq lock inversion dependency w/ osdc->request_mutex and con->mutex
nfs uses the rpc code, which, if I understand it correctly initializes a work queue for socket allocation and connect... Yehuda Sadeh

05/27/2010

09:26 PM Bug #148: iozone failure
I think this may have been caused by the mds request signal handling? It isn't happening on the latest unstable. Sage Weil
09:23 PM Bug #147: lockdep: possible irq lock inversion dependency w/ osdc->request_mutex and con->mutex
We could have a pool of preallocated sockets.. but that could be exhausted.
Or duplicate a bunch of socket creation ...
Sage Weil
02:57 PM Bug #157 (Resolved): fix auth_x memory leak
fixed by 'ceph: fix leak of osd authorizer'. the osd_client put_osd() didn't clean up the ceph_authorizer. Sage Weil
01:14 PM Bug #157 (Resolved): fix auth_x memory leak
this is on ceph1, qa loopall.sh workload, unstable branch.... Sage Weil

05/25/2010

03:42 PM Bug #143 (Resolved): avoid resending requests on mon ticket renewal
fixed by 'ceph: do not resend mon requests on auth ticket renewal' and 'ceph: renew auth tickets before they expire' Sage Weil
02:37 PM Bug #147: lockdep: possible irq lock inversion dependency w/ osdc->request_mutex and con->mutex
What it actually means is that sock_alloc_inode is being called under the kswapd context and it does an allocation wi... Yehuda Sadeh
10:34 AM Bug #147 (Resolved): lockdep: possible irq lock inversion dependency w/ osdc->request_mutex and c...
... Sage Weil
02:01 PM Bug #150 (Can't reproduce): order:1 page allocation failure
workload was rsync to a ceph mount.
ceph3 mounting cosd0:/
not sure which version. probably unstable from last wee...
Sage Weil
10:35 AM Bug #148 (Resolved): iozone failure
on ceph4, running
* rbd 3a6e756 ceph-rbd: snapshots support...
Sage Weil
10:32 AM Bug #106 (Resolved): msgpool depletion?
Sage Weil
10:28 AM Bug #106: msgpool depletion?
On what version did it happen? Do we have any reproducible scenario? Yehuda Sadeh

05/21/2010

01:39 PM Bug #141: ERESTARTSYS on mds update operations cause bad results
It seems pretty important to me that users be able to abort MDS requests -- if for some reason part of the filesystem... Greg Farnum
08:53 AM Bug #141 (Resolved): ERESTARTSYS on mds update operations cause bad results
- process does a create
- gets signal and returns ERESTARTSYS before reply comes back
- kernel retries the operatio...
Sage Weil
12:56 PM Cleanup #142 (Resolved): reuse message for mon subscribe
Sage Weil
12:20 PM Cleanup #142 (Resolved): reuse message for mon subscribe
no need to allocate a fresh message each time around Sage Weil
12:50 PM Bug #144 (Can't reproduce): GPF at con_close_socket+0x40/0x9f
... Sage Weil
12:31 PM Bug #143 (Resolved): avoid resending requests on mon ticket renewal
Sage Weil
12:22 PM Bug #66 (Resolved): BUG_ON(req->r_reply) at fs/ceph/mds_client.c:1841!
Sage Weil
12:22 PM Bug #66 (In Progress): BUG_ON(req->r_reply) at fs/ceph/mds_client.c:1841!
Sage Weil
08:58 AM Bug #139: BUG ceph_dentry_info: Objects remaining on kmem_cache_close()
Looks this isn't fixed after all (see #63). Maybe a dentry is allocated but never added to the dcache? Sage Weil
02:08 AM Bug #139 (Resolved): BUG ceph_dentry_info: Objects remaining on kmem_cache_close()
After unmounting my Ceph filesystem and removing my kernel module i got the following message:... Wido den Hollander

05/18/2010

11:49 AM Feature #23: fcntl/flock advisory lock support
Ahah, file_lock's fl_nspid pointer isn't filled in before calling the filesystem's lock handlers. I've fixed that so ... Greg Farnum
10:03 AM Feature #23 (In Progress): fcntl/flock advisory lock support
Found some issues with recovery after all; working on them now. Greg Farnum
08:37 AM Feature #19 (Resolved): rbd
Sage Weil
 

Also available in: Atom