Project

General

Profile

Activity

From 03/27/2013 to 04/25/2013

04/25/2013

06:34 PM Bug #4742: mds: stuck clientreplay request
Yeah, we've discussed this some on github around wip-4742 and on irc. :) Greg Farnum
06:31 PM Bug #4742: mds: stuck clientreplay request
Looks like a client bug, it may add cap releases to the replay requests. (encode_cap_releases() should be called when... Zheng Yan
10:38 AM Bug #4742: mds: stuck clientreplay request
Logs for two runs, one is stuck in replay from a setattr, the other is stuck in replay from a rename.
Sam Lang

04/23/2013

03:53 PM Feature #4799 (Resolved): Client Security for CephFS
As discussed on the #ceph IRC channel with gregaf and others, I would find some added level of client security in Cep... Mike Kelly
01:34 PM Bug #4721 (Resolved): libcephfs tests fail when using ceph-deploy
strange that it works fine on the latest next branch [0.60-624-g426e3be-1precise] ... Tamilarasi muthamizhan
10:29 AM Bug #4742: mds: stuck clientreplay request
Attaching mds log from mds stuck on clientreplay. Looks like setattr is gets put on the inode waiting list by the lo... Sam Lang

04/21/2013

06:12 AM Bug #4753: mds/Locker.cc: 4167: FAILED assert(0)
Additional: I resolve it runtime, changing assert(0) to some lock (IMHO first in this case) on one node and found for... Denis kaganovich

04/19/2013

10:17 AM Bug #4105: mds: fix up the Dumper
This has annoyed me a couple more times and I think it's now at the top of the queue, so here we go again. Greg Farnum
10:08 AM Bug #4746: client: invalidate callback can deadlock
pushed wip-fuse to ceph-client.git Sage Weil
09:42 AM Bug #4753: mds/Locker.cc: 4167: FAILED assert(0)
You mean file_eval should just short-circuit if it's scanning? That seems like the most sensible place for it, but I'... Greg Farnum
09:31 AM Bug #4753: mds/Locker.cc: 4167: FAILED assert(0)
yeah, that transition doesn't make sense. i think it should do nothing in the scan state.. Sage Weil
09:05 AM Bug #4753: mds/Locker.cc: 4167: FAILED assert(0)
file_eval is trying to move ifile from "scan" to "mixed" in order to serve up the client caps, and scatter_mix doesn'... Greg Farnum
02:13 AM Bug #4601: symlink with size zero
I was looking at the <inode>.<frag>_head* file in the osd that held the directory where the link was stored. As it t... Alexandre Oliva

04/18/2013

05:22 PM Bug #4753 (Resolved): mds/Locker.cc: 4167: FAILED assert(0)
Every mds crashed after some startup checks: "mds/Locker.cc: 4167: FAILED assert(0)":
mds/Locker.cc: 4167: FAILED ...
Denis kaganovich
05:12 PM Bug #4746: client: invalidate callback can deadlock
The suggestion from Maxim is to modify fuse to serialize reads and invalidate via a mutex. That ought to do the tric... Sage Weil
09:37 AM Bug #4746: client: invalidate callback can deadlock
It's not any of our internal locking that are getting stuck; it's the VFS inode mutexes in combination with us. If I ... Greg Farnum
07:31 AM Bug #4746: client: invalidate callback can deadlock
The invalidate is queued in a separate thread, and when we call the invalidate, we don't have the client lock held. ... Sam Lang
05:06 PM Bug #4601: symlink with size zero
>I looked a bit in the ceph-osd file holding the directory that contains the symlink, and I can see ^Q in the yes_hea... Greg Farnum
04:57 PM Bug #1945 (Can't reproduce): blogbench hang on caps
We haven't seen this in a long time (at least, that's marked here), and there's been a ton of work here over the last... Greg Farnum
04:39 PM Bug #4732: uclient: client/Inode.cc: 126: FAILED assert(cap_refs[c] > 0)
This was in the async invalidate thread, so I'm turning this down. It should probably be investigated alongside/after... Greg Farnum
04:34 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
Okay, pushed the update for more debugging, and am downgrading this to "High" since it only appears under so many fai... Greg Farnum
04:17 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
Also, both of these are the same job as the first incident was: fsstress workunit on ceph-fuse, messenger failure inj... Greg Farnum
04:15 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
Those machines are cleared out again, of course (d'oh!). Next time we see this we need to gather up everything we can... Greg Farnum
04:03 PM Bug #4741: MDS: stuck in clientreplay
Interesting; on #4742 it was clearly waiting on a request because it kept saying "still have 1 active replay requests... Greg Farnum
03:57 PM Bug #4741 (Duplicate): MDS: stuck in clientreplay
This is a duplicate of #4742. It looks like setattr is the culprit. I was able to generate a core file of the mds w... Sam Lang
11:13 AM Bug #4741: MDS: stuck in clientreplay
Also /a/teuthology-2013-04-18_01:01:07-fs-next-testing-basic/15101 Greg Farnum
03:58 PM Bug #4721 (Need More Info): libcephfs tests fail when using ceph-deploy
(Trying to track the responsibility flow more clearly.) Greg Farnum
03:19 PM Bug #4721: libcephfs tests fail when using ceph-deploy
Have you reproduced this, Tamil? Since all the tests are failing I'm pretty sure this is some kind of authentication ... Greg Farnum
03:57 PM Bug #4742 (In Progress): mds: stuck clientreplay request
Sam Lang
03:57 PM Bug #4742: mds: stuck clientreplay request
Marked #4741 as a duplicate of this bug. It looks like setattr is the culprit. I was able to generate a core file o... Sam Lang
01:57 PM Bug #4722: kernel BUG at fs/ceph/caps.c:1006 invalid opcode: 0000
I did a checkout of v3.5, and caps.c:1006 is... Greg Farnum
01:37 PM Bug #4738: libceph: unlink vs. readdir (and other dir orders)
I don't believe locking is implemented yet via the Samba VFS bindings, since we don't have a userspace implementation... Greg Farnum
01:27 PM Bug #4738: libceph: unlink vs. readdir (and other dir orders)
On top only:
vfs objects = scannedonly ceph
And if i switching to:
vfs objects = scannedonly
or:
vfs objects = c...
Denis kaganovich
11:03 AM Bug #3637 (Resolved): client: not issuing caps for with clients doing shared writes
Merged into next in commit:efbe2e8b55ba735673a3fdb925a6304915f333d8 Greg Farnum

04/17/2013

07:42 PM Bug #4713 (Resolved): mds: hang related to access from two clients
The following have been committed to the "testing" branch
of the ceph-client git repository. With them in place
I ...
Alex Elder
07:39 PM Bug #4706 (Resolved): kclient: Oops when two clients concurrently write a file
The following have been committed to the ceph-client
"testing" branch:
8f68229 libceph: change how "safe" callbac...
Alex Elder
07:38 PM Bug #4679 (Resolved): ceph: hang while running blogbench on mira nodes
Sorry Greg, I should have been in better communication
with you. I have been testing these all afternoon and
Sage ...
Alex Elder
03:48 PM Bug #4679: ceph: hang while running blogbench on mira nodes
I believe Sage has been over all these now. I'm trying to go over the newest versions off the mailing list as well, n... Greg Farnum
07:20 PM Bug #4726 (Can't reproduce): mds: segv during blogbench in remove_pending_backtraces
I wasn't able to reproduce this after more than 200 runs, so I'm marking it as Can't reproduce for now. Sam Lang
05:37 PM Bug #3597 (Resolved): ceph-fuse: denying root access
Oh, this was a bug that got fixed in commit:d87035c0c4ff, included in v0.60. Greg Farnum
05:05 PM Bug #4746: client: invalidate callback can deadlock
Hmm, you're right, this is a more fundamental problem. Sage Weil
04:50 PM Bug #4746: client: invalidate callback can deadlock
Maybe; we didn't think this through much beyond going "yep, that's broken".
However, I think we can queue up the i...
Greg Farnum
04:44 PM Bug #4746: client: invalidate callback can deadlock
"We may need to introduce a second locking layer to deal with this, that covers draining out all VFS requests before ... Sam Lang
03:04 PM Bug #4746 (Resolved): client: invalidate callback can deadlock
I saw this when testing the fix for #3637. We appear to be (correctly) safe against deadlocks on our own locks, but w... Greg Farnum
04:12 PM Feature #4326: qa: add samba + (kclient|ceph-fuse) to suite
I think you might have mentioned you were trying to do this while you were working on the samba vfs-based ones? If no... Greg Farnum
04:09 PM Bug #1878 (Resolved): ceph.ko doesn't setattr (lchown, utimes) on symlinks
I've pushed this to our testing branch. It's presently commit:baf0169b77f6a0c384a15fb425e5700fb0239e89, although that... Greg Farnum
03:59 PM Bug #3637: client: not issuing caps for with clients doing shared writes
And he gave me a reviewed-by tag. Will merge this tomorrow morning after some more testing. Greg Farnum
03:53 PM Bug #3637: client: not issuing caps for with clients doing shared writes
This now appears to be passing (I've got it continuing to loop in the background), but it needs review and merging. S... Greg Farnum
03:05 PM Bug #3637: client: not issuing caps for with clients doing shared writes
That latest issue was #4746. Turning off the callback and testing again... Greg Farnum
05:42 AM Bug #3637: client: not issuing caps for with clients doing shared writes
Zheng Yan wrote:
> there are only 4 states that allow Fw caps, they are MIX, MIX_EXCL, EXCL and EXCL_MIX. they all a...
Zheng Yan
05:39 AM Bug #3637: client: not issuing caps for with clients doing shared writes
Greg Farnum wrote:
> I don't remember how all the locking works when you have multiple writers, but I don't believe ...
Zheng Yan
10:17 AM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
And also /a/teuthology-2013-04-16_01:00:52-fs-next-testing-basic/13665 Greg Farnum
09:26 AM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
This just happened again at /a/teuthology-2013-04-17_01:00:56-fs-master-testing-basic/14248 (it's still running, for ... Greg Farnum
10:12 AM Bug #4742: mds: stuck clientreplay request
Looks like a setattr and a create:
ubuntu@plana72:~$ sudo ceph --admin-daemon /var/run/ceph/ceph-client.0.19374.as...
Sam Lang
09:36 AM Bug #4742 (Resolved): mds: stuck clientreplay request
/a/teuthology-2013-04-17_01:00:56-fs-master-testing-basic/14246
It has a single request which isn't completing; wh...
Greg Farnum
10:06 AM Cleanup #4744 (In Progress): mds: pass around LogSegments via std::shared_ptr
These really ought to be ref-counted in some way to prevent early expiry. Greg Farnum
09:34 AM Bug #4741 (Duplicate): MDS: stuck in clientreplay
/a/teuthology-2013-04-17_01:00:56-fs-master-testing-basic/14249
I can't find any hints, except that it is in fact ...
Greg Farnum
09:00 AM Feature #3243 (In Progress): qa: test samba reexport via libcephfs vfs plugin in teuthology
Sam Lang
08:58 AM Feature #3242 (Resolved): samba: push plugin upstream
Posted patches to mailing list:
https://lists.samba.org/archive/samba-technical/2013-April/091651.html
Sam Lang
08:01 AM Bug #4738 (Need More Info): libceph: unlink vs. readdir (and other dir orders)
Denis,
I've seen similar behavior with the smbtorture dir1 test, but it happens without the vfs_ceph module. Does...
Sam Lang
04:54 AM Bug #4738 (Closed): libceph: unlink vs. readdir (and other dir orders)
Combining (stacking) in samba vfs_scannedonly with vfs_ceph, I experienced some bugs, looks like libceph readdir prob... Denis kaganovich

04/16/2013

06:41 PM Bug #3637: client: not issuing caps for with clients doing shared writes
Greg Farnum wrote:
> I don't remember how all the locking works when you have multiple writers, but I don't believe ...
Zheng Yan
03:43 PM Bug #3637: client: not issuing caps for with clients doing shared writes
Okay, it's not quite that simple. This (all following the data writeout; I think this is the data check — anyway, thi... Greg Farnum
02:58 PM Bug #3637: client: not issuing caps for with clients doing shared writes
Reproduced at last. There continues to be a problem with the fix branch too :( but it's not a max_size issue; one of ... Greg Farnum
01:47 PM Bug #3637: client: not issuing caps for with clients doing shared writes
And that wasn't working because teuthology was creating working dirs like /tmp/cephtest/gregf@kai-2013-04-16_12-59-21... Greg Farnum
10:48 AM Bug #3637 (Fix Under Review): client: not issuing caps for with clients doing shared writes
Regarding the testing (which I'm doing now), what those warnings turned out to mean is that each instance had their o... Greg Farnum
10:37 AM Bug #3637: client: not issuing caps for with clients doing shared writes
I don't remember how all the locking works when you have multiple writers, but I don't believe either of those suppos... Greg Farnum
01:11 PM Feature #4734: libcephfs: async interfaces
If when we do this, whoever does so should please be careful to refactor our synchronous interfaces in terms of the a... Greg Farnum
12:48 PM Feature #4734 (New): libcephfs: async interfaces

Implement async interfaces to libcephfs, at the least for the write and read calls.
This is motivated by the cep...
Sam Lang
12:53 PM Bug #4732: uclient: client/Inode.cc: 126: FAILED assert(cap_refs[c] > 0)
You might want to grab the ceph-fuse binary too so that the core dump is useful. Sam Lang
12:37 PM Bug #4732 (Closed): uclient: client/Inode.cc: 126: FAILED assert(cap_refs[c] > 0)
... Greg Farnum
09:59 AM Bug #4729 (Can't reproduce): mds: stuck in clientreplay
Unfortunately by the time I got in one of the machines had been allocated for another job, and now it looks like the ... Greg Farnum
07:52 AM Bug #4729 (Can't reproduce): mds: stuck in clientreplay
job was... Sage Weil
09:31 AM Bug #4694 (Resolved): client: put_snap_realm assert failure
Looks good to me; I merged it into next. This was an impressively narrow race so we couldn't get a good reproducer go... Greg Farnum

04/15/2013

04:38 PM Documentation #4727 (Resolved): upgrade doc has to be modified to include upgrading ceph-mds as well
Changed package to ceph-mds: http://ceph.com/docs/master/install/upgrading-ceph/#upgrading-a-metadata-server John Wilkins
04:26 PM Documentation #4727 (In Progress): upgrade doc has to be modified to include upgrading ceph-mds a...
John Wilkins
11:42 AM Documentation #4727 (Resolved): upgrade doc has to be modified to include upgrading ceph-mds as well
http://ceph.com/docs/master/install/upgrading-ceph/
In the above mentioned doc, in section "upgrading a metadata s...
Tamilarasi muthamizhan
12:47 PM Bug #4713 (Fix Under Review): mds: hang related to access from two clients
I have tested the commands listed above on a system with the
patches described here:
http://tracker.ceph.com/is...
Alex Elder
11:03 AM Bug #4679: ceph: hang while running blogbench on mira nodes
I ran the blogbench test with all of the above-mentioned
patches applied on a mira cluster and I never saw it hang.
...
Alex Elder
09:35 AM Bug #4679: ceph: hang while running blogbench on mira nodes
FYI, these kernel patches (Zheng's and mine) are available on
the ceph-client git repository branch "review/wip-4706...
Alex Elder
09:27 AM Bug #4679 (Fix Under Review): ceph: hang while running blogbench on mira nodes
> Found 5 bugs, fixed 4.
I reviewed the four kernel patches (they were posted on the mailing
list). I also provi...
Alex Elder
09:15 AM Bug #4679: ceph: hang while running blogbench on mira nodes
> The fix for writepages race is easier than I thought, patch is attached.
This is interesting. When I was workin...
Alex Elder
10:59 AM Bug #4660: mds: segfault in queue_backtrace_update
*blink*
Of course it's not; sorry about that.
Greg Farnum
10:57 AM Bug #4660 (Resolved): mds: segfault in queue_backtrace_update
That isn't the same bug. Opening #4726 for that issue. Sam Lang
10:52 AM Bug #4660 (In Progress): mds: segfault in queue_backtrace_update
ubuntu@teuthology:/a/teuthology-2013-04-13_01:00:48-fs-next-testing-basic/12134 Greg Farnum
10:57 AM Bug #4726 (Can't reproduce): mds: segv during blogbench in remove_pending_backtraces

ubuntu@teuthology:/a/teuthology-2013-04-13_01:00:48-fs-next-testing-basic/12134
2013-04-13T18:52:50.199 INFO:t...
Sam Lang
09:33 AM Bug #4706 (Fix Under Review): kclient: Oops when two clients concurrently write a file
I have posted two patches, one which resolves the
crash due to an interrupt while waiting and one
that resolves Zhe...
Alex Elder
08:46 AM Bug #3579: kclient: Use less secure random number generator so we don't consume entropy
commit 442318d09506d33e811d9d6a7bd2514287df729d
Ian Colle

04/13/2013

09:46 AM Bug #4722 (Can't reproduce): kernel BUG at fs/ceph/caps.c:1006 invalid opcode: 0000
Top of Call trace:... Matthew Roy

04/12/2013

11:07 PM Bug #4721: libcephfs tests fail when using ceph-deploy
I'm able to reproduce this failure.
I'm much less familiar with libceph than I am the libcephfs-java code, so I'm g...
Anonymous
05:42 PM Bug #4721: libcephfs tests fail when using ceph-deploy
and the logs are placed in burnupi06.front.sepia.ceph.com:/home/ubuntu/apr12_cdep_libcephfs/ Tamilarasi muthamizhan
05:41 PM Bug #4721 (Resolved): libcephfs tests fail when using ceph-deploy
ceph version : 0.60-467-g6b98162-1precise
config.yaml used to reproduce
tamil@ubuntu:~/test_logs_cuttlefish/apr...
Tamilarasi muthamizhan
08:36 PM Bug #3637: client: not issuing caps for with clients doing shared writes
If Locker::_do_cap_update can't get wrlock for a given client, the client should have no Fw cap. I think we can make ... Zheng Yan
04:47 PM Bug #3637: client: not issuing caps for with clients doing shared writes
I'm having difficulty reproducing this at all on current next, but am leaving it churning in the background... :/
...
Greg Farnum
01:36 PM Feature #3242 (In Progress): samba: push plugin upstream
Sam has been working on this for the last couple days. Greg Farnum
11:06 AM Bug #3579 (Resolved): kclient: Use less secure random number generator so we don't consume entropy
Sam Lang
10:13 AM Bug #4660 (Resolved): mds: segfault in queue_backtrace_update
The commit that hit this segv above looks like it was off of master, whereas the fix went into next. I was able to r... Sam Lang
09:30 AM Bug #4694 (Fix Under Review): client: put_snap_realm assert failure
Pushed wip-4694. Still trying to reproduce this reliably so that I can test the proposed fix. Sam Lang
09:26 AM Bug #4706: kclient: Oops when two clients concurrently write a file
Zheng Yan wrote:
> The Oops is caused by uninitialized req->r_inode
Already tracked down the Oops. time to sleep,...
Zheng Yan
09:07 AM Bug #4706: kclient: Oops when two clients concurrently write a file
FYI I just reproduced the problem without interrupt
and it matches what I saw before. (So I don't believe
the inte...
Alex Elder
07:39 AM Bug #4706: kclient: Oops when two clients concurrently write a file
I also proposed a fix: [PATCH 1/4] ceph: add osd request to inode unsafe list in advance Zheng Yan
07:22 AM Bug #4706: kclient: Oops when two clients concurrently write a file
Zheng I think I have a fix. I'm going to test it first,
but then I'd like to supply it to you to see if it resolves...
Alex Elder
05:23 AM Bug #4706 (New): kclient: Oops when two clients concurrently write a file
> Found a potential cause. the request may complete before adding it
> to the unsafe list.
I think that not being...
Alex Elder
12:09 AM Bug #4706: kclient: Oops when two clients concurrently write a file
The Oops is caused by uninitialized req->r_inode Zheng Yan
07:35 AM Bug #4679: ceph: hang while running blogbench on mira nodes
The fix for writepages race is easier than I thought, patch is attached. Zheng Yan
01:08 AM Bug #4679: ceph: hang while running blogbench on mira nodes
Found 5 bugs, fixed 4. The remaining one is a race between truncate and writepages. Truncate message from MDS can cha... Zheng Yan

04/11/2013

08:26 PM Bug #4714 (Duplicate): kclient: ceph_sync_{read,write} only accept single buffer.
So readv and writev are broken for SYNC IO Zheng Yan
07:28 PM Bug #4713: mds: hang related to access from two clients
I discovered this while trying to reproduce the issue
in http://tracker.ceph.com/issues/4706.
I documented it the...
Alex Elder
07:24 PM Bug #4713 (Resolved): mds: hang related to access from two clients
Alex Elder
06:31 PM Bug #4706: kclient: Oops when two clients concurrently write a file
This crash looks a little bit familiar to me, and I think
I created a bug for it, but at the moment I can't find it....
Alex Elder
05:52 PM Bug #4706: kclient: Oops when two clients concurrently write a file
OK, well I believe I have reproduced the problem.
I did this on two nodes simultaneously:
dd if=/dev/zero of=...
Alex Elder
09:23 AM Bug #4706: kclient: Oops when two clients concurrently write a file
Yes, test branch of ceph-client. The hint to trigger the Oops is multiple clients write date to a file at the same ti... Zheng Yan
08:52 AM Bug #4706: kclient: Oops when two clients concurrently write a file
Well, I unfortunately got the same problem using
the "bobtail" branch.
Specifically what I'm doing:...
Alex Elder
08:15 AM Bug #4706: kclient: Oops when two clients concurrently write a file
Well that's interesting.
I haven't been working with the ceph file system much so
I'm not sure what to expect. B...
Alex Elder
07:43 AM Bug #4706: kclient: Oops when two clients concurrently write a file
> the request may complete before adding it to the unsafe list.
That looks like a reasonable explanation to me. A...
Alex Elder
06:28 AM Bug #4706: kclient: Oops when two clients concurrently write a file
... Zheng Yan
05:56 AM Bug #4706: kclient: Oops when two clients concurrently write a file
It is a new issue in the sync write path, nothing to do with cap revoke. Alex has made quite a lot of changes in that... Zheng Yan
05:01 AM Bug #4706: kclient: Oops when two clients concurrently write a file
Them doing a sync write is probably correct as their concurrency is being managed by the MDS now, and they aren't goi... Greg Farnum
06:06 PM Bug #3637 (In Progress): client: not issuing caps for with clients doing shared writes
Since I apparently forgot to mention it here, this has nothing to do with #4489; I just pattern-matched a little too ... Greg Farnum
09:09 AM Bug #4644 (Resolved): mds crashing after upgrade from 0.58 to 0.60
Merged into next as of commit:d777b8e66b2e950266e52589c129b00f77b8afc0 (Thanks Sam!). Greg Farnum
02:25 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
so patch tested, mds is running fine now. thx ! norbert schmidt
02:18 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
Last patch seems work. At least mds dont crash anymore. Also df reports non bogus values.
I'll add this patch to gen...
Alexey Shvetsov
12:14 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
let me know if i can test patches for you ! :) norbert schmidt
09:06 AM Bug #4451 (Resolved): client: Ceph client not releasing cap
Merged into next via commit:e32849c4eef2f5d911288aabeac0a6967b1e6ae4
I'm electing not to backport this despite its...
Greg Farnum
08:16 AM Fix #4708 (Rejected): MDS: journaler pre-zeroing is dangerous
See http://pastebin.com/NJd0UCfF
At first glance it looks like there's a short and a missing log object, and then ...
Greg Farnum
08:15 AM Bug #4105: mds: fix up the Dumper
Promoting this to high as it can be so useful for gathering important debug data; it would be nice to have done befor... Greg Farnum

04/10/2013

11:52 PM Bug #4706 (Resolved): kclient: Oops when two clients concurrently write a file
... Zheng Yan
08:31 PM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
The code looks good. Zheng Yan
01:10 PM Bug #4644 (Fix Under Review): mds crashing after upgrade from 0.58 to 0.60
Hurray, I did manage to reproduce so I guess I just missed before, and indeed it works with that patch and fails with... Greg Farnum
12:38 PM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
I'm having trouble reproducing this bug, but I'm probably not going through the right steps. A patch that I think sho... Greg Farnum
12:20 PM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
if you have some patch that we can test, i'd be glad =) Alexey Shvetsov
10:27 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
Ah, this looks to be less bad than I thought — the (struct_v == 2) check should be (struct_v <= 2) is all, from the s... Greg Farnum
09:03 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
update directly from IRC, as alexxy is still having registration issues:
<alexxy> joao: upgrade was from version 0...
Joao Eduardo Luis
09:11 AM Bug #3579 (Fix Under Review): kclient: Use less secure random number generator so we don't consum...
Patches sent to the mailing list and pushed to wip-3579. Sam Lang
09:07 AM Bug #4569: ceph-mds: segfault
It looks like this fix didn't make it into 0.60. See #4696. Sam Lang
09:06 AM Bug #4696: MDS Crashes with Segmentation fault near Objecter::handle_osd_op_reply
Oh you're using 0.60. Looks like that commit didn't make it into the 0.60 release. It will be fixed in the next one! Sam Lang
09:04 AM Bug #4696 (Duplicate): MDS Crashes with Segmentation fault near Objecter::handle_osd_op_reply
This is a duplicate of #4569. Its fixed in 0.60 if you're willing to upgrade. Sam Lang
06:37 AM Bug #4696 (Duplicate): MDS Crashes with Segmentation fault near Objecter::handle_osd_op_reply
Limited logs at http://goo.gl/VAIFh... Matthew Roy
05:23 AM Bug #4679 (In Progress): ceph: hang while running blogbench on mira nodes
I reproduced a hang, it is an 'i_mutex + cap revoking' deadlock.... Zheng Yan
12:58 AM Bug #1878: ceph.ko doesn't setattr (lchown, utimes) on symlinks
For xattrs, there is no difference between symbol links and regular file. For setattr, I think the only difference is... Zheng Yan

04/09/2013

07:49 PM Bug #4451: client: Ceph client not releasing cap
Please review again based on the latest changed pushed to wip-4451. Sam Lang
04:27 PM Bug #4451: client: Ceph client not releasing cap
Does this need more review or just testing? (I ask because I notice you've got two reviewed-by tags on it, although I... Greg Farnum
08:48 AM Bug #4451: client: Ceph client not releasing cap
Thanks Yan for fixing up that patch and testing it out. The inode check was just cruft from the previous changes, an... Sam Lang
06:00 AM Bug #4451: client: Ceph client not releasing cap
After removing the path_is_mine check, MDCache::parallel_fetch_traverse_dir() needs skip non-auth dirfrags. The modif... Zheng Yan
06:34 PM Bug #4644 (In Progress): mds crashing after upgrade from 0.58 to 0.60
That shouldn't be a problem for v0.58; it included version 2 session_info_t. You sure that's the version you upgraded... Greg Farnum
06:18 PM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
The 26th byte of Norbert's sessionmap is 1. If I'm not wrong, it's struct_v for session_info_t. But the oldest versio... Zheng Yan
10:58 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
alexxy's sessionmap doesn't look anything like a sessionmap should; this won't fix his issue. Norbert's is at least s... Greg Farnum
06:20 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
alexxy on IRC is reporting that the patch doesn't work. He would have provided his report himself, but it appears th... Joao Eduardo Luis
04:13 PM Bug #4618 (Resolved): Journaler: _is_readable() and _prefetch() don't communicate correctly
Merged into next in commit:8eb5465c10840d047a894d1a4f079ff8b8d608b5. This would apply to bobtail as well if we decide... Greg Farnum
03:12 PM Bug #4679: ceph: hang while running blogbench on mira nodes
Not off-hand, but I haven't spent any time thinking about it yet. This one could be differences between how aggressiv... Greg Farnum
03:03 PM Bug #4679: ceph: hang while running blogbench on mira nodes
We've only seen a certain set of errors at the mds with the kernel client (this one and #4660 - although they may be ... Sam Lang
02:57 PM Bug #4679: ceph: hang while running blogbench on mira nodes
*sigh* Yep...
I've marked this as an MDS issue for now, but it could be a broader protocol change or something as ...
Greg Farnum
02:45 PM Bug #4679 (Rejected): ceph: hang while running blogbench on mira nodes
I re-ran the blogbench test 10 times using the "bobtail"
branch of ceph and never saw a hang.
I'm going to call t...
Alex Elder
12:13 PM Bug #4679: ceph: hang while running blogbench on mira nodes
I got another hang without any debug info being dumped
from the MDS. This time I just abandoned it. I'm about
to ...
Alex Elder
02:50 PM Bug #4694 (Resolved): client: put_snap_realm assert failure
... Greg Farnum
11:04 AM Bug #1878: ceph.ko doesn't setattr (lchown, utimes) on symlinks
I'm actually not sure how the symlink stuff is represented in our kernel client or the VFS — do these functions handl... Greg Farnum
08:31 AM Bug #4660 (In Progress): mds: segfault in queue_backtrace_update
Sam Lang
08:30 AM Bug #4660: mds: segfault in queue_backtrace_update
Alex hit the same segfault with the next branch yesterday, looks like the commit 3cdc61ec doesn't fix this bug. The ... Sam Lang

04/08/2013

08:32 PM Bug #4680 (Closed): mds: log possibly not trimming
2013-03-28 10:27:35.154461 7f1fc96b8700 10 mds.0.log trim 2 / 30 segments, 10 / -1 events, 0 (0) expiring, 0 (0) expi... Zheng Yan
10:32 AM Bug #4680: mds: log possibly not trimming
Yeah, it's not a generic never trimming; just not certain about this one. It could also be fine and just that there's... Greg Farnum
10:27 AM Bug #4680: mds: log possibly not trimming
I've seen it trim logs in the tests I've been running, but that's with mds_log_segment_size=16K and mds_log_max_segme... Sam Lang
10:04 AM Bug #4680 (Closed): mds: log possibly not trimming
Apparently there are a lot of old files showing up in the log replay, and I noticed previously on a different issue t... Greg Farnum
08:20 PM Bug #4644 (Fix Under Review): mds crashing after upgrade from 0.58 to 0.60
there is a typo in session_info_t::decode Zheng Yan
08:04 PM Bug #4451: client: Ceph client not releasing cap
Greg Farnum wrote:
> Although I think the MDS would need to have the inode in cache for that to happen — it would ha...
Zheng Yan
10:59 AM Bug #4451: client: Ceph client not releasing cap
Zheng Yan wrote:
> "Regarding the cap export, is it possible that the client has a cap that it thinks belongs to the...
Greg Farnum
09:43 AM Bug #4451: client: Ceph client not releasing cap
"Regarding the cap export, is it possible that the client has a cap that it thinks belongs to the mds, but the mds do... Zheng Yan
09:13 AM Bug #4451: client: Ceph client not releasing cap
"After removing the path_is_mine check in Server::handle_client_reconnect(), I think we should also call mdcache->rej... Sam Lang
04:41 PM Bug #4685 (Can't reproduce): BUG: unable to handle kernel NULL pointer dereference at
0.56.4 ceph, 3.8 kernel... Andras Elso
02:22 PM Bug #4679: ceph: hang while running blogbench on mira nodes
It looked very promising. 4 successful passes, but the
last one hung again. This time there were two blogbench
ta...
Alex Elder
12:26 PM Bug #4679: ceph: hang while running blogbench on mira nodes
One pass succeeded, so it's looking good.
I'll let it run 5 times and if all are successful, I'll just
close this...
Alex Elder
11:56 AM Bug #4679: ceph: hang while running blogbench on mira nodes
I talked with Sam Lang who said I should try again with
mds debugging on. That led to more info getting dumped
on ...
Alex Elder
11:01 AM Bug #4679: ceph: hang while running blogbench on mira nodes
... Alex Elder
10:49 AM Bug #4679: ceph: hang while running blogbench on mira nodes
Actually, the other common theme (maybe more important)
is the involvement of an in-progress ceph_setattr() call.
...
Alex Elder
10:40 AM Bug #4679 (In Progress): ceph: hang while running blogbench on mira nodes
Unfortunately it looks like I've reproduced the problem
with my patches. The common theme is ceph_aio_write(), so
...
Alex Elder
10:04 AM Bug #4679: ceph: hang while running blogbench on mira nodes
I ran those tests a few times with the testing branch and
the problem did not show up. I reduced the test to just
...
Alex Elder
05:49 AM Bug #4679: ceph: hang while running blogbench on mira nodes
Here is an excerpt of the yaml file driving the
tests, leading up to the blogbench run:...
Alex Elder
05:29 AM Bug #4679: ceph: hang while running blogbench on mira nodes
Here are the versions of ceph and teuthology I'm using
while running these tests:
ceph
f5ba0fb mon: make 'osd cr...
Alex Elder
05:26 AM Bug #4679: ceph: hang while running blogbench on mira nodes
Here is a log of the commits in place during these
tests. (I know, quite a few...) The last one is
the current te...
Alex Elder
05:24 AM Bug #4679: ceph: hang while running blogbench on mira nodes
Here is an excerpt of the stack trace generated using:
echo t > /proc/sysrq-trigger
[31482.585095] blogbench....
Alex Elder
05:21 AM Bug #4679 (Resolved): ceph: hang while running blogbench on mira nodes
I have seen this only on mira nodes, now twice on two
consecutive attempts. I've run the same set of tests
with th...
Alex Elder
11:02 AM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Said he could look at this for me today. Greg Farnum
09:29 AM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Heh, no; that was supposed to be a 10. Re-pushed; thanks! Greg Farnum
09:34 AM Bug #3579 (In Progress): kclient: Use less secure random number generator so we don't consume ent...
Sam Lang
07:16 AM Bug #4660 (Fix Under Review): mds: segfault in queue_backtrace_update
Pushed a fix to wip-4660. The mdr was getting deleted before we queued the backtrace for update, so mdr->ls was inva... Sam Lang

04/07/2013

01:46 AM Bug #1878 (Fix Under Review): ceph.ko doesn't setattr (lchown, utimes) on symlinks
ceph_symlink_iops does not have getattr/setattr and xattrs related mothods Zheng Yan
01:25 AM Bug #4241 (Duplicate): SELinux fails because it can't set xattrs
This is the same problem as #1878 (ceph_symlink_iops doesn't have setattr method) Zheng Yan

04/06/2013

11:30 AM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Confirmed, i tested with my system, and the journal-check can load the journal.
But, there is a line in commit:
<...
Andras Elso

04/05/2013

04:02 PM Bug #4618 (Fix Under Review): Journaler: _is_readable() and _prefetch() don't communicate correctly
There were a couple related bugs which prevented this from working right. I don't guarantee it's bug-free now, but th... Greg Farnum
04:32 AM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Can i continue testing cephfs, or you make the fix quickly for this bug, and i can verify it on my system? Andras Elso
03:37 PM Bug #4451: client: Ceph client not releasing cap
After removing the path_is_mine check in Server::handle_client_reconnect(), I think we should also call mdcache->rejo... Zheng Yan
10:25 AM Bug #4451 (Fix Under Review): client: Ceph client not releasing cap
Pushed a proposed fix to wip-4451. The fix is to not adjust the conditional for checking if an inode is auth or not.... Sam Lang
10:26 AM Bug #4660 (In Progress): mds: segfault in queue_backtrace_update
Sam Lang
09:37 AM Bug #4660: mds: segfault in queue_backtrace_update
No wonder this wasn't showing up in my bug queue! Greg Farnum
08:20 AM Bug #4660 (Resolved): mds: segfault in queue_backtrace_update
... Sage Weil
09:36 AM Bug #4565 (Can't reproduce): MDS/client: issue decoding MClientReconnect on MDS
I've had this running for more than 24 hours and it still hasn't reproduced. I'll let it keep going, but I don't beli... Greg Farnum

04/04/2013

11:15 PM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
sessionmap, command is rados --pool=metadata get mds0_sessionmap /tmp/sessionmap (without -o) :) norbert schmidt
11:07 PM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
logfile with debug mds = 20... norbert schmidt
05:16 PM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
I guess this bug was introduced by commit 0bcf2ac081b8386fe00387b654aa5676a7902c80... Zheng Yan
11:29 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
I got a SessionMap from alexxy and it somehow has a bad version number attached to it. More importantly when I hexdum... Greg Farnum
10:36 AM Bug #4644 (Need More Info): mds crashing after upgrade from 0.58 to 0.60
It failed to decode the SessionMap properly here, but I can't tell why and the code hasn't changed at all between tho... Greg Farnum
03:34 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
alexxy @ IRC also hit this issue. Attaching log. Joao Eduardo Luis
02:37 AM Bug #4644 (Resolved): mds crashing after upgrade from 0.58 to 0.60
after upgrade from 0.58 to 0.60, one mds is crashed and still crashing directly after start... norbert schmidt
03:03 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Greg Farnum
02:18 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Okay, so the next entry is >40MB and we have 38MB in our read buffer. I'm not certain, but I think our use of "temp_f... Greg Farnum
12:54 PM Bug #4618 (In Progress): Journaler: _is_readable() and _prefetch() don't communicate correctly
Greg Farnum
12:53 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Okay, there's not a lot there so apparently it doesn't have as much data as it thinks it needs in order to read the n... Greg Farnum

04/03/2013

06:09 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Greg Farnum wrote:
> Are those logs posted somewhere? That indicates it's waiting to be allowed to read the stuff pa...
Andras Elso
05:41 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Are those logs posted somewhere? That indicates it's waiting to be allowed to read the stuff past where it stopped, b... Greg Farnum
04:50 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
just a guess: with journaler debug, there is a line:... Andras Elso
03:08 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
you said "My off-hand guess is that something isn't getting cleaned up properly with the slave requests, which leads ... Zheng Yan
03:07 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
I think of it every time i hear "stuck in replay", that's all. I havne't looked at the logs or anything. Sage Weil
02:59 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Sorry, but I'm a bit lost about why that might apply here. Are you just speculating or did something in the logs look... Greg Farnum
02:57 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
see commit 7e04504d3ed119bb43a4eb99ca524b39dc3696bc. But the bug should just make replay slow. Zheng Yan
02:38 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
here is a logcut with "debug journaler = 20": http://pastebin.com/nrzJg87E Andras Elso
01:59 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Yeah, that all looks good too. My off-hand guess is that something isn't getting cleaned up properly with the slave r... Greg Farnum
01:52 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Don't forget #3351.. if the osd returns a short read on an object before the end of the journal, the Journaler replay... Sage Weil
01:35 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
if you tell me (here or irc) where to add new debug/assert lines, we can hunt down this bug. Andras Elso
01:15 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Thanks. (For future onlookers, the summary of those links is that everything is perfectly normal and as it should be,... Greg Farnum
01:02 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Here is the status: http://pastebin.com/x1XEvuWc
Here is the config dump: http://pastebin.com/YTFbY5jW
Andras Elso
10:09 AM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
The MDS maintains a journal that it writes metadata into before committing the aggregated updates into the actual ino... Greg Farnum
02:01 AM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Greg Farnum wrote:
> Sorry, I mean the mds journal, not the debug logs, when referring to the size.
So the mds jo...
Andras Elso
03:43 PM Bug #3266 (Resolved): "ceph mds tell 0 dumpcache /etc/passwd" is not cool
Merged in with commit:32aac00c7043aa1564272697879b1c626814b143 Greg Farnum
03:33 PM Bug #3266 (Fix Under Review): "ceph mds tell 0 dumpcache /etc/passwd" is not cool
wip-3266 Sage Weil
03:02 PM Bug #4582 (Resolved): mds: Client hang on fsstress with mds_thrasher
Sam Lang
09:41 AM Bug #4582 (Fix Under Review): mds: Client hang on fsstress with mds_thrasher
With the latest changes to the mds merged to master, and the fix from #4637, I was able to get a successful run of fs... Sam Lang
01:35 PM Bug #4489 (New): ceph fs hangs on file stat
Never mind, forgot the other one involved max size changes. Greg Farnum
01:05 PM Bug #4489 (Duplicate): ceph fs hangs on file stat
All right; that should be more stable for you. :)
Thanks for the steps to reproduce. I'm going to tentatively mark...
Greg Farnum
01:27 PM Bug #3637: client: not issuing caps for with clients doing shared writes
Starting to look at this now. Greg Farnum
01:04 PM Bug #3637: client: not issuing caps for with clients doing shared writes
#4489 is probably a duplicate of this and has steps to reproduce, if we need alternate angles of attack. (And we shou... Greg Farnum
12:56 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
[Meant to post this yesterday but I guess I forgot to hit submit.]
Sadly, this test didn't slurp up any logs, so all...
Greg Farnum
12:53 PM Bug #4637 (Resolved): mds: standby takeover stuck in rejoin
Thanks. Don't you ever sleep? :)
Merged into master in commit:0d6ddd926432821842a7e40fdb78d793ab0737bb
Greg Farnum
12:37 PM Bug #4637: mds: standby takeover stuck in rejoin
Greg's fix looks good, sorry for the bug. Zheng Yan
10:45 AM Bug #4637: mds: standby takeover stuck in rejoin
Pushed that to wip-no-fail-whoami-4637. Sage, Yan, care to check it out? :) Greg Farnum
10:33 AM Bug #4637: mds: standby takeover stuck in rejoin
Can you try this patch instead, and see if that works? (If it does I'll want a review from Sage or Yan; it looks okay... Greg Farnum
08:43 AM Bug #4637 (Fix Under Review): mds: standby takeover stuck in rejoin
Pushed a fix to wip-4637. Sam Lang
08:40 AM Bug #4637 (Resolved): mds: standby takeover stuck in rejoin
With current master, with one active mds and one standby, if the active fails, the standby gets stuck in rejoin while... Sam Lang
12:44 PM Bug #4638 (Duplicate): client: fsstress and mds_thrasher hangs client on unmount
This is the same problem as #4451 (client inodes getting disconnected on unmount. Sam Lang
09:42 AM Bug #4638 (Duplicate): client: fsstress and mds_thrasher hangs client on unmount

After a successful run of fsstress and mds_thrasher, the client hangs on unmount and eventually returns EBUSY.
Sam Lang

04/02/2013

11:24 PM Bug #1535 (Resolved): concurrent creating and removing directories crashes cmds
I think this has been fixed by commit 00025462 Zheng Yan
10:48 PM Bug #1945: blogbench hang on caps
Sorry for the delay, I didn't noticed the notification. I fixed several bugs that may cause hangs of this type, but I... Zheng Yan
07:24 PM Bug #4489: ceph fs hangs on file stat
Hm, snapdirname is something obfuscated (but have no use, actually).
I've got the same error one more time, so I bel...
Ivan Kudryavtsev
06:14 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Sorry, I mean the mds journal, not the debug logs, when referring to the size. Greg Farnum
05:12 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Greg Farnum wrote:
> Strange, it looks like you have an MDS log of about 1236MB, which is...large. What config optio...
Andras Elso
04:28 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Strange, it looks like you have an MDS log of about 1236MB, which is...large. What config options are you setting?
...
Greg Farnum
12:36 PM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
I changed back to max_mds 1. same result:... Andras Elso
09:42 AM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
I'll check my assumptions today (already downloaded the logs), but with multiple active MDSes this doesn't warrant a ... Greg Farnum
07:14 AM Bug #4618 (Resolved): Journaler: _is_readable() and _prefetch() don't communicate correctly
The Journaler has mechanisms to try and read extra data if an event is large enough that it exceeds the current prefe... Andras Elso
02:48 PM Bug #4619 (Resolved): mds: anchortable hangs on new cluster
Merged and pushed to master in commit:3842ff7d677bae98462f7d050f5fda9d85f6273d Greg Farnum
02:20 PM Bug #4619: mds: anchortable hangs on new cluster
Code looks good, Sorry for the bug!. Zheng Yan
01:06 PM Bug #4619 (Fix Under Review): mds: anchortable hangs on new cluster
recovery_done() breaks on a fresh machine because of the populate_mydir() ordering. The problem is that both recover... Sage Weil
09:52 AM Bug #4619 (In Progress): mds: anchortable hangs on new cluster
Sage said he'd look at the double-send as well. Greg Farnum
09:27 AM Bug #4619 (Resolved): mds: anchortable hangs on new cluster
commit:968c6c0c9408b33904041e5ddbd9ea738e831713 Sage Weil
09:13 AM Bug #4619: mds: anchortable hangs on new cluster
I think this isn't correct. If we restart the table server MDS, it will send two ready messages to the table client. ... Zheng Yan
09:02 AM Bug #4619: mds: anchortable hangs on new cluster
Code looks good, assuming the tests run.
Sorry about that! :(
Greg Farnum
08:15 AM Bug #4619 (Fix Under Review): mds: anchortable hangs on new cluster
wip-4619 Sage Weil
08:14 AM Bug #4619 (Resolved): mds: anchortable hangs on new cluster
Sage Weil
02:30 PM Bug #4621 (Rejected): failed pjd chown/00.t 124
Okay, all symlink attempts that made it to the MDS were successes, and I can't find any failed ceph-fuse symlink/ll_s... Greg Farnum
01:59 PM Bug #4621: failed pjd chown/00.t 124
Sorry, not an lchown, just a symlink create. Greg Farnum
01:29 PM Bug #4621: failed pjd chown/00.t 124
Well, it's always an adventure to figure out which one is busted, but it looks to be an lchown on a symlink failing. ... Greg Farnum
09:30 AM Bug #4621 (Rejected): failed pjd chown/00.t 124
2013-04-02T09:04:34.029 INFO:teuthology.task.workunit.client.0.out:../pjd-fstest-20090130-RC-open24/tests/chown/00.t ... Sage Weil
02:27 PM Feature #4630 (New): make lchown work in ceph-fuse for pjd
pjd doesn't believe that ceph-fuse supports lchown. Maybe this is pjd's fault; maybe it's ours. Figure out why so tha... Greg Farnum
11:49 AM Documentation #2206: Need a control command to gracefully shutdown an active MDS prior to planned...
This is partially documented by 0c16b31db7a5ed72a9c306ae91b191c326d0776a on github. Matthew Roy

04/01/2013

03:18 PM Bug #3266: "ceph mds tell 0 dumpcache /etc/passwd" is not cool
Before anybody embarks on solving this, I assume there's a standard way to handle this by outlawing certain kinds of ... Greg Farnum
01:23 PM Bug #2657: kclient: direct io write larger than 8MiB fails
in testing, there is now a test workunit Sage Weil
01:23 PM Bug #2657 (Resolved): kclient: direct io write larger than 8MiB fails
Sage Weil
01:22 PM Bug #4434 (Resolved): looping waiting for quorum after upgrade
Whoops@! Greg Farnum
01:14 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
I'll look into the code around this today. Greg Farnum
11:03 AM Bug #4489: ceph fs hangs on file stat
Why are you specifying the snapdirname to that weird value when mounting this? Greg Farnum
11:00 AM Bug #4405: MDCache::populate_mydir can loop forever
This dump has 1063591 inodes in the cache, of which only 122104 are non-stray. That doesn't seem quite right.
I do...
Greg Farnum
09:37 AM Bug #4590 (Resolved): ceph-fuse: fsx fails with 'client oc = false'
commit:c01e2e42f368ca003e03debe9a7bd5f12eb79d2c Sage Weil

03/31/2013

10:33 AM Bug #4601 (Can't reproduce): symlink with size zero
Somehow I got into a situation in which a number of symlinks, all of them created and later modified at about the sam... Alexandre Oliva

03/29/2013

09:05 PM Bug #4590 (Resolved): ceph-fuse: fsx fails with 'client oc = false'
... Sage Weil
03:22 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
Oh, yeah, we can do the same in the userspace client. I'll do that and re-push. Thanks Yan! Sam Lang
03:12 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
FYI:
The kclient deals with this case by calling wake_up_session_caps(). It just clear i_wanted_max_size/i_requested...
Zheng Yan
01:04 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
I believe those are okay as truncate size changes should end up actually journaled (as setattrs) so they'll be replay... Greg Farnum
12:58 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
I spent most of this morning figuring out if it made sense to send the full cap (ceph_mds_caps -- and get rid of the ... Sam Lang
12:31 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
I'm not sure this is wrong, but it's confusing me a bit. I thought that the Client sent all capabilities it holds bac... Greg Farnum
12:14 PM Bug #4582: mds: Client hang on fsstress with mds_thrasher
I just pushed wip-4582. Testing it on the fsstress test with mds_thrasher now. I'm not positive this is the right a... Sam Lang
11:53 AM Bug #4582 (In Progress): mds: Client hang on fsstress with mds_thrasher
Sam Lang
11:53 AM Bug #4582 (Resolved): mds: Client hang on fsstress with mds_thrasher

While trying to reproduce #4565, fsstress eventually hangs where the client is waiting for a max size update that t...
Sam Lang
01:55 PM Feature #4583 (Resolved): libcephfs: add test that kills a client and verifies mds cleans it up
Sage Weil
01:28 PM Feature #4022 (In Progress): client: qa: test non-cached operation (force sync mode)
Sage Weil
01:24 PM Fix #4191 (Resolved): qa: mulitiple mds in nightly (non-failure case)
Sage Weil
11:31 AM Bug #4578 (Resolved): client: hangs on unlink
Noah Watkins
11:16 AM Bug #4578: client: hangs on unlink
This patch solves the problem :) Noah Watkins
12:51 AM Bug #4578: client: hangs on unlink
yes, patch is also attached Zheng Yan
11:11 AM Feature #4442 (Resolved): java: add topology API support
Err, forgot to close. Thanks. ebc3abaf6dc62678f5ef5914862e9d8f216fffbf Noah Watkins
11:05 AM Feature #4442: java: add topology API support
I think this already got reviewed and merged, right? Or is there something else we need? Greg Farnum
11:02 AM Bug #4569 (Resolved): ceph-mds: segfault
commit:4f8ba0e7756a1b0647867db0e9b5549b3e82f6b1 in master. This wasn't a bug in any released versions, so no backports. Greg Farnum
10:50 AM Bug #4569: ceph-mds: segfault
In case it matters at all, the segfault was happening when I was furiously sigterm'n my hung-on-unlink client. Noah Watkins
10:33 AM Bug #4569: ceph-mds: segfault
Yep, the problem here is that the Session was created during replay and it never had a Connection associated with it ... Greg Farnum
10:20 AM Bug #4569: ceph-mds: segfault
In the logs the session in question is one that failed to reconnect. Was there a different event that caused the MDS ... Greg Farnum

03/28/2013

08:47 PM Bug #4578 (Resolved): client: hangs on unlink
Looks like somebody accidentally deleted #4570 (and there's no undelete in Redmine best I can tell), so this ticket w... Greg Farnum
06:58 PM Feature #4576 (Rejected): java: support ByteBuffer interface for NIO and NIO.2 high-perf I/O
ByteBuffer interface in NIO avoids needless copying, and is used by NIO.2 and the new VFS infrastructure in Java 7. T... Noah Watkins
10:21 AM Bug #4569: ceph-mds: segfault
It looks like the session is getting closed because its stale, and then killed, but the session->connection field pas... Sam Lang
10:00 AM Feature #4354 (In Progress): mds: add an equivalent to the OSD OpTracker
Greg Farnum
07:31 AM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
Update on trying to track this down...running this test in teuthology, I don't hit the same assertion, but I do see t... Sam Lang

03/27/2013

09:23 PM Bug #4308 (Won't Fix): ceph-fuse crashed during blogbench test (argonaut)
this is most likely memory corruption in argonaut's ceph-fuse. Sage Weil
09:21 PM Bug #4564 (Resolved): client: Close session doesn't wait for outstanding requests
Sage Weil
09:09 AM Bug #4564 (Fix Under Review): client: Close session doesn't wait for outstanding requests
Pushed a fix to wip-4564. Sam Lang
07:13 AM Bug #4564 (Resolved): client: Close session doesn't wait for outstanding requests

Ran into another failure related to testing #4451 on the client where the following occurs:
client sends create/...
Sam Lang
11:45 AM Bug #4569 (Resolved): ceph-mds: segfault
I started receiving this segfault in ceph-mds with the latest master today.... Noah Watkins
09:35 AM Bug #4565 (Resolved): MDS/client: issue decoding MClientReconnect on MDS
... Sage Weil
08:26 AM Bug #4539 (Resolved): include/elist.h: 92: FAILED assert(_head.empty()) from MDLog::standby_trim_...
commit:295c92c Sage Weil
07:47 AM Bug #4539 (Fix Under Review): include/elist.h: 92: FAILED assert(_head.empty()) from MDLog::stand...
Yep. There's no state bit, and the cache is unchanged by the backtrace updates list. The standby mds is free to cle... Sam Lang
08:04 AM Bug #4555 (Resolved): The CephFileSystem class is missing the createNonRecursive method
0a5175722a8444579715c1871c09c246969e7890 Noah Watkins
 

Also available in: Atom