Project

General

Profile

Activity

From 04/07/2013 to 05/06/2013

05/06/2013

02:57 PM Bug #4920 (Resolved): client: does not respect O_NOFOLLOW
It looks like doing an open() always implicitly follows symlinks, because we call path_walk() with followsym set to t... Greg Farnum

05/05/2013

06:36 AM Bug #4909: mds: stalled/stuck directory (standby)
Directory accessed only after reboot one of node (with stalled mount's) - not after only ceph daemons restarting. Denis kaganovich

05/04/2013

03:56 AM Bug #4909: mds: stalled/stuck directory (standby)
Sorry, comment 1 is about ctdbd (IMHO), forget. Only main issue. Denis kaganovich
03:51 AM Bug #4909: mds: stalled/stuck directory (standby)
& (without debug 10) now log flooding on other node (mds.4):
2013-05-04 13:47:27.648019 7fe8c59ca700 0 mds.0.serv...
Denis kaganovich

05/03/2013

07:18 PM Bug #4909 (Can't reproduce): mds: stalled/stuck directory (standby)
I many times break actions (debug mysql replication script, just multiple dump redirections) directly to directory, m... Denis kaganovich
02:53 PM Bug #4894: mds: standby shut itself down due to not having any data
MDS::boot_create() first starts a new log segment (its ESubtreemap is empty), then use MDCache::create_empty_hierarch... Zheng Yan
10:40 AM Bug #4894: mds: standby shut itself down due to not having any data
You must be racing ahead of me here, Yan — what's your theory? Just that the first active MDS failed to write any log... Greg Farnum
12:24 PM Feature #4906 (Resolved): ceph-fuse: use the Preforker class
Sage wrote a Preforker class for the Monitor. We should switch to using that instead of our own band-aided daemonizat... Greg Farnum

05/02/2013

07:29 PM Bug #4894: mds: standby shut itself down due to not having any data
I think MDS::boot_create() should start a new log segment after creating the fs hierarchy. Zheng Yan
10:56 AM Bug #4894 (Resolved): mds: standby shut itself down due to not having any data
... Greg Farnum
03:05 PM Feature #4326 (Fix Under Review): qa: add samba + (kclient|ceph-fuse) to suite
These changes were part of the samba.py task changes in wip-samba-tasks. An example of use is in ceph-qa-suite:suite... Sam Lang
08:28 AM Bug #4832: mds: failed auth_unpin assert
hit this again:... Sage Weil

05/01/2013

03:06 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
This is happening with the argonaut ceph-fuse daemon, not a cuttlefish one. Going to turn this down to High again and... Greg Farnum
02:19 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
steps to reproduce:
bring up a cluster of 2 nodes running argonaut, run blogbench workload on it from client.
upg...
Tamilarasi muthamizhan
12:46 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
Are you still running old clients when you hit this? Greg Farnum
12:04 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
hitting this pretty consistently when upgrading mds from argonaut to cuttlefish.
have this reproduced on burnupi39...
Tamilarasi muthamizhan
02:27 PM Bug #4105 (Resolved): mds: fix up the Dumper
Merged into next in commit:dfacd1bd805ebb730b5206c9830b28f47cc7f9cf. Hurray! Greg Farnum
02:20 PM Bug #4105 (Fix Under Review): mds: fix up the Dumper
wip-4105-mds-dumper
Wasn't actually that complicated; it's just the locking expectations around the Objecter chang...
Greg Farnum
02:23 PM Feature #4886 (Resolved): teuthology: add tests that use the MDS dumper
We want to prevent the Dumper from bitrotting like it has been. Figure out a simple and effective way to test the dum... Greg Farnum
02:20 PM Feature #4885 (Resolved): dumper: do an incremental log dump
Right now we read it all into memory and then dump it out into a file. So far that's been okay, but we probably want ... Greg Farnum
11:28 AM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
Hmm, I thought we handled renames properly since they involve changing the caps state. But maybe we don't propagate t... Greg Farnum
11:13 AM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-05-01_01:00:37-fs-next-testing-basic/4534 Sage Weil
04:41 AM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
I think this is a general issue. When handling MClientReconnect, if an inode is not in the cache, the MDS tries fetch... Zheng Yan

04/30/2013

01:26 PM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
The attached files include the complete client log, along with the mds logs that include 10000000004 (one of the indo... Sam Lang
09:39 AM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
We can't revoke on unlink because the file might still be held open with something accessing it. :) Greg Farnum
09:38 AM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
This looks like the client creates a file, then unlinks it, but it never removes it from its cache, because it still ... Sam Lang
05:41 AM Bug #4850 (In Progress): ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
Sam Lang

04/29/2013

04:58 PM Bug #4853 (Resolved): ceph-fuse hang on mount getattr
commit:ee553ac279664b7f1b527a0b1b56768134cf5157 Sage Weil
12:43 PM Bug #4853: ceph-fuse hang on mount getattr
this is not a new race, and is only triggered when a mds session open and request race with an mds restart. not a cu... Sage Weil
10:47 AM Bug #4853 (Fix Under Review): ceph-fuse hang on mount getattr
fix in wip-up
here is the client-side log that shows we send the getattr twice. we only process the first reply, ...
Sage Weil
09:21 AM Bug #4853: ceph-fuse hang on mount getattr
Ignore that, wrong bug — sorry. Greg Farnum
09:20 AM Bug #4853: ceph-fuse hang on mount getattr
/a/teuthology-2013-04-28_21:32:40-fs-next-testing-basic/2662
That's an fsstress run that got hung, I copied the cl...
Greg Farnum
09:02 AM Bug #4853 (In Progress): ceph-fuse hang on mount getattr
Sage Weil
08:38 AM Bug #4853 (Resolved): ceph-fuse hang on mount getattr
100% reproducible with this job file... Sage Weil
02:26 PM Bug #4861 (Rejected): Alter Java components to build against Java 1.6 (or 1.7)
The Java packages use -source 1.5 to specify that they should use that version of the API. This is being done for com... Anonymous
09:21 AM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
/a/teuthology-2013-04-28_21:32:40-fs-next-testing-basic/2662
That's an fsstress run that got hung, I copied the cl...
Greg Farnum

04/28/2013

08:51 AM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
have full log.. put a copy in the run dir Sage Weil
08:50 AM Bug #4850 (Resolved): ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
... Sage Weil

04/26/2013

11:19 AM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
/a/teuthology-2013-04-26_02:29:14-fs-next-testing-basic/1450 Greg Farnum
11:17 AM Bug #4832 (Resolved): mds: failed auth_unpin assert
... Greg Farnum
10:44 AM Bug #4829 (Closed): client: handling part of MClientForward incorrectly?
(In reference to a backwards check for is_replay when doing encode_cap_releases())... Greg Farnum
09:52 AM Bug #4742 (Resolved): mds: stuck clientreplay request
commit:5121e56c255c079569f02e0ee852e469f38f470e Sage Weil

04/25/2013

06:34 PM Bug #4742: mds: stuck clientreplay request
Yeah, we've discussed this some on github around wip-4742 and on irc. :) Greg Farnum
06:31 PM Bug #4742: mds: stuck clientreplay request
Looks like a client bug, it may add cap releases to the replay requests. (encode_cap_releases() should be called when... Zheng Yan
10:38 AM Bug #4742: mds: stuck clientreplay request
Logs for two runs, one is stuck in replay from a setattr, the other is stuck in replay from a rename.
Sam Lang

04/23/2013

03:53 PM Feature #4799 (Resolved): Client Security for CephFS
As discussed on the #ceph IRC channel with gregaf and others, I would find some added level of client security in Cep... Mike Kelly
01:34 PM Bug #4721 (Resolved): libcephfs tests fail when using ceph-deploy
strange that it works fine on the latest next branch [0.60-624-g426e3be-1precise] ... Tamilarasi muthamizhan
10:29 AM Bug #4742: mds: stuck clientreplay request
Attaching mds log from mds stuck on clientreplay. Looks like setattr is gets put on the inode waiting list by the lo... Sam Lang

04/21/2013

06:12 AM Bug #4753: mds/Locker.cc: 4167: FAILED assert(0)
Additional: I resolve it runtime, changing assert(0) to some lock (IMHO first in this case) on one node and found for... Denis kaganovich

04/19/2013

10:17 AM Bug #4105: mds: fix up the Dumper
This has annoyed me a couple more times and I think it's now at the top of the queue, so here we go again. Greg Farnum
10:08 AM Bug #4746: client: invalidate callback can deadlock
pushed wip-fuse to ceph-client.git Sage Weil
09:42 AM Bug #4753: mds/Locker.cc: 4167: FAILED assert(0)
You mean file_eval should just short-circuit if it's scanning? That seems like the most sensible place for it, but I'... Greg Farnum
09:31 AM Bug #4753: mds/Locker.cc: 4167: FAILED assert(0)
yeah, that transition doesn't make sense. i think it should do nothing in the scan state.. Sage Weil
09:05 AM Bug #4753: mds/Locker.cc: 4167: FAILED assert(0)
file_eval is trying to move ifile from "scan" to "mixed" in order to serve up the client caps, and scatter_mix doesn'... Greg Farnum
02:13 AM Bug #4601: symlink with size zero
I was looking at the <inode>.<frag>_head* file in the osd that held the directory where the link was stored. As it t... Alexandre Oliva

04/18/2013

05:22 PM Bug #4753 (Resolved): mds/Locker.cc: 4167: FAILED assert(0)
Every mds crashed after some startup checks: "mds/Locker.cc: 4167: FAILED assert(0)":
mds/Locker.cc: 4167: FAILED ...
Denis kaganovich
05:12 PM Bug #4746: client: invalidate callback can deadlock
The suggestion from Maxim is to modify fuse to serialize reads and invalidate via a mutex. That ought to do the tric... Sage Weil
09:37 AM Bug #4746: client: invalidate callback can deadlock
It's not any of our internal locking that are getting stuck; it's the VFS inode mutexes in combination with us. If I ... Greg Farnum
07:31 AM Bug #4746: client: invalidate callback can deadlock
The invalidate is queued in a separate thread, and when we call the invalidate, we don't have the client lock held. ... Sam Lang
05:06 PM Bug #4601: symlink with size zero
>I looked a bit in the ceph-osd file holding the directory that contains the symlink, and I can see ^Q in the yes_hea... Greg Farnum
04:57 PM Bug #1945 (Can't reproduce): blogbench hang on caps
We haven't seen this in a long time (at least, that's marked here), and there's been a ton of work here over the last... Greg Farnum
04:39 PM Bug #4732: uclient: client/Inode.cc: 126: FAILED assert(cap_refs[c] > 0)
This was in the async invalidate thread, so I'm turning this down. It should probably be investigated alongside/after... Greg Farnum
04:34 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
Okay, pushed the update for more debugging, and am downgrading this to "High" since it only appears under so many fai... Greg Farnum
04:17 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
Also, both of these are the same job as the first incident was: fsstress workunit on ceph-fuse, messenger failure inj... Greg Farnum
04:15 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
Those machines are cleared out again, of course (d'oh!). Next time we see this we need to gather up everything we can... Greg Farnum
04:03 PM Bug #4741: MDS: stuck in clientreplay
Interesting; on #4742 it was clearly waiting on a request because it kept saying "still have 1 active replay requests... Greg Farnum
03:57 PM Bug #4741 (Duplicate): MDS: stuck in clientreplay
This is a duplicate of #4742. It looks like setattr is the culprit. I was able to generate a core file of the mds w... Sam Lang
11:13 AM Bug #4741: MDS: stuck in clientreplay
Also /a/teuthology-2013-04-18_01:01:07-fs-next-testing-basic/15101 Greg Farnum
03:58 PM Bug #4721 (Need More Info): libcephfs tests fail when using ceph-deploy
(Trying to track the responsibility flow more clearly.) Greg Farnum
03:19 PM Bug #4721: libcephfs tests fail when using ceph-deploy
Have you reproduced this, Tamil? Since all the tests are failing I'm pretty sure this is some kind of authentication ... Greg Farnum
03:57 PM Bug #4742 (In Progress): mds: stuck clientreplay request
Sam Lang
03:57 PM Bug #4742: mds: stuck clientreplay request
Marked #4741 as a duplicate of this bug. It looks like setattr is the culprit. I was able to generate a core file o... Sam Lang
01:57 PM Bug #4722: kernel BUG at fs/ceph/caps.c:1006 invalid opcode: 0000
I did a checkout of v3.5, and caps.c:1006 is... Greg Farnum
01:37 PM Bug #4738: libceph: unlink vs. readdir (and other dir orders)
I don't believe locking is implemented yet via the Samba VFS bindings, since we don't have a userspace implementation... Greg Farnum
01:27 PM Bug #4738: libceph: unlink vs. readdir (and other dir orders)
On top only:
vfs objects = scannedonly ceph
And if i switching to:
vfs objects = scannedonly
or:
vfs objects = c...
Denis kaganovich
11:03 AM Bug #3637 (Resolved): client: not issuing caps for with clients doing shared writes
Merged into next in commit:efbe2e8b55ba735673a3fdb925a6304915f333d8 Greg Farnum

04/17/2013

07:42 PM Bug #4713 (Resolved): mds: hang related to access from two clients
The following have been committed to the "testing" branch
of the ceph-client git repository. With them in place
I ...
Alex Elder
07:39 PM Bug #4706 (Resolved): kclient: Oops when two clients concurrently write a file
The following have been committed to the ceph-client
"testing" branch:
8f68229 libceph: change how "safe" callbac...
Alex Elder
07:38 PM Bug #4679 (Resolved): ceph: hang while running blogbench on mira nodes
Sorry Greg, I should have been in better communication
with you. I have been testing these all afternoon and
Sage ...
Alex Elder
03:48 PM Bug #4679: ceph: hang while running blogbench on mira nodes
I believe Sage has been over all these now. I'm trying to go over the newest versions off the mailing list as well, n... Greg Farnum
07:20 PM Bug #4726 (Can't reproduce): mds: segv during blogbench in remove_pending_backtraces
I wasn't able to reproduce this after more than 200 runs, so I'm marking it as Can't reproduce for now. Sam Lang
05:37 PM Bug #3597 (Resolved): ceph-fuse: denying root access
Oh, this was a bug that got fixed in commit:d87035c0c4ff, included in v0.60. Greg Farnum
05:05 PM Bug #4746: client: invalidate callback can deadlock
Hmm, you're right, this is a more fundamental problem. Sage Weil
04:50 PM Bug #4746: client: invalidate callback can deadlock
Maybe; we didn't think this through much beyond going "yep, that's broken".
However, I think we can queue up the i...
Greg Farnum
04:44 PM Bug #4746: client: invalidate callback can deadlock
"We may need to introduce a second locking layer to deal with this, that covers draining out all VFS requests before ... Sam Lang
03:04 PM Bug #4746 (Resolved): client: invalidate callback can deadlock
I saw this when testing the fix for #3637. We appear to be (correctly) safe against deadlocks on our own locks, but w... Greg Farnum
04:12 PM Feature #4326: qa: add samba + (kclient|ceph-fuse) to suite
I think you might have mentioned you were trying to do this while you were working on the samba vfs-based ones? If no... Greg Farnum
04:09 PM Bug #1878 (Resolved): ceph.ko doesn't setattr (lchown, utimes) on symlinks
I've pushed this to our testing branch. It's presently commit:baf0169b77f6a0c384a15fb425e5700fb0239e89, although that... Greg Farnum
03:59 PM Bug #3637: client: not issuing caps for with clients doing shared writes
And he gave me a reviewed-by tag. Will merge this tomorrow morning after some more testing. Greg Farnum
03:53 PM Bug #3637: client: not issuing caps for with clients doing shared writes
This now appears to be passing (I've got it continuing to loop in the background), but it needs review and merging. S... Greg Farnum
03:05 PM Bug #3637: client: not issuing caps for with clients doing shared writes
That latest issue was #4746. Turning off the callback and testing again... Greg Farnum
05:42 AM Bug #3637: client: not issuing caps for with clients doing shared writes
Zheng Yan wrote:
> there are only 4 states that allow Fw caps, they are MIX, MIX_EXCL, EXCL and EXCL_MIX. they all a...
Zheng Yan
05:39 AM Bug #3637: client: not issuing caps for with clients doing shared writes
Greg Farnum wrote:
> I don't remember how all the locking works when you have multiple writers, but I don't believe ...
Zheng Yan
10:17 AM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
And also /a/teuthology-2013-04-16_01:00:52-fs-next-testing-basic/13665 Greg Farnum
09:26 AM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
This just happened again at /a/teuthology-2013-04-17_01:00:56-fs-master-testing-basic/14248 (it's still running, for ... Greg Farnum
10:12 AM Bug #4742: mds: stuck clientreplay request
Looks like a setattr and a create:
ubuntu@plana72:~$ sudo ceph --admin-daemon /var/run/ceph/ceph-client.0.19374.as...
Sam Lang
09:36 AM Bug #4742 (Resolved): mds: stuck clientreplay request
/a/teuthology-2013-04-17_01:00:56-fs-master-testing-basic/14246
It has a single request which isn't completing; wh...
Greg Farnum
10:06 AM Cleanup #4744 (In Progress): mds: pass around LogSegments via std::shared_ptr
These really ought to be ref-counted in some way to prevent early expiry. Greg Farnum
09:34 AM Bug #4741 (Duplicate): MDS: stuck in clientreplay
/a/teuthology-2013-04-17_01:00:56-fs-master-testing-basic/14249
I can't find any hints, except that it is in fact ...
Greg Farnum
09:00 AM Feature #3243 (In Progress): qa: test samba reexport via libcephfs vfs plugin in teuthology
Sam Lang
08:58 AM Feature #3242 (Resolved): samba: push plugin upstream
Posted patches to mailing list:
https://lists.samba.org/archive/samba-technical/2013-April/091651.html
Sam Lang
08:01 AM Bug #4738 (Need More Info): libceph: unlink vs. readdir (and other dir orders)
Denis,
I've seen similar behavior with the smbtorture dir1 test, but it happens without the vfs_ceph module. Does...
Sam Lang
04:54 AM Bug #4738 (Closed): libceph: unlink vs. readdir (and other dir orders)
Combining (stacking) in samba vfs_scannedonly with vfs_ceph, I experienced some bugs, looks like libceph readdir prob... Denis kaganovich

04/16/2013

06:41 PM Bug #3637: client: not issuing caps for with clients doing shared writes
Greg Farnum wrote:
> I don't remember how all the locking works when you have multiple writers, but I don't believe ...
Zheng Yan
03:43 PM Bug #3637: client: not issuing caps for with clients doing shared writes
Okay, it's not quite that simple. This (all following the data writeout; I think this is the data check — anyway, thi... Greg Farnum
02:58 PM Bug #3637: client: not issuing caps for with clients doing shared writes
Reproduced at last. There continues to be a problem with the fix branch too :( but it's not a max_size issue; one of ... Greg Farnum
01:47 PM Bug #3637: client: not issuing caps for with clients doing shared writes
And that wasn't working because teuthology was creating working dirs like /tmp/cephtest/gregf@kai-2013-04-16_12-59-21... Greg Farnum
10:48 AM Bug #3637 (Fix Under Review): client: not issuing caps for with clients doing shared writes
Regarding the testing (which I'm doing now), what those warnings turned out to mean is that each instance had their o... Greg Farnum
10:37 AM Bug #3637: client: not issuing caps for with clients doing shared writes
I don't remember how all the locking works when you have multiple writers, but I don't believe either of those suppos... Greg Farnum
01:11 PM Feature #4734: libcephfs: async interfaces
If when we do this, whoever does so should please be careful to refactor our synchronous interfaces in terms of the a... Greg Farnum
12:48 PM Feature #4734 (New): libcephfs: async interfaces

Implement async interfaces to libcephfs, at the least for the write and read calls.
This is motivated by the cep...
Sam Lang
12:53 PM Bug #4732: uclient: client/Inode.cc: 126: FAILED assert(cap_refs[c] > 0)
You might want to grab the ceph-fuse binary too so that the core dump is useful. Sam Lang
12:37 PM Bug #4732 (Closed): uclient: client/Inode.cc: 126: FAILED assert(cap_refs[c] > 0)
... Greg Farnum
09:59 AM Bug #4729 (Can't reproduce): mds: stuck in clientreplay
Unfortunately by the time I got in one of the machines had been allocated for another job, and now it looks like the ... Greg Farnum
07:52 AM Bug #4729 (Can't reproduce): mds: stuck in clientreplay
job was... Sage Weil
09:31 AM Bug #4694 (Resolved): client: put_snap_realm assert failure
Looks good to me; I merged it into next. This was an impressively narrow race so we couldn't get a good reproducer go... Greg Farnum

04/15/2013

04:38 PM Documentation #4727 (Resolved): upgrade doc has to be modified to include upgrading ceph-mds as well
Changed package to ceph-mds: http://ceph.com/docs/master/install/upgrading-ceph/#upgrading-a-metadata-server John Wilkins
04:26 PM Documentation #4727 (In Progress): upgrade doc has to be modified to include upgrading ceph-mds a...
John Wilkins
11:42 AM Documentation #4727 (Resolved): upgrade doc has to be modified to include upgrading ceph-mds as well
http://ceph.com/docs/master/install/upgrading-ceph/
In the above mentioned doc, in section "upgrading a metadata s...
Tamilarasi muthamizhan
12:47 PM Bug #4713 (Fix Under Review): mds: hang related to access from two clients
I have tested the commands listed above on a system with the
patches described here:
http://tracker.ceph.com/is...
Alex Elder
11:03 AM Bug #4679: ceph: hang while running blogbench on mira nodes
I ran the blogbench test with all of the above-mentioned
patches applied on a mira cluster and I never saw it hang.
...
Alex Elder
09:35 AM Bug #4679: ceph: hang while running blogbench on mira nodes
FYI, these kernel patches (Zheng's and mine) are available on
the ceph-client git repository branch "review/wip-4706...
Alex Elder
09:27 AM Bug #4679 (Fix Under Review): ceph: hang while running blogbench on mira nodes
> Found 5 bugs, fixed 4.
I reviewed the four kernel patches (they were posted on the mailing
list). I also provi...
Alex Elder
09:15 AM Bug #4679: ceph: hang while running blogbench on mira nodes
> The fix for writepages race is easier than I thought, patch is attached.
This is interesting. When I was workin...
Alex Elder
10:59 AM Bug #4660: mds: segfault in queue_backtrace_update
*blink*
Of course it's not; sorry about that.
Greg Farnum
10:57 AM Bug #4660 (Resolved): mds: segfault in queue_backtrace_update
That isn't the same bug. Opening #4726 for that issue. Sam Lang
10:52 AM Bug #4660 (In Progress): mds: segfault in queue_backtrace_update
ubuntu@teuthology:/a/teuthology-2013-04-13_01:00:48-fs-next-testing-basic/12134 Greg Farnum
10:57 AM Bug #4726 (Can't reproduce): mds: segv during blogbench in remove_pending_backtraces

ubuntu@teuthology:/a/teuthology-2013-04-13_01:00:48-fs-next-testing-basic/12134
2013-04-13T18:52:50.199 INFO:t...
Sam Lang
09:33 AM Bug #4706 (Fix Under Review): kclient: Oops when two clients concurrently write a file
I have posted two patches, one which resolves the
crash due to an interrupt while waiting and one
that resolves Zhe...
Alex Elder
08:46 AM Bug #3579: kclient: Use less secure random number generator so we don't consume entropy
commit 442318d09506d33e811d9d6a7bd2514287df729d
Ian Colle

04/13/2013

09:46 AM Bug #4722 (Can't reproduce): kernel BUG at fs/ceph/caps.c:1006 invalid opcode: 0000
Top of Call trace:... Matthew Roy

04/12/2013

11:07 PM Bug #4721: libcephfs tests fail when using ceph-deploy
I'm able to reproduce this failure.
I'm much less familiar with libceph than I am the libcephfs-java code, so I'm g...
Anonymous
05:42 PM Bug #4721: libcephfs tests fail when using ceph-deploy
and the logs are placed in burnupi06.front.sepia.ceph.com:/home/ubuntu/apr12_cdep_libcephfs/ Tamilarasi muthamizhan
05:41 PM Bug #4721 (Resolved): libcephfs tests fail when using ceph-deploy
ceph version : 0.60-467-g6b98162-1precise
config.yaml used to reproduce
tamil@ubuntu:~/test_logs_cuttlefish/apr...
Tamilarasi muthamizhan
08:36 PM Bug #3637: client: not issuing caps for with clients doing shared writes
If Locker::_do_cap_update can't get wrlock for a given client, the client should have no Fw cap. I think we can make ... Zheng Yan
04:47 PM Bug #3637: client: not issuing caps for with clients doing shared writes
I'm having difficulty reproducing this at all on current next, but am leaving it churning in the background... :/
...
Greg Farnum
01:36 PM Feature #3242 (In Progress): samba: push plugin upstream
Sam has been working on this for the last couple days. Greg Farnum
11:06 AM Bug #3579 (Resolved): kclient: Use less secure random number generator so we don't consume entropy
Sam Lang
10:13 AM Bug #4660 (Resolved): mds: segfault in queue_backtrace_update
The commit that hit this segv above looks like it was off of master, whereas the fix went into next. I was able to r... Sam Lang
09:30 AM Bug #4694 (Fix Under Review): client: put_snap_realm assert failure
Pushed wip-4694. Still trying to reproduce this reliably so that I can test the proposed fix. Sam Lang
09:26 AM Bug #4706: kclient: Oops when two clients concurrently write a file
Zheng Yan wrote:
> The Oops is caused by uninitialized req->r_inode
Already tracked down the Oops. time to sleep,...
Zheng Yan
09:07 AM Bug #4706: kclient: Oops when two clients concurrently write a file
FYI I just reproduced the problem without interrupt
and it matches what I saw before. (So I don't believe
the inte...
Alex Elder
07:39 AM Bug #4706: kclient: Oops when two clients concurrently write a file
I also proposed a fix: [PATCH 1/4] ceph: add osd request to inode unsafe list in advance Zheng Yan
07:22 AM Bug #4706: kclient: Oops when two clients concurrently write a file
Zheng I think I have a fix. I'm going to test it first,
but then I'd like to supply it to you to see if it resolves...
Alex Elder
05:23 AM Bug #4706 (New): kclient: Oops when two clients concurrently write a file
> Found a potential cause. the request may complete before adding it
> to the unsafe list.
I think that not being...
Alex Elder
12:09 AM Bug #4706: kclient: Oops when two clients concurrently write a file
The Oops is caused by uninitialized req->r_inode Zheng Yan
07:35 AM Bug #4679: ceph: hang while running blogbench on mira nodes
The fix for writepages race is easier than I thought, patch is attached. Zheng Yan
01:08 AM Bug #4679: ceph: hang while running blogbench on mira nodes
Found 5 bugs, fixed 4. The remaining one is a race between truncate and writepages. Truncate message from MDS can cha... Zheng Yan

04/11/2013

08:26 PM Bug #4714 (Duplicate): kclient: ceph_sync_{read,write} only accept single buffer.
So readv and writev are broken for SYNC IO Zheng Yan
07:28 PM Bug #4713: mds: hang related to access from two clients
I discovered this while trying to reproduce the issue
in http://tracker.ceph.com/issues/4706.
I documented it the...
Alex Elder
07:24 PM Bug #4713 (Resolved): mds: hang related to access from two clients
Alex Elder
06:31 PM Bug #4706: kclient: Oops when two clients concurrently write a file
This crash looks a little bit familiar to me, and I think
I created a bug for it, but at the moment I can't find it....
Alex Elder
05:52 PM Bug #4706: kclient: Oops when two clients concurrently write a file
OK, well I believe I have reproduced the problem.
I did this on two nodes simultaneously:
dd if=/dev/zero of=...
Alex Elder
09:23 AM Bug #4706: kclient: Oops when two clients concurrently write a file
Yes, test branch of ceph-client. The hint to trigger the Oops is multiple clients write date to a file at the same ti... Zheng Yan
08:52 AM Bug #4706: kclient: Oops when two clients concurrently write a file
Well, I unfortunately got the same problem using
the "bobtail" branch.
Specifically what I'm doing:...
Alex Elder
08:15 AM Bug #4706: kclient: Oops when two clients concurrently write a file
Well that's interesting.
I haven't been working with the ceph file system much so
I'm not sure what to expect. B...
Alex Elder
07:43 AM Bug #4706: kclient: Oops when two clients concurrently write a file
> the request may complete before adding it to the unsafe list.
That looks like a reasonable explanation to me. A...
Alex Elder
06:28 AM Bug #4706: kclient: Oops when two clients concurrently write a file
... Zheng Yan
05:56 AM Bug #4706: kclient: Oops when two clients concurrently write a file
It is a new issue in the sync write path, nothing to do with cap revoke. Alex has made quite a lot of changes in that... Zheng Yan
05:01 AM Bug #4706: kclient: Oops when two clients concurrently write a file
Them doing a sync write is probably correct as their concurrency is being managed by the MDS now, and they aren't goi... Greg Farnum
06:06 PM Bug #3637 (In Progress): client: not issuing caps for with clients doing shared writes
Since I apparently forgot to mention it here, this has nothing to do with #4489; I just pattern-matched a little too ... Greg Farnum
09:09 AM Bug #4644 (Resolved): mds crashing after upgrade from 0.58 to 0.60
Merged into next as of commit:d777b8e66b2e950266e52589c129b00f77b8afc0 (Thanks Sam!). Greg Farnum
02:25 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
so patch tested, mds is running fine now. thx ! norbert schmidt
02:18 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
Last patch seems work. At least mds dont crash anymore. Also df reports non bogus values.
I'll add this patch to gen...
Alexey Shvetsov
12:14 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
let me know if i can test patches for you ! :) norbert schmidt
09:06 AM Bug #4451 (Resolved): client: Ceph client not releasing cap
Merged into next via commit:e32849c4eef2f5d911288aabeac0a6967b1e6ae4
I'm electing not to backport this despite its...
Greg Farnum
08:16 AM Fix #4708 (Rejected): MDS: journaler pre-zeroing is dangerous
See http://pastebin.com/NJd0UCfF
At first glance it looks like there's a short and a missing log object, and then ...
Greg Farnum
08:15 AM Bug #4105: mds: fix up the Dumper
Promoting this to high as it can be so useful for gathering important debug data; it would be nice to have done befor... Greg Farnum

04/10/2013

11:52 PM Bug #4706 (Resolved): kclient: Oops when two clients concurrently write a file
... Zheng Yan
08:31 PM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
The code looks good. Zheng Yan
01:10 PM Bug #4644 (Fix Under Review): mds crashing after upgrade from 0.58 to 0.60
Hurray, I did manage to reproduce so I guess I just missed before, and indeed it works with that patch and fails with... Greg Farnum
12:38 PM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
I'm having trouble reproducing this bug, but I'm probably not going through the right steps. A patch that I think sho... Greg Farnum
12:20 PM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
if you have some patch that we can test, i'd be glad =) Alexey Shvetsov
10:27 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
Ah, this looks to be less bad than I thought — the (struct_v == 2) check should be (struct_v <= 2) is all, from the s... Greg Farnum
09:03 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
update directly from IRC, as alexxy is still having registration issues:
<alexxy> joao: upgrade was from version 0...
Joao Eduardo Luis
09:11 AM Bug #3579 (Fix Under Review): kclient: Use less secure random number generator so we don't consum...
Patches sent to the mailing list and pushed to wip-3579. Sam Lang
09:07 AM Bug #4569: ceph-mds: segfault
It looks like this fix didn't make it into 0.60. See #4696. Sam Lang
09:06 AM Bug #4696: MDS Crashes with Segmentation fault near Objecter::handle_osd_op_reply
Oh you're using 0.60. Looks like that commit didn't make it into the 0.60 release. It will be fixed in the next one! Sam Lang
09:04 AM Bug #4696 (Duplicate): MDS Crashes with Segmentation fault near Objecter::handle_osd_op_reply
This is a duplicate of #4569. Its fixed in 0.60 if you're willing to upgrade. Sam Lang
06:37 AM Bug #4696 (Duplicate): MDS Crashes with Segmentation fault near Objecter::handle_osd_op_reply
Limited logs at http://goo.gl/VAIFh... Matthew Roy
05:23 AM Bug #4679 (In Progress): ceph: hang while running blogbench on mira nodes
I reproduced a hang, it is an 'i_mutex + cap revoking' deadlock.... Zheng Yan
12:58 AM Bug #1878: ceph.ko doesn't setattr (lchown, utimes) on symlinks
For xattrs, there is no difference between symbol links and regular file. For setattr, I think the only difference is... Zheng Yan

04/09/2013

07:49 PM Bug #4451: client: Ceph client not releasing cap
Please review again based on the latest changed pushed to wip-4451. Sam Lang
04:27 PM Bug #4451: client: Ceph client not releasing cap
Does this need more review or just testing? (I ask because I notice you've got two reviewed-by tags on it, although I... Greg Farnum
08:48 AM Bug #4451: client: Ceph client not releasing cap
Thanks Yan for fixing up that patch and testing it out. The inode check was just cruft from the previous changes, an... Sam Lang
06:00 AM Bug #4451: client: Ceph client not releasing cap
After removing the path_is_mine check, MDCache::parallel_fetch_traverse_dir() needs skip non-auth dirfrags. The modif... Zheng Yan
06:34 PM Bug #4644 (In Progress): mds crashing after upgrade from 0.58 to 0.60
That shouldn't be a problem for v0.58; it included version 2 session_info_t. You sure that's the version you upgraded... Greg Farnum
06:18 PM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
The 26th byte of Norbert's sessionmap is 1. If I'm not wrong, it's struct_v for session_info_t. But the oldest versio... Zheng Yan
10:58 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
alexxy's sessionmap doesn't look anything like a sessionmap should; this won't fix his issue. Norbert's is at least s... Greg Farnum
06:20 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
alexxy on IRC is reporting that the patch doesn't work. He would have provided his report himself, but it appears th... Joao Eduardo Luis
04:13 PM Bug #4618 (Resolved): Journaler: _is_readable() and _prefetch() don't communicate correctly
Merged into next in commit:8eb5465c10840d047a894d1a4f079ff8b8d608b5. This would apply to bobtail as well if we decide... Greg Farnum
03:12 PM Bug #4679: ceph: hang while running blogbench on mira nodes
Not off-hand, but I haven't spent any time thinking about it yet. This one could be differences between how aggressiv... Greg Farnum
03:03 PM Bug #4679: ceph: hang while running blogbench on mira nodes
We've only seen a certain set of errors at the mds with the kernel client (this one and #4660 - although they may be ... Sam Lang
02:57 PM Bug #4679: ceph: hang while running blogbench on mira nodes
*sigh* Yep...
I've marked this as an MDS issue for now, but it could be a broader protocol change or something as ...
Greg Farnum
02:45 PM Bug #4679 (Rejected): ceph: hang while running blogbench on mira nodes
I re-ran the blogbench test 10 times using the "bobtail"
branch of ceph and never saw a hang.
I'm going to call t...
Alex Elder
12:13 PM Bug #4679: ceph: hang while running blogbench on mira nodes
I got another hang without any debug info being dumped
from the MDS. This time I just abandoned it. I'm about
to ...
Alex Elder
02:50 PM Bug #4694 (Resolved): client: put_snap_realm assert failure
... Greg Farnum
11:04 AM Bug #1878: ceph.ko doesn't setattr (lchown, utimes) on symlinks
I'm actually not sure how the symlink stuff is represented in our kernel client or the VFS — do these functions handl... Greg Farnum
08:31 AM Bug #4660 (In Progress): mds: segfault in queue_backtrace_update
Sam Lang
08:30 AM Bug #4660: mds: segfault in queue_backtrace_update
Alex hit the same segfault with the next branch yesterday, looks like the commit 3cdc61ec doesn't fix this bug. The ... Sam Lang

04/08/2013

08:32 PM Bug #4680 (Closed): mds: log possibly not trimming
2013-03-28 10:27:35.154461 7f1fc96b8700 10 mds.0.log trim 2 / 30 segments, 10 / -1 events, 0 (0) expiring, 0 (0) expi... Zheng Yan
10:32 AM Bug #4680: mds: log possibly not trimming
Yeah, it's not a generic never trimming; just not certain about this one. It could also be fine and just that there's... Greg Farnum
10:27 AM Bug #4680: mds: log possibly not trimming
I've seen it trim logs in the tests I've been running, but that's with mds_log_segment_size=16K and mds_log_max_segme... Sam Lang
10:04 AM Bug #4680 (Closed): mds: log possibly not trimming
Apparently there are a lot of old files showing up in the log replay, and I noticed previously on a different issue t... Greg Farnum
08:20 PM Bug #4644 (Fix Under Review): mds crashing after upgrade from 0.58 to 0.60
there is a typo in session_info_t::decode Zheng Yan
08:04 PM Bug #4451: client: Ceph client not releasing cap
Greg Farnum wrote:
> Although I think the MDS would need to have the inode in cache for that to happen — it would ha...
Zheng Yan
10:59 AM Bug #4451: client: Ceph client not releasing cap
Zheng Yan wrote:
> "Regarding the cap export, is it possible that the client has a cap that it thinks belongs to the...
Greg Farnum
09:43 AM Bug #4451: client: Ceph client not releasing cap
"Regarding the cap export, is it possible that the client has a cap that it thinks belongs to the mds, but the mds do... Zheng Yan
09:13 AM Bug #4451: client: Ceph client not releasing cap
"After removing the path_is_mine check in Server::handle_client_reconnect(), I think we should also call mdcache->rej... Sam Lang
04:41 PM Bug #4685 (Can't reproduce): BUG: unable to handle kernel NULL pointer dereference at
0.56.4 ceph, 3.8 kernel... Andras Elso
02:22 PM Bug #4679: ceph: hang while running blogbench on mira nodes
It looked very promising. 4 successful passes, but the
last one hung again. This time there were two blogbench
ta...
Alex Elder
12:26 PM Bug #4679: ceph: hang while running blogbench on mira nodes
One pass succeeded, so it's looking good.
I'll let it run 5 times and if all are successful, I'll just
close this...
Alex Elder
11:56 AM Bug #4679: ceph: hang while running blogbench on mira nodes
I talked with Sam Lang who said I should try again with
mds debugging on. That led to more info getting dumped
on ...
Alex Elder
11:01 AM Bug #4679: ceph: hang while running blogbench on mira nodes
... Alex Elder
10:49 AM Bug #4679: ceph: hang while running blogbench on mira nodes
Actually, the other common theme (maybe more important)
is the involvement of an in-progress ceph_setattr() call.
...
Alex Elder
10:40 AM Bug #4679 (In Progress): ceph: hang while running blogbench on mira nodes
Unfortunately it looks like I've reproduced the problem
with my patches. The common theme is ceph_aio_write(), so
...
Alex Elder
10:04 AM Bug #4679: ceph: hang while running blogbench on mira nodes
I ran those tests a few times with the testing branch and
the problem did not show up. I reduced the test to just
...
Alex Elder
05:49 AM Bug #4679: ceph: hang while running blogbench on mira nodes
Here is an excerpt of the yaml file driving the
tests, leading up to the blogbench run:...
Alex Elder
05:29 AM Bug #4679: ceph: hang while running blogbench on mira nodes
Here are the versions of ceph and teuthology I'm using
while running these tests:
ceph
f5ba0fb mon: make 'osd cr...
Alex Elder
05:26 AM Bug #4679: ceph: hang while running blogbench on mira nodes
Here is a log of the commits in place during these
tests. (I know, quite a few...) The last one is
the current te...
Alex Elder
05:24 AM Bug #4679: ceph: hang while running blogbench on mira nodes
Here is an excerpt of the stack trace generated using:
echo t > /proc/sysrq-trigger
[31482.585095] blogbench....
Alex Elder
05:21 AM Bug #4679 (Resolved): ceph: hang while running blogbench on mira nodes
I have seen this only on mira nodes, now twice on two
consecutive attempts. I've run the same set of tests
with th...
Alex Elder
11:02 AM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Said he could look at this for me today. Greg Farnum
09:29 AM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
Heh, no; that was supposed to be a 10. Re-pushed; thanks! Greg Farnum
09:34 AM Bug #3579 (In Progress): kclient: Use less secure random number generator so we don't consume ent...
Sam Lang
07:16 AM Bug #4660 (Fix Under Review): mds: segfault in queue_backtrace_update
Pushed a fix to wip-4660. The mdr was getting deleted before we queued the backtrace for update, so mdr->ls was inva... Sam Lang

04/07/2013

01:46 AM Bug #1878 (Fix Under Review): ceph.ko doesn't setattr (lchown, utimes) on symlinks
ceph_symlink_iops does not have getattr/setattr and xattrs related mothods Zheng Yan
01:25 AM Bug #4241 (Duplicate): SELinux fails because it can't set xattrs
This is the same problem as #1878 (ceph_symlink_iops doesn't have setattr method) Zheng Yan
 

Also available in: Atom