Activity
From 04/07/2013 to 05/06/2013
05/06/2013
- 02:57 PM Bug #4920 (Resolved): client: does not respect O_NOFOLLOW
- It looks like doing an open() always implicitly follows symlinks, because we call path_walk() with followsym set to t...
05/05/2013
- 06:36 AM Bug #4909: mds: stalled/stuck directory (standby)
- Directory accessed only after reboot one of node (with stalled mount's) - not after only ceph daemons restarting.
05/04/2013
- 03:56 AM Bug #4909: mds: stalled/stuck directory (standby)
- Sorry, comment 1 is about ctdbd (IMHO), forget. Only main issue.
- 03:51 AM Bug #4909: mds: stalled/stuck directory (standby)
- & (without debug 10) now log flooding on other node (mds.4):
2013-05-04 13:47:27.648019 7fe8c59ca700 0 mds.0.serv...
05/03/2013
- 07:18 PM Bug #4909 (Can't reproduce): mds: stalled/stuck directory (standby)
- I many times break actions (debug mysql replication script, just multiple dump redirections) directly to directory, m...
- 02:53 PM Bug #4894: mds: standby shut itself down due to not having any data
- MDS::boot_create() first starts a new log segment (its ESubtreemap is empty), then use MDCache::create_empty_hierarch...
- 10:40 AM Bug #4894: mds: standby shut itself down due to not having any data
- You must be racing ahead of me here, Yan — what's your theory? Just that the first active MDS failed to write any log...
- 12:24 PM Feature #4906 (Resolved): ceph-fuse: use the Preforker class
- Sage wrote a Preforker class for the Monitor. We should switch to using that instead of our own band-aided daemonizat...
05/02/2013
- 07:29 PM Bug #4894: mds: standby shut itself down due to not having any data
- I think MDS::boot_create() should start a new log segment after creating the fs hierarchy.
- 10:56 AM Bug #4894 (Resolved): mds: standby shut itself down due to not having any data
- ...
- 03:05 PM Feature #4326 (Fix Under Review): qa: add samba + (kclient|ceph-fuse) to suite
- These changes were part of the samba.py task changes in wip-samba-tasks. An example of use is in ceph-qa-suite:suite...
- 08:28 AM Bug #4832: mds: failed auth_unpin assert
- hit this again:...
05/01/2013
- 03:06 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
- This is happening with the argonaut ceph-fuse daemon, not a cuttlefish one. Going to turn this down to High again and...
- 02:19 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
- steps to reproduce:
bring up a cluster of 2 nodes running argonaut, run blogbench workload on it from client.
upg... - 12:46 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
- Are you still running old clients when you hit this?
- 12:04 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
- hitting this pretty consistently when upgrading mds from argonaut to cuttlefish.
have this reproduced on burnupi39... - 02:27 PM Bug #4105 (Resolved): mds: fix up the Dumper
- Merged into next in commit:dfacd1bd805ebb730b5206c9830b28f47cc7f9cf. Hurray!
- 02:20 PM Bug #4105 (Fix Under Review): mds: fix up the Dumper
- wip-4105-mds-dumper
Wasn't actually that complicated; it's just the locking expectations around the Objecter chang... - 02:23 PM Feature #4886 (Resolved): teuthology: add tests that use the MDS dumper
- We want to prevent the Dumper from bitrotting like it has been. Figure out a simple and effective way to test the dum...
- 02:20 PM Feature #4885 (Resolved): dumper: do an incremental log dump
- Right now we read it all into memory and then dump it out into a file. So far that's been okay, but we probably want ...
- 11:28 AM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
- Hmm, I thought we handled renames properly since they involve changing the caps state. But maybe we don't propagate t...
- 11:13 AM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
- ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-05-01_01:00:37-fs-next-testing-basic/4534
- 04:41 AM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
- I think this is a general issue. When handling MClientReconnect, if an inode is not in the cache, the MDS tries fetch...
04/30/2013
- 01:26 PM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
- The attached files include the complete client log, along with the mds logs that include 10000000004 (one of the indo...
- 09:39 AM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
- We can't revoke on unlink because the file might still be held open with something accessing it. :)
- 09:38 AM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
- This looks like the client creates a file, then unlinks it, but it never removes it from its cache, because it still ...
- 05:41 AM Bug #4850 (In Progress): ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
04/29/2013
- 04:58 PM Bug #4853 (Resolved): ceph-fuse hang on mount getattr
- commit:ee553ac279664b7f1b527a0b1b56768134cf5157
- 12:43 PM Bug #4853: ceph-fuse hang on mount getattr
- this is not a new race, and is only triggered when a mds session open and request race with an mds restart. not a cu...
- 10:47 AM Bug #4853 (Fix Under Review): ceph-fuse hang on mount getattr
- fix in wip-up
here is the client-side log that shows we send the getattr twice. we only process the first reply, ... - 09:21 AM Bug #4853: ceph-fuse hang on mount getattr
- Ignore that, wrong bug — sorry.
- 09:20 AM Bug #4853: ceph-fuse hang on mount getattr
- /a/teuthology-2013-04-28_21:32:40-fs-next-testing-basic/2662
That's an fsstress run that got hung, I copied the cl... - 09:02 AM Bug #4853 (In Progress): ceph-fuse hang on mount getattr
- 08:38 AM Bug #4853 (Resolved): ceph-fuse hang on mount getattr
- 100% reproducible with this job file...
- 02:26 PM Bug #4861 (Rejected): Alter Java components to build against Java 1.6 (or 1.7)
- The Java packages use -source 1.5 to specify that they should use that version of the API. This is being done for com...
- 09:21 AM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
- /a/teuthology-2013-04-28_21:32:40-fs-next-testing-basic/2662
That's an fsstress run that got hung, I copied the cl...
04/28/2013
- 08:51 AM Bug #4850: ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
- have full log.. put a copy in the run dir
- 08:50 AM Bug #4850 (Resolved): ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
- ...
04/26/2013
- 11:19 AM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
- /a/teuthology-2013-04-26_02:29:14-fs-next-testing-basic/1450
- 11:17 AM Bug #4832 (Resolved): mds: failed auth_unpin assert
- ...
- 10:44 AM Bug #4829 (Closed): client: handling part of MClientForward incorrectly?
- (In reference to a backwards check for is_replay when doing encode_cap_releases())...
- 09:52 AM Bug #4742 (Resolved): mds: stuck clientreplay request
- commit:5121e56c255c079569f02e0ee852e469f38f470e
04/25/2013
- 06:34 PM Bug #4742: mds: stuck clientreplay request
- Yeah, we've discussed this some on github around wip-4742 and on irc. :)
- 06:31 PM Bug #4742: mds: stuck clientreplay request
- Looks like a client bug, it may add cap releases to the replay requests. (encode_cap_releases() should be called when...
- 10:38 AM Bug #4742: mds: stuck clientreplay request
- Logs for two runs, one is stuck in replay from a setattr, the other is stuck in replay from a rename.
04/23/2013
- 03:53 PM Feature #4799 (Resolved): Client Security for CephFS
- As discussed on the #ceph IRC channel with gregaf and others, I would find some added level of client security in Cep...
- 01:34 PM Bug #4721 (Resolved): libcephfs tests fail when using ceph-deploy
- strange that it works fine on the latest next branch [0.60-624-g426e3be-1precise] ...
- 10:29 AM Bug #4742: mds: stuck clientreplay request
- Attaching mds log from mds stuck on clientreplay. Looks like setattr is gets put on the inode waiting list by the lo...
04/21/2013
- 06:12 AM Bug #4753: mds/Locker.cc: 4167: FAILED assert(0)
- Additional: I resolve it runtime, changing assert(0) to some lock (IMHO first in this case) on one node and found for...
04/19/2013
- 10:17 AM Bug #4105: mds: fix up the Dumper
- This has annoyed me a couple more times and I think it's now at the top of the queue, so here we go again.
- 10:08 AM Bug #4746: client: invalidate callback can deadlock
- pushed wip-fuse to ceph-client.git
- 09:42 AM Bug #4753: mds/Locker.cc: 4167: FAILED assert(0)
- You mean file_eval should just short-circuit if it's scanning? That seems like the most sensible place for it, but I'...
- 09:31 AM Bug #4753: mds/Locker.cc: 4167: FAILED assert(0)
- yeah, that transition doesn't make sense. i think it should do nothing in the scan state..
- 09:05 AM Bug #4753: mds/Locker.cc: 4167: FAILED assert(0)
- file_eval is trying to move ifile from "scan" to "mixed" in order to serve up the client caps, and scatter_mix doesn'...
- 02:13 AM Bug #4601: symlink with size zero
- I was looking at the <inode>.<frag>_head* file in the osd that held the directory where the link was stored. As it t...
04/18/2013
- 05:22 PM Bug #4753 (Resolved): mds/Locker.cc: 4167: FAILED assert(0)
- Every mds crashed after some startup checks: "mds/Locker.cc: 4167: FAILED assert(0)":
mds/Locker.cc: 4167: FAILED ... - 05:12 PM Bug #4746: client: invalidate callback can deadlock
- The suggestion from Maxim is to modify fuse to serialize reads and invalidate via a mutex. That ought to do the tric...
- 09:37 AM Bug #4746: client: invalidate callback can deadlock
- It's not any of our internal locking that are getting stuck; it's the VFS inode mutexes in combination with us. If I ...
- 07:31 AM Bug #4746: client: invalidate callback can deadlock
- The invalidate is queued in a separate thread, and when we call the invalidate, we don't have the client lock held. ...
- 05:06 PM Bug #4601: symlink with size zero
- >I looked a bit in the ceph-osd file holding the directory that contains the symlink, and I can see ^Q in the yes_hea...
- 04:57 PM Bug #1945 (Can't reproduce): blogbench hang on caps
- We haven't seen this in a long time (at least, that's marked here), and there's been a ton of work here over the last...
- 04:39 PM Bug #4732: uclient: client/Inode.cc: 126: FAILED assert(cap_refs[c] > 0)
- This was in the async invalidate thread, so I'm turning this down. It should probably be investigated alongside/after...
- 04:34 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
- Okay, pushed the update for more debugging, and am downgrading this to "High" since it only appears under so many fai...
- 04:17 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
- Also, both of these are the same job as the first incident was: fsstress workunit on ceph-fuse, messenger failure inj...
- 04:15 PM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
- Those machines are cleared out again, of course (d'oh!). Next time we see this we need to gather up everything we can...
- 04:03 PM Bug #4741: MDS: stuck in clientreplay
- Interesting; on #4742 it was clearly waiting on a request because it kept saying "still have 1 active replay requests...
- 03:57 PM Bug #4741 (Duplicate): MDS: stuck in clientreplay
- This is a duplicate of #4742. It looks like setattr is the culprit. I was able to generate a core file of the mds w...
- 11:13 AM Bug #4741: MDS: stuck in clientreplay
- Also /a/teuthology-2013-04-18_01:01:07-fs-next-testing-basic/15101
- 03:58 PM Bug #4721 (Need More Info): libcephfs tests fail when using ceph-deploy
- (Trying to track the responsibility flow more clearly.)
- 03:19 PM Bug #4721: libcephfs tests fail when using ceph-deploy
- Have you reproduced this, Tamil? Since all the tests are failing I'm pretty sure this is some kind of authentication ...
- 03:57 PM Bug #4742 (In Progress): mds: stuck clientreplay request
- 03:57 PM Bug #4742: mds: stuck clientreplay request
- Marked #4741 as a duplicate of this bug. It looks like setattr is the culprit. I was able to generate a core file o...
- 01:57 PM Bug #4722: kernel BUG at fs/ceph/caps.c:1006 invalid opcode: 0000
- I did a checkout of v3.5, and caps.c:1006 is...
- 01:37 PM Bug #4738: libceph: unlink vs. readdir (and other dir orders)
- I don't believe locking is implemented yet via the Samba VFS bindings, since we don't have a userspace implementation...
- 01:27 PM Bug #4738: libceph: unlink vs. readdir (and other dir orders)
- On top only:
vfs objects = scannedonly ceph
And if i switching to:
vfs objects = scannedonly
or:
vfs objects = c... - 11:03 AM Bug #3637 (Resolved): client: not issuing caps for with clients doing shared writes
- Merged into next in commit:efbe2e8b55ba735673a3fdb925a6304915f333d8
04/17/2013
- 07:42 PM Bug #4713 (Resolved): mds: hang related to access from two clients
- The following have been committed to the "testing" branch
of the ceph-client git repository. With them in place
I ... - 07:39 PM Bug #4706 (Resolved): kclient: Oops when two clients concurrently write a file
- The following have been committed to the ceph-client
"testing" branch:
8f68229 libceph: change how "safe" callbac... - 07:38 PM Bug #4679 (Resolved): ceph: hang while running blogbench on mira nodes
- Sorry Greg, I should have been in better communication
with you. I have been testing these all afternoon and
Sage ... - 03:48 PM Bug #4679: ceph: hang while running blogbench on mira nodes
- I believe Sage has been over all these now. I'm trying to go over the newest versions off the mailing list as well, n...
- 07:20 PM Bug #4726 (Can't reproduce): mds: segv during blogbench in remove_pending_backtraces
- I wasn't able to reproduce this after more than 200 runs, so I'm marking it as Can't reproduce for now.
- 05:37 PM Bug #3597 (Resolved): ceph-fuse: denying root access
- Oh, this was a bug that got fixed in commit:d87035c0c4ff, included in v0.60.
- 05:05 PM Bug #4746: client: invalidate callback can deadlock
- Hmm, you're right, this is a more fundamental problem.
- 04:50 PM Bug #4746: client: invalidate callback can deadlock
- Maybe; we didn't think this through much beyond going "yep, that's broken".
However, I think we can queue up the i... - 04:44 PM Bug #4746: client: invalidate callback can deadlock
- "We may need to introduce a second locking layer to deal with this, that covers draining out all VFS requests before ...
- 03:04 PM Bug #4746 (Resolved): client: invalidate callback can deadlock
- I saw this when testing the fix for #3637. We appear to be (correctly) safe against deadlocks on our own locks, but w...
- 04:12 PM Feature #4326: qa: add samba + (kclient|ceph-fuse) to suite
- I think you might have mentioned you were trying to do this while you were working on the samba vfs-based ones? If no...
- 04:09 PM Bug #1878 (Resolved): ceph.ko doesn't setattr (lchown, utimes) on symlinks
- I've pushed this to our testing branch. It's presently commit:baf0169b77f6a0c384a15fb425e5700fb0239e89, although that...
- 03:59 PM Bug #3637: client: not issuing caps for with clients doing shared writes
- And he gave me a reviewed-by tag. Will merge this tomorrow morning after some more testing.
- 03:53 PM Bug #3637: client: not issuing caps for with clients doing shared writes
- This now appears to be passing (I've got it continuing to loop in the background), but it needs review and merging. S...
- 03:05 PM Bug #3637: client: not issuing caps for with clients doing shared writes
- That latest issue was #4746. Turning off the callback and testing again...
- 05:42 AM Bug #3637: client: not issuing caps for with clients doing shared writes
- Zheng Yan wrote:
> there are only 4 states that allow Fw caps, they are MIX, MIX_EXCL, EXCL and EXCL_MIX. they all a... - 05:39 AM Bug #3637: client: not issuing caps for with clients doing shared writes
- Greg Farnum wrote:
> I don't remember how all the locking works when you have multiple writers, but I don't believe ... - 10:17 AM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
- And also /a/teuthology-2013-04-16_01:00:52-fs-next-testing-basic/13665
- 09:26 AM Bug #4565: MDS/client: issue decoding MClientReconnect on MDS
- This just happened again at /a/teuthology-2013-04-17_01:00:56-fs-master-testing-basic/14248 (it's still running, for ...
- 10:12 AM Bug #4742: mds: stuck clientreplay request
- Looks like a setattr and a create:
ubuntu@plana72:~$ sudo ceph --admin-daemon /var/run/ceph/ceph-client.0.19374.as... - 09:36 AM Bug #4742 (Resolved): mds: stuck clientreplay request
- /a/teuthology-2013-04-17_01:00:56-fs-master-testing-basic/14246
It has a single request which isn't completing; wh... - 10:06 AM Cleanup #4744 (In Progress): mds: pass around LogSegments via std::shared_ptr
- These really ought to be ref-counted in some way to prevent early expiry.
- 09:34 AM Bug #4741 (Duplicate): MDS: stuck in clientreplay
- /a/teuthology-2013-04-17_01:00:56-fs-master-testing-basic/14249
I can't find any hints, except that it is in fact ... - 09:00 AM Feature #3243 (In Progress): qa: test samba reexport via libcephfs vfs plugin in teuthology
- 08:58 AM Feature #3242 (Resolved): samba: push plugin upstream
- Posted patches to mailing list:
https://lists.samba.org/archive/samba-technical/2013-April/091651.html - 08:01 AM Bug #4738 (Need More Info): libceph: unlink vs. readdir (and other dir orders)
- Denis,
I've seen similar behavior with the smbtorture dir1 test, but it happens without the vfs_ceph module. Does... - 04:54 AM Bug #4738 (Closed): libceph: unlink vs. readdir (and other dir orders)
- Combining (stacking) in samba vfs_scannedonly with vfs_ceph, I experienced some bugs, looks like libceph readdir prob...
04/16/2013
- 06:41 PM Bug #3637: client: not issuing caps for with clients doing shared writes
- Greg Farnum wrote:
> I don't remember how all the locking works when you have multiple writers, but I don't believe ... - 03:43 PM Bug #3637: client: not issuing caps for with clients doing shared writes
- Okay, it's not quite that simple. This (all following the data writeout; I think this is the data check — anyway, thi...
- 02:58 PM Bug #3637: client: not issuing caps for with clients doing shared writes
- Reproduced at last. There continues to be a problem with the fix branch too :( but it's not a max_size issue; one of ...
- 01:47 PM Bug #3637: client: not issuing caps for with clients doing shared writes
- And that wasn't working because teuthology was creating working dirs like /tmp/cephtest/gregf@kai-2013-04-16_12-59-21...
- 10:48 AM Bug #3637 (Fix Under Review): client: not issuing caps for with clients doing shared writes
- Regarding the testing (which I'm doing now), what those warnings turned out to mean is that each instance had their o...
- 10:37 AM Bug #3637: client: not issuing caps for with clients doing shared writes
- I don't remember how all the locking works when you have multiple writers, but I don't believe either of those suppos...
- 01:11 PM Feature #4734: libcephfs: async interfaces
- If when we do this, whoever does so should please be careful to refactor our synchronous interfaces in terms of the a...
- 12:48 PM Feature #4734 (New): libcephfs: async interfaces
Implement async interfaces to libcephfs, at the least for the write and read calls.
This is motivated by the cep...- 12:53 PM Bug #4732: uclient: client/Inode.cc: 126: FAILED assert(cap_refs[c] > 0)
- You might want to grab the ceph-fuse binary too so that the core dump is useful.
- 12:37 PM Bug #4732 (Closed): uclient: client/Inode.cc: 126: FAILED assert(cap_refs[c] > 0)
- ...
- 09:59 AM Bug #4729 (Can't reproduce): mds: stuck in clientreplay
- Unfortunately by the time I got in one of the machines had been allocated for another job, and now it looks like the ...
- 07:52 AM Bug #4729 (Can't reproduce): mds: stuck in clientreplay
- job was...
- 09:31 AM Bug #4694 (Resolved): client: put_snap_realm assert failure
- Looks good to me; I merged it into next. This was an impressively narrow race so we couldn't get a good reproducer go...
04/15/2013
- 04:38 PM Documentation #4727 (Resolved): upgrade doc has to be modified to include upgrading ceph-mds as well
- Changed package to ceph-mds: http://ceph.com/docs/master/install/upgrading-ceph/#upgrading-a-metadata-server
- 04:26 PM Documentation #4727 (In Progress): upgrade doc has to be modified to include upgrading ceph-mds a...
- 11:42 AM Documentation #4727 (Resolved): upgrade doc has to be modified to include upgrading ceph-mds as well
- http://ceph.com/docs/master/install/upgrading-ceph/
In the above mentioned doc, in section "upgrading a metadata s... - 12:47 PM Bug #4713 (Fix Under Review): mds: hang related to access from two clients
- I have tested the commands listed above on a system with the
patches described here:
http://tracker.ceph.com/is... - 11:03 AM Bug #4679: ceph: hang while running blogbench on mira nodes
- I ran the blogbench test with all of the above-mentioned
patches applied on a mira cluster and I never saw it hang.
... - 09:35 AM Bug #4679: ceph: hang while running blogbench on mira nodes
- FYI, these kernel patches (Zheng's and mine) are available on
the ceph-client git repository branch "review/wip-4706... - 09:27 AM Bug #4679 (Fix Under Review): ceph: hang while running blogbench on mira nodes
- > Found 5 bugs, fixed 4.
I reviewed the four kernel patches (they were posted on the mailing
list). I also provi... - 09:15 AM Bug #4679: ceph: hang while running blogbench on mira nodes
- > The fix for writepages race is easier than I thought, patch is attached.
This is interesting. When I was workin... - 10:59 AM Bug #4660: mds: segfault in queue_backtrace_update
- *blink*
Of course it's not; sorry about that. - 10:57 AM Bug #4660 (Resolved): mds: segfault in queue_backtrace_update
- That isn't the same bug. Opening #4726 for that issue.
- 10:52 AM Bug #4660 (In Progress): mds: segfault in queue_backtrace_update
- ubuntu@teuthology:/a/teuthology-2013-04-13_01:00:48-fs-next-testing-basic/12134
- 10:57 AM Bug #4726 (Can't reproduce): mds: segv during blogbench in remove_pending_backtraces
ubuntu@teuthology:/a/teuthology-2013-04-13_01:00:48-fs-next-testing-basic/12134
2013-04-13T18:52:50.199 INFO:t...- 09:33 AM Bug #4706 (Fix Under Review): kclient: Oops when two clients concurrently write a file
- I have posted two patches, one which resolves the
crash due to an interrupt while waiting and one
that resolves Zhe... - 08:46 AM Bug #3579: kclient: Use less secure random number generator so we don't consume entropy
- commit 442318d09506d33e811d9d6a7bd2514287df729d
04/13/2013
- 09:46 AM Bug #4722 (Can't reproduce): kernel BUG at fs/ceph/caps.c:1006 invalid opcode: 0000
- Top of Call trace:...
04/12/2013
- 11:07 PM Bug #4721: libcephfs tests fail when using ceph-deploy
- I'm able to reproduce this failure.
I'm much less familiar with libceph than I am the libcephfs-java code, so I'm g... - 05:42 PM Bug #4721: libcephfs tests fail when using ceph-deploy
- and the logs are placed in burnupi06.front.sepia.ceph.com:/home/ubuntu/apr12_cdep_libcephfs/
- 05:41 PM Bug #4721 (Resolved): libcephfs tests fail when using ceph-deploy
- ceph version : 0.60-467-g6b98162-1precise
config.yaml used to reproduce
tamil@ubuntu:~/test_logs_cuttlefish/apr... - 08:36 PM Bug #3637: client: not issuing caps for with clients doing shared writes
- If Locker::_do_cap_update can't get wrlock for a given client, the client should have no Fw cap. I think we can make ...
- 04:47 PM Bug #3637: client: not issuing caps for with clients doing shared writes
- I'm having difficulty reproducing this at all on current next, but am leaving it churning in the background... :/
... - 01:36 PM Feature #3242 (In Progress): samba: push plugin upstream
- Sam has been working on this for the last couple days.
- 11:06 AM Bug #3579 (Resolved): kclient: Use less secure random number generator so we don't consume entropy
- 10:13 AM Bug #4660 (Resolved): mds: segfault in queue_backtrace_update
- The commit that hit this segv above looks like it was off of master, whereas the fix went into next. I was able to r...
- 09:30 AM Bug #4694 (Fix Under Review): client: put_snap_realm assert failure
- Pushed wip-4694. Still trying to reproduce this reliably so that I can test the proposed fix.
- 09:26 AM Bug #4706: kclient: Oops when two clients concurrently write a file
- Zheng Yan wrote:
> The Oops is caused by uninitialized req->r_inode
Already tracked down the Oops. time to sleep,... - 09:07 AM Bug #4706: kclient: Oops when two clients concurrently write a file
- FYI I just reproduced the problem without interrupt
and it matches what I saw before. (So I don't believe
the inte... - 07:39 AM Bug #4706: kclient: Oops when two clients concurrently write a file
- I also proposed a fix: [PATCH 1/4] ceph: add osd request to inode unsafe list in advance
- 07:22 AM Bug #4706: kclient: Oops when two clients concurrently write a file
- Zheng I think I have a fix. I'm going to test it first,
but then I'd like to supply it to you to see if it resolves... - 05:23 AM Bug #4706 (New): kclient: Oops when two clients concurrently write a file
- > Found a potential cause. the request may complete before adding it
> to the unsafe list.
I think that not being... - 12:09 AM Bug #4706: kclient: Oops when two clients concurrently write a file
- The Oops is caused by uninitialized req->r_inode
- 07:35 AM Bug #4679: ceph: hang while running blogbench on mira nodes
- The fix for writepages race is easier than I thought, patch is attached.
- 01:08 AM Bug #4679: ceph: hang while running blogbench on mira nodes
- Found 5 bugs, fixed 4. The remaining one is a race between truncate and writepages. Truncate message from MDS can cha...
04/11/2013
- 08:26 PM Bug #4714 (Duplicate): kclient: ceph_sync_{read,write} only accept single buffer.
- So readv and writev are broken for SYNC IO
- 07:28 PM Bug #4713: mds: hang related to access from two clients
- I discovered this while trying to reproduce the issue
in http://tracker.ceph.com/issues/4706.
I documented it the... - 07:24 PM Bug #4713 (Resolved): mds: hang related to access from two clients
- 06:31 PM Bug #4706: kclient: Oops when two clients concurrently write a file
- This crash looks a little bit familiar to me, and I think
I created a bug for it, but at the moment I can't find it.... - 05:52 PM Bug #4706: kclient: Oops when two clients concurrently write a file
- OK, well I believe I have reproduced the problem.
I did this on two nodes simultaneously:
dd if=/dev/zero of=... - 09:23 AM Bug #4706: kclient: Oops when two clients concurrently write a file
- Yes, test branch of ceph-client. The hint to trigger the Oops is multiple clients write date to a file at the same ti...
- 08:52 AM Bug #4706: kclient: Oops when two clients concurrently write a file
- Well, I unfortunately got the same problem using
the "bobtail" branch.
Specifically what I'm doing:... - 08:15 AM Bug #4706: kclient: Oops when two clients concurrently write a file
- Well that's interesting.
I haven't been working with the ceph file system much so
I'm not sure what to expect. B... - 07:43 AM Bug #4706: kclient: Oops when two clients concurrently write a file
- > the request may complete before adding it to the unsafe list.
That looks like a reasonable explanation to me. A... - 06:28 AM Bug #4706: kclient: Oops when two clients concurrently write a file
- ...
- 05:56 AM Bug #4706: kclient: Oops when two clients concurrently write a file
- It is a new issue in the sync write path, nothing to do with cap revoke. Alex has made quite a lot of changes in that...
- 05:01 AM Bug #4706: kclient: Oops when two clients concurrently write a file
- Them doing a sync write is probably correct as their concurrency is being managed by the MDS now, and they aren't goi...
- 06:06 PM Bug #3637 (In Progress): client: not issuing caps for with clients doing shared writes
- Since I apparently forgot to mention it here, this has nothing to do with #4489; I just pattern-matched a little too ...
- 09:09 AM Bug #4644 (Resolved): mds crashing after upgrade from 0.58 to 0.60
- Merged into next as of commit:d777b8e66b2e950266e52589c129b00f77b8afc0 (Thanks Sam!).
- 02:25 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
- so patch tested, mds is running fine now. thx !
- 02:18 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
- Last patch seems work. At least mds dont crash anymore. Also df reports non bogus values.
I'll add this patch to gen... - 12:14 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
- let me know if i can test patches for you ! :)
- 09:06 AM Bug #4451 (Resolved): client: Ceph client not releasing cap
- Merged into next via commit:e32849c4eef2f5d911288aabeac0a6967b1e6ae4
I'm electing not to backport this despite its... - 08:16 AM Fix #4708 (Rejected): MDS: journaler pre-zeroing is dangerous
- See http://pastebin.com/NJd0UCfF
At first glance it looks like there's a short and a missing log object, and then ... - 08:15 AM Bug #4105: mds: fix up the Dumper
- Promoting this to high as it can be so useful for gathering important debug data; it would be nice to have done befor...
04/10/2013
- 11:52 PM Bug #4706 (Resolved): kclient: Oops when two clients concurrently write a file
- ...
- 08:31 PM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
- The code looks good.
- 01:10 PM Bug #4644 (Fix Under Review): mds crashing after upgrade from 0.58 to 0.60
- Hurray, I did manage to reproduce so I guess I just missed before, and indeed it works with that patch and fails with...
- 12:38 PM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
- I'm having trouble reproducing this bug, but I'm probably not going through the right steps. A patch that I think sho...
- 12:20 PM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
- if you have some patch that we can test, i'd be glad =)
- 10:27 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
- Ah, this looks to be less bad than I thought — the (struct_v == 2) check should be (struct_v <= 2) is all, from the s...
- 09:03 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
- update directly from IRC, as alexxy is still having registration issues:
<alexxy> joao: upgrade was from version 0... - 09:11 AM Bug #3579 (Fix Under Review): kclient: Use less secure random number generator so we don't consum...
- Patches sent to the mailing list and pushed to wip-3579.
- 09:07 AM Bug #4569: ceph-mds: segfault
- It looks like this fix didn't make it into 0.60. See #4696.
- 09:06 AM Bug #4696: MDS Crashes with Segmentation fault near Objecter::handle_osd_op_reply
- Oh you're using 0.60. Looks like that commit didn't make it into the 0.60 release. It will be fixed in the next one!
- 09:04 AM Bug #4696 (Duplicate): MDS Crashes with Segmentation fault near Objecter::handle_osd_op_reply
- This is a duplicate of #4569. Its fixed in 0.60 if you're willing to upgrade.
- 06:37 AM Bug #4696 (Duplicate): MDS Crashes with Segmentation fault near Objecter::handle_osd_op_reply
- Limited logs at http://goo.gl/VAIFh...
- 05:23 AM Bug #4679 (In Progress): ceph: hang while running blogbench on mira nodes
- I reproduced a hang, it is an 'i_mutex + cap revoking' deadlock....
- 12:58 AM Bug #1878: ceph.ko doesn't setattr (lchown, utimes) on symlinks
- For xattrs, there is no difference between symbol links and regular file. For setattr, I think the only difference is...
04/09/2013
- 07:49 PM Bug #4451: client: Ceph client not releasing cap
- Please review again based on the latest changed pushed to wip-4451.
- 04:27 PM Bug #4451: client: Ceph client not releasing cap
- Does this need more review or just testing? (I ask because I notice you've got two reviewed-by tags on it, although I...
- 08:48 AM Bug #4451: client: Ceph client not releasing cap
- Thanks Yan for fixing up that patch and testing it out. The inode check was just cruft from the previous changes, an...
- 06:00 AM Bug #4451: client: Ceph client not releasing cap
- After removing the path_is_mine check, MDCache::parallel_fetch_traverse_dir() needs skip non-auth dirfrags. The modif...
- 06:34 PM Bug #4644 (In Progress): mds crashing after upgrade from 0.58 to 0.60
- That shouldn't be a problem for v0.58; it included version 2 session_info_t. You sure that's the version you upgraded...
- 06:18 PM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
- The 26th byte of Norbert's sessionmap is 1. If I'm not wrong, it's struct_v for session_info_t. But the oldest versio...
- 10:58 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
- alexxy's sessionmap doesn't look anything like a sessionmap should; this won't fix his issue. Norbert's is at least s...
- 06:20 AM Bug #4644: mds crashing after upgrade from 0.58 to 0.60
- alexxy on IRC is reporting that the patch doesn't work. He would have provided his report himself, but it appears th...
- 04:13 PM Bug #4618 (Resolved): Journaler: _is_readable() and _prefetch() don't communicate correctly
- Merged into next in commit:8eb5465c10840d047a894d1a4f079ff8b8d608b5. This would apply to bobtail as well if we decide...
- 03:12 PM Bug #4679: ceph: hang while running blogbench on mira nodes
- Not off-hand, but I haven't spent any time thinking about it yet. This one could be differences between how aggressiv...
- 03:03 PM Bug #4679: ceph: hang while running blogbench on mira nodes
- We've only seen a certain set of errors at the mds with the kernel client (this one and #4660 - although they may be ...
- 02:57 PM Bug #4679: ceph: hang while running blogbench on mira nodes
- *sigh* Yep...
I've marked this as an MDS issue for now, but it could be a broader protocol change or something as ... - 02:45 PM Bug #4679 (Rejected): ceph: hang while running blogbench on mira nodes
- I re-ran the blogbench test 10 times using the "bobtail"
branch of ceph and never saw a hang.
I'm going to call t... - 12:13 PM Bug #4679: ceph: hang while running blogbench on mira nodes
- I got another hang without any debug info being dumped
from the MDS. This time I just abandoned it. I'm about
to ... - 02:50 PM Bug #4694 (Resolved): client: put_snap_realm assert failure
- ...
- 11:04 AM Bug #1878: ceph.ko doesn't setattr (lchown, utimes) on symlinks
- I'm actually not sure how the symlink stuff is represented in our kernel client or the VFS — do these functions handl...
- 08:31 AM Bug #4660 (In Progress): mds: segfault in queue_backtrace_update
- 08:30 AM Bug #4660: mds: segfault in queue_backtrace_update
- Alex hit the same segfault with the next branch yesterday, looks like the commit 3cdc61ec doesn't fix this bug. The ...
04/08/2013
- 08:32 PM Bug #4680 (Closed): mds: log possibly not trimming
- 2013-03-28 10:27:35.154461 7f1fc96b8700 10 mds.0.log trim 2 / 30 segments, 10 / -1 events, 0 (0) expiring, 0 (0) expi...
- 10:32 AM Bug #4680: mds: log possibly not trimming
- Yeah, it's not a generic never trimming; just not certain about this one. It could also be fine and just that there's...
- 10:27 AM Bug #4680: mds: log possibly not trimming
- I've seen it trim logs in the tests I've been running, but that's with mds_log_segment_size=16K and mds_log_max_segme...
- 10:04 AM Bug #4680 (Closed): mds: log possibly not trimming
- Apparently there are a lot of old files showing up in the log replay, and I noticed previously on a different issue t...
- 08:20 PM Bug #4644 (Fix Under Review): mds crashing after upgrade from 0.58 to 0.60
- there is a typo in session_info_t::decode
- 08:04 PM Bug #4451: client: Ceph client not releasing cap
- Greg Farnum wrote:
> Although I think the MDS would need to have the inode in cache for that to happen — it would ha... - 10:59 AM Bug #4451: client: Ceph client not releasing cap
- Zheng Yan wrote:
> "Regarding the cap export, is it possible that the client has a cap that it thinks belongs to the... - 09:43 AM Bug #4451: client: Ceph client not releasing cap
- "Regarding the cap export, is it possible that the client has a cap that it thinks belongs to the mds, but the mds do...
- 09:13 AM Bug #4451: client: Ceph client not releasing cap
- "After removing the path_is_mine check in Server::handle_client_reconnect(), I think we should also call mdcache->rej...
- 04:41 PM Bug #4685 (Can't reproduce): BUG: unable to handle kernel NULL pointer dereference at
- 0.56.4 ceph, 3.8 kernel...
- 02:22 PM Bug #4679: ceph: hang while running blogbench on mira nodes
- It looked very promising. 4 successful passes, but the
last one hung again. This time there were two blogbench
ta... - 12:26 PM Bug #4679: ceph: hang while running blogbench on mira nodes
- One pass succeeded, so it's looking good.
I'll let it run 5 times and if all are successful, I'll just
close this... - 11:56 AM Bug #4679: ceph: hang while running blogbench on mira nodes
- I talked with Sam Lang who said I should try again with
mds debugging on. That led to more info getting dumped
on ... - 11:01 AM Bug #4679: ceph: hang while running blogbench on mira nodes
- ...
- 10:49 AM Bug #4679: ceph: hang while running blogbench on mira nodes
- Actually, the other common theme (maybe more important)
is the involvement of an in-progress ceph_setattr() call.
... - 10:40 AM Bug #4679 (In Progress): ceph: hang while running blogbench on mira nodes
- Unfortunately it looks like I've reproduced the problem
with my patches. The common theme is ceph_aio_write(), so
... - 10:04 AM Bug #4679: ceph: hang while running blogbench on mira nodes
- I ran those tests a few times with the testing branch and
the problem did not show up. I reduced the test to just
... - 05:49 AM Bug #4679: ceph: hang while running blogbench on mira nodes
- Here is an excerpt of the yaml file driving the
tests, leading up to the blogbench run:... - 05:29 AM Bug #4679: ceph: hang while running blogbench on mira nodes
- Here are the versions of ceph and teuthology I'm using
while running these tests:
ceph
f5ba0fb mon: make 'osd cr... - 05:26 AM Bug #4679: ceph: hang while running blogbench on mira nodes
- Here is a log of the commits in place during these
tests. (I know, quite a few...) The last one is
the current te... - 05:24 AM Bug #4679: ceph: hang while running blogbench on mira nodes
- Here is an excerpt of the stack trace generated using:
echo t > /proc/sysrq-trigger
[31482.585095] blogbench.... - 05:21 AM Bug #4679 (Resolved): ceph: hang while running blogbench on mira nodes
- I have seen this only on mira nodes, now twice on two
consecutive attempts. I've run the same set of tests
with th... - 11:02 AM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
- Said he could look at this for me today.
- 09:29 AM Bug #4618: Journaler: _is_readable() and _prefetch() don't communicate correctly
- Heh, no; that was supposed to be a 10. Re-pushed; thanks!
- 09:34 AM Bug #3579 (In Progress): kclient: Use less secure random number generator so we don't consume ent...
- 07:16 AM Bug #4660 (Fix Under Review): mds: segfault in queue_backtrace_update
- Pushed a fix to wip-4660. The mdr was getting deleted before we queued the backtrace for update, so mdr->ls was inva...
04/07/2013
- 01:46 AM Bug #1878 (Fix Under Review): ceph.ko doesn't setattr (lchown, utimes) on symlinks
- ceph_symlink_iops does not have getattr/setattr and xattrs related mothods
- 01:25 AM Bug #4241 (Duplicate): SELinux fails because it can't set xattrs
- This is the same problem as #1878 (ceph_symlink_iops doesn't have setattr method)
Also available in: Atom