Project

General

Profile

Activity

From 06/02/2013 to 07/01/2013

07/01/2013

09:44 PM Bug #5453: kclient: multiple_rsync tee output partially zeroed
patch "ceph: fix pending vmtruncate race" should fix the issue. Zheng Yan
12:36 PM Feature #5486 (Resolved): kclient: make it work with selinux
see #5477 for the latest failed attempt Sage Weil
12:34 PM Bug #5477 (Resolved): Unable to create files on CephFS on Fedora 18 using kernel module
Sage Weil
12:19 PM Bug #5477: Unable to create files on CephFS on Fedora 18 using kernel module
Many thanks for the responses Sage and Greg. You were right - once I disabled SElinux this worked.
Chris
Chris Howarth
10:51 AM Bug #5485 (Can't reproduce): failed cifs mount
In teuthology, logs at /a/teuthology-2013-07-01_01:00:46-fs-master-testing-basic/51619... Greg Farnum

06/29/2013

05:35 PM Bug #5453: kclient: multiple_rsync tee output partially zeroed
please check if the attached patch solves this issue Zheng Yan

06/28/2013

06:31 PM Bug #5453: kclient: multiple_rsync tee output partially zeroed
i hit it after just a couple iterations of the teuthology test. i'll capture the osd log... Sage Weil
06:08 PM Bug #5453: kclient: multiple_rsync tee output partially zeroed
I can't reproduce this locally. how difficult to reproduce this? what's the backend fs for osd? Zheng Yan
05:32 PM Bug #5411: teuthology: bad object dereference
IME that's what this kind of error from gevent/eventlet etc. means - once the thread exits in a certain abnormal way,... Josh Durgin
03:28 PM Bug #5411: teuthology: bad object dereference
Yeah, I am/somebody will need to spend some time digging into this when we have some time free. There's another issue... Greg Farnum
03:24 PM Bug #5411: teuthology: bad object dereference
I think this is just a symtom of the mds_thrasher crashing, but not logging the exception since this join happens bef... Josh Durgin
02:25 PM Bug #5381 (Pending Backport): ceph-fuse: stuck with disconnected inodes on shutdown
commit:946a838cffa0927d1237489e8c2c143e87d66892 Sage Weil
09:31 AM Bug #5250: ceph-mds 0.61.2 aborts on start
Wow, that is a much simpler test case than I would expect to be required. I can reproduce with a single file and this... Greg Farnum
02:24 AM Bug #5250: ceph-mds 0.61.2 aborts on start

This is all in the lab at present.
We have been doing some additional testing, and have now confirmed that this...
Chris Clayton
09:23 AM Bug #5477: Unable to create files on CephFS on Fedora 18 using kernel module
And you don't need any kernel support to run the Ceph daemons. You should also check the permissions — it's possible ... Greg Farnum
06:51 AM Bug #5477: Unable to create files on CephFS on Fedora 18 using kernel module
I suspect this is SElinux or something similar getting in the way... Sage Weil
04:47 AM Bug #5477 (Resolved): Unable to create files on CephFS on Fedora 18 using kernel module
I have mounted a CephFS filesystem on a Fedora 18 system, which succeeds as follows:
[root@e8c4-dl360g7-03 ceph]# ...
Chris Howarth

06/27/2013

09:39 PM Bug #5381 (Fix Under Review): ceph-fuse: stuck with disconnected inodes on shutdown
Sage Weil
09:22 AM Bug #5381: ceph-fuse: stuck with disconnected inodes on shutdown
Greg Farnum

06/26/2013

11:15 PM Bug #5381: ceph-fuse: stuck with disconnected inodes on shutdown
this is sufficient to reproduce. i think this is a problem with unlinked inodes in the client cache not getting clea... Sage Weil
10:08 PM Bug #5453: kclient: multiple_rsync tee output partially zeroed
putting the tee'd file in /tmp fixes the problem, implying this is a kclient/cephfs bug of some sort. moving this in... Sage Weil

06/25/2013

07:48 PM Bug #5450 (Resolved): mds: failed CDir::_fetched() assert
nice! cherry-picked to commit:ccb3dd5ad5533ca4e9b656b4e3df31025a5f2017 Sage Weil
07:08 PM Bug #5450: mds: failed CDir::_fetched() assert
0.61.4-5-gd572cf6 ? probably already fixed by commit:81d073fecb (mds: fix underwater dentry cleanup) Zheng Yan
10:20 AM Bug #5450 (Resolved): mds: failed CDir::_fetched() assert
... Greg Farnum
07:19 PM Bug #5418: kceph: crash in remove_session_caps
Zheng Yan wrote:
> I still don't figure out that root cause of the crash, infinite loop in iterate_session_caps(), B...
Sage Weil
07:01 PM Bug #5418: kceph: crash in remove_session_caps
I still don't figure out the cause of the crash, infinite loop in iterate_session_caps(), BUG_ON(session->s_nr_caps >... Zheng Yan
01:00 PM Bug #5418: kceph: crash in remove_session_caps
ubuntu@teuthology:/a/teuthology-2013-06-25_01:00:47-kernel-next-testing-basic/45603 Sage Weil
06:01 PM Bug #5458 (Duplicate): mds: standby-replay -> replay takeover does not handle racing expire/trim
not sure this is the right diagnosis since i only looked at this briefly, but:... Sage Weil
12:13 PM Bug #5453 (In Progress): kclient: multiple_rsync tee output partially zeroed
Sage Weil
12:09 PM Bug #5453 (Resolved): kclient: multiple_rsync tee output partially zeroed
latest run:... Sage Weil

06/24/2013

05:54 PM Bug #5381 (Need More Info): ceph-fuse: stuck with disconnected inodes on shutdown
Sage Weil
01:03 PM Bug #5381: ceph-fuse: stuck with disconnected inodes on shutdown
next time we see this (or any other ceph-fuse hsutdown hang), grab teh logs manually via scp before nuking, and note ... Sage Weil
10:58 AM Bug #5333 (Resolved): mds: segfault in MDLog::standby_trim_segments
done, commit:f046dab88fcfeda23391bcd694abc65ff1ed8cd8 Sage Weil
10:12 AM Bug #5333 (Pending Backport): mds: segfault in MDLog::standby_trim_segments
I saw this crash under teuthology in the next branch as well; can we put it there? Greg Farnum
10:44 AM Bug #5411: teuthology: bad object dereference
#5333 is what I was referring to. There's a whole string of failures which are hitting both that and this. Greg Farnum
10:08 AM Bug #5411: teuthology: bad object dereference
Josh, I went back and looked at the first instance (/a/teuthology-2013-06-18_01\:00\:37-fs-next-testing-basic/38877/)... Greg Farnum
10:05 AM Bug #5411: teuthology: bad object dereference
Happened again... Greg Farnum
09:45 AM Bug #5411: teuthology: bad object dereference
If you look at the message from the first exception, it says the mds failed:... Josh Durgin
09:36 AM Bug #5382: mds: failed objecter assert on shutdown
/a/teuthology-2013-06-23_20:00:47-fs-cuttlefish-testing-basic/43843/teuthology.log Greg Farnum
09:10 AM Bug #5250: ceph-mds 0.61.2 aborts on start
Unfortunately this is an area where CephFS needs some hardening and some recovery tools — part of why we don't recomm... Greg Farnum
05:49 AM Bug #5250: ceph-mds 0.61.2 aborts on start
We have fit a very similar problem with V0.61.2. We are unable to start any MDS daemons following testing that involv... Chris Clayton

06/23/2013

10:12 PM Bug #5021: ceph-fuse: crash on traceless reply
btw wip-5021 still hasn't merged because it failed the smbtorture test. i'll rebase on master and retest to see wher... Sage Weil
10:09 PM Bug #5105 (Duplicate): mds/CInode.cc: 1996: FAILED assert(auth_pins >= 0)
#4832 Sage Weil
10:06 PM Bug #5333 (Resolved): mds: segfault in MDLog::standby_trim_segments
commit:abd0ff64e108b7670a062b3fa39baaf3d3e48fb3 Sage Weil
04:30 PM Bug #5430 (Duplicate): newfs makes ceph-mds segfault in suicide
#5432 Sage Weil
10:57 AM Bug #5430: newfs makes ceph-mds segfault in suicide
... Sage Weil
10:52 AM Bug #5430 (Duplicate): newfs makes ceph-mds segfault in suicide
... Sage Weil

06/21/2013

12:02 PM Bug #5418: kceph: crash in remove_session_caps
kdb dumpall attached Sage Weil
12:02 PM Bug #5418 (Resolved): kceph: crash in remove_session_caps
... Sage Weil

06/20/2013

09:33 PM Fix #5399: timestamp changes on replayed mds request (pjd link 71)
probably need to extend the replayed request message to include the timestamps we got for the inode and dir so that t... Sage Weil
09:33 PM Fix #5399: timestamp changes on replayed mds request (pjd link 71)
- we send a create to mds
- get an ack, but it isn't journaled
- pjd stats the mtime/ctime/ec.
- mds restarts
- w...
Sage Weil
09:12 PM Bug #5290: mds: crash whilst trying to reconnect
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-06-20_13:32:57-fs-master-testing-basic/41231
logs in ...
Sage Weil
06:45 PM Bug #5333 (Fix Under Review): mds: segfault in MDLog::standby_trim_segments
wip-5333
this looks like a simple matter of not crashing if the segment list is empty. that at least covers this ...
Sage Weil
12:53 PM Bug #5333: mds: segfault in MDLog::standby_trim_segments
Just a note: maybe we missed a spot, but I remember doing a re-read head object, retry journal read whenever we get a... Greg Farnum
12:47 PM Bug #5333: mds: segfault in MDLog::standby_trim_segments
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-06-20_01:00:49-fs-next-testing-basic/40965
with ful...
Sage Weil
06:15 PM Bug #5380 (Resolved): osdc/Filer.cc: 163: FAILED assert(probe->known_size[p->oid] <= shouldbe)
Sage Weil
12:30 PM Bug #5380: osdc/Filer.cc: 163: FAILED assert(probe->known_size[p->oid] <= shouldbe)
Sage Weil
02:42 PM Bug #5411 (Resolved): teuthology: bad object dereference
... Greg Farnum
01:30 PM Fix #5268: mds: fix/clean up file size/mtime recovery code
See also #4485. Greg Farnum
01:30 PM Feature #4485: Improve "needsrecover" handling
See also #5268. Greg Farnum
01:24 PM Feature #1693 (In Progress): libcephfs: Support TRIM (hole punching)
See "[PATCH] Ceph-fuse: Punch hole support" from Li Wang. Greg Farnum
01:17 PM Feature #3541 (In Progress): mds: robust ino lookup using file backpointers
A bunch of this got done, but Sage isn't sure if the client -> LOOKUPINO messages are wired up to that infrastructure... Greg Farnum

06/19/2013

10:48 PM Bug #5289: mds closing stale session
Sage Weil wrote:
> this is caused when teh client is not talknig to the mds. can you verify the network is working, ...
chen atrmat
08:08 PM Bug #5380: osdc/Filer.cc: 163: FAILED assert(probe->known_size[p->oid] <= shouldbe)
The patch only fixes the root cause. It doesn't help if objects already have wrong size. Zheng Yan
04:02 PM Fix #5399 (New): timestamp changes on replayed mds request (pjd link 71)
Hmm, Sage points out this might be something else; reopening. Greg Farnum
03:56 PM Fix #5399 (Rejected): timestamp changes on replayed mds request (pjd link 71)
It's a time stamp check for things going backwards, and is failing due to out-of-sync clocks (over a network) being h... Greg Farnum
03:44 PM Fix #5399 (Resolved): timestamp changes on replayed mds request (pjd link 71)
teuthology-2013-06-19_10:46:59-fs-cuttlefish-master-basic 40138 40141 Sage Weil
11:43 AM Bug #5250: ceph-mds 0.61.2 aborts on start
I'm still using the cluster with the modified ceph-mds program, it still works. I caused another power outage (this i... Jérôme Poulin

06/18/2013

12:51 PM Bug #5289 (Can't reproduce): mds closing stale session
this is caused when teh client is not talknig to the mds. can you verify the network is working, and ceph-fuse is hea... Sage Weil
09:34 AM Bug #5379 (Resolved): mds/ceph-fuse hang on mount
Sage Weil

06/17/2013

09:15 PM Bug #5381: ceph-fuse: stuck with disconnected inodes on shutdown
This is different from #4850. In issue #4850, disconnected inodes have no cap. In this issue, all disconnected inodes... Zheng Yan
01:32 PM Bug #5381: ceph-fuse: stuck with disconnected inodes on shutdown
Good chance this is a duplicate of #4850 (though that's fsstress, so maybe not). Greg Farnum
01:22 PM Bug #5381 (Resolved): ceph-fuse: stuck with disconnected inodes on shutdown
Seen this at least 2x in the last few days:... Sage Weil
05:43 PM Bug #5380: osdc/Filer.cc: 163: FAILED assert(probe->known_size[p->oid] <= shouldbe)
see commit a41bad1a9b(ceph: re-calculate truncate_size for strip object) Zheng Yan
01:18 PM Bug #5380 (Resolved): osdc/Filer.cc: 163: FAILED assert(probe->known_size[p->oid] <= shouldbe)
on mds shutdown... Sage Weil
04:44 PM Bug #5379: mds/ceph-fuse hang on mount
Sage Weil
12:52 PM Bug #5379 (Resolved): mds/ceph-fuse hang on mount
have observed serveral times ceph-fuse hanging on getattr(#1). latest job was... Sage Weil
02:09 PM Bug #5382: mds: failed objecter assert on shutdown
Sorry, logs at /a/teuthology-2013-06-15_01:00:44-fs-next-testing-basic/36375 Greg Farnum
02:07 PM Bug #5382 (Can't reproduce): mds: failed objecter assert on shutdown
I haven't been through this completely, but it looks like the mds went laggy, and then it received a SIGTERM (the tes... Greg Farnum
12:24 PM Bug #5368 (Resolved): ceph-fue: fsx-mpi hangs in _sync_read
commit:ee40c217e373b538e227f7218b09c1c794b4124a Sage Weil

06/16/2013

05:50 AM Bug #5367: multiclient tests: kernel mount gets EPERM
kclient and MDS never return -EACCES. was ior executed with root privilege? Zheng Yan

06/15/2013

07:46 PM Bug #5367: multiclient tests: kernel mount gets EPERM
mpi-fsx also gets EPERM. Sage Weil
07:15 PM Bug #5367 (Resolved): multiclient tests: kernel mount gets EPERM
... Sage Weil
07:45 PM Bug #5368 (Resolved): ceph-fue: fsx-mpi hangs in _sync_read
infinite loop in _sync_read() due to a short read. see wip-client-sync. Sage Weil

06/14/2013

12:50 PM Bug #5360 (Rejected): ceph-fuse: failing smbtorture tests
We're failing the maxfid test when samba is backed by a ceph-fuse mount. It seems to be an inconsistent (this is the ... Greg Farnum

06/13/2013

07:43 PM Bug #5333: mds: segfault in MDLog::standby_trim_segments
I think it's an old race. The standby MDS gets the pos of journal head, then reads the corresponding journal object. ... Zheng Yan
02:02 PM Bug #5333: mds: segfault in MDLog::standby_trim_segments
I see that Yan changed one line in this function recently (which shouldn't have had any impact), but other than that ... Greg Farnum

06/12/2013

01:23 PM Bug #5333 (Resolved): mds: segfault in MDLog::standby_trim_segments
... Sage Weil
06:10 AM Bug #5290: mds: crash whilst trying to reconnect
Hi Zheng,
Is this what you mean?
Damien Churchill

06/11/2013

08:55 AM Bug #5303 (Resolved): OSD segfaults on SIGINT
This was a missed backport for an old fix. I pushed it to the cuttlefish branch and it will be included in .4. Thanks! Sage Weil
08:41 AM Bug #5303: OSD segfaults on SIGINT
Without debugger:... Jérôme Poulin
08:38 AM Bug #5303 (Resolved): OSD segfaults on SIGINT
This is not the first time but interrupting the OSD with SIGINT (CTRL+C) causes a segmentation fault.
Cuttlefish 0...
Jérôme Poulin
07:19 AM Bug #5250: ceph-mds 0.61.2 aborts on start
Removing the assert worked around the problem:... Jérôme Poulin
06:32 AM Bug #5250: ceph-mds 0.61.2 aborts on start
I noticed that resetting the MDS journal using ceph-mds -i 1 --reset-journal 0 -d hangs there.... Jérôme Poulin

06/10/2013

10:28 PM Bug #5290: mds: crash whilst trying to reconnect
looks like session map corruption.
Damien, please upload the session map. you can find where is it by "ceph osd ma...
Zheng Yan
02:16 AM Bug #5290 (Can't reproduce): mds: crash whilst trying to reconnect
Hi,
Recently I experienced an issue with the mds servers in my cluster, the cluster storage would be absolutely fi...
Damien Churchill
09:42 AM Bug #5287 (Resolved): the permission of file in CephFS
Ian Colle

06/09/2013

01:54 AM Bug #5289 (Can't reproduce): mds closing stale session
Hi all,
I found a stale session in MDS.
$ceph -w
\ health HEALTH_OK
..................
.....................
chen atrmat

06/08/2013

11:00 PM Support #5285 (Closed): cephfs give permission to write files
dup #5287 Zheng Yan
10:37 PM Bug #5287: the permission of file in CephFS
so far the only solution is chmod Zheng Yan
07:55 PM Bug #5287: the permission of file in CephFS
Zheng Yan wrote:
> The short answer is no better solution so far. If a given node can mount the FS, it can access to...
chen atrmat
06:24 PM Bug #5287: the permission of file in CephFS
The short answer is no better solution so far. If a given node can mount the FS, it can access to the data pool direc... Zheng Yan
01:43 AM Bug #5287 (Resolved): the permission of file in CephFS
Hi all,
I used the CephFS v0.56.3 to store VMs. There're 8 nodes of my cluster, and I mount the cephFS in every node...
chen atrmat
10:23 PM Bug #4832 (Resolved): mds: failed auth_unpin assert
Sage Weil

06/07/2013

10:04 PM Bug #4832: mds: failed auth_unpin assert
aie.. thanks Sage Weil
09:36 PM Bug #4832: mds: failed auth_unpin assert
that commit breaks filelock eval gather Zheng Yan
05:23 PM Bug #4832 (Resolved): mds: failed auth_unpin assert
commit:a08d62045657713bf0a5372bf14136082ec3b17e Sage Weil
07:39 PM Support #5285 (Closed): cephfs give permission to write files
Hi all,
I used the CephFS v0.56.3 to store VMs. There're 8 nodes of my cluster, and I mount the cephFS in every n...
chen atrmat
05:34 PM Bug #5236 (Resolved): mds assert when starting file scan
no more failures, yay! Sage Weil
10:52 AM Bug #5250: ceph-mds 0.61.2 aborts on start
I'll try commenting out the assert, and yes, we tried the snapshots feature of the MDS hours before the shutdown. Jérôme Poulin
09:44 AM Bug #5250: ceph-mds 0.61.2 aborts on start
were you using the mds snapshots? Sage Weil
09:42 AM Bug #5250: ceph-mds 0.61.2 aborts on start
probably the workaround is to comment out that assert.. Sage Weil
07:56 AM Bug #5250: ceph-mds 0.61.2 aborts on start
Is it useful for me to keep the FS in this state much longuer, right now the FS is unusable. Is it possible to clear ... Jérôme Poulin

06/06/2013

09:53 PM Bug #4832: mds: failed auth_unpin assert
full log attached for posterity. see wip-4832 Sage Weil
06:27 PM Bug #4832: mds: failed auth_unpin assert
... Sage Weil
07:23 AM Bug #4832: mds: failed auth_unpin assert
... Sage Weil
09:38 PM Fix #5268 (Closed): mds: fix/clean up file size/mtime recovery code
from diagnosing #4832 (see the attached log) it looks like this code needs an overhaul:
* i don't think we should ...
Sage Weil

06/05/2013

09:21 PM Bug #4832: mds: failed auth_unpin assert
lgo is here flab:/home/sage/tmp/4832
Sage Weil
09:21 PM Bug #4832: mds: failed auth_unpin assert
it's getting recovered twice:... Sage Weil

06/04/2013

07:40 PM Bug #3681: kclient fsx fails nightly
I think this has already been fixed (a cap revoke bug in the MDS code). When handling truncate request, current MDS ... Zheng Yan
12:34 PM Bug #5250: ceph-mds 0.61.2 aborts on start
I'm running a single MDS on the same server as a MON and a ODS. We're not using the FS very much, just testing, this ... Jérôme Poulin
12:16 PM Bug #5250: ceph-mds 0.61.2 aborts on start
Can you provide the output of "ceph -s" as well, please. And start up an MDS daemon after setting "debug mds = 20" an... Greg Farnum
11:19 AM Bug #5250: ceph-mds 0.61.2 aborts on start
Full log at pastebin.com : http://pastebin.com/9YPMjw0t Jérôme Poulin
11:18 AM Bug #5250 (Can't reproduce): ceph-mds 0.61.2 aborts on start
After rebooting the whole cluster using the "shut the braker off" method, I had some BTRFS corruption which was fixed... Jérôme Poulin
09:37 AM Bug #5236: mds assert when starting file scan
Sage Weil

06/03/2013

09:50 PM Bug #5236: mds assert when starting file scan
commit:2d655bde8de9ad255d63718768558399cacd7068
thanks!
Sage Weil
05:53 PM Bug #5236: mds assert when starting file scan
looks like I forget to initialize MDCache::rejoins_pending Zheng Yan
02:17 PM Bug #5236: mds assert when starting file scan
Yan, I got as far as identifying that the problem is that rejoin_gather_finish->identify_files_to_recovery is getting... Sage Weil
10:00 AM Bug #5236: mds assert when starting file scan
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-06-03_01:00:48-fs-master-testing-basic/30161 Sage Weil
07:52 AM Bug #5236 (Resolved): mds assert when starting file scan
... Sage Weil
03:50 PM Fix #5241: MDS: not valgrind (leak) clean
teuthology-2013-06-03_01:00:48-fs-master-testing-basic:
30170, 30172, 30174
Greg Farnum
03:43 PM Fix #5241 (New): MDS: not valgrind (leak) clean
Valgrind info at /a/teuthology-2013-06-01_01:00:43-fs-next-testing-basic/28691/remote/ubuntu@plana85.front.sepia.ceph... Greg Farnum
 

Also available in: Atom