Project

General

Profile

Activity

From 02/10/2014 to 03/11/2014

03/11/2014

01:57 PM Bug #7685 (Can't reproduce): hung/failed teuthology test: cfuse_workunit_misc
http://qa-proxy.ceph.com/teuthology/teuthology-2014-03-07_23:00:50-fs-firefly-testing-basic-plana/122094
http://qa-p...
Greg Farnum
01:46 PM Bug #7684 (Resolved): failed cfuse_workunit_kernel_untar_build.yaml test
http://qa-proxy.ceph.com/teuthology/teuthology-2014-03-09_23:00:25-fs-firefly-testing-basic-plana/124157/
The teut...
Greg Farnum
07:18 AM Bug #7474 (Won't Fix): Kernel oops with cephfs [ceph_write_begin -> *x8 -> wait_on_page_read]
This looks like it's the writeback deadlock when trying to flush from the client to the OSD on a single memory-constr... Greg Farnum
07:17 AM Bug #6599 (Resolved): client: invalid iterator dereference in Client::trim_caps
Sage Weil
07:12 AM Feature #5486: kclient: make it work with selinux
Hmm, Sage notes that maybe it'll work now we support ACLs. Or maybe we can use a special mount option? Greg Farnum
07:00 AM Bug #2187 (Can't reproduce): pjd chown/00.t failed test 97
Sage Weil
06:59 AM Bug #2740 (Resolved): mds: crash in Objecter when shutting down too early
Sage Weil

03/10/2014

12:58 PM Bug #1318: directories disappear across multiple rsyncs
By “this” I meant files with different timestamps from what they were last set to, as in the first paragraph of comme... Alexandre Oliva
12:51 PM Bug #1318: directories disappear across multiple rsyncs
I'm afraid this still occurs quite often with ceph 0.77 and ceph.ko 3.13.6-gnu. I have a slightly better understandi... Alexandre Oliva

03/06/2014

02:22 PM Feature #7324 (Resolved): qa: kcephfs + ACLs (new pjd tests?)
Sage Weil
07:30 AM Bug #7613: mds/MDCache.cc: 216: FAILED assert(inode_map.count(in->vino()) == 0)
can upload the core dump and ceph-mds binary to somewhere? Zheng Yan

03/05/2014

09:59 AM Bug #7613: mds/MDCache.cc: 216: FAILED assert(inode_map.count(in->vino()) == 0)
I shouldn't have said anything -- minutes later the problem is now happening again. Matthew Via
09:47 AM Bug #7613: mds/MDCache.cc: 216: FAILED assert(inode_map.count(in->vino()) == 0)
Stopping the clients that were attempting to mount cephfs (by restarting them) appeared to let the mds start with no ... Matthew Via
08:48 AM Bug #7613 (Can't reproduce): mds/MDCache.cc: 216: FAILED assert(inode_map.count(in->vino()) == 0)
... Sage Weil
03:36 AM Bug #4722: kernel BUG at fs/ceph/caps.c:1006 invalid opcode: 0000
__queue_cap_release has code which limits the size of cap release message Zheng Yan

03/04/2014

11:27 PM Bug #4722: kernel BUG at fs/ceph/caps.c:1006 invalid opcode: 0000
You think the msg pointer is invalid, and so it's overflowing?
I'm a little concerned at just closing this unless we...
Greg Farnum
06:52 PM Bug #4722 (Can't reproduce): kernel BUG at fs/ceph/caps.c:1006 invalid opcode: 0000
it's more likely there is no pre-allocated message. variable 'msg' is pointing to the pre-allocated message list. Zheng Yan

03/03/2014

10:48 PM Bug #2679 (Can't reproduce): POSIX file lock not released on process termination
Zheng Yan
09:20 PM Bug #2288: libcephfs: setxattr returns EEXIST following removexattr
commit:48e55d9 is in master and contains the simpler handle_client_setxattr() fix discussed above. Thanks Zheng! Greg Farnum
06:31 PM Bug #2288 (Resolved): libcephfs: setxattr returns EEXIST following removexattr
Zheng Yan
06:17 PM Bug #2445 (Can't reproduce): crash when removing a non-empty directory
Zheng Yan
06:16 PM Bug #1877 (Can't reproduce): ceph.ko (3.1.6) oopses upon cephfs set_layout of a symlink to a dir
Zheng Yan
06:55 AM Bug #1874 (Can't reproduce): Running `git gc` on a bare git repository hosted by ceph results in ...
Greg Farnum
06:31 AM Bug #1874: Running `git gc` on a bare git repository hosted by ceph results in a bus error.
Hi,
I have since moved on, but (as it happens) am currently investigating the production use of Ceph at the Univer...
David McBride

02/28/2014

03:08 PM Bug #5382: mds: failed objecter assert on shutdown
Okay, leaving this alone for the moment — that patch is a good start, and I think the MDLog stuff is actually okay (b... Greg Farnum
02:02 PM Bug #7565: Failed assert in check_rstats
What's the bug with check_rstats? Is num_head_items just not expected to be valid at this stage of replay?
Either ...
Greg Farnum
01:44 PM Bug #4722: kernel BUG at fs/ceph/caps.c:1006 invalid opcode: 0000
Unless this part has been fixed by a newer kernel, we still need to deal with it. In particular we were concerned tha... Greg Farnum

02/27/2014

09:29 PM Cleanup #3742 (Resolved): Remove old Hadoop wrappers and configuration options
Noah Watkins
09:27 PM Bug #3318: java: lock access to CephStat, CephStatVFS from native
Actually, yeh I'll look at this. Noah Watkins
03:30 PM Bug #3318: java: lock access to CephStat, CephStatVFS from native
Is this still an issue, Noah? Greg Farnum
09:23 PM Bug #4861 (Rejected): Alter Java components to build against Java 1.6 (or 1.7)
Noah Watkins
09:23 PM Bug #4861: Alter Java components to build against Java 1.6 (or 1.7)
Closing. I'm not sure what the problem is here.. it looks like I am saying that the code builds for a super old versi... Noah Watkins
03:26 PM Bug #4861: Alter Java components to build against Java 1.6 (or 1.7)
Do you know the state of the Java code right now, Noah? I wonder if this got done already or is still a bug requiring... Greg Farnum
07:29 PM Bug #4023: kclient: d_revalidate is abusing d_parent
The race still exists, but I don't think it's big problem. Because even if ceph_get_dentry_parent_inode() returns a w... Zheng Yan
04:03 PM Bug #4023: kclient: d_revalidate is abusing d_parent
Is this still a problem? Greg Farnum
05:14 PM Bug #4722: kernel BUG at fs/ceph/caps.c:1006 invalid opcode: 0000
who cares 3.5 kernel? Zheng Yan
03:28 PM Bug #4722: kernel BUG at fs/ceph/caps.c:1006 invalid opcode: 0000
Sounds like this might require some protocol work and it's in the kernel client — high! Greg Farnum
05:13 PM Bug #7565: Failed assert in check_rstats
it's CDir::check_rstats() bug, not rstat corruption. Zheng Yan
04:46 PM Bug #7565 (Resolved): Failed assert in check_rstats

This is odd, because it's happening very reproducibly, is not unique to the tip of master, but apparently isn't hap...
John Spray
04:51 PM Bug #1181: mds: old_inodes crash
Snapshots
See also #4248, which may or may not have anything to do with this.
Greg Farnum
04:50 PM Bug #926: mds: fix rename between snaprealms
Snapshots Greg Farnum
04:50 PM Bug #1552 (Duplicate): qa: file locking test fails
#7326 Greg Farnum
04:48 PM Bug #2740: mds: crash in Objecter when shutting down too early
I'm pretty sure this is fixed, but let's check it out and make sure. Greg Farnum
04:45 PM Bug #3596: ceph-fuse: crash in mds rejoin
Snapshots Greg Farnum
04:44 PM Bug #2187: pjd chown/00.t failed test 97
This is either an MDS or protocol bug since we've seen it across clients. Greg Farnum
04:43 PM Bug #2863: client: does not tolerate traceless replies from mds
uclient failure-case: low priority.
I believe we've established that the kclient does not suffer from this issue, ...
Greg Farnum
04:42 PM Bug #2288: libcephfs: setxattr returns EEXIST following removexattr
Confirmed MDS bug! Greg Farnum
04:40 PM Bug #2679: POSIX file lock not released on process termination
Let's see if we can reproduce this as it's some combination of kclient, MDS, or protocol bug. Greg Farnum
04:38 PM Bug #1666: hadoop: time-related meta-data problems
Also see #7564
But low priority, for this is Hadoop
Greg Farnum
04:36 PM Bug #4212: mds: open_snap_parents isn't called all the times it needs to be
Snapshots Greg Farnum
04:36 PM Bug #4213: mds: old_parents is never cleaned up
Snapshots Greg Farnum
04:35 PM Fix #7564: synchronize MDS and client times in a way that makes pjd happy even under clock skew
See also #1666 Greg Farnum
04:28 PM Fix #7564 (Duplicate): synchronize MDS and client times in a way that makes pjd happy even under ...
See #854. We have ops happen on both the client and the MDS, and so sometimes one time wins and sometimes the other d... Greg Farnum
04:29 PM Bug #854 (Duplicate): unsynchronized clocks between kernel-client/cmds cause PJD fstest failures
I'm closing this in favor of fix ticket #7564. Greg Farnum
04:13 PM Bug #1874: Running `git gc` on a bare git repository hosted by ceph results in a bus error.
So basically two things could have gone wrong here:
1) The OSD replied with a bad tid (unlikely)
2) the client forg...
Greg Farnum
04:02 PM Bug #4370: mds: high-cpu utilization in memorymodel:_sample
Figure out if the current MemoryModel is actually useful for anything — I think it might not be. All the lovely ticke... Greg Farnum
04:01 PM Bug #3935: kclient: Big directory access bugs (multiple), mixed 32- and 64-bit clients
The hangs sound like generic cap and request waitlisting issues to to me. The empty directory is tickling something i... Greg Farnum
03:57 PM Bug #4248: mds: replay does not correctly update CInode::first and ::last members
I'm going to leave this at normal even though it's a snapshotting issue — the problem's diagnosed and it's a bug in t... Greg Farnum
03:53 PM Bug #4134: mds: request locking hang under snaptests
snapshots = low Greg Farnum
03:52 PM Bug #3719 (Can't reproduce): pjd test 145 failed in the nightly runs
These logs are gone. Greg Farnum
03:45 PM Bug #4280: mds: crash on lookupsnap
Snapshots = low priority Greg Farnum
03:38 PM Bug #2445: crash when removing a non-empty directory
Let's validate behavior here — there's a good chance Zheng or somebody fixed whatever bug caused this, and we want to... Greg Farnum
03:32 PM Bug #1877: ceph.ko (3.1.6) oopses upon cephfs set_layout of a symlink to a dir
Kernel client layout crash = high. Identify if this is still a problem, and if we can trigger it using the vxattrs as... Greg Farnum
03:30 PM Bug #4738: libceph: unlink vs. readdir (and other dir orders)
Need more info, samba, uclient, etc. Greg Farnum
03:27 PM Bug #4732: uclient: client/Inode.cc: 126: FAILED assert(cap_refs[c] > 0)
The blocker bug is low, so this one can't have a higher priority. Greg Farnum
03:25 PM Bug #4920: client: does not respect O_NOFOLLOW
uclient = low priority, for now. Greg Farnum
03:25 PM Bug #4188: mds crashes when cow-ing entries in formerly snapshotted dir
Snapshots = low priority. *sigh* Greg Farnum
03:21 PM Bug #5360: ceph-fuse: failing smbtorture tests
Samba against ceph-fuse (not even using libcephfs) = low priority. Greg Farnum
03:20 PM Feature #5486: kclient: make it work with selinux
I don't know anything about SELinux, nor its users. What needs to work for us to support SELinux, and how big of a st... Greg Farnum
03:19 PM Bug #5762: teuthology: Failed MPI runs lead to a hung test instead of a failure
It's a test which we can't use properly. High priority! Greg Farnum
03:18 PM Bug #6458 (Need More Info): journaler: journal too short during replay
I've bumped up #4708, so if that's the cause of this it'll be fixed when that is. If not, we need more info. Greg Farnum
03:17 PM Fix #4708: MDS: journaler pre-zeroing is dangerous
#6458 could be a result of this issue, so I'm bumping up the priority. Greg Farnum
03:14 PM Bug #5950: kcephfs: cephfs set_layout -p 4 gets EINVAL
We want to use the virtual xattrs moving forward, so downgrading a bug in the cephfs tool. Greg Farnum
01:38 PM Bug #7474: Kernel oops with cephfs [ceph_write_begin -> *x8 -> wait_on_page_read]
I wasn't on 3.8, it was 3.11. Unfortunately I can't use the machines I was experimenting with for this purpose anymor... Peter Waller
01:19 PM Bug #7474: Kernel oops with cephfs [ceph_write_begin -> *x8 -> wait_on_page_read]
Zheng, do you have a specific bug you think this is so we can close it out? Greg Farnum
01:24 PM Bug #6741: failed snaptest-2.sh; got ENOTEMPTY on should-be empty dir
Downgrading: ceph-fuse and snapshots. Greg Farnum
01:23 PM Bug #6609: teuthology rsync workunit failure
I haven't noticed this in a while, but upgrading as it was a failure across both clients. Greg Farnum
01:22 PM Bug #5864: cfuse_workunit_suites_ffsb suite on Centos hangs with *** Got Signal Interrupt ***
This is passing in the nightlies, so if there is a bug it has to do with not only ceph-fuse, but ceph-fuse specifical... Greg Farnum
01:20 PM Bug #7206 (Need More Info): Ceph MDS Hang on hadoop workloads
Greg Farnum
01:18 PM Bug #7485 (Resolved): Killing MDS during 'creating' breaks subsequent startup (no snaptable)
We merged this to master in commit:9a040bfd46d141712c32aaa0fa8fc5de93336306, but I guess we missed closing out the ti... Greg Farnum
06:50 AM Feature #7325: mds: tool to examine (later, manipulate) dirfrag objects
Is this intended to be an online thing (modifying live MDS state), or something that operates on the RADOS objects (i... John Spray
06:35 AM Bug #5382: mds: failed objecter assert on shutdown
There was an earlier patch that introduced an "I'm in dispatch" flag, and a more recent one (https://github.com/ceph/... John Spray

02/26/2014

05:01 PM Bug #4746: client: invalidate callback can deadlock
Demoted due to ceph-fuse and FUSE interface work. Greg Farnum
05:00 PM Bug #4829: client: handling part of MClientForward incorrectly?
Demoting due to uclient and multi-mds. Greg Farnum
04:58 PM Bug #5787: client/Client.cc: 2081: FAILED assert(!unclean) in put_inode
Demoting due to uclient and Need More Info. Greg Farnum
04:57 PM Bug #6473: multimds + ceph-fuse: fsstress gets ENOTEMPTY on final rm -r
Demoting due to multi-mds. Greg Farnum
04:57 PM Bug #5765: kclient: High CPU due to raw_spin_lock in ceph_cap_string
Demoting due to performance, not correctness. Greg Farnum
04:56 PM Bug #5021: ceph-fuse: crash on traceless reply
Demoting due to uclient. Greg Farnum
04:55 PM Bug #5382: mds: failed objecter assert on shutdown
I'm pretty sure we had a discussion about your patch, but I can't find the comments and I don't remember the outcome.... Greg Farnum
04:48 PM Bug #6608: samba teuthology dbench failure
Demoting priority on samba. Greg Farnum
04:47 PM Bug #7011: ENOTEMPTY on ceph-fuse + snaptest-? test
Demoting priority on ceph-fuse and snapshots. Greg Farnum
04:47 PM Bug #6613: samba is crashing in teuthology
Demoting priority on samba. Greg Farnum
04:37 PM Feature #7326: qa: fix flock tests
I don't remember which tests these are; the locktest ones that are racy, or something else? Greg Farnum
04:35 PM Feature #7352: mds: make classes encode/decode-able
We've already merged in the MDSTable and Journaler header dumping stuff; I think that's all the stuff that you were t... Greg Farnum
04:29 PM Feature #4001 (Resolved): Implement the migration path from using the AnchorTable to using lookup...
Again, Zheng got this done. Greg Farnum
04:26 PM Cleanup #3742: Remove old Hadoop wrappers and configuration options
This is already done, isn't it Noah? At least, the old stuff isn't where it used to be and I didn't see it with the n... Greg Farnum
04:18 PM Feature #118: kclient: clean pages when throwing out dirty metadata on session teardown
I can't find the referenced ticket anywhere. Anybody know what this is supposed to be and if it still applies? (I thi... Greg Farnum
07:15 AM Bug #7530: mds: failed anchor assert on replay
commit:7ba3200f1e91d803cdf84f96777641f7d18d3c01 Greg Farnum
01:08 AM Feature #7531 (Closed): MDS: support required feature sets like the OSD and monitor
MDS map contains CompatSet::FeatureSet Zheng Yan

02/25/2014

06:27 PM Bug #7530 (Resolved): mds: failed anchor assert on replay
Zheng Yan
09:14 AM Bug #7530: mds: failed anchor assert on replay
config used was (suites/fs/thrash/): ceph/base.yaml ceph-thrash/default.yaml clusters/mds-1active-1standby.yaml debug... John Spray
09:09 AM Bug #7530: mds: failed anchor assert on replay
Crashed on first try, log at debug-mds=10 attached John Spray
07:04 AM Bug #7530 (In Progress): mds: failed anchor assert on replay
John Spray
07:25 AM Bug #7503: mds start and oops after access to cephfs
fine, ok for the ticket #7531.This one should be closed.
Yann Dupont
07:02 AM Feature #3863 (In Progress): implement a tool to lookup inode numbers without holding their path
John Spray

02/24/2014

10:31 PM Bug #7503 (Won't Fix): mds start and oops after access to cephfs
Ah, it sounds like this is happening because the MDS doesn't currently have a good versioning system to prevent too-o... Greg Farnum
10:30 PM Feature #7531 (Closed): MDS: support required feature sets like the OSD and monitor
This'll be a little interesting because the MDS doesn't have local storage. Evaluate if feature sets are best stored ... Greg Farnum
10:11 PM Bug #7530 (Resolved): mds: failed anchor assert on replay
... Greg Farnum
05:10 PM Feature #4000 (Resolved): Design a migration path from using the AnchorTable to using lookup-by-ino
Sage Weil
03:16 PM Feature #4000: Design a migration path from using the AnchorTable to using lookup-by-ino
Did this already get done with Zheng's work to remove the AnchorTable? Greg Farnum
05:10 PM Feature #7323 (Resolved): mds: fix and merge pending libcephfs changes
Sage Weil
05:09 PM Feature #3999 (Resolved): update CDir encoding
this was revved as part of zheng's omap stuff Sage Weil
12:58 AM Feature #7315 (Closed): review and merge zheng's dirfrag series
Zheng Yan

02/21/2014

03:15 PM Bug #7503: mds start and oops after access to cephfs
Ok for explanation, and as already said, all that data was test data, so I can loose it without problems. I also full... Yann Dupont
08:45 AM Bug #7503: mds start and oops after access to cephfs
MDS is getting an ENFILE (object lost) from the OSD while trying to read the OMAP from one of its stray directory obj... John Spray
07:18 AM Bug #7503 (Won't Fix): mds start and oops after access to cephfs
this is a follow up to http://tracker.ceph.com/issues/7367, which explain the scenario.
I now attach the mds.log
Yann Dupont
08:30 AM Bug #7485 (Fix Under Review): Killing MDS during 'creating' breaks subsequent startup (no snaptable)
John Spray
08:29 AM Bug #7485: Killing MDS during 'creating' breaks subsequent startup (no snaptable)
https://github.com/ceph/ceph/pull/1283 John Spray
08:14 AM Bug #7485: Killing MDS during 'creating' breaks subsequent startup (no snaptable)

MDS -1 gid 1 starts in BOOTING, sends a beacon
MON prepare_beacon records its existence and puts it into state STA...
John Spray

02/20/2014

06:23 AM Bug #7485 (Resolved): Killing MDS during 'creating' breaks subsequent startup (no snaptable)

Pretty easy to reproduce: start MDS for first time on fresh cluster (I'm using vstart here), ctrl-c it promptly, tr...
John Spray

02/19/2014

10:14 PM Bug #6608: samba teuthology dbench failure
We're still occasional samba test failures, but I haven't diagnosed them carefully enough to know if they're this fai... Greg Farnum
07:23 PM Bug #7474: Kernel oops with cephfs [ceph_write_begin -> *x8 -> wait_on_page_read]
are you using 3.8 kernel? if you are, please try 3.12 or 3.13 Zheng Yan
01:07 AM Bug #7474 (Won't Fix): Kernel oops with cephfs [ceph_write_begin -> *x8 -> wait_on_page_read]
I'm on Ubuntu 13.10 and I've installed the packages distributed with it (ceph-deploy 1.2.3-0ubuntu1 and `ceph` 0.67.4... Peter Waller

02/18/2014

11:07 PM Bug #6608: samba teuthology dbench failure
still see the issue ? Zheng Yan

02/17/2014

01:37 PM Bug #7422 (In Progress): client/barrier.h uses boost's interval set library, which is not availab...
The barrier code has been disabled to fix the build. Matt said he will follow up. http://marc.info/?l=ceph-devel&m=... Sage Weil
01:34 PM Bug #7373 (Resolved): kcephfs nfs file create failes with EOPNOTSUPP
Sage Weil
08:44 AM Bug #7424 (Rejected): Cannot read from zero-length file
Pavel Veretennikov wrote:
> * Strange that it worked without permission. Where had it stored the data?
It was onl...
Sage Weil
07:32 AM Bug #7424: Cannot read from zero-length file
* Strange that it worked without permission. Where had it stored the data? Pavel Veretennikov
07:31 AM Bug #7424: Cannot read from zero-length file
Yes, the problem resolved after I gave client access to default data pool
rwx pool=data
Strange that it work...
Pavel Veretennikov
06:52 AM Bug #7424: Cannot read from zero-length file
does client have permission permission to access the data pool? try using admin's keyring to mount the fs. Zheng Yan
01:19 AM Bug #7424: Cannot read from zero-length file
Ubuntu doesn't use SELinux as I know. /selinux lib is empty, only one related selinux package is present - libselinux... Pavel Veretennikov

02/16/2014

10:43 PM Bug #7372 (Closed): kcephfs: pjd tests fail
Zheng Yan
09:57 PM Bug #7372: kcephfs: pjd tests fail
well, as far as i can tell, the pjd tests also fail on ext4 in the same way they do on ceph:... Sage Weil

02/15/2014

06:07 PM Bug #6791 (Won't Fix): mds assert after startup - CDir::commit error (want > commited version)
Zheng Yan

02/14/2014

06:25 AM Bug #7424: Cannot read from zero-length file
do you have selinux enabled Zheng Yan
02:16 AM Bug #7424 (Rejected): Cannot read from zero-length file
Ubuntu 12.04 LTS 3.8.0-35-generic x64
Ceph 0.72.2-1precise from http://ceph.com/debian-emperor/
cluster b8...
Pavel Veretennikov
12:45 AM Bug #7422 (Resolved): client/barrier.h uses boost's interval set library, which is not available ...
http://gitbuilder.sepia.ceph.com/gitbuilder-centos6-amd64/log.cgi?log=9cbbc883e225b08b3e31cd2cf6e766688795886b
Thi...
Josh Durgin

02/12/2014

12:10 PM Bug #5382: mds: failed objecter assert on shutdown
I haven't looked at any of the code involved for real, but that sounds like a good plan to me. *thumbs up* Greg Farnum
11:22 AM Bug #5382: mds: failed objecter assert on shutdown
What's happening is that suicide() is getting called from another thread while the dispatch thread is inside _dispatc... John Spray
 

Also available in: Atom