Project

General

Profile

Activity

From 09/23/2016 to 10/22/2016

10/22/2016

10:52 PM Bug #17670 (Fix Under Review): multimds: mds entering up:replay and processing down mds aborts
https://github.com/ceph/ceph/pull/11611 Patrick Donnelly
10:46 PM Bug #17670 (Resolved): multimds: mds entering up:replay and processing down mds aborts
... Patrick Donnelly

10/21/2016

09:48 PM Feature #17249 (Resolved): cephfs tool for finding files that use named PGs
Greg Farnum
01:58 PM Bug #17620: Data Integrity Issue with kernel client vs fuse client
I suspect the zeros are from stale page cache data. If you encounter the issue again, please drop the kernel page cac... Zheng Yan
01:39 PM Bug #17275 (Fix Under Review): MDS long-time blocked ops. ceph-fuse locks up with getattr of file
https://github.com/ceph/ceph/pull/11593 Zheng Yan
01:21 PM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
created http://tracker.ceph.com/issues/17660 Henrik Korkuc
01:03 PM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
got a getattr long lock, but this time client has long-running objecter requests. I will be filling a ticket for that... Henrik Korkuc
05:33 AM Bug #17656: cephfs: high concurrent causing slow request
William, could you describe the issue from the Ceph's perspective in detail? Kefu Chai
01:46 AM Bug #17656 (Need More Info): cephfs: high concurrent causing slow request
background:
we use cephfs as CDN backend, when CDN vendor prefetch video files in cephfs, it will cause high concu...
william sheng

10/20/2016

08:44 PM Bug #17620: Data Integrity Issue with kernel client vs fuse client
On further testing, it seems I can only make this happen when doing multi-threaded downloads from multiple hosts. I h... Aaron Bassett
04:39 PM Bug #17591 (Resolved): Samba failures: smbtorture, dbench.sh, fsstress.sh
Greg Farnum
02:53 PM Bug #17636 (Resolved): MDS crash on creating: interval_set<inodeno_t> segfaults with new encoding
Kefu Chai
10:31 AM Bug #17636 (Fix Under Review): MDS crash on creating: interval_set<inodeno_t> segfaults with new ...
https://github.com/ceph/ceph/pull/11577 John Spray
10:23 AM Bug #17636: MDS crash on creating: interval_set<inodeno_t> segfaults with new encoding
Should also note that the backtraces are like:... John Spray
09:35 AM Bug #17636 (Resolved): MDS crash on creating: interval_set<inodeno_t> segfaults with new encoding

The map bound_encode method does this:...
John Spray
02:43 PM Feature #17643 (New): Verify and repair layout xattr during forward scrub
We should do exactly the same thing we do for backtrace, but for the layout that we write into 'layout' xattr. Repai... John Spray
02:36 PM Feature #17639 (Fix Under Review): Repair file backtraces during forward scrub
https://github.com/ceph/ceph/pull/11578
https://github.com/ceph/ceph-qa-suite/pull/1218
John Spray
02:31 PM Feature #17639 (Resolved): Repair file backtraces during forward scrub

Instead of just recording that a backtrace was missing or invalid, repair it in the same way we do in verify_diri_b...
John Spray
02:22 PM Feature #15619 (Resolved): Repair InoTable during forward scrub
... John Spray
10:38 AM Bug #17216 (Resolved): ceph_volume_client: recovery of partial auth update is broken
Loïc Dachary
08:32 AM Bug #17216 (Pending Backport): ceph_volume_client: recovery of partial auth update is broken
John Spray
09:35 AM Bug #17606 (Need More Info): multimds: assertion failure during directory migration
debug level of the log is too low. This is dup of a long-standing bug http://tracker.ceph.com/issues/8405. I can't fi... Zheng Yan
08:43 AM Bug #17629 (Duplicate): ceph_volume_client: TypeError with wrong argument count
The fix for this just merged in master yesterday.
Python says "2 args, 3 given" because the 3 is counting the "sel...
John Spray
12:33 AM Bug #17629 (Duplicate): ceph_volume_client: TypeError with wrong argument count
From: http://pulpito.ceph.com/pdonnell-2016-10-19_23:26:58-fs:recovery-master---basic-mira/486359/... Patrick Donnelly
12:43 AM Bug #14195: test_full_fclose fails (tasks.cephfs.test_full.TestClusterFull)
I just ran across this in http://pulpito.ceph.com/pdonnell-2016-10-19_23:26:58-fs:recovery-master---basic-mira/486355... Patrick Donnelly

10/19/2016

05:39 PM Bug #17621 (Rejected): Hadoop does a bad job closing files and we end up holding too many caps
(Just scribbling this in the tracker for a record.)
We've had several reports on the mailing list of Hadoop client...
Greg Farnum
05:32 PM Feature #12141: cephfs-data-scan: File size correction from backward scan
Unassigning as I believe this was dropped? Greg Farnum
04:48 PM Bug #17620: Data Integrity Issue with kernel client vs fuse client
No I only see:
Oct 19 15:34:52 phx-r2-r3-comp5 kernel: [683656.055247] libceph: client259809 fsid <redacted>
Oc...
Aaron Bassett
04:28 PM Bug #17620: Data Integrity Issue with kernel client vs fuse client
Were there any ceph related kernel message on the client hosts? (such as "ceph: mds0 caps stale") Zheng Yan
03:49 PM Bug #17620 (Resolved): Data Integrity Issue with kernel client vs fuse client
ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
kernel: 4.4.0-42-generic #62~14.04.1-Ubuntu SMP
I ...
Aaron Bassett
03:07 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
The cephx user, not the *nix user. I'd be fine with the other stuff — we already run (all our?) nightlies with our ow... Greg Farnum
03:04 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
I don't quite understand. The reproducer that does the ceph_ll_mkdir does it as an unprivileged user, so why is it al... Jeff Layton
02:46 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
There we go, that case would concern me.
Contrary to our discussion in standup today, I think MDSAuthCaps::is_capa...
Greg Farnum
12:25 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
No, the order has already been established, because each operation _really_ starts with the permission check. But ok,... Jeff Layton
04:08 AM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
No, I mean: I don't think this counts as a bug. To get in this situation we have two outstanding requests:
1) chmod
...
Greg Farnum
02:54 PM Backport #17617 (Resolved): jewel: [cephfs] fuse client crash when adding a new osd
https://github.com/ceph/ceph/pull/11860 Loïc Dachary
02:53 PM Backport #17615 (Resolved): jewel: mds: false "failing to respond to cache pressure" warning
https://github.com/ceph/ceph/pull/11861 Loïc Dachary
01:14 PM Bug #17606: multimds: assertion failure during directory migration
Fixed! Patrick Donnelly
03:49 AM Bug #17606: multimds: assertion failure during directory migration
I can't access /ceph/cephfs-perf/tmp/2016-10-18/ceph-mds.ceph-mds0.log. please change permission
Zheng Yan
11:56 AM Feature #16016 (Resolved): Populate DamageTable from forward scrub
John Spray
10:24 AM Bug #17270 (Pending Backport): [cephfs] fuse client crash when adding a new osd
John Spray
10:21 AM Bug #17611 (Resolved): mds: false "failing to respond to cache pressure" warning

Creating this ticket retroactively for a ticketless PR that needs backporting: https://github.com/ceph/ceph/pull/11373
John Spray
07:04 AM Bug #17172 (Resolved): Failure in snaptest-git-ceph.sh
Loïc Dachary

10/18/2016

11:35 PM Feature #17604: MDSMonitor: raise health warning when there are no standbys but there should be
I'll take this one. Patrick Donnelly
11:44 AM Feature #17604 (Resolved): MDSMonitor: raise health warning when there are no standbys but there ...

We could have a per-filesystem setting for how many standbys it wants to be available. Then if the available set o...
John Spray
09:22 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
FWIW, "little holes" like this are worse in that they'll often strike when things are really redlined. If the race wi... Jeff Layton
09:02 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
The client has to send out both a cap drop message and the user's request.
Those will be ordered, and if the cap dro...
Greg Farnum
08:34 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
Will they be ordered after the reply comes in, or just ordered wrt how they go out onto the wire?
IOW, is there so...
Jeff Layton
08:13 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
Luckily the messages are ordered so once a client sends off a request to the MDS, any caps it drops will be ordered a... Greg Farnum
07:22 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
I don't think that's sufficient.
My understanding is that once you drop the client_lock mutex (as the client does ...
Jeff Layton
06:54 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
The MDS does enforce cephx-based permissions, but unless you've deliberately constructed a limited cephx key it's goi... Greg Farnum
04:27 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
Ok, and the reason setting fuse_default_permissions didn't work for me yesterday was due to a typo. When I set that o... Jeff Layton
03:26 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
Ok, here's a "backported" test program. Both mkdirs still succeed, but when I set fuse_default_permissions to false o... Jeff Layton
01:02 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
No, I don't think so. Here is ceph_userperm_new in my tree:... Jeff Layton
03:22 PM Bug #17606 (Resolved): multimds: assertion failure during directory migration
This is from an experiment on Linode with 9 active MDS, 32 OSD, and 128 clients building the kernel.... Patrick Donnelly
09:47 AM Bug #17591 (Fix Under Review): Samba failures: smbtorture, dbench.sh, fsstress.sh
https://github.com/ceph/ceph/pull/11526 Zheng Yan
09:23 AM Bug #17591 (In Progress): Samba failures: smbtorture, dbench.sh, fsstress.sh
samba daemon changes its own euid/egid before calling libcephfs functions. But libcephfs setup a default_perm on init... Zheng Yan
02:44 AM Bug #17563: extremely slow ceph_fsync calls
No method so far, need to extend the MClientCaps message Zheng Yan

10/17/2016

09:45 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
And at a guess, you might be using the uid,gid I presume you have there (0,0) to create folders with the ownership sp... Greg Farnum
09:39 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
Quick sanity check: is fuse_default_permissions set to false? Otherwise the assumption is that fuse (or nfs, samba) ... John Spray
09:02 PM Bug #17594 (In Progress): cephfs: permission checking not working (MDS should enforce POSIX permi...
Frank Filz noticed that cephfs doesn't seem to be enforcing permissions properly, particularly on mkdir.
This test...
Jeff Layton
05:06 PM Bug #17591: Samba failures: smbtorture, dbench.sh, fsstress.sh
Yep, still failing on master. John Spray
02:23 PM Bug #17591: Samba failures: smbtorture, dbench.sh, fsstress.sh
Rerun of failures on master (a1fd258)
http://pulpito.ceph.com/jspray-2016-10-17_14:22:43-samba-master---basic-mira/
John Spray
02:19 PM Bug #17591: Samba failures: smbtorture, dbench.sh, fsstress.sh
Failures are on a4ce1f56c1e22d4d3c190403339d5082d66fc89c, last passes were on cfc0a16e048a5868b137e2b7b89c7ae105218bc... John Spray
02:16 PM Bug #17591 (Resolved): Samba failures: smbtorture, dbench.sh, fsstress.sh

Samba tests were some of the most stable, they are suddenly failing on master.
http://pulpito.ceph.com/teutholog...
John Spray
03:14 PM Bug #17286 (Resolved): Failure in dirfrag.sh
Loïc Dachary
03:14 PM Bug #17271 (Resolved): Failure in snaptest-git-ceph.sh
Loïc Dachary
03:14 PM Bug #17253 (Resolved): Crash in Client::_invalidate_kernel_dcache when reconnecting during unmount
Loïc Dachary
03:14 PM Bug #17173 (Resolved): Duplicate damage table entries
Loïc Dachary
03:14 PM Feature #16973 (Resolved): Log path as well as ino when detecting metadata damage
Loïc Dachary
03:14 PM Bug #16668 (Resolved): client: nlink count is not maintained correctly
Loïc Dachary
02:38 PM Bug #17547 (Resolved): ceph-fuse 10.2.3 segfault
John Spray
02:32 PM Bug #17547: ceph-fuse 10.2.3 segfault
looks good so far. I guess we can close this. Thanks! Henrik Korkuc
01:42 PM Bug #17547: ceph-fuse 10.2.3 segfault
Henrik: did that build fix the issue for you? John Spray
12:31 PM Backport #16946 (Resolved): jewel: client: nlink count is not maintained correctly
Loïc Dachary
12:30 PM Backport #17244 (Resolved): jewel: Failure in snaptest-git-ceph.sh
Loïc Dachary
12:30 PM Backport #17246 (Resolved): jewel: Log path as well as ino when detecting metadata damage
Loïc Dachary
12:30 PM Backport #17474 (Resolved): jewel: Failure in dirfrag.sh
Loïc Dachary
12:30 PM Backport #17476 (Resolved): jewel: Failure in snaptest-git-ceph.sh
Loïc Dachary
12:30 PM Backport #17477 (Resolved): jewel: Crash in Client::_invalidate_kernel_dcache when reconnecting d...
Loïc Dachary
12:30 PM Backport #17479 (Resolved): jewel: Duplicate damage table entries
Loïc Dachary
09:06 AM Bug #17562 (Fix Under Review): backtrace check fails when scrubbing directory created by fsstress
https://github.com/ceph/ceph/pull/11517/ Zheng Yan

10/16/2016

10:45 AM Bug #17563: extremely slow ceph_fsync calls
Yep, exactly. I noticed this while running the cthon04 testsuite against it. It copies some source files into place a... Jeff Layton

10/15/2016

10:06 PM Bug #17564: close race window when handling writes on a file descriptor opened with O_APPEND
Good point. Yeah, a retry loop may be the best we can do in that case. I'll have to read up on rados asserts to make ... Jeff Layton
01:16 AM Bug #17564: close race window when handling writes on a file descriptor opened with O_APPEND
Are we *allowed* to fail writes like that? :/
It doesn't actually close the race in all cases, but for sane use it...
Greg Farnum
09:39 AM Backport #17582 (Resolved): jewel: monitor assertion failure when deactivating mds in (invalid) f...
https://github.com/ceph/ceph/pull/11862 Nathan Cutler
09:34 AM Bug #16610 (Resolved): Jewel: segfault in ObjectCacher::FlusherThread
Nathan Cutler
08:28 AM Bug #17468: CephFs: IO Pauses for more than a 40 seconds, while running write intensive IOs
Further analysis of I/O drop points to correlation of MDS config parameter mds_log_max_segments and I/O drop. Using d... Vishal Kanaujia

10/14/2016

11:39 PM Bug #17563: extremely slow ceph_fsync calls
Oh right, this doesn't even have a way to ask the MDS for immediate service — notice how it just puts already-sent re... Greg Farnum
11:27 PM Bug #17562: backtrace check fails when scrubbing directory created by fsstress
It looks to me like compare() is doing the right thing, but that the scrubbing code is declaring an error if the on-d... Greg Farnum
11:19 PM Bug #17562: backtrace check fails when scrubbing directory created by fsstress
Well, in inode_backtrace_t::compare() we're trying to determine and return three different things, as described in th... Greg Farnum
08:06 PM Backport #17131 (Resolved): jewel: Jewel: segfault in ObjectCacher::FlusherThread
Loïc Dachary
05:06 PM Bug #16255 (Resolved): ceph-create-keys: sometimes blocks forever if mds "allow" is set
Nathan Cutler
02:26 PM Backport #17347 (Resolved): jewel: ceph-create-keys: sometimes blocks forever if mds "allow" is set
Sage Weil
10:04 AM Bug #17518 (Pending Backport): monitor assertion failure when deactivating mds in (invalid) fscid 0
John Spray
07:10 AM Bug #17525 (Duplicate): "[ FAILED ] LibCephFS.ThreesomeInterProcessRecordLocking" in smoke
Loïc Dachary
07:03 AM Bug #17525: "[ FAILED ] LibCephFS.ThreesomeInterProcessRecordLocking" in smoke
In jewel http://pulpito.ceph.com/loic-2016-10-13_17:04:37-fs-jewel-backports-distro-basic-smithi/471638/... Loïc Dachary

10/13/2016

07:00 PM Bug #17563: extremely slow ceph_fsync calls
This program is a reproducer. You can build it with something like ... Jeff Layton
01:10 PM Bug #17563: extremely slow ceph_fsync calls
Is that something we can change? Slow fsync() performance is particularly awful for applications.
In any case, the...
Jeff Layton
11:46 AM Bug #17563: extremely slow ceph_fsync calls
The reason is that client fsync does not force MDS flush its journal. fsync may wait up to a MDS tick if there is no ... Zheng Yan
10:22 AM Bug #17563 (Resolved): extremely slow ceph_fsync calls
I've been seeing problems with very slow fsyncs vs. a vstart cluster when I run ganesha on top of it. This is the gan... Jeff Layton
03:51 PM Bug #17466 (Resolved): MDSMonitor: non-existent standby_for_fscid not caught
Loïc Dachary
03:51 PM Bug #17197 (Resolved): ceph-fuse crash in Client::get_root_ino
Loïc Dachary
03:51 PM Bug #17105 (Resolved): multimds: allow_multimds not required when max_mds is set in ceph.conf at ...
Loïc Dachary
03:51 PM Bug #16764 (Resolved): ceph-fuse crash on force unmount with file open
Loïc Dachary
03:51 PM Bug #16066 (Resolved): client: FAILED assert(root_ancestor->qtree == __null)
Loïc Dachary
01:27 PM Bug #17564: close race window when handling writes on a file descriptor opened with O_APPEND
For the record -- I really don't care for O_APPEND semantics, but...
_write() does this currently:...
Jeff Layton
11:41 AM Bug #17564: close race window when handling writes on a file descriptor opened with O_APPEND
Not that easy to do. MDS does not always issue caps what client wants. In this case, client wants Fsw, but MDS may on... Zheng Yan
10:59 AM Bug #17564 (New): close race window when handling writes on a file descriptor opened with O_APPEND
This comment is in _write() in the userland client code:... Jeff Layton
01:02 PM Backport #16313 (Resolved): jewel: client: FAILED assert(root_ancestor->qtree == __null)
Loïc Dachary
12:45 PM Bug #17468: CephFs: IO Pauses for more than a 40 seconds, while running write intensive IOs
John, there are two test cases.
1. Load files into file system. We created 900k files of 1k size
2. Run I/O tests s...
Vishal Kanaujia
09:15 AM Bug #17562 (Resolved): backtrace check fails when scrubbing directory created by fsstress
when scrubbing created by fsstress, there are lots of false backtrace errors. The bug is in inode_backtrace_t::compar... Zheng Yan
06:54 AM Backport #17207 (Resolved): jewel: ceph-fuse crash on force unmount with file open
Loïc Dachary
06:44 AM Backport #17206 (Resolved): jewel: ceph-fuse crash in Client::get_root_ino
Loïc Dachary
06:44 AM Backport #17264 (Resolved): jewel: multimds: allow_multimds not required when max_mds is set in c...
Loïc Dachary
06:43 AM Backport #17557 (Resolved): jewel: MDSMonitor: non-existent standby_for_fscid not caught
Loïc Dachary

10/12/2016

04:25 PM Feature #17276 (Resolved): stick client PID in client_metadata
Patrick Donnelly
04:22 PM Bug #17531 (Resolved): mds fails to respawn if executable has changed
Patrick Donnelly

10/11/2016

08:57 AM Backport #17244 (In Progress): jewel: Failure in snaptest-git-ceph.sh
Loïc Dachary
08:56 AM Backport #17246 (In Progress): jewel: Log path as well as ino when detecting metadata damage
Loïc Dachary
08:52 AM Backport #17347 (In Progress): jewel: ceph-create-keys: sometimes blocks forever if mds "allow" i...
Loïc Dachary
08:51 AM Backport #17474 (In Progress): jewel: Failure in dirfrag.sh
Loïc Dachary
08:50 AM Backport #17476 (In Progress): jewel: Failure in snaptest-git-ceph.sh
Loïc Dachary
08:45 AM Backport #17477 (In Progress): jewel: Crash in Client::_invalidate_kernel_dcache when reconnectin...
Loïc Dachary
08:40 AM Backport #17478 (In Progress): jewel: MDS goes damaged on blacklist (failed to read JournalPointe...
Loïc Dachary
08:34 AM Backport #17479 (In Progress): jewel: Duplicate damage table entries
Loïc Dachary
08:32 AM Backport #17557 (In Progress): jewel: MDSMonitor: non-existent standby_for_fscid not caught
Loïc Dachary
08:08 AM Backport #17557 (Resolved): jewel: MDSMonitor: non-existent standby_for_fscid not caught
https://github.com/ceph/ceph/pull/11389 Loïc Dachary

10/10/2016

09:15 PM Bug #17548: should userland ceph_llseek do permission checking?
... Jeff Layton
08:47 PM Bug #17548: should userland ceph_llseek do permission checking?
Actually, now I'm confused. What's the failing test, if this isn't a case users should run into anyway? Greg Farnum
08:42 PM Bug #17548: should userland ceph_llseek do permission checking?
There are several ways, but yeah...it comes down to being careful to close out old file descriptors (or use O_CLOEXEC... Jeff Layton
08:40 PM Bug #17548: should userland ceph_llseek do permission checking?
Yeah, sorry. I know it's possible for a process to drop permissions somehow or other. Greg Farnum
08:36 PM Bug #17548: should userland ceph_llseek do permission checking?
That's just the way UNIX (and hence POSIX) works. UNIX pipelines require file descriptor inheritance.
I'm not sure...
Jeff Layton
08:12 PM Bug #17548: should userland ceph_llseek do permission checking?
But...isn't that insecure by design? Why should we assume an open FD is still valid on a setuid'ed fork(), for instance? Greg Farnum
08:07 PM Bug #17548: should userland ceph_llseek do permission checking?
So does fstat, and we don't do any special permission checking there either.
I guess my view is that if you're iss...
Jeff Layton
07:56 PM Bug #17548: should userland ceph_llseek do permission checking?
We probably want to follow the kernel. But when I read the man pages and think about security, it seems like it shoul... Greg Farnum
03:53 PM Bug #17548: should userland ceph_llseek do permission checking?
Seems reasonable to remove. Patrick Donnelly
11:48 AM Bug #17548 (Resolved): should userland ceph_llseek do permission checking?
One of the test failures here:
http://qa-proxy.ceph.com/teuthology/jlayton-2016-10-08_00:13:32-fs-wip-jlayton-...
Jeff Layton
01:51 PM Bug #17468: CephFs: IO Pauses for more than a 40 seconds, while running write intensive IOs
Vishal: are you still investigating this or do you need some input from others? John Spray
01:46 PM Bug #17522 (In Progress): ceph_readdirplus_r does not acquire caps before sending back attributes
John Spray
12:35 PM Bug #17547: ceph-fuse 10.2.3 segfault
I am preparing new build with a patch from https://github.com/ceph/ceph/pull/10921 Henrik Korkuc
12:12 PM Bug #17547: ceph-fuse 10.2.3 segfault
It is happening on 10.2.3 cluster too (just checked another cluster). It's just that I took this segfault from client... Henrik Korkuc
12:11 PM Bug #17547: ceph-fuse 10.2.3 segfault
Seems likely to be same issue fixed by https://github.com/ceph/ceph/pull/10921 (i.e. duplicate of http://tracker.ceph... John Spray
12:10 PM Bug #17547: ceph-fuse 10.2.3 segfault
> ceph-fuse 10.2.3 segfaults on 10.2.2 Ceph cluster.
Are you asserting that it *doesn't* segfault on a 10.2.3 ceph...
John Spray
11:38 AM Bug #17547 (Resolved): ceph-fuse 10.2.3 segfault
ceph-fuse 10.2.3 segfaults on 10.2.2 Ceph cluster.
ceph-fuse has #17275 and https://github.com/ceph/ceph/pull/1086...
Henrik Korkuc
11:24 AM Bug #17466 (Pending Backport): MDSMonitor: non-existent standby_for_fscid not caught
Jewel backport PR here: https://github.com/ceph/ceph/pull/11389 John Spray

10/07/2016

11:16 PM Feature #17276: stick client PID in client_metadata
Testing an email filter. Please ignore... Patrick Donnelly
11:44 AM Feature #17532: qa: repeated "rsync --link-dest" workload
Also: ensure that the test is running with an "mds cache size" setting smaller than the directory being backed up. John Spray
11:41 AM Feature #17532 (New): qa: repeated "rsync --link-dest" workload
Related but distinct: http://tracker.ceph.com/issues/17434
We should have a test that uses the --link-dest feature...
John Spray

10/06/2016

11:09 PM Bug #17531 (Fix Under Review): mds fails to respawn if executable has changed
PR: https://github.com/ceph/ceph/pull/11362 Patrick Donnelly
10:48 PM Bug #17531 (Resolved): mds fails to respawn if executable has changed
If the mds is failed via `ceph mds fail` and the executable file has changed, the mds will fail to respawn using exec... Patrick Donnelly
09:19 PM Feature #17276 (Fix Under Review): stick client PID in client_metadata
PR: https://github.com/ceph/ceph/pull/11359 Patrick Donnelly
08:04 PM Bug #17518 (Fix Under Review): monitor assertion failure when deactivating mds in (invalid) fscid 0
PR: https://github.com/ceph/ceph/pull/11357 Patrick Donnelly
04:17 PM Bug #17525 (Duplicate): "[ FAILED ] LibCephFS.ThreesomeInterProcessRecordLocking" in smoke
Run: http://pulpito.ceph.com/teuthology-2016-10-06_05:00:03-smoke-master-testing-basic-vps/
Job: 457132
Logs: http:...
Yuri Weinstein
11:13 AM Bug #17522 (Resolved): ceph_readdirplus_r does not acquire caps before sending back attributes
In order to send a reliable set of attributes back to the caller, cephfs should ensure that the client has shared cap... Jeff Layton
10:38 AM Backport #15999 (Resolved): jewel: CephFSVolumeClient: read-only authorization for volumes
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/9558
> https://github.com/ceph/ceph-qa-suite/pull/1041
...
Ramana Raja
10:34 AM Feature #15614 (Resolved): CephFSVolumeClient: read-only authorization for volumes
Ramana Raja

10/05/2016

09:27 PM Bug #17518 (Resolved): monitor assertion failure when deactivating mds in (invalid) fscid 0
Made a simple vstart cluster and tried this:... Patrick Donnelly
09:12 PM Bug #17517 (Resolved): ceph-fuse: does not handle ENOMEM during remount
When running 8 ceph-fuse clients on a single Linode VM with 2GB memory, I observed the clients would die under load w... Patrick Donnelly
07:43 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
I ended up going with a journal flush and a ceph fs reset. The mds is now up and clients can mount. Thank you both fo... Adam Tygart
02:56 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
Digging through that patch, I'm not sure it is setup to work with the mdsmap->fsmap conversion that happened at some ... Adam Tygart
01:24 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
Hmm, I guess the --yes-i-really-mean-it part isn't helping! (http://tracker.ceph.com/issues/14379)
I've attached Z...
John Spray
01:06 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
Well, crap. I don't remember running that, and it wasn't in the bash_history on the monitor I've been using, but:
...
Adam Tygart
12:24 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
Anything rank that is in the 'in' set should also be in one of failed,stopped,damaged or up.
I notice that in the ...
John Spray
04:25 AM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
http://people.cs.ksu.edu/~mozes/ceph-mon.hobbit01-debug-branch.log
Unfortunately, it would seem we're bailing out ...
Adam Tygart
12:26 AM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
mon level 20 should do it. Greg Farnum

10/04/2016

11:34 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
I'm building it now. What logs and levels would you like me to collect? Adam Tygart
10:57 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
I built a debug commit and merged in wip-17466-jewel; named the result wip-1746-jewel-debug. Greg Farnum
10:05 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
there was definitely an encoding issue. I went back through the historical mdsmaps and it wasn't always insisting it ... Adam Tygart
09:57 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
I think that if you use the repair tools to flush the MDS journal and then do an fs reset you should get everything b... Greg Farnum
09:43 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
No, you're correct. The part you're missing is MDSMonitor::maybe_expand_cluster(), invoked from MDSMonitor::tick(). I... Greg Farnum
07:37 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
https://github.com/ceph/ceph/blob/master/src/mon/MDSMonitor.cc#L2865
If I'm reading the code paths correctly, and ...
Adam Tygart
11:02 AM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
Gah, I thought I saw it (created new FS and set standby_for_fscid, then saw it stuck in standby), but a few deamon re... John Spray
10:44 AM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
There is some other bug here with standby_for_rank and standby_for_fscid set correctly, I can reproduce it locally. ... John Spray
12:38 PM Bug #17468: CephFs: IO Pauses for more than a 40 seconds, while running write intensive IOs
1. MDS runs with default configuration.
2. On MDS node, I ran following commands:
# ceph --admin-daemon ceph-md...
Vishal Kanaujia
11:32 AM Bug #17216: ceph_volume_client: recovery of partial auth update is broken
Proposed fix: https://github.com/ceph/ceph/pull/11304 Ramana Raja
05:32 AM Bug #17368 (Resolved): mds: allowed GIDs must be ordered in the cephx string
Greg Farnum
05:32 AM Bug #16367 (Resolved): libcephfs: UID parsing breaks root squash (Ganesha FSAL)
Greg Farnum

10/03/2016

11:57 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
ceph-post-file: 731781d4-c78c-47f9-9de3-e151b5c1a5f9
Above is the Monitor log with a fresh boot of the mds.
Adam Tygart
10:06 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
That FSMap contains a standby with '"standby_for_fscid": 0', but the existing filesystem is '"id": 2'.
Had him set i...
Greg Farnum
03:18 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
In fact, the mdsmap epoch is quite a lot larger than I remember it. I think it was in the low 50's before this issue... Adam Tygart
03:08 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
I've cherry-picked the mds and mon fixes from your pull-request onto jewel and built a new version. this allowed me t... Adam Tygart
06:46 AM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
wip-17466-jewel is building with a minimal fix rebased on v10.2.3
https://github.com/ceph/ceph/pull/11281 has the ...
John Spray
06:39 AM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
I think this is happening when older MDSs send a beacon to newer mons, because MMDSBeacon isn't initialising POD fiel... John Spray
06:30 AM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
... John Spray
01:42 PM Bug #17468 (In Progress): CephFs: IO Pauses for more than a 40 seconds, while running write inten...
Vishal Kanaujia
01:27 PM Bug #17468: CephFs: IO Pauses for more than a 40 seconds, while running write intensive IOs
I collected stack unwind traces in MDS process while no I/O was observed during write ops on cephFS. Stack trace freq... Vishal Kanaujia
01:28 PM Backport #17479 (Resolved): jewel: Duplicate damage table entries
https://github.com/ceph/ceph/pull/11412 Loïc Dachary
01:28 PM Backport #17478 (Resolved): jewel: MDS goes damaged on blacklist (failed to read JournalPointer: ...
https://github.com/ceph/ceph/pull/11413 Loïc Dachary
01:28 PM Backport #17477 (Resolved): jewel: Crash in Client::_invalidate_kernel_dcache when reconnecting d...
https://github.com/ceph/ceph/pull/11414 Loïc Dachary
01:28 PM Backport #17476 (Resolved): jewel: Failure in snaptest-git-ceph.sh
https://github.com/ceph/ceph/pull/11415 Loïc Dachary
01:27 PM Backport #17474 (Resolved): jewel: Failure in dirfrag.sh
https://github.com/ceph/ceph/pull/11416 Loïc Dachary

10/02/2016

08:34 PM Bug #10944: Deadlock, MDS logs "slow request", getattr pAsLsXsFs failed to rdlock
Oliver: use "ceph daemon mds.<id> objecter_requests" to see if the MDS is stuck waiting for operations from the OSDs. John Spray

10/01/2016

06:02 AM Bug #17468 (Closed): CephFs: IO Pauses for more than a 40 seconds, while running write intensive IOs
CephFs Environment:
1 MDS node
1 Ceph Kernel Filesystem, Ubuntu 14.04 LTS, kernel version 4.4.0-36-generic
Test ...
Parikshith B

09/30/2016

09:50 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
So now it seems like although the order of members was whacky, calls to = {fsicd, rank} were actually assigning the v... John Spray
08:59 PM Bug #17466 (Fix Under Review): MDSMonitor: non-existent standby_for_fscid not caught
https://github.com/ceph/ceph/pull/11281 John Spray
08:48 PM Bug #17466 (Resolved): MDSMonitor: non-existent standby_for_fscid not caught
We've got it using aggregate initialization (that is, the explicity {a, b} syntax), but we're ordering the rank and f... Greg Farnum
09:19 PM Feature #17434: qa: background rsync task for FS workunits
When we build this, we should include some performance metrics gathering. Many of our bugs around cap handling here a... Greg Farnum
04:27 PM Bug #10944: Deadlock, MDS logs "slow request", getattr pAsLsXsFs failed to rdlock
Hi,
i fully understand that this things might not be trivial.
But we have again that issue. Due to a network p...
Oliver Dzombc
03:34 AM Bug #17069: multimds: slave rmdir assertion failure
snapshot bug, lower Priority Zheng Yan
02:45 AM Bug #17069: multimds: slave rmdir assertion failure
please don't run snapshot tests on multimds, they are know broken. Zheng Yan
03:33 AM Bug #16768 (Need More Info): multimds: check_rstat assertion failure
Zheng Yan
03:26 AM Bug #16768: multimds: check_rstat assertion failure
Patrick Donnelly wrote:
> Zheng, I think we already have "debug mds = 20", right? From the config for this run: http...
Zheng Yan
03:28 AM Bug #16886 (Need More Info): multimds: kclient hang (?) in tests
no debug log found in http://pulpito.ceph.com/pdonnell-2016-07-29_08:28:00-multimds-master---basic-mira/339886/ Zheng Yan
03:22 AM Bug #17392 (Resolved): ceph-fuse sometimes fails to terminate (failures with "reached maximum tri...
the buggy code was newly introduced, which does not exist in jewel
the jewel hang is different bug...
Zheng Yan
02:39 AM Bug #16926: multimds: kclient fails to mount
I don't see any mount failure when using testing branch. Besides, I see lots of snapshot related failures. snapshot i... Zheng Yan

09/29/2016

03:45 PM Support #17171 (Closed): Ceph-fuse client hangs on unmount
John Spray
12:56 PM Support #17171: Ceph-fuse client hangs on unmount
We can probably close this issue, we'll reopen or create new when we'll be able to reliably reproduce issue Arturas Moskvinas
03:38 PM Bug #17173 (Pending Backport): Duplicate damage table entries
John Spray
12:01 PM Bug #17270 (Fix Under Review): [cephfs] fuse client crash when adding a new osd
Opened https://github.com/ceph/ceph/pull/11262 for the fix for the first crash.
Opened http://tracker.ceph.com/iss...
John Spray
12:01 PM Bug #17435 (New): Crash in ceph-fuse in ObjectCacher::trim while adding an OSD

From: http://tracker.ceph.com/issues/17270, in which there was initially a crash during writes, and this appears to...
John Spray
11:48 AM Feature #17434 (Fix Under Review): qa: background rsync task for FS workunits
A client that just sits there trying to rsync the contents of the filesystem. Running this at the same time as any o... John Spray

09/28/2016

02:25 PM Feature #11171 (Resolved): Path filtering on "dump cache" asok
John Spray
02:22 PM Bug #17392 (Pending Backport): ceph-fuse sometimes fails to terminate (failures with "reached max...
John Spray
02:22 PM Bug #17253 (Pending Backport): Crash in Client::_invalidate_kernel_dcache when reconnecting durin...
John Spray
02:22 PM Bug #17286 (Pending Backport): Failure in dirfrag.sh
John Spray
02:21 PM Bug #17236 (Pending Backport): MDS goes damaged on blacklist (failed to read JournalPointer: -108...
John Spray
02:21 PM Bug #17271 (Pending Backport): Failure in snaptest-git-ceph.sh
John Spray

09/27/2016

07:10 PM Feature #17276: stick client PID in client_metadata
There's already a GID (just an increasing integer) that the monitor uniquely associates with each client session — I ... Greg Farnum
07:00 PM Feature #17276: stick client PID in client_metadata
I will take this as I've been wanting it for my performance work.
I'm also wondering if you think it'd be useful t...
Patrick Donnelly
06:21 PM Bug #17294: mds client didn't update directory content after short network break
That sounds good to me.
It'll actually be a little finicky though — if we generate a new MDSMap that cuts the time...
Greg Farnum
04:02 PM Bug #17294: mds client didn't update directory content after short network break
Oh dear, the MDS takes mds_session_timeout from its local config, but the client users mdsmap->get_session_timeout().... John Spray
01:21 PM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
I created #17413 for that. Still waiting for this issue's issue to pop out Henrik Korkuc
03:03 AM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
Henrik Korkuc wrote:
> it looks like this time I got a read request stuck for 2 days from OSD. Client is 0.94.9
>
...
Zheng Yan
03:10 AM Bug #17368 (Fix Under Review): mds: allowed GIDs must be ordered in the cephx string
A new patch on https://github.com/ceph/ceph/pull/11218 Greg Farnum

09/26/2016

07:11 PM Bug #17408 (Can't reproduce): Possible un-needed wait on rstats when listing dir?

When a client does "ls -l" on a dir where other clients are doing lots of writes, it can't see an up to date size w...
John Spray
04:54 PM Bug #17294: mds client didn't update directory content after short network break
Yes, there's session renewal, and the MDS shouldn't unilaterally revoke caps until that time has passed.
I think w...
Greg Farnum
02:07 PM Bug #17294: mds client didn't update directory content after short network break
Ok, so to summarize (based on speculation here -- I haven't reproduced this).
After the first ls -l, client A has ...
Jeff Layton
01:55 PM Bug #17294: mds client didn't update directory content after short network break
So we should change this to use the same value everywhere (from the mdsmap!) John Spray
03:02 PM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
it looks like this time I got a read request stuck for 2 days from OSD. Client is 0.94.9
client cache: ceph-post-f...
Henrik Korkuc
01:58 PM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
check the cluster log, find lines "client.xxx failing to respond to capability release". client.xxx is the guy that c... Zheng Yan
01:44 PM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
hey Zheng,
can you provide me instructions how can I trace client which blocks ops? client.2816210 in this case.
...
Henrik Korkuc
01:50 PM Bug #17308 (Fix Under Review): MDSMonitor should tolerate paxos delays without failing daemons (W...
John Spray
01:22 PM Bug #17392 (Fix Under Review): ceph-fuse sometimes fails to terminate (failures with "reached max...
https://github.com/ceph/ceph/pull/11225 Zheng Yan

09/24/2016

03:14 AM Bug #16367 (Fix Under Review): libcephfs: UID parsing breaks root squash (Ganesha FSAL)
https://github.com/ceph/ceph/pull/11218 Greg Farnum
01:48 AM Feature #3314: client: client interfaces should take a set of group ids
Passing this off since he's been doing the statx interfaces and has more experience as a library consumer. Greg Farnum

09/23/2016

05:00 PM Bug #17392 (Resolved): ceph-fuse sometimes fails to terminate (failures with "reached maximum tri...
Seen here:
http://pulpito.ceph.com/jspray-2016-09-21_18:37:24-fs-wip-jcsp-greenish-distro-basic-smithi/429512/
http...
John Spray
01:45 PM Bug #17370: knfs ffsb hang on master
there is no log and I can't reproduce this locally. lower the priority Zheng Yan
10:11 AM Bug #16640: libcephfs: Java bindings failing to load on CentOS
Temporarily pinning tests to Ubuntu: https://github.com/ceph/ceph-qa-suite/pull/1186 John Spray
10:11 AM Bug #16556: LibCephFS.InterProcessLocking failing on master and jewel
Temporarily disabling: https://github.com/ceph/ceph/pull/11211 John Spray
 

Also available in: Atom