Activity
From 09/23/2016 to 10/22/2016
10/22/2016
- 10:52 PM Bug #17670 (Fix Under Review): multimds: mds entering up:replay and processing down mds aborts
- https://github.com/ceph/ceph/pull/11611
- 10:46 PM Bug #17670 (Resolved): multimds: mds entering up:replay and processing down mds aborts
- ...
10/21/2016
- 09:48 PM Feature #17249 (Resolved): cephfs tool for finding files that use named PGs
- 01:58 PM Bug #17620: Data Integrity Issue with kernel client vs fuse client
- I suspect the zeros are from stale page cache data. If you encounter the issue again, please drop the kernel page cac...
- 01:39 PM Bug #17275 (Fix Under Review): MDS long-time blocked ops. ceph-fuse locks up with getattr of file
- https://github.com/ceph/ceph/pull/11593
- 01:21 PM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
- created http://tracker.ceph.com/issues/17660
- 01:03 PM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
- got a getattr long lock, but this time client has long-running objecter requests. I will be filling a ticket for that...
- 05:33 AM Bug #17656: cephfs: high concurrent causing slow request
- William, could you describe the issue from the Ceph's perspective in detail?
- 01:46 AM Bug #17656 (Need More Info): cephfs: high concurrent causing slow request
- background:
we use cephfs as CDN backend, when CDN vendor prefetch video files in cephfs, it will cause high concu...
10/20/2016
- 08:44 PM Bug #17620: Data Integrity Issue with kernel client vs fuse client
- On further testing, it seems I can only make this happen when doing multi-threaded downloads from multiple hosts. I h...
- 04:39 PM Bug #17591 (Resolved): Samba failures: smbtorture, dbench.sh, fsstress.sh
- 02:53 PM Bug #17636 (Resolved): MDS crash on creating: interval_set<inodeno_t> segfaults with new encoding
- 10:31 AM Bug #17636 (Fix Under Review): MDS crash on creating: interval_set<inodeno_t> segfaults with new ...
- https://github.com/ceph/ceph/pull/11577
- 10:23 AM Bug #17636: MDS crash on creating: interval_set<inodeno_t> segfaults with new encoding
- Should also note that the backtraces are like:...
- 09:35 AM Bug #17636 (Resolved): MDS crash on creating: interval_set<inodeno_t> segfaults with new encoding
The map bound_encode method does this:...- 02:43 PM Feature #17643 (New): Verify and repair layout xattr during forward scrub
- We should do exactly the same thing we do for backtrace, but for the layout that we write into 'layout' xattr. Repai...
- 02:36 PM Feature #17639 (Fix Under Review): Repair file backtraces during forward scrub
- https://github.com/ceph/ceph/pull/11578
https://github.com/ceph/ceph-qa-suite/pull/1218 - 02:31 PM Feature #17639 (Resolved): Repair file backtraces during forward scrub
Instead of just recording that a backtrace was missing or invalid, repair it in the same way we do in verify_diri_b...- 02:22 PM Feature #15619 (Resolved): Repair InoTable during forward scrub
- ...
- 10:38 AM Bug #17216 (Resolved): ceph_volume_client: recovery of partial auth update is broken
- 08:32 AM Bug #17216 (Pending Backport): ceph_volume_client: recovery of partial auth update is broken
- 09:35 AM Bug #17606 (Need More Info): multimds: assertion failure during directory migration
- debug level of the log is too low. This is dup of a long-standing bug http://tracker.ceph.com/issues/8405. I can't fi...
- 08:43 AM Bug #17629 (Duplicate): ceph_volume_client: TypeError with wrong argument count
- The fix for this just merged in master yesterday.
Python says "2 args, 3 given" because the 3 is counting the "sel... - 12:33 AM Bug #17629 (Duplicate): ceph_volume_client: TypeError with wrong argument count
- From: http://pulpito.ceph.com/pdonnell-2016-10-19_23:26:58-fs:recovery-master---basic-mira/486359/...
- 12:43 AM Bug #14195: test_full_fclose fails (tasks.cephfs.test_full.TestClusterFull)
- I just ran across this in http://pulpito.ceph.com/pdonnell-2016-10-19_23:26:58-fs:recovery-master---basic-mira/486355...
10/19/2016
- 05:39 PM Bug #17621 (Rejected): Hadoop does a bad job closing files and we end up holding too many caps
- (Just scribbling this in the tracker for a record.)
We've had several reports on the mailing list of Hadoop client... - 05:32 PM Feature #12141: cephfs-data-scan: File size correction from backward scan
- Unassigning as I believe this was dropped?
- 04:48 PM Bug #17620: Data Integrity Issue with kernel client vs fuse client
- No I only see:
Oct 19 15:34:52 phx-r2-r3-comp5 kernel: [683656.055247] libceph: client259809 fsid <redacted>
Oc... - 04:28 PM Bug #17620: Data Integrity Issue with kernel client vs fuse client
- Were there any ceph related kernel message on the client hosts? (such as "ceph: mds0 caps stale")
- 03:49 PM Bug #17620 (Resolved): Data Integrity Issue with kernel client vs fuse client
- ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
kernel: 4.4.0-42-generic #62~14.04.1-Ubuntu SMP
I ... - 03:07 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- The cephx user, not the *nix user. I'd be fine with the other stuff — we already run (all our?) nightlies with our ow...
- 03:04 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- I don't quite understand. The reproducer that does the ceph_ll_mkdir does it as an unprivileged user, so why is it al...
- 02:46 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- There we go, that case would concern me.
Contrary to our discussion in standup today, I think MDSAuthCaps::is_capa... - 12:25 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- No, the order has already been established, because each operation _really_ starts with the permission check. But ok,...
- 04:08 AM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- No, I mean: I don't think this counts as a bug. To get in this situation we have two outstanding requests:
1) chmod
... - 02:54 PM Backport #17617 (Resolved): jewel: [cephfs] fuse client crash when adding a new osd
- https://github.com/ceph/ceph/pull/11860
- 02:53 PM Backport #17615 (Resolved): jewel: mds: false "failing to respond to cache pressure" warning
- https://github.com/ceph/ceph/pull/11861
- 01:14 PM Bug #17606: multimds: assertion failure during directory migration
- Fixed!
- 03:49 AM Bug #17606: multimds: assertion failure during directory migration
- I can't access /ceph/cephfs-perf/tmp/2016-10-18/ceph-mds.ceph-mds0.log. please change permission
- 11:56 AM Feature #16016 (Resolved): Populate DamageTable from forward scrub
- 10:24 AM Bug #17270 (Pending Backport): [cephfs] fuse client crash when adding a new osd
- 10:21 AM Bug #17611 (Resolved): mds: false "failing to respond to cache pressure" warning
Creating this ticket retroactively for a ticketless PR that needs backporting: https://github.com/ceph/ceph/pull/11373- 07:04 AM Bug #17172 (Resolved): Failure in snaptest-git-ceph.sh
10/18/2016
- 11:35 PM Feature #17604: MDSMonitor: raise health warning when there are no standbys but there should be
- I'll take this one.
- 11:44 AM Feature #17604 (Resolved): MDSMonitor: raise health warning when there are no standbys but there ...
We could have a per-filesystem setting for how many standbys it wants to be available. Then if the available set o...- 09:22 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- FWIW, "little holes" like this are worse in that they'll often strike when things are really redlined. If the race wi...
- 09:02 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- The client has to send out both a cap drop message and the user's request.
Those will be ordered, and if the cap dro... - 08:34 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- Will they be ordered after the reply comes in, or just ordered wrt how they go out onto the wire?
IOW, is there so... - 08:13 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- Luckily the messages are ordered so once a client sends off a request to the MDS, any caps it drops will be ordered a...
- 07:22 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- I don't think that's sufficient.
My understanding is that once you drop the client_lock mutex (as the client does ... - 06:54 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- The MDS does enforce cephx-based permissions, but unless you've deliberately constructed a limited cephx key it's goi...
- 04:27 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- Ok, and the reason setting fuse_default_permissions didn't work for me yesterday was due to a typo. When I set that o...
- 03:26 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- Ok, here's a "backported" test program. Both mkdirs still succeed, but when I set fuse_default_permissions to false o...
- 01:02 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- No, I don't think so. Here is ceph_userperm_new in my tree:...
- 03:22 PM Bug #17606 (Resolved): multimds: assertion failure during directory migration
- This is from an experiment on Linode with 9 active MDS, 32 OSD, and 128 clients building the kernel....
- 09:47 AM Bug #17591 (Fix Under Review): Samba failures: smbtorture, dbench.sh, fsstress.sh
- https://github.com/ceph/ceph/pull/11526
- 09:23 AM Bug #17591 (In Progress): Samba failures: smbtorture, dbench.sh, fsstress.sh
- samba daemon changes its own euid/egid before calling libcephfs functions. But libcephfs setup a default_perm on init...
- 02:44 AM Bug #17563: extremely slow ceph_fsync calls
- No method so far, need to extend the MClientCaps message
10/17/2016
- 09:45 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- And at a guess, you might be using the uid,gid I presume you have there (0,0) to create folders with the ownership sp...
- 09:39 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
- Quick sanity check: is fuse_default_permissions set to false? Otherwise the assumption is that fuse (or nfs, samba) ...
- 09:02 PM Bug #17594 (In Progress): cephfs: permission checking not working (MDS should enforce POSIX permi...
- Frank Filz noticed that cephfs doesn't seem to be enforcing permissions properly, particularly on mkdir.
This test... - 05:06 PM Bug #17591: Samba failures: smbtorture, dbench.sh, fsstress.sh
- Yep, still failing on master.
- 02:23 PM Bug #17591: Samba failures: smbtorture, dbench.sh, fsstress.sh
- Rerun of failures on master (a1fd258)
http://pulpito.ceph.com/jspray-2016-10-17_14:22:43-samba-master---basic-mira/ - 02:19 PM Bug #17591: Samba failures: smbtorture, dbench.sh, fsstress.sh
- Failures are on a4ce1f56c1e22d4d3c190403339d5082d66fc89c, last passes were on cfc0a16e048a5868b137e2b7b89c7ae105218bc...
- 02:16 PM Bug #17591 (Resolved): Samba failures: smbtorture, dbench.sh, fsstress.sh
Samba tests were some of the most stable, they are suddenly failing on master.
http://pulpito.ceph.com/teutholog...- 03:14 PM Bug #17286 (Resolved): Failure in dirfrag.sh
- 03:14 PM Bug #17271 (Resolved): Failure in snaptest-git-ceph.sh
- 03:14 PM Bug #17253 (Resolved): Crash in Client::_invalidate_kernel_dcache when reconnecting during unmount
- 03:14 PM Bug #17173 (Resolved): Duplicate damage table entries
- 03:14 PM Feature #16973 (Resolved): Log path as well as ino when detecting metadata damage
- 03:14 PM Bug #16668 (Resolved): client: nlink count is not maintained correctly
- 02:38 PM Bug #17547 (Resolved): ceph-fuse 10.2.3 segfault
- 02:32 PM Bug #17547: ceph-fuse 10.2.3 segfault
- looks good so far. I guess we can close this. Thanks!
- 01:42 PM Bug #17547: ceph-fuse 10.2.3 segfault
- Henrik: did that build fix the issue for you?
- 12:31 PM Backport #16946 (Resolved): jewel: client: nlink count is not maintained correctly
- 12:30 PM Backport #17244 (Resolved): jewel: Failure in snaptest-git-ceph.sh
- 12:30 PM Backport #17246 (Resolved): jewel: Log path as well as ino when detecting metadata damage
- 12:30 PM Backport #17474 (Resolved): jewel: Failure in dirfrag.sh
- 12:30 PM Backport #17476 (Resolved): jewel: Failure in snaptest-git-ceph.sh
- 12:30 PM Backport #17477 (Resolved): jewel: Crash in Client::_invalidate_kernel_dcache when reconnecting d...
- 12:30 PM Backport #17479 (Resolved): jewel: Duplicate damage table entries
- 09:06 AM Bug #17562 (Fix Under Review): backtrace check fails when scrubbing directory created by fsstress
- https://github.com/ceph/ceph/pull/11517/
10/16/2016
- 10:45 AM Bug #17563: extremely slow ceph_fsync calls
- Yep, exactly. I noticed this while running the cthon04 testsuite against it. It copies some source files into place a...
10/15/2016
- 10:06 PM Bug #17564: close race window when handling writes on a file descriptor opened with O_APPEND
- Good point. Yeah, a retry loop may be the best we can do in that case. I'll have to read up on rados asserts to make ...
- 01:16 AM Bug #17564: close race window when handling writes on a file descriptor opened with O_APPEND
- Are we *allowed* to fail writes like that? :/
It doesn't actually close the race in all cases, but for sane use it... - 09:39 AM Backport #17582 (Resolved): jewel: monitor assertion failure when deactivating mds in (invalid) f...
- https://github.com/ceph/ceph/pull/11862
- 09:34 AM Bug #16610 (Resolved): Jewel: segfault in ObjectCacher::FlusherThread
- 08:28 AM Bug #17468: CephFs: IO Pauses for more than a 40 seconds, while running write intensive IOs
- Further analysis of I/O drop points to correlation of MDS config parameter mds_log_max_segments and I/O drop. Using d...
10/14/2016
- 11:39 PM Bug #17563: extremely slow ceph_fsync calls
- Oh right, this doesn't even have a way to ask the MDS for immediate service — notice how it just puts already-sent re...
- 11:27 PM Bug #17562: backtrace check fails when scrubbing directory created by fsstress
- It looks to me like compare() is doing the right thing, but that the scrubbing code is declaring an error if the on-d...
- 11:19 PM Bug #17562: backtrace check fails when scrubbing directory created by fsstress
- Well, in inode_backtrace_t::compare() we're trying to determine and return three different things, as described in th...
- 08:06 PM Backport #17131 (Resolved): jewel: Jewel: segfault in ObjectCacher::FlusherThread
- 05:06 PM Bug #16255 (Resolved): ceph-create-keys: sometimes blocks forever if mds "allow" is set
- 02:26 PM Backport #17347 (Resolved): jewel: ceph-create-keys: sometimes blocks forever if mds "allow" is set
- 10:04 AM Bug #17518 (Pending Backport): monitor assertion failure when deactivating mds in (invalid) fscid 0
- 07:10 AM Bug #17525 (Duplicate): "[ FAILED ] LibCephFS.ThreesomeInterProcessRecordLocking" in smoke
- 07:03 AM Bug #17525: "[ FAILED ] LibCephFS.ThreesomeInterProcessRecordLocking" in smoke
- In jewel http://pulpito.ceph.com/loic-2016-10-13_17:04:37-fs-jewel-backports-distro-basic-smithi/471638/...
10/13/2016
- 07:00 PM Bug #17563: extremely slow ceph_fsync calls
- This program is a reproducer. You can build it with something like ...
- 01:10 PM Bug #17563: extremely slow ceph_fsync calls
- Is that something we can change? Slow fsync() performance is particularly awful for applications.
In any case, the... - 11:46 AM Bug #17563: extremely slow ceph_fsync calls
- The reason is that client fsync does not force MDS flush its journal. fsync may wait up to a MDS tick if there is no ...
- 10:22 AM Bug #17563 (Resolved): extremely slow ceph_fsync calls
- I've been seeing problems with very slow fsyncs vs. a vstart cluster when I run ganesha on top of it. This is the gan...
- 03:51 PM Bug #17466 (Resolved): MDSMonitor: non-existent standby_for_fscid not caught
- 03:51 PM Bug #17197 (Resolved): ceph-fuse crash in Client::get_root_ino
- 03:51 PM Bug #17105 (Resolved): multimds: allow_multimds not required when max_mds is set in ceph.conf at ...
- 03:51 PM Bug #16764 (Resolved): ceph-fuse crash on force unmount with file open
- 03:51 PM Bug #16066 (Resolved): client: FAILED assert(root_ancestor->qtree == __null)
- 01:27 PM Bug #17564: close race window when handling writes on a file descriptor opened with O_APPEND
- For the record -- I really don't care for O_APPEND semantics, but...
_write() does this currently:... - 11:41 AM Bug #17564: close race window when handling writes on a file descriptor opened with O_APPEND
- Not that easy to do. MDS does not always issue caps what client wants. In this case, client wants Fsw, but MDS may on...
- 10:59 AM Bug #17564 (New): close race window when handling writes on a file descriptor opened with O_APPEND
- This comment is in _write() in the userland client code:...
- 01:02 PM Backport #16313 (Resolved): jewel: client: FAILED assert(root_ancestor->qtree == __null)
- 12:45 PM Bug #17468: CephFs: IO Pauses for more than a 40 seconds, while running write intensive IOs
- John, there are two test cases.
1. Load files into file system. We created 900k files of 1k size
2. Run I/O tests s... - 09:15 AM Bug #17562 (Resolved): backtrace check fails when scrubbing directory created by fsstress
- when scrubbing created by fsstress, there are lots of false backtrace errors. The bug is in inode_backtrace_t::compar...
- 06:54 AM Backport #17207 (Resolved): jewel: ceph-fuse crash on force unmount with file open
- 06:44 AM Backport #17206 (Resolved): jewel: ceph-fuse crash in Client::get_root_ino
- 06:44 AM Backport #17264 (Resolved): jewel: multimds: allow_multimds not required when max_mds is set in c...
- 06:43 AM Backport #17557 (Resolved): jewel: MDSMonitor: non-existent standby_for_fscid not caught
10/12/2016
- 04:25 PM Feature #17276 (Resolved): stick client PID in client_metadata
- 04:22 PM Bug #17531 (Resolved): mds fails to respawn if executable has changed
10/11/2016
- 08:57 AM Backport #17244 (In Progress): jewel: Failure in snaptest-git-ceph.sh
- 08:56 AM Backport #17246 (In Progress): jewel: Log path as well as ino when detecting metadata damage
- 08:52 AM Backport #17347 (In Progress): jewel: ceph-create-keys: sometimes blocks forever if mds "allow" i...
- 08:51 AM Backport #17474 (In Progress): jewel: Failure in dirfrag.sh
- 08:50 AM Backport #17476 (In Progress): jewel: Failure in snaptest-git-ceph.sh
- 08:45 AM Backport #17477 (In Progress): jewel: Crash in Client::_invalidate_kernel_dcache when reconnectin...
- 08:40 AM Backport #17478 (In Progress): jewel: MDS goes damaged on blacklist (failed to read JournalPointe...
- 08:34 AM Backport #17479 (In Progress): jewel: Duplicate damage table entries
- 08:32 AM Backport #17557 (In Progress): jewel: MDSMonitor: non-existent standby_for_fscid not caught
- 08:08 AM Backport #17557 (Resolved): jewel: MDSMonitor: non-existent standby_for_fscid not caught
- https://github.com/ceph/ceph/pull/11389
10/10/2016
- 09:15 PM Bug #17548: should userland ceph_llseek do permission checking?
- ...
- 08:47 PM Bug #17548: should userland ceph_llseek do permission checking?
- Actually, now I'm confused. What's the failing test, if this isn't a case users should run into anyway?
- 08:42 PM Bug #17548: should userland ceph_llseek do permission checking?
- There are several ways, but yeah...it comes down to being careful to close out old file descriptors (or use O_CLOEXEC...
- 08:40 PM Bug #17548: should userland ceph_llseek do permission checking?
- Yeah, sorry. I know it's possible for a process to drop permissions somehow or other.
- 08:36 PM Bug #17548: should userland ceph_llseek do permission checking?
- That's just the way UNIX (and hence POSIX) works. UNIX pipelines require file descriptor inheritance.
I'm not sure... - 08:12 PM Bug #17548: should userland ceph_llseek do permission checking?
- But...isn't that insecure by design? Why should we assume an open FD is still valid on a setuid'ed fork(), for instance?
- 08:07 PM Bug #17548: should userland ceph_llseek do permission checking?
- So does fstat, and we don't do any special permission checking there either.
I guess my view is that if you're iss... - 07:56 PM Bug #17548: should userland ceph_llseek do permission checking?
- We probably want to follow the kernel. But when I read the man pages and think about security, it seems like it shoul...
- 03:53 PM Bug #17548: should userland ceph_llseek do permission checking?
- Seems reasonable to remove.
- 11:48 AM Bug #17548 (Resolved): should userland ceph_llseek do permission checking?
- One of the test failures here:
http://qa-proxy.ceph.com/teuthology/jlayton-2016-10-08_00:13:32-fs-wip-jlayton-... - 01:51 PM Bug #17468: CephFs: IO Pauses for more than a 40 seconds, while running write intensive IOs
- Vishal: are you still investigating this or do you need some input from others?
- 01:46 PM Bug #17522 (In Progress): ceph_readdirplus_r does not acquire caps before sending back attributes
- 12:35 PM Bug #17547: ceph-fuse 10.2.3 segfault
- I am preparing new build with a patch from https://github.com/ceph/ceph/pull/10921
- 12:12 PM Bug #17547: ceph-fuse 10.2.3 segfault
- It is happening on 10.2.3 cluster too (just checked another cluster). It's just that I took this segfault from client...
- 12:11 PM Bug #17547: ceph-fuse 10.2.3 segfault
- Seems likely to be same issue fixed by https://github.com/ceph/ceph/pull/10921 (i.e. duplicate of http://tracker.ceph...
- 12:10 PM Bug #17547: ceph-fuse 10.2.3 segfault
- > ceph-fuse 10.2.3 segfaults on 10.2.2 Ceph cluster.
Are you asserting that it *doesn't* segfault on a 10.2.3 ceph... - 11:38 AM Bug #17547 (Resolved): ceph-fuse 10.2.3 segfault
- ceph-fuse 10.2.3 segfaults on 10.2.2 Ceph cluster.
ceph-fuse has #17275 and https://github.com/ceph/ceph/pull/1086... - 11:24 AM Bug #17466 (Pending Backport): MDSMonitor: non-existent standby_for_fscid not caught
- Jewel backport PR here: https://github.com/ceph/ceph/pull/11389
10/07/2016
- 11:16 PM Feature #17276: stick client PID in client_metadata
- Testing an email filter. Please ignore...
- 11:44 AM Feature #17532: qa: repeated "rsync --link-dest" workload
- Also: ensure that the test is running with an "mds cache size" setting smaller than the directory being backed up.
- 11:41 AM Feature #17532 (New): qa: repeated "rsync --link-dest" workload
- Related but distinct: http://tracker.ceph.com/issues/17434
We should have a test that uses the --link-dest feature...
10/06/2016
- 11:09 PM Bug #17531 (Fix Under Review): mds fails to respawn if executable has changed
- PR: https://github.com/ceph/ceph/pull/11362
- 10:48 PM Bug #17531 (Resolved): mds fails to respawn if executable has changed
- If the mds is failed via `ceph mds fail` and the executable file has changed, the mds will fail to respawn using exec...
- 09:19 PM Feature #17276 (Fix Under Review): stick client PID in client_metadata
- PR: https://github.com/ceph/ceph/pull/11359
- 08:04 PM Bug #17518 (Fix Under Review): monitor assertion failure when deactivating mds in (invalid) fscid 0
- PR: https://github.com/ceph/ceph/pull/11357
- 04:17 PM Bug #17525 (Duplicate): "[ FAILED ] LibCephFS.ThreesomeInterProcessRecordLocking" in smoke
- Run: http://pulpito.ceph.com/teuthology-2016-10-06_05:00:03-smoke-master-testing-basic-vps/
Job: 457132
Logs: http:... - 11:13 AM Bug #17522 (Resolved): ceph_readdirplus_r does not acquire caps before sending back attributes
- In order to send a reliable set of attributes back to the caller, cephfs should ensure that the client has shared cap...
- 10:38 AM Backport #15999 (Resolved): jewel: CephFSVolumeClient: read-only authorization for volumes
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/9558
> https://github.com/ceph/ceph-qa-suite/pull/1041
... - 10:34 AM Feature #15614 (Resolved): CephFSVolumeClient: read-only authorization for volumes
10/05/2016
- 09:27 PM Bug #17518 (Resolved): monitor assertion failure when deactivating mds in (invalid) fscid 0
- Made a simple vstart cluster and tried this:...
- 09:12 PM Bug #17517 (Resolved): ceph-fuse: does not handle ENOMEM during remount
- When running 8 ceph-fuse clients on a single Linode VM with 2GB memory, I observed the clients would die under load w...
- 07:43 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- I ended up going with a journal flush and a ceph fs reset. The mds is now up and clients can mount. Thank you both fo...
- 02:56 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- Digging through that patch, I'm not sure it is setup to work with the mdsmap->fsmap conversion that happened at some ...
- 01:24 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- Hmm, I guess the --yes-i-really-mean-it part isn't helping! (http://tracker.ceph.com/issues/14379)
I've attached Z... - 01:06 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- Well, crap. I don't remember running that, and it wasn't in the bash_history on the monitor I've been using, but:
... - 12:24 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- Anything rank that is in the 'in' set should also be in one of failed,stopped,damaged or up.
I notice that in the ... - 04:25 AM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- http://people.cs.ksu.edu/~mozes/ceph-mon.hobbit01-debug-branch.log
Unfortunately, it would seem we're bailing out ... - 12:26 AM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- mon level 20 should do it.
10/04/2016
- 11:34 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- I'm building it now. What logs and levels would you like me to collect?
- 10:57 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- I built a debug commit and merged in wip-17466-jewel; named the result wip-1746-jewel-debug.
- 10:05 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- there was definitely an encoding issue. I went back through the historical mdsmaps and it wasn't always insisting it ...
- 09:57 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- I think that if you use the repair tools to flush the MDS journal and then do an fs reset you should get everything b...
- 09:43 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- No, you're correct. The part you're missing is MDSMonitor::maybe_expand_cluster(), invoked from MDSMonitor::tick(). I...
- 07:37 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- https://github.com/ceph/ceph/blob/master/src/mon/MDSMonitor.cc#L2865
If I'm reading the code paths correctly, and ... - 11:02 AM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- Gah, I thought I saw it (created new FS and set standby_for_fscid, then saw it stuck in standby), but a few deamon re...
- 10:44 AM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- There is some other bug here with standby_for_rank and standby_for_fscid set correctly, I can reproduce it locally. ...
- 12:38 PM Bug #17468: CephFs: IO Pauses for more than a 40 seconds, while running write intensive IOs
- 1. MDS runs with default configuration.
2. On MDS node, I ran following commands:
# ceph --admin-daemon ceph-md... - 11:32 AM Bug #17216: ceph_volume_client: recovery of partial auth update is broken
- Proposed fix: https://github.com/ceph/ceph/pull/11304
- 05:32 AM Bug #17368 (Resolved): mds: allowed GIDs must be ordered in the cephx string
- 05:32 AM Bug #16367 (Resolved): libcephfs: UID parsing breaks root squash (Ganesha FSAL)
10/03/2016
- 11:57 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- ceph-post-file: 731781d4-c78c-47f9-9de3-e151b5c1a5f9
Above is the Monitor log with a fresh boot of the mds. - 10:06 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- That FSMap contains a standby with '"standby_for_fscid": 0', but the existing filesystem is '"id": 2'.
Had him set i... - 03:18 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- In fact, the mdsmap epoch is quite a lot larger than I remember it. I think it was in the low 50's before this issue...
- 03:08 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- I've cherry-picked the mds and mon fixes from your pull-request onto jewel and built a new version. this allowed me t...
- 06:46 AM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- wip-17466-jewel is building with a minimal fix rebased on v10.2.3
https://github.com/ceph/ceph/pull/11281 has the ... - 06:39 AM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- I think this is happening when older MDSs send a beacon to newer mons, because MMDSBeacon isn't initialising POD fiel...
- 06:30 AM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- ...
- 01:42 PM Bug #17468 (In Progress): CephFs: IO Pauses for more than a 40 seconds, while running write inten...
- 01:27 PM Bug #17468: CephFs: IO Pauses for more than a 40 seconds, while running write intensive IOs
- I collected stack unwind traces in MDS process while no I/O was observed during write ops on cephFS. Stack trace freq...
- 01:28 PM Backport #17479 (Resolved): jewel: Duplicate damage table entries
- https://github.com/ceph/ceph/pull/11412
- 01:28 PM Backport #17478 (Resolved): jewel: MDS goes damaged on blacklist (failed to read JournalPointer: ...
- https://github.com/ceph/ceph/pull/11413
- 01:28 PM Backport #17477 (Resolved): jewel: Crash in Client::_invalidate_kernel_dcache when reconnecting d...
- https://github.com/ceph/ceph/pull/11414
- 01:28 PM Backport #17476 (Resolved): jewel: Failure in snaptest-git-ceph.sh
- https://github.com/ceph/ceph/pull/11415
- 01:27 PM Backport #17474 (Resolved): jewel: Failure in dirfrag.sh
- https://github.com/ceph/ceph/pull/11416
10/02/2016
- 08:34 PM Bug #10944: Deadlock, MDS logs "slow request", getattr pAsLsXsFs failed to rdlock
- Oliver: use "ceph daemon mds.<id> objecter_requests" to see if the MDS is stuck waiting for operations from the OSDs.
10/01/2016
- 06:02 AM Bug #17468 (Closed): CephFs: IO Pauses for more than a 40 seconds, while running write intensive IOs
- CephFs Environment:
1 MDS node
1 Ceph Kernel Filesystem, Ubuntu 14.04 LTS, kernel version 4.4.0-36-generic
Test ...
09/30/2016
- 09:50 PM Bug #17466: MDSMonitor: non-existent standby_for_fscid not caught
- So now it seems like although the order of members was whacky, calls to = {fsicd, rank} were actually assigning the v...
- 08:59 PM Bug #17466 (Fix Under Review): MDSMonitor: non-existent standby_for_fscid not caught
- https://github.com/ceph/ceph/pull/11281
- 08:48 PM Bug #17466 (Resolved): MDSMonitor: non-existent standby_for_fscid not caught
- We've got it using aggregate initialization (that is, the explicity {a, b} syntax), but we're ordering the rank and f...
- 09:19 PM Feature #17434: qa: background rsync task for FS workunits
- When we build this, we should include some performance metrics gathering. Many of our bugs around cap handling here a...
- 04:27 PM Bug #10944: Deadlock, MDS logs "slow request", getattr pAsLsXsFs failed to rdlock
- Hi,
i fully understand that this things might not be trivial.
But we have again that issue. Due to a network p... - 03:34 AM Bug #17069: multimds: slave rmdir assertion failure
- snapshot bug, lower Priority
- 02:45 AM Bug #17069: multimds: slave rmdir assertion failure
- please don't run snapshot tests on multimds, they are know broken.
- 03:33 AM Bug #16768 (Need More Info): multimds: check_rstat assertion failure
- 03:26 AM Bug #16768: multimds: check_rstat assertion failure
- Patrick Donnelly wrote:
> Zheng, I think we already have "debug mds = 20", right? From the config for this run: http... - 03:28 AM Bug #16886 (Need More Info): multimds: kclient hang (?) in tests
- no debug log found in http://pulpito.ceph.com/pdonnell-2016-07-29_08:28:00-multimds-master---basic-mira/339886/
- 03:22 AM Bug #17392 (Resolved): ceph-fuse sometimes fails to terminate (failures with "reached maximum tri...
- the buggy code was newly introduced, which does not exist in jewel
the jewel hang is different bug... - 02:39 AM Bug #16926: multimds: kclient fails to mount
- I don't see any mount failure when using testing branch. Besides, I see lots of snapshot related failures. snapshot i...
09/29/2016
- 03:45 PM Support #17171 (Closed): Ceph-fuse client hangs on unmount
- 12:56 PM Support #17171: Ceph-fuse client hangs on unmount
- We can probably close this issue, we'll reopen or create new when we'll be able to reliably reproduce issue
- 03:38 PM Bug #17173 (Pending Backport): Duplicate damage table entries
- 12:01 PM Bug #17270 (Fix Under Review): [cephfs] fuse client crash when adding a new osd
- Opened https://github.com/ceph/ceph/pull/11262 for the fix for the first crash.
Opened http://tracker.ceph.com/iss... - 12:01 PM Bug #17435 (New): Crash in ceph-fuse in ObjectCacher::trim while adding an OSD
From: http://tracker.ceph.com/issues/17270, in which there was initially a crash during writes, and this appears to...- 11:48 AM Feature #17434 (Fix Under Review): qa: background rsync task for FS workunits
- A client that just sits there trying to rsync the contents of the filesystem. Running this at the same time as any o...
09/28/2016
- 02:25 PM Feature #11171 (Resolved): Path filtering on "dump cache" asok
- 02:22 PM Bug #17392 (Pending Backport): ceph-fuse sometimes fails to terminate (failures with "reached max...
- 02:22 PM Bug #17253 (Pending Backport): Crash in Client::_invalidate_kernel_dcache when reconnecting durin...
- 02:22 PM Bug #17286 (Pending Backport): Failure in dirfrag.sh
- 02:21 PM Bug #17236 (Pending Backport): MDS goes damaged on blacklist (failed to read JournalPointer: -108...
- 02:21 PM Bug #17271 (Pending Backport): Failure in snaptest-git-ceph.sh
09/27/2016
- 07:10 PM Feature #17276: stick client PID in client_metadata
- There's already a GID (just an increasing integer) that the monitor uniquely associates with each client session — I ...
- 07:00 PM Feature #17276: stick client PID in client_metadata
- I will take this as I've been wanting it for my performance work.
I'm also wondering if you think it'd be useful t... - 06:21 PM Bug #17294: mds client didn't update directory content after short network break
- That sounds good to me.
It'll actually be a little finicky though — if we generate a new MDSMap that cuts the time... - 04:02 PM Bug #17294: mds client didn't update directory content after short network break
- Oh dear, the MDS takes mds_session_timeout from its local config, but the client users mdsmap->get_session_timeout()....
- 01:21 PM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
- I created #17413 for that. Still waiting for this issue's issue to pop out
- 03:03 AM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
- Henrik Korkuc wrote:
> it looks like this time I got a read request stuck for 2 days from OSD. Client is 0.94.9
>
... - 03:10 AM Bug #17368 (Fix Under Review): mds: allowed GIDs must be ordered in the cephx string
- A new patch on https://github.com/ceph/ceph/pull/11218
09/26/2016
- 07:11 PM Bug #17408 (Can't reproduce): Possible un-needed wait on rstats when listing dir?
When a client does "ls -l" on a dir where other clients are doing lots of writes, it can't see an up to date size w...- 04:54 PM Bug #17294: mds client didn't update directory content after short network break
- Yes, there's session renewal, and the MDS shouldn't unilaterally revoke caps until that time has passed.
I think w... - 02:07 PM Bug #17294: mds client didn't update directory content after short network break
- Ok, so to summarize (based on speculation here -- I haven't reproduced this).
After the first ls -l, client A has ... - 01:55 PM Bug #17294: mds client didn't update directory content after short network break
- So we should change this to use the same value everywhere (from the mdsmap!)
- 03:02 PM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
- it looks like this time I got a read request stuck for 2 days from OSD. Client is 0.94.9
client cache: ceph-post-f... - 01:58 PM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
- check the cluster log, find lines "client.xxx failing to respond to capability release". client.xxx is the guy that c...
- 01:44 PM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
- hey Zheng,
can you provide me instructions how can I trace client which blocks ops? client.2816210 in this case.
... - 01:50 PM Bug #17308 (Fix Under Review): MDSMonitor should tolerate paxos delays without failing daemons (W...
- 01:22 PM Bug #17392 (Fix Under Review): ceph-fuse sometimes fails to terminate (failures with "reached max...
- https://github.com/ceph/ceph/pull/11225
09/24/2016
- 03:14 AM Bug #16367 (Fix Under Review): libcephfs: UID parsing breaks root squash (Ganesha FSAL)
- https://github.com/ceph/ceph/pull/11218
- 01:48 AM Feature #3314: client: client interfaces should take a set of group ids
- Passing this off since he's been doing the statx interfaces and has more experience as a library consumer.
09/23/2016
- 05:00 PM Bug #17392 (Resolved): ceph-fuse sometimes fails to terminate (failures with "reached maximum tri...
- Seen here:
http://pulpito.ceph.com/jspray-2016-09-21_18:37:24-fs-wip-jcsp-greenish-distro-basic-smithi/429512/
http... - 01:45 PM Bug #17370: knfs ffsb hang on master
- there is no log and I can't reproduce this locally. lower the priority
- 10:11 AM Bug #16640: libcephfs: Java bindings failing to load on CentOS
- Temporarily pinning tests to Ubuntu: https://github.com/ceph/ceph-qa-suite/pull/1186
- 10:11 AM Bug #16556: LibCephFS.InterProcessLocking failing on master and jewel
- Temporarily disabling: https://github.com/ceph/ceph/pull/11211
Also available in: Atom