Project

General

Profile

Activity

From 10/14/2016 to 11/12/2016

11/12/2016

12:23 PM Backport #13927 (In Progress): hammer: cephfs-java ftruncate unit test failure
Nathan Cutler
12:22 PM Bug #11258: cephfs-java ftruncate unit test failure
*master PR*: https://github.com/ceph/ceph/pull/4215 Nathan Cutler

11/11/2016

02:21 PM Bug #17800 (Fix Under Review): ceph_volume_client.py : Error: Can't handle arrays of non-strings
PR for fix: https://github.com/ceph/ceph/pull/11917 Ramana Raja
11:13 AM Bug #17800 (In Progress): ceph_volume_client.py : Error: Can't handle arrays of non-strings
Ramana Raja
11:24 AM Bug #17747 (Fix Under Review): ceph-mds: remove "--journal-check" help text
Oops, forgot to clean up the manpage as well. Follow-up PR: https://github.com/ceph/ceph/pull/11912 Nathan Cutler
11:19 AM Bug #17747 (Resolved): ceph-mds: remove "--journal-check" help text
Nathan Cutler
11:05 AM Feature #17853 (In Progress): More deterministic timing for directory fragmentation
John Spray
06:46 AM Bug #17837: ceph-mon crashed after upgrade from hammer 0.94.7 to jewel 10.2.3
yes, I've stopped my update and my cluster working now with two mon server.
Perhaps it is helpful, I've a test clu...
alexander walker
06:46 AM Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMa...
I've created the ticket #17837.
Here ist output from "ceph mds dump --format=json-pretty"...
alexander walker

11/10/2016

03:38 PM Bug #17858 (Resolved): Cannot create deep directories when caps contain "path=/somepath"
ceph-fuse client with having "path=/something" cannot create multiple dirs with mkdir (e.g. mkdir -p 1/2/3/4/5/6/7/8/... Henrik Korkuc
02:27 PM Bug #17620: Data Integrity Issue with kernel client vs fuse client
The jobs failed by incorrectly writing the object to disk. I can recreate this pretty easily by having two clients do... Aaron Bassett
02:13 PM Bug #17620 (Need More Info): Data Integrity Issue with kernel client vs fuse client
Brett Niver
01:49 PM Bug #17193 (In Progress): truncate can cause unflushed snapshot data lose
Zheng Yan
01:18 PM Bug #17847: "Fuse mount failed to populate /sys/ after 31 seconds" in jewel 10.2.4
... Zheng Yan
01:16 PM Feature #17856 (Resolved): qa: background cephfs forward scrub teuthology task
Create a teuthology task that can be run in the background of a workunit, which simply starts a forward scrub on the ... John Spray
12:03 PM Bug #12895 (Can't reproduce): Failure in TestClusterFull.test_barrier
John Spray
12:02 PM Bug #2277: qa: flock test broken
For reference: locktest.py still exists in ceph-qa-suite, with a single use from suites/marginal/fs-misc/tasks/lockte... John Spray
11:51 AM Feature #17855 (Resolved): Don't evict a slow client if it's the only client

There is nothing gained by evicting a client session if there are no other clients who might be held up by it.
T...
John Spray
11:50 AM Feature #17854 (Resolved): mds: only evict an unresponsive client when another client wants its caps

Instead of immediately evicting a client when it has not responded within the timeout, set a flag to mark the clien...
John Spray
10:45 AM Feature #17853 (Resolved): More deterministic timing for directory fragmentation

This ticket is to track the work to replace the tick()-based consumption of the split queue with some timers, and t...
John Spray
10:26 AM Bug #17837 (Need More Info): ceph-mon crashed after upgrade from hammer 0.94.7 to jewel 10.2.3
Alexander: so hopefully you stopped the upgrade at that point and you still have a working cluster of two hammer mons... John Spray
03:49 AM Bug #17837 (Duplicate): ceph-mon crashed after upgrade from hammer 0.94.7 to jewel 10.2.3
This looks like a duplicate of 16592 but in a new code path: interestingly in a slave monitor. Patrick Donnelly

11/09/2016

11:50 PM Bug #4829: client: handling part of MClientForward incorrectly?
Targeting for Luminous to investigate and either fix or close this. John Spray
11:50 PM Bug #3267 (Closed): Multiple active MDSes stall when listing freshly created files
This ticket is old and the use case seems like something we will pick up on from the multimds suite if it's still bro... John Spray
11:48 PM Feature #3426 (Closed): ceph-fuse: build/run on os x
John Spray
11:47 PM Feature #3426: ceph-fuse: build/run on os x
I don't think it's likely we will ever "finish" this in the sense of maintaining/testing functionality on OSX, so I'm... John Spray
11:45 PM Cleanup #660 (Closed): mds: use helpers in mknod, mkdir, openc paths
John Spray
11:31 PM Bug #1511 (Closed): fsstress failure with 3 active mds
We have fresh tickets for multimds failures as they are reproduced now. John Spray
11:24 PM Feature #17852 (Resolved): mds: when starting forward scrub, return handle or stamp/version which...
Enable caller to kick off a scrub and later check completion (may just mean caller has to compare scrub stamps with a... John Spray
11:21 PM Feature #12274: mds: start forward scrubs from all subtree roots, skip non-auth metadata
Using this ticket to track the task of implementing cross-MDS forward scrub (i.e. handing off at subtree bounds) John Spray
11:12 PM Feature #11950: Strays enqueued for purge cause MDCache to exceed size limit
Targeting for Luminous and assigning to me: we will use a single Journaler() instance per MDS to track a persistent p... John Spray
11:00 PM Feature #17770: qa: test kernel client against "full" pools/filesystems
This is largely http://tracker.ceph.com/issues/17204 , but I'm leaving this ticket here as a convenient way of tracki... John Spray
05:36 PM Bug #17847: "Fuse mount failed to populate /sys/ after 31 seconds" in jewel 10.2.4
For those not familiar with the test, can you explain what versions are in play here, and what version of the client/... John Spray
04:39 PM Bug #17847 (New): "Fuse mount failed to populate /sys/ after 31 seconds" in jewel 10.2.4
Run: http://pulpito.front.sepia.ceph.com/yuriw-2016-11-09_15:03:11-upgrade:hammer-x-jewel-distro-basic-vps/
Job: 534...
Yuri Weinstein
03:10 PM Backport #17841 (In Progress): jewel: mds fails to respawn if executable has changed
Loïc Dachary
02:47 PM Backport #17841 (Resolved): jewel: mds fails to respawn if executable has changed
https://github.com/ceph/ceph/pull/11873 Loïc Dachary
02:15 PM Backport #17582 (In Progress): jewel: monitor assertion failure when deactivating mds in (invalid...
Loïc Dachary
02:14 PM Backport #17615 (In Progress): jewel: mds: false "failing to respond to cache pressure" warning
Loïc Dachary
02:12 PM Backport #17617 (In Progress): jewel: [cephfs] fuse client crash when adding a new osd
Loïc Dachary
02:05 PM Bug #17837 (Resolved): ceph-mon crashed after upgrade from hammer 0.94.7 to jewel 10.2.3
I've a cluster of three nodes:... alexander walker
01:59 PM Feature #17836 (New): qa: run multi-MDS tests with cache smaller than workload
Maybe work from my wip-smallcache branch https://github.com/ceph/ceph-qa-suite/tree/wip-smallcache
The goal is to ...
John Spray
01:48 PM Bug #16926 (Rejected): multimds: kclient fails to mount
Looks like this was initially failing when running against an old kernel, and was okay on a recent kernel, so closing. John Spray
01:43 PM Feature #17835 (Fix Under Review): mds: enable killpoint tests for MDS-MDS subtree export
... John Spray
01:00 PM Backport #17697 (In Progress): jewel: MDS long-time blocked ops. ceph-fuse locks up with getattr ...
Loïc Dachary
12:58 PM Backport #17706 (In Progress): jewel: multimds: mds entering up:replay and processing down mds ab...
Loïc Dachary
12:57 PM Backport #17720 (In Progress): jewel: MDS: false "failing to respond to cache pressure" warning
Loïc Dachary
11:17 AM Feature #17834 (Resolved): MDS Balancer overrides

(Discussion from November 2016 meeting https://docs.google.com/a/redhat.com/document/d/11-O8uHWmOCqyc2_xGIukL0myjnq...
John Spray
12:55 AM Bug #17832 (Resolved): "[ FAILED ] LibCephFS.InterProcessLocking" in jewel v10.2.4
Run: http://pulpito.front.sepia.ceph.com/yuriw-2016-11-08_16:23:29-fs-jewel-distro-basic-smithi/
Jobs: 532365, 53236...
Yuri Weinstein

11/08/2016

10:50 PM Bug #16640: libcephfs: Java bindings failing to load on CentOS
And https://github.com/ceph/ceph-qa-suite/pull/1224 Greg Farnum
09:25 PM Bug #17828 (Need More Info): libceph setxattr returns 0 without setting the attr
Using the jewel libcephfs python bindings I ran the following code snippet:... Chris Holcombe
04:40 PM Bug #17193: truncate can cause unflushed snapshot data lose
This appears to be intermittent: after ~10 runs without a failure, it's back.
http://qa-proxy.ceph.com/teuthology/...
John Spray
02:06 PM Bug #17819: MDS crashed while performing snapshot creation and deletion in a loop
016-11-08T06:04:23.498 INFO:tasks.ceph.mds.a.host.stdout:starting mds.a at :/0
2016-11-08T06:04:23.498 INFO:tasks.ce...
Ramakrishnan Periyasamy
01:22 PM Bug #17819: MDS crashed while performing snapshot creation and deletion in a loop
The log ought to display exactly what assert in check_cache failed -- can you post that too? Greg Farnum
11:17 AM Bug #17819 (Can't reproduce): MDS crashed while performing snapshot creation and deletion in a loop
--- begin dump of recent events ---
-1> 2016-11-08 11:04:27.309191 7f4a57c38700 1 -- 10.8.128.73:6812/8588 <== ...
Ramakrishnan Periyasamy
01:41 PM Bug #17522 (Resolved): ceph_readdirplus_r does not acquire caps before sending back attributes
Fixed as of commit f7028e48936cb70f623fe7ba408708a403e60270. This requires moving to libcephfs2, however. Jeff Layton
01:37 PM Feature #16419 (Resolved): add statx-like interface to libcephfs
This should now be done for the most part. New ceph_statx interfaces have been merged into kraken. This change is not... Jeff Layton
01:36 PM Feature #3314 (Resolved): client: client interfaces should take a set of group ids
Now merged as of commit c078dc0daa9c50621f7252559a97bfb191244ca1. libcephfs has been revved to version 2, which is no... Jeff Layton

11/07/2016

02:29 PM Bug #17799: cephfs-data-scan: doesn't know how to handle files with pool_namespace layouts
the backtrace objects are always in default namespace. I think data scan tool can't calculate correct size for files ... Zheng Yan
02:25 PM Bug #17801 (Fix Under Review): Cleanly reject "session evict" command when in replay
https://github.com/ceph/ceph/pull/11813 Zheng Yan
11:43 AM Bug #17193 (Resolved): truncate can cause unflushed snapshot data lose
This is no longer failing when running against the testing kernel. John Spray

11/04/2016

02:32 PM Bug #17531 (Pending Backport): mds fails to respawn if executable has changed
Patrick Donnelly
02:13 PM Bug #17801 (Resolved): Cleanly reject "session evict" command when in replay

Currently we crash like this (from ceph-users):...
John Spray
02:07 PM Fix #15134 (Fix Under Review): multifs: test case exercising mds_thrash for multiple filesystems
PR adding support to mds_thrash.py: https://github.com/ceph/ceph-qa-suite/pull/1175
Need to check if we have a tes...
Patrick Donnelly
02:06 PM Feature #10792 (In Progress): qa: enable thrasher for MDS cluster size (vary max_mds)
Pre-requisite PR: https://github.com/ceph/ceph-qa-suite/pull/1175
multimds testing with the thrasher will be added...
Patrick Donnelly
01:41 PM Bug #17800 (Resolved): ceph_volume_client.py : Error: Can't handle arrays of non-strings
When using Ceph (python-cephfs-10.2.3+git.1475228057.755cf99 , SLE12SP2) together with OpenStack Manila and trying to... Thomas Bechtold
01:18 PM Bug #17799 (New): cephfs-data-scan: doesn't know how to handle files with pool_namespace layouts

Not actually sure how we currently behave.
Do we see the data objects and inject files with incorrect layouts? ...
John Spray
01:06 PM Bug #17798 (Resolved): Clients without pool-changing caps shouldn't be allowed to change pool_nam...

The purpose of the 'p' flag in MDS client auth caps is to enable creating clients that cannot set the pool part of ...
John Spray
01:01 PM Bug #17797 (Fix Under Review): rmxattr on ceph.[dir|file].layout.pool_namespace doesn't work
https://github.com/ceph/ceph/pull/11783 John Spray
11:05 AM Bug #17797 (Resolved): rmxattr on ceph.[dir|file].layout.pool_namespace doesn't work

Currently it's obvious how to set the namespace but much less so how to clear it (i.e. revert to default namespace)...
John Spray

11/02/2016

08:22 AM Bug #17747 (Fix Under Review): ceph-mds: remove "--journal-check" help text
*master PR*: https://github.com/ceph/ceph/pull/11739 Nathan Cutler

11/01/2016

05:16 PM Bug #17563: extremely slow ceph_fsync calls
PR to fix the userland side of things is here:
https://github.com/ceph/ceph/pull/11710
Jeff Layton
01:10 PM Bug #17563: extremely slow ceph_fsync calls
It looks like ceph-fuse has the same problem with fsync. Here's a POSIX API reproducer that shows similar improvement... Jeff Layton
03:53 PM Bug #17115 (Resolved): kernel panic when running IO with cephfs and resource pool becomes full
http://tracker.ceph.com/issues/17770 Greg Farnum
03:53 PM Feature #17770 (New): qa: test kernel client against "full" pools/filesystems
We test the uclient against full pools to validate behavior. We discovered in #17115 that we don't for the kernel cli... Greg Farnum
03:49 PM Bug #17240 (Closed): inode_permission error with kclient when running client IO with recovery ope...
When the RADOS cluster has blocked IO, the kernel client is going to have blocked IO. That's just life. :( Greg Farnum
03:37 PM Bug #17656 (Need More Info): cephfs: high concurrent causing slow request
Just from the description it sounds like we're backing up while the MDS purges deleted files from RADOS. You can adju... Greg Farnum
03:08 PM Bug #7750: Attempting to mount a kNFS export of a sub-directory of a CephFS filesystem fails with...
NFS-Utils-1.2.8
NFS server on Ubuntu 16.04
Kostya Velychkovsky
03:06 PM Bug #7750: Attempting to mount a kNFS export of a sub-directory of a CephFS filesystem fails with...
I still can't reproduce it. This does not seem like kernel issue. which version of nfs-utils do you use? Zheng Yan
12:13 PM Bug #7750: Attempting to mount a kNFS export of a sub-directory of a CephFS filesystem fails with...
I've reproduced this bug in CephFS jewel. Can't mount via NFS NON root CephFS dir.
Get error on client 'Stale NFS fi...
Kostya Velychkovsky
09:24 AM Bug #17747: ceph-mds: remove "--journal-check" help text
You should use the cephfs-journal-tool for dealing with this stuff now. The journal-check oneshot-replay mode got rem... Greg Farnum
08:39 AM Bug #17620: Data Integrity Issue with kernel client vs fuse client
I'm not familiar with docker,how did the jobs failed.(what's the symptom) Zheng Yan

10/31/2016

06:29 PM Bug #17620: Data Integrity Issue with kernel client vs fuse client
Thanks for that. I'm working on getting to the point where I can test that.
In the meantime further testing has i...
Aaron Bassett
05:05 PM Bug #4212: mds: open_snap_parents isn't called all the times it needs to be
Having all past_parents open is hard because of dir renames. Say you do
/a/b/c
and snapshot /a/b, then rename...
Sage Weil
02:06 PM Bug #17747: ceph-mds: remove "--journal-check" help text
running ceph-mds -d -i ceph --journal-check 0 gives me following output:
--conf/-c FILE read configuration fr...
François Blondel
09:51 AM Bug #17747 (Resolved): ceph-mds: remove "--journal-check" help text
Hi,
running ceph-mds -d -i ceph --journal-check 0 gives me following input:
--conf/-c FILE read configurat...
François Blondel

10/28/2016

05:31 PM Bug #17563: extremely slow ceph_fsync calls
Ok, thanks. That makes sense. I've got a patchset that works as a PoC, but it's pretty ugly and could use some cleanu... Jeff Layton
11:23 AM Bug #17548 (Resolved): should userland ceph_llseek do permission checking?
Fixed in commit db2e7e0811679b4c284e105536ebf3327cc02ffc. Jeff Layton
10:35 AM Bug #17731 (Can't reproduce): MDS stuck in stopping with other rank's strays

Kraken v11.0.2
Seen on a max_mds=2 MDS cluster with a fuse client doing an rsync -av --delete on a dir that incl...
John Spray

10/27/2016

07:50 AM Backport #17720 (Resolved): jewel: MDS: false "failing to respond to cache pressure" warning
https://github.com/ceph/ceph/pull/11856 Loïc Dachary

10/26/2016

01:37 PM Bug #17562 (Resolved): backtrace check fails when scrubbing directory created by fsstress
Zheng Yan
12:44 PM Bug #17716 (Resolved): MDS: false "failing to respond to cache pressure" warning
Creating this ticket for a PR that went in without a ticket on it so that we can backport.
https://github.com/ceph...
John Spray
07:56 AM Backport #17705: jewel: ceph_volume_client: recovery of partial auth update is broken
h3. previous description
I run into the following traceback when the volume_client tries
to recover from partia...
Loïc Dachary
07:18 AM Backport #17705: jewel: ceph_volume_client: recovery of partial auth update is broken
https://github.com/ceph/ceph/pull/11656
https://github.com/ceph/ceph-qa-suite/pull/1221
Ramana Raja
05:21 AM Backport #17705 (In Progress): jewel: ceph_volume_client: recovery of partial auth update is broken
Ramana Raja
05:20 AM Backport #17705 (Resolved): jewel: ceph_volume_client: recovery of partial auth update is broken
https://github.com/ceph/ceph/pull/11656 Ramana Raja
07:07 AM Backport #17706 (Resolved): jewel: multimds: mds entering up:replay and processing down mds aborts
https://github.com/ceph/ceph/pull/11857 Loïc Dachary
05:16 AM Bug #17216 (Pending Backport): ceph_volume_client: recovery of partial auth update is broken
Ramana Raja

10/25/2016

07:53 PM Bug #17670 (Pending Backport): multimds: mds entering up:replay and processing down mds aborts
Patrick Donnelly
01:46 PM Backport #17697 (Resolved): jewel: MDS long-time blocked ops. ceph-fuse locks up with getattr of ...
https://github.com/ceph/ceph/pull/11858 Loïc Dachary
01:42 PM Bug #17620: Data Integrity Issue with kernel client vs fuse client
I have fixed a bug that may cause this issue. could you have a try https://github.com/ceph/ceph-client/commits/testing Zheng Yan
11:16 AM Bug #17275 (Pending Backport): MDS long-time blocked ops. ceph-fuse locks up with getattr of file
John Spray
11:12 AM Bug #17691 (Resolved): bad backtrace on inode
Merged https://github.com/ceph/ceph-qa-suite/pull/1218 John Spray
10:46 AM Bug #17691 (In Progress): bad backtrace on inode
Sorry, that's happening because I merged my backtrace repair PR before the ceph-qa-suite piece, so the log message is... John Spray
03:08 AM Bug #17691 (Resolved): bad backtrace on inode
Seen this in testing:
http://pulpito.ceph.com/pdonnell-2016-10-25_02:25:11-fs:recovery-master---basic-mira/493889/...
Patrick Donnelly

10/24/2016

05:49 PM Bug #17563: extremely slow ceph_fsync calls
The client is waiting for an ack to a cap *flush*, not to get caps granted. Usually flushes happen asynchronously (ju... Greg Farnum
05:36 PM Bug #17563: extremely slow ceph_fsync calls
OTOH...do we even need a flag at all here? Under what circumstances is it beneficial to delay granting and recalling ... Jeff Layton
01:45 PM Feature #17639 (Resolved): Repair file backtraces during forward scrub
Greg Farnum

10/22/2016

10:52 PM Bug #17670 (Fix Under Review): multimds: mds entering up:replay and processing down mds aborts
https://github.com/ceph/ceph/pull/11611 Patrick Donnelly
10:46 PM Bug #17670 (Resolved): multimds: mds entering up:replay and processing down mds aborts
... Patrick Donnelly

10/21/2016

09:48 PM Feature #17249 (Resolved): cephfs tool for finding files that use named PGs
Greg Farnum
01:58 PM Bug #17620: Data Integrity Issue with kernel client vs fuse client
I suspect the zeros are from stale page cache data. If you encounter the issue again, please drop the kernel page cac... Zheng Yan
01:39 PM Bug #17275 (Fix Under Review): MDS long-time blocked ops. ceph-fuse locks up with getattr of file
https://github.com/ceph/ceph/pull/11593 Zheng Yan
01:21 PM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
created http://tracker.ceph.com/issues/17660 Henrik Korkuc
01:03 PM Bug #17275: MDS long-time blocked ops. ceph-fuse locks up with getattr of file
got a getattr long lock, but this time client has long-running objecter requests. I will be filling a ticket for that... Henrik Korkuc
05:33 AM Bug #17656: cephfs: high concurrent causing slow request
William, could you describe the issue from the Ceph's perspective in detail? Kefu Chai
01:46 AM Bug #17656 (Need More Info): cephfs: high concurrent causing slow request
background:
we use cephfs as CDN backend, when CDN vendor prefetch video files in cephfs, it will cause high concu...
william sheng

10/20/2016

08:44 PM Bug #17620: Data Integrity Issue with kernel client vs fuse client
On further testing, it seems I can only make this happen when doing multi-threaded downloads from multiple hosts. I h... Aaron Bassett
04:39 PM Bug #17591 (Resolved): Samba failures: smbtorture, dbench.sh, fsstress.sh
Greg Farnum
02:53 PM Bug #17636 (Resolved): MDS crash on creating: interval_set<inodeno_t> segfaults with new encoding
Kefu Chai
10:31 AM Bug #17636 (Fix Under Review): MDS crash on creating: interval_set<inodeno_t> segfaults with new ...
https://github.com/ceph/ceph/pull/11577 John Spray
10:23 AM Bug #17636: MDS crash on creating: interval_set<inodeno_t> segfaults with new encoding
Should also note that the backtraces are like:... John Spray
09:35 AM Bug #17636 (Resolved): MDS crash on creating: interval_set<inodeno_t> segfaults with new encoding

The map bound_encode method does this:...
John Spray
02:43 PM Feature #17643 (New): Verify and repair layout xattr during forward scrub
We should do exactly the same thing we do for backtrace, but for the layout that we write into 'layout' xattr. Repai... John Spray
02:36 PM Feature #17639 (Fix Under Review): Repair file backtraces during forward scrub
https://github.com/ceph/ceph/pull/11578
https://github.com/ceph/ceph-qa-suite/pull/1218
John Spray
02:31 PM Feature #17639 (Resolved): Repair file backtraces during forward scrub

Instead of just recording that a backtrace was missing or invalid, repair it in the same way we do in verify_diri_b...
John Spray
02:22 PM Feature #15619 (Resolved): Repair InoTable during forward scrub
... John Spray
10:38 AM Bug #17216 (Resolved): ceph_volume_client: recovery of partial auth update is broken
Loïc Dachary
08:32 AM Bug #17216 (Pending Backport): ceph_volume_client: recovery of partial auth update is broken
John Spray
09:35 AM Bug #17606 (Need More Info): multimds: assertion failure during directory migration
debug level of the log is too low. This is dup of a long-standing bug http://tracker.ceph.com/issues/8405. I can't fi... Zheng Yan
08:43 AM Bug #17629 (Duplicate): ceph_volume_client: TypeError with wrong argument count
The fix for this just merged in master yesterday.
Python says "2 args, 3 given" because the 3 is counting the "sel...
John Spray
12:33 AM Bug #17629 (Duplicate): ceph_volume_client: TypeError with wrong argument count
From: http://pulpito.ceph.com/pdonnell-2016-10-19_23:26:58-fs:recovery-master---basic-mira/486359/... Patrick Donnelly
12:43 AM Bug #14195: test_full_fclose fails (tasks.cephfs.test_full.TestClusterFull)
I just ran across this in http://pulpito.ceph.com/pdonnell-2016-10-19_23:26:58-fs:recovery-master---basic-mira/486355... Patrick Donnelly

10/19/2016

05:39 PM Bug #17621 (Rejected): Hadoop does a bad job closing files and we end up holding too many caps
(Just scribbling this in the tracker for a record.)
We've had several reports on the mailing list of Hadoop client...
Greg Farnum
05:32 PM Feature #12141: cephfs-data-scan: File size correction from backward scan
Unassigning as I believe this was dropped? Greg Farnum
04:48 PM Bug #17620: Data Integrity Issue with kernel client vs fuse client
No I only see:
Oct 19 15:34:52 phx-r2-r3-comp5 kernel: [683656.055247] libceph: client259809 fsid <redacted>
Oc...
Aaron Bassett
04:28 PM Bug #17620: Data Integrity Issue with kernel client vs fuse client
Were there any ceph related kernel message on the client hosts? (such as "ceph: mds0 caps stale") Zheng Yan
03:49 PM Bug #17620 (Resolved): Data Integrity Issue with kernel client vs fuse client
ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
kernel: 4.4.0-42-generic #62~14.04.1-Ubuntu SMP
I ...
Aaron Bassett
03:07 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
The cephx user, not the *nix user. I'd be fine with the other stuff — we already run (all our?) nightlies with our ow... Greg Farnum
03:04 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
I don't quite understand. The reproducer that does the ceph_ll_mkdir does it as an unprivileged user, so why is it al... Jeff Layton
02:46 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
There we go, that case would concern me.
Contrary to our discussion in standup today, I think MDSAuthCaps::is_capa...
Greg Farnum
12:25 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
No, the order has already been established, because each operation _really_ starts with the permission check. But ok,... Jeff Layton
04:08 AM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
No, I mean: I don't think this counts as a bug. To get in this situation we have two outstanding requests:
1) chmod
...
Greg Farnum
02:54 PM Backport #17617 (Resolved): jewel: [cephfs] fuse client crash when adding a new osd
https://github.com/ceph/ceph/pull/11860 Loïc Dachary
02:53 PM Backport #17615 (Resolved): jewel: mds: false "failing to respond to cache pressure" warning
https://github.com/ceph/ceph/pull/11861 Loïc Dachary
01:14 PM Bug #17606: multimds: assertion failure during directory migration
Fixed! Patrick Donnelly
03:49 AM Bug #17606: multimds: assertion failure during directory migration
I can't access /ceph/cephfs-perf/tmp/2016-10-18/ceph-mds.ceph-mds0.log. please change permission
Zheng Yan
11:56 AM Feature #16016 (Resolved): Populate DamageTable from forward scrub
John Spray
10:24 AM Bug #17270 (Pending Backport): [cephfs] fuse client crash when adding a new osd
John Spray
10:21 AM Bug #17611 (Resolved): mds: false "failing to respond to cache pressure" warning

Creating this ticket retroactively for a ticketless PR that needs backporting: https://github.com/ceph/ceph/pull/11373
John Spray
07:04 AM Bug #17172 (Resolved): Failure in snaptest-git-ceph.sh
Loïc Dachary

10/18/2016

11:35 PM Feature #17604: MDSMonitor: raise health warning when there are no standbys but there should be
I'll take this one. Patrick Donnelly
11:44 AM Feature #17604 (Resolved): MDSMonitor: raise health warning when there are no standbys but there ...

We could have a per-filesystem setting for how many standbys it wants to be available. Then if the available set o...
John Spray
09:22 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
FWIW, "little holes" like this are worse in that they'll often strike when things are really redlined. If the race wi... Jeff Layton
09:02 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
The client has to send out both a cap drop message and the user's request.
Those will be ordered, and if the cap dro...
Greg Farnum
08:34 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
Will they be ordered after the reply comes in, or just ordered wrt how they go out onto the wire?
IOW, is there so...
Jeff Layton
08:13 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
Luckily the messages are ordered so once a client sends off a request to the MDS, any caps it drops will be ordered a... Greg Farnum
07:22 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
I don't think that's sufficient.
My understanding is that once you drop the client_lock mutex (as the client does ...
Jeff Layton
06:54 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
The MDS does enforce cephx-based permissions, but unless you've deliberately constructed a limited cephx key it's goi... Greg Farnum
04:27 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
Ok, and the reason setting fuse_default_permissions didn't work for me yesterday was due to a typo. When I set that o... Jeff Layton
03:26 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
Ok, here's a "backported" test program. Both mkdirs still succeed, but when I set fuse_default_permissions to false o... Jeff Layton
01:02 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
No, I don't think so. Here is ceph_userperm_new in my tree:... Jeff Layton
03:22 PM Bug #17606 (Resolved): multimds: assertion failure during directory migration
This is from an experiment on Linode with 9 active MDS, 32 OSD, and 128 clients building the kernel.... Patrick Donnelly
09:47 AM Bug #17591 (Fix Under Review): Samba failures: smbtorture, dbench.sh, fsstress.sh
https://github.com/ceph/ceph/pull/11526 Zheng Yan
09:23 AM Bug #17591 (In Progress): Samba failures: smbtorture, dbench.sh, fsstress.sh
samba daemon changes its own euid/egid before calling libcephfs functions. But libcephfs setup a default_perm on init... Zheng Yan
02:44 AM Bug #17563: extremely slow ceph_fsync calls
No method so far, need to extend the MClientCaps message Zheng Yan

10/17/2016

09:45 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
And at a guess, you might be using the uid,gid I presume you have there (0,0) to create folders with the ownership sp... Greg Farnum
09:39 PM Bug #17594: cephfs: permission checking not working (MDS should enforce POSIX permissions)
Quick sanity check: is fuse_default_permissions set to false? Otherwise the assumption is that fuse (or nfs, samba) ... John Spray
09:02 PM Bug #17594 (In Progress): cephfs: permission checking not working (MDS should enforce POSIX permi...
Frank Filz noticed that cephfs doesn't seem to be enforcing permissions properly, particularly on mkdir.
This test...
Jeff Layton
05:06 PM Bug #17591: Samba failures: smbtorture, dbench.sh, fsstress.sh
Yep, still failing on master. John Spray
02:23 PM Bug #17591: Samba failures: smbtorture, dbench.sh, fsstress.sh
Rerun of failures on master (a1fd258)
http://pulpito.ceph.com/jspray-2016-10-17_14:22:43-samba-master---basic-mira/
John Spray
02:19 PM Bug #17591: Samba failures: smbtorture, dbench.sh, fsstress.sh
Failures are on a4ce1f56c1e22d4d3c190403339d5082d66fc89c, last passes were on cfc0a16e048a5868b137e2b7b89c7ae105218bc... John Spray
02:16 PM Bug #17591 (Resolved): Samba failures: smbtorture, dbench.sh, fsstress.sh

Samba tests were some of the most stable, they are suddenly failing on master.
http://pulpito.ceph.com/teutholog...
John Spray
03:14 PM Bug #17286 (Resolved): Failure in dirfrag.sh
Loïc Dachary
03:14 PM Bug #17271 (Resolved): Failure in snaptest-git-ceph.sh
Loïc Dachary
03:14 PM Bug #17253 (Resolved): Crash in Client::_invalidate_kernel_dcache when reconnecting during unmount
Loïc Dachary
03:14 PM Bug #17173 (Resolved): Duplicate damage table entries
Loïc Dachary
03:14 PM Feature #16973 (Resolved): Log path as well as ino when detecting metadata damage
Loïc Dachary
03:14 PM Bug #16668 (Resolved): client: nlink count is not maintained correctly
Loïc Dachary
02:38 PM Bug #17547 (Resolved): ceph-fuse 10.2.3 segfault
John Spray
02:32 PM Bug #17547: ceph-fuse 10.2.3 segfault
looks good so far. I guess we can close this. Thanks! Henrik Korkuc
01:42 PM Bug #17547: ceph-fuse 10.2.3 segfault
Henrik: did that build fix the issue for you? John Spray
12:31 PM Backport #16946 (Resolved): jewel: client: nlink count is not maintained correctly
Loïc Dachary
12:30 PM Backport #17244 (Resolved): jewel: Failure in snaptest-git-ceph.sh
Loïc Dachary
12:30 PM Backport #17246 (Resolved): jewel: Log path as well as ino when detecting metadata damage
Loïc Dachary
12:30 PM Backport #17474 (Resolved): jewel: Failure in dirfrag.sh
Loïc Dachary
12:30 PM Backport #17476 (Resolved): jewel: Failure in snaptest-git-ceph.sh
Loïc Dachary
12:30 PM Backport #17477 (Resolved): jewel: Crash in Client::_invalidate_kernel_dcache when reconnecting d...
Loïc Dachary
12:30 PM Backport #17479 (Resolved): jewel: Duplicate damage table entries
Loïc Dachary
09:06 AM Bug #17562 (Fix Under Review): backtrace check fails when scrubbing directory created by fsstress
https://github.com/ceph/ceph/pull/11517/ Zheng Yan

10/16/2016

10:45 AM Bug #17563: extremely slow ceph_fsync calls
Yep, exactly. I noticed this while running the cthon04 testsuite against it. It copies some source files into place a... Jeff Layton

10/15/2016

10:06 PM Bug #17564: close race window when handling writes on a file descriptor opened with O_APPEND
Good point. Yeah, a retry loop may be the best we can do in that case. I'll have to read up on rados asserts to make ... Jeff Layton
01:16 AM Bug #17564: close race window when handling writes on a file descriptor opened with O_APPEND
Are we *allowed* to fail writes like that? :/
It doesn't actually close the race in all cases, but for sane use it...
Greg Farnum
09:39 AM Backport #17582 (Resolved): jewel: monitor assertion failure when deactivating mds in (invalid) f...
https://github.com/ceph/ceph/pull/11862 Nathan Cutler
09:34 AM Bug #16610 (Resolved): Jewel: segfault in ObjectCacher::FlusherThread
Nathan Cutler
08:28 AM Bug #17468: CephFs: IO Pauses for more than a 40 seconds, while running write intensive IOs
Further analysis of I/O drop points to correlation of MDS config parameter mds_log_max_segments and I/O drop. Using d... Vishal Kanaujia

10/14/2016

11:39 PM Bug #17563: extremely slow ceph_fsync calls
Oh right, this doesn't even have a way to ask the MDS for immediate service — notice how it just puts already-sent re... Greg Farnum
11:27 PM Bug #17562: backtrace check fails when scrubbing directory created by fsstress
It looks to me like compare() is doing the right thing, but that the scrubbing code is declaring an error if the on-d... Greg Farnum
11:19 PM Bug #17562: backtrace check fails when scrubbing directory created by fsstress
Well, in inode_backtrace_t::compare() we're trying to determine and return three different things, as described in th... Greg Farnum
08:06 PM Backport #17131 (Resolved): jewel: Jewel: segfault in ObjectCacher::FlusherThread
Loïc Dachary
05:06 PM Bug #16255 (Resolved): ceph-create-keys: sometimes blocks forever if mds "allow" is set
Nathan Cutler
02:26 PM Backport #17347 (Resolved): jewel: ceph-create-keys: sometimes blocks forever if mds "allow" is set
Sage Weil
10:04 AM Bug #17518 (Pending Backport): monitor assertion failure when deactivating mds in (invalid) fscid 0
John Spray
07:10 AM Bug #17525 (Duplicate): "[ FAILED ] LibCephFS.ThreesomeInterProcessRecordLocking" in smoke
Loïc Dachary
07:03 AM Bug #17525: "[ FAILED ] LibCephFS.ThreesomeInterProcessRecordLocking" in smoke
In jewel http://pulpito.ceph.com/loic-2016-10-13_17:04:37-fs-jewel-backports-distro-basic-smithi/471638/... Loïc Dachary
 

Also available in: Atom