Project

General

Profile

Activity

From 07/23/2017 to 08/21/2017

08/21/2017

09:41 PM Backport #21047: luminous: mds,mgr: add 'is_valid=false' when failed to parse caps
Nathan, the fix for http://tracker.ceph.com/issues/21027 should also make it into Luminous with this backport. I'm go... Patrick Donnelly
04:13 PM Backport #21047 (Resolved): luminous: mds,mgr: add 'is_valid=false' when failed to parse caps
Nathan Cutler
08:53 PM Bug #21058 (New): mds: remove UNIX file permissions binary dependency
The MDS has various file permission/type bits pulled from UNIX headers. These could be different depending on what sy... Patrick Donnelly
07:46 PM Bug #20988: client: dual client segfault with racing ceph_shutdown
Moving this to main "Ceph" project as it looks more like a problem in the AdminSocket code. The thing seems to mainly... Jeff Layton
06:13 PM Bug #20988: client: dual client segfault with racing ceph_shutdown
Here's a testcase that seems to trigger it fairly reliably. You may have to run it a few times to get it to crash but... Jeff Layton
05:03 PM Bug #20988: client: dual client segfault with racing ceph_shutdown
Correct. I'll see if I can roll up a testcase for this when I get a few mins. Jeff Layton
04:50 PM Bug #20988: client: dual client segfault with racing ceph_shutdown
Jeff, just confirming this bug is with two client instances and not one instance with two threads? Patrick Donnelly
02:59 PM Bug #21007 (Resolved): The ceph fs set mds_max command must be udpated
Patch merged: https://github.com/ceph/ceph/pull/17044 Bara Ancincova
01:49 PM Feature #18490: client: implement delegation support in userland cephfs
The latest set has timeout support that basically does a client->unmount() on the thing. With the patches for this bu... Jeff Layton
11:01 AM Feature #18490: client: implement delegation support in userland cephfs
For the clean-ish shutdown case, it would be neat to have a common code path with the -EBLACKLISTED handling (see Cli... John Spray
01:43 PM Bug #21025: racy is_mounted() checks in libcephfs
PR is here:
https://github.com/ceph/ceph/pull/17095
Jeff Layton
01:40 PM Bug #21004 (Fix Under Review): fs: client/mds has wrong check to clear S_ISGID on chown
Patrick Donnelly
11:41 AM Bug #20535: mds segmentation fault ceph_lock_state_t::get_overlapping_locks
Reporting in, I've had the first incident after the version upgrade.
My active MDS had committed suicide due to "d...
Webert Lima
09:03 AM Bug #20892: qa: FS_DEGRADED spurious health warnings in some sub-suites
kcephfs suite has similar issue:
http://pulpito.ceph.com/teuthology-2017-08-19_05:20:01-kcephfs-luminous-testing-bas...
Zheng Yan

08/17/2017

06:41 PM Feature #19109 (Resolved): Use data pool's 'df' for statfs instead of global stats, if there is o...
Oh, oops. I forgot I merged this into luminous. Thanks Doug. Patrick Donnelly
06:22 PM Feature #19109: Use data pool's 'df' for statfs instead of global stats, if there is only one dat...
There's no need to wait for the kernel client since the message encoding is versioned. This has already been merged i... Douglas Fuller
06:14 PM Feature #19109 (Pending Backport): Use data pool's 'df' for statfs instead of global stats, if th...
Waiting for
https://github.com/ceph/ceph-client/commit/b7f94d6a95dfe2399476de1e0d0a7c15c01611d0
to be merged up...
Patrick Donnelly
03:15 PM Bug #21025 (Resolved): racy is_mounted() checks in libcephfs
libcephfs.cc has a bunch of is_mounted checks like this in it:... Jeff Layton
03:02 PM Feature #18490: client: implement delegation support in userland cephfs
Patrick Donnelly wrote:
> here "client" means Ganesha. What about how does Ganesha handle its client not releasing...
Jeff Layton

08/16/2017

11:08 PM Feature #18490: client: implement delegation support in userland cephfs
Jeff Layton wrote:
> The main work to be done at this point is handling clients that don't return the delegation in ...
Patrick Donnelly
12:57 PM Feature #18490: client: implement delegation support in userland cephfs
I've been working on this for the last week or so, so this is a good place to pause and provide an update:
I have ...
Jeff Layton
09:47 PM Bug #20990 (Pending Backport): mds,mgr: add 'is_valid=false' when failed to parse caps
Patrick Donnelly
06:48 PM Bug #21014 (Resolved): fs: reduce number of helper debug messages at level 5 for client
I think we want just the inital log message for each ll_ operation and not the helpers (e.g. _rmdir).
See: http://...
Patrick Donnelly
06:23 PM Bug #21004: fs: client/mds has wrong check to clear S_ISGID on chown
https://github.com/ceph/ceph/pull/17053 Patrick Donnelly
02:58 AM Bug #21004: fs: client/mds has wrong check to clear S_ISGID on chown
The logical is from kernel_src/fs/attr.c... Zheng Yan
09:54 AM Bug #21007 (Fix Under Review): The ceph fs set mds_max command must be udpated
Created this PR: https://github.com/ceph/ceph/pull/17044 Bara Ancincova
08:14 AM Bug #21007 (Resolved): The ceph fs set mds_max command must be udpated
Copied from Bugzilla:
Ramakrishnan Periyasamy 2017-08-16 09:14:21 CEST
Description of problem:
Upstream docume...
Bara Ancincova
01:20 AM Bug #21002 (Closed): set option "mon_pool_quota_warn_threshold && mon_pool_quota_crit_threshold" ...
xie xingguo

08/15/2017

09:58 PM Bug #21004: fs: client/mds has wrong check to clear S_ISGID on chown
Well actually the test above fails because the chown was a no-op due to an earlier chown failure. In any case I've fo... Patrick Donnelly
09:41 PM Bug #21004 (In Progress): fs: client/mds has wrong check to clear S_ISGID on chown
Patrick Donnelly
09:41 PM Bug #21004 (Resolved): fs: client/mds has wrong check to clear S_ISGID on chown
Reported in: https://bugzilla.redhat.com/show_bug.cgi?id=1480182
This causes the failure in test 88 from https://b...
Patrick Donnelly
09:12 AM Bug #21002: set option "mon_pool_quota_warn_threshold && mon_pool_quota_crit_threshold" fail
please close it
no err
huanwen ren
08:59 AM Bug #21002: set option "mon_pool_quota_warn_threshold && mon_pool_quota_crit_threshold" fail
... huanwen ren
08:58 AM Bug #21002 (Closed): set option "mon_pool_quota_warn_threshold && mon_pool_quota_crit_threshold" ...
err:... huanwen ren

08/13/2017

03:28 AM Bug #20990 (Resolved): mds,mgr: add 'is_valid=false' when failed to parse caps
mds,mgr: add 'is_valid=false' when failed to parse caps.
Backport needed for the PRs:
https://github.com/ceph/cep...
Jos Collin

08/12/2017

01:44 PM Bug #20988 (Resolved): client: dual client segfault with racing ceph_shutdown
I have a testcase that I'm working on that has two threads, each with their own ceph_mount_info. If those threads end... Jeff Layton

08/11/2017

03:18 AM Bug #20938: CephFS: concurrent access to file from multiple nodes blocks for seconds
... Zheng Yan

08/10/2017

02:27 PM Bug #20938: CephFS: concurrent access to file from multiple nodes blocks for seconds
Ahh yeah, I remember seeing that in there a while back. I guess the danger is that we can end up instantiating an ino... Jeff Layton
02:29 AM Bug #20938: CephFS: concurrent access to file from multiple nodes blocks for seconds
what worry me is comment in fuse_lowlevel.h... Zheng Yan
12:27 PM Backport #20972: jewel ceph-fuse segfaults at mount time, assert in ceph::log::Log::stop
Thanks, Dan! Jewel backport staged: https://github.com/ceph/ceph/pull/16963 Nathan Cutler
12:26 PM Backport #20972 (In Progress): jewel ceph-fuse segfaults at mount time, assert in ceph::log::Log:...
Nathan Cutler
12:25 PM Backport #20972: jewel ceph-fuse segfaults at mount time, assert in ceph::log::Log::stop
h3. description
10.2.9 instroduces a regression where ceph-fuse will segfault at mount time because of an attempt ...
Nathan Cutler
11:52 AM Backport #20972: jewel ceph-fuse segfaults at mount time, assert in ceph::log::Log::stop
Confirmed that 10.2.9 plus cbf18b1d80d214e4203e88637acf4b0a0a201ee7 does not segfault. Dan van der Ster
09:04 AM Backport #20972 (Resolved): jewel ceph-fuse segfaults at mount time, assert in ceph::log::Log::stop
https://github.com/ceph/ceph/pull/16963 Dan van der Ster
12:24 PM Bug #18157 (Pending Backport): ceph-fuse segfaults on daemonize
Nathan Cutler
09:42 AM Bug #20945: get_quota_root sends lookupname op for every buffered write
Could you also please add the luminous backport tag for this? Dan van der Ster
09:23 AM Bug #20945: get_quota_root sends lookupname op for every buffered write
https://github.com/ceph/ceph/pull/16959 Dan van der Ster
02:08 AM Bug #20945: get_quota_root sends lookupname op for every buffered write
lookupname is for following case:
directory /a /b have non-default quota
client A is writing /a/file
client ...
Zheng Yan

08/09/2017

04:39 PM Bug #20945: get_quota_root sends lookupname op for every buffered write
This seems to work... Dan van der Ster
10:45 AM Bug #20945: get_quota_root sends lookupname op for every buffered write
Thanks. You're right. Here's the trivial reproducer:... Dan van der Ster
08:46 AM Bug #20945: get_quota_root sends lookupname op for every buffered write
enabling quota and writing to unlinked file can produce this easily. get_quota_root() uses dentry in dn_set if it has... Zheng Yan
01:57 PM Bug #20938: CephFS: concurrent access to file from multiple nodes blocks for seconds
FUSE is the only caller of ->ll_lookup so a simpler fix might be to just change the mask field to 0 in the _lookup ca... Jeff Layton
10:39 AM Bug #20938: CephFS: concurrent access to file from multiple nodes blocks for seconds
this slowness is due to limitation of fuse API. The attached patch is a workaround. (not 100% sure it doesn't break a... Zheng Yan

08/08/2017

05:35 PM Feature #19109: Use data pool's 'df' for statfs instead of global stats, if there is only one dat...
Partially resolved by: https://github.com/ceph/ceph/commit/eabe6626141df3f1b253c880aa6cb852c8b7ac1d Patrick Donnelly
02:25 PM Bug #20938: CephFS: concurrent access to file from multiple nodes blocks for seconds
I'm only running the fuse client. I see the problem both on Jewel (10.2.9 servers + fuse client) and on Luminous RC ... Andras Pataki
02:22 PM Bug #20938: CephFS: concurrent access to file from multiple nodes blocks for seconds
I tried on latest Luminous RC + 4.12 kernel client. I got about 7000 opens/second in two nodes read-write case.
did ...
Zheng Yan
02:00 PM Bug #20945: get_quota_root sends lookupname op for every buffered write
Our user confirmed that without client-quota their job finishes quickly:... Dan van der Ster
01:46 PM Bug #20945 (Resolved): get_quota_root sends lookupname op for every buffered write
We have a CAD use-case (hspice) which sees very slow buffered writes, apparently due to the quota code. (We haven't y... Dan van der Ster

08/07/2017

05:15 PM Feature #20885: add syntax for generating OSD/MDS auth caps for cephfs
PR to master was https://github.com/ceph/ceph/pull/16761 Ken Dreyer
03:33 PM Bug #20938 (New): CephFS: concurrent access to file from multiple nodes blocks for seconds
When accessing the same file opened for read/write on multiple nodes via ceph-fuse, performance drops by about 3 orde... Andras Pataki

08/05/2017

03:34 AM Bug #20852 (Resolved): hadoop on cephfs would report "Invalid argument" when mount on a sub direc...
Patrick Donnelly
03:33 AM Feature #20885 (Resolved): add syntax for generating OSD/MDS auth caps for cephfs
Patrick Donnelly

08/03/2017

09:13 PM Fix #20246 (Resolved): Make clog message on scrub errors friendlier.
Patrick Donnelly
09:11 PM Bug #20799 (Resolved): Races when multiple MDS boot at once
Patrick Donnelly
09:11 PM Bug #20806 (Resolved): kclient: fails to delete tree during thrashing w/ multimds
Patrick Donnelly
09:10 PM Bug #20892 (Resolved): qa: FS_DEGRADED spurious health warnings in some sub-suites
Patrick Donnelly
04:08 AM Bug #20892: qa: FS_DEGRADED spurious health warnings in some sub-suites
https://github.com/ceph/ceph/pull/16772 Patrick Donnelly
04:05 AM Bug #20892 (Resolved): qa: FS_DEGRADED spurious health warnings in some sub-suites
From: /ceph/teuthology-archive/pdonnell-2017-08-02_17:25:29-fs-wip-pdonnell-testing-20170802-distro-basic-smithi/1474... Patrick Donnelly
09:09 PM Feature #20760 (Resolved): mds: add perf counters for all mds-to-mds messages
Patrick Donnelly
09:09 PM Bug #20889 (Resolved): qa: MDS_DAMAGED not whitelisted properly
Patrick Donnelly
08:36 PM Bug #20889: qa: MDS_DAMAGED not whitelisted properly
https://github.com/ceph/ceph/pull/16768/ Patrick Donnelly
02:54 AM Bug #20535: mds segmentation fault ceph_lock_state_t::get_overlapping_locks
Webert Lima wrote:
> Just upgraded the other 2 production clusters where the problem tends to happen frequently.
> ...
Patrick Donnelly
02:48 AM Bug #20535: mds segmentation fault ceph_lock_state_t::get_overlapping_locks
Just upgraded the other 2 production clusters where the problem tends to happen frequently.
Will watch from now on.
Webert Lima
02:44 AM Bug #20595 (Resolved): mds: export_pin should be included in `get subtrees` output
Patrick Donnelly
02:43 AM Bug #20731 (Resolved): "[ERR] : Health check failed: 1 mds daemon down (MDS_FAILED)" in upgrade:j...
Patrick Donnelly

08/02/2017

11:31 PM Bug #20889 (Resolved): qa: MDS_DAMAGED not whitelisted properly
Due to d12c51ca9129213d53c25a00447af431083ad4c9, grep no longer whitelisted MDS_DAMAGED properly. qa/suites/fs/basic_... Patrick Donnelly
03:43 PM Feature #20885 (Resolved): add syntax for generating OSD/MDS auth caps for cephfs
Add a simpler method for generating MDS auth caps based on filesystem name.
https://bugzilla.redhat.com/show_bug.c...
Douglas Fuller
03:36 AM Feature #20760 (Fix Under Review): mds: add perf counters for all mds-to-mds messages
https://github.com/ceph/ceph/pull/16743 Patrick Donnelly

08/01/2017

10:54 AM Feature #20607 (Resolved): MDSMonitor: change "mds deactivate" to clearer "mds rejoin"
Kefu Chai
10:48 AM Backport #20026 (Resolved): kraken: cephfs: MDS became unresponsive when truncating a very large ...
Nathan Cutler
07:20 AM Support #20788 (Closed): MDS report "failed to open ino 10007be02d9 err -61/0" and can not restar...
Zheng Yan

07/31/2017

10:42 PM Bug #20595 (Fix Under Review): mds: export_pin should be included in `get subtrees` output
https://github.com/ceph/ceph/pull/16714 Patrick Donnelly
09:54 PM Feature #19230 (Resolved): Limit MDS deactivation to one at a time
Mon enforces this since 2c08f58ee8353322a342ce043150aafc8dd9c381. Patrick Donnelly
09:48 PM Bug #20731: "[ERR] : Health check failed: 1 mds daemon down (MDS_FAILED)" in upgrade:jewel-x-lumi...
PR: https://github.com/ceph/ceph/pull/16713 Patrick Donnelly
08:57 PM Bug #20731 (In Progress): "[ERR] : Health check failed: 1 mds daemon down (MDS_FAILED)" in upgrad...
Obviously this error is expected when restarting the MDS. We should whitelist the warning. Patrick Donnelly
09:05 PM Subtask #20864: kill allow_multimds
Removing allow_multimds seems reasonable. [Of course, the command should remain a deprecated no-op for deployment com... Patrick Donnelly
06:05 PM Subtask #20864 (Resolved): kill allow_multimds
At this point, allow_multimds is now the default. Under this proposal, its effect is exactly the same as setting max_... Douglas Fuller
10:44 AM Bug #20535: mds segmentation fault ceph_lock_state_t::get_overlapping_locks
One of our production clusters upgraded.
Next one scheduled for next Wednesday, August 2nd.
Webert Lima

07/30/2017

05:30 AM Bug #20852: hadoop on cephfs would report "Invalid argument" when mount on a sub directory
https://github.com/ceph/ceph/pull/16671 dongdong tao

07/29/2017

10:46 AM Bug #20852 (Resolved): hadoop on cephfs would report "Invalid argument" when mount on a sub direc...
we hava tested hadoop on cephfs and hbase on cephfs.
and we got following stack on hbase:
Failed to become active m...
dongdong tao
02:47 AM Feature #20851 (New): cephfs fuse support "secret" option
we know that cephfs kernel state mount support shows the "secret",
example:...
huanwen ren

07/28/2017

04:57 PM Bug #20805 (Resolved): qa: test_client_limits waiting for wrong health warning
Patrick Donnelly
04:57 PM Bug #20677 (Resolved): mds: abrt during migration
Patrick Donnelly
01:38 PM Bug #20806 (Fix Under Review): kclient: fails to delete tree during thrashing w/ multimds
https://github.com/ceph/ceph/pull/16654 Zheng Yan
07:46 AM Bug #20806 (In Progress): kclient: fails to delete tree during thrashing w/ multimds
it's caused by bug in "open inode by inode number" function Zheng Yan
07:33 AM Support #20788: MDS report "failed to open ino 10007be02d9 err -61/0" and can not restart success
now we have figure out the reason:
it's killed by docker when mds reach its memory limit
thanks for your help!
dongdong tao
07:11 AM Support #20788: MDS report "failed to open ino 10007be02d9 err -61/0" and can not restart success
"failed to open ino" is a normal when mds is recovery. what do you mean "ceph can not restart"? mds crashed or mds hu... Zheng Yan
06:16 AM Backport #20823 (Resolved): jewel: client::mkdirs not handle well when two clients send mkdir req...
https://github.com/ceph/ceph/pull/20271 Nathan Cutler
02:29 AM Bug #20566 (Resolved): "MDS health message (mds.0): Behind on trimming" in powercycle tests
Kefu Chai
12:30 AM Bug #20792: cephfs: ceph fs new is err when no default rbd pool
Maybe is my version have problem,I check it.
thank you, John and sage.
huanwen ren

07/27/2017

09:39 PM Bug #20805: qa: test_client_limits waiting for wrong health warning
https://github.com/ceph/ceph/pull/16640 Patrick Donnelly
08:59 PM Bug #20805: qa: test_client_limits waiting for wrong health warning
From: /ceph/teuthology-archive/pdonnell-2017-07-26_18:40:14-fs-wip-pdonnell-testing-20170725-distro-basic-smithi/1448... Patrick Donnelly
08:58 PM Bug #20805 (Resolved): qa: test_client_limits waiting for wrong health warning
... Patrick Donnelly
09:29 PM Bug #20806: kclient: fails to delete tree during thrashing w/ multimds
Zheng, please take a look. Patrick Donnelly
09:29 PM Bug #20806 (Resolved): kclient: fails to delete tree during thrashing w/ multimds
... Patrick Donnelly
09:20 PM Support #20788: MDS report "failed to open ino 10007be02d9 err -61/0" and can not restart success
61 is ENODATA. Sounds like something broke in the cluster; you'll need to provide a timeline of events. Greg Farnum
03:06 AM Support #20788 (Closed): MDS report "failed to open ino 10007be02d9 err -61/0" and can not restar...
ceph version is v10.2.8
now my ceph can not restart
i have cephfs_metadata and cephfs_data pools
i can not reprodu...
dongdong tao
05:48 PM Bug #20799 (Fix Under Review): Races when multiple MDS boot at once
Douglas Fuller
05:48 PM Bug #20799: Races when multiple MDS boot at once
https://github.com/ceph/ceph/pull/16631 Douglas Fuller
04:53 PM Bug #20799 (Resolved): Races when multiple MDS boot at once
There is a race in MDSRank::starting_done() between MDCache::open_root() and MDLog::start_new_segment()
An MDS in ...
Douglas Fuller
04:15 PM Bug #20792 (Need More Info): cephfs: ceph fs new is err when no default rbd pool
Sage Weil
04:15 PM Bug #20792: cephfs: ceph fs new is err when no default rbd pool
What version did you see this on? Current master already skips pool id 0. Sage Weil
02:16 PM Bug #20792 (Fix Under Review): cephfs: ceph fs new is err when no default rbd pool
https://github.com/ceph/ceph/pull/16626 John Spray
01:05 PM Bug #20792 (Need More Info): cephfs: ceph fs new is err when no default rbd pool
The default rbd pool is deleted in the new version.
link:...
huanwen ren

07/26/2017

04:23 PM Bug #20761 (Resolved): fs status: KeyError in handle_fs_status
Fixed by 71ea1716043843dd191830f0bcbcc4a88059a9c2. Patrick Donnelly

07/24/2017

08:01 PM Bug #20761 (Resolved): fs status: KeyError in handle_fs_status
... Sage Weil
06:14 PM Feature #20760 (Resolved): mds: add perf counters for all mds-to-mds messages
Idea here is to have a better idea of what the MDS are doing through external tools (graphs). Continuation of #19362. Patrick Donnelly

07/23/2017

08:38 AM Feature #20752 (Resolved): cap message flag which indicates if client still has pending capsnap
current mds code uses "(cap->issued() & CEPH_CAP_ANY_FILE_WR) == 0" to infer that client has no pending capsnap. ther... Zheng Yan
 

Also available in: Atom