Project

General

Profile

Activity

From 02/22/2017 to 03/23/2017

03/23/2017

01:35 PM Feature #18509 (Fix Under Review): MDS: damage reporting by ino number is useless
John Spray
01:30 PM Feature #18509: MDS: damage reporting by ino number is useless
https://github.com/ceph/ceph/pull/14104 John Spray
02:16 AM Feature #19362 (Resolved): mds: add perf counters for each type of MDS operation
It's desirable to know the types of operations are being performed on the MDS so we have a better idea over time what... Patrick Donnelly

03/22/2017

06:31 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
I have used the offset parameter of the ceph_dir_llseek function, and it will be passed to mds in readdir request,if ... geng jichao
03:38 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
This bug is not specific to kernel client. Enabling directory fragments can help. The complete fix is make client enc... Zheng Yan

03/21/2017

08:56 PM Bug #16842: mds: replacement MDS crashes on InoTable release
If anyone needs the debug level "30/30" dump I can send it directly to them.
Also, so far I have not nuked the cep...
c sights
04:08 PM Bug #16842: mds: replacement MDS crashes on InoTable release
I don't have any brilliant ideas for debugging this (can't go back in time to find out how those bogus inode ranges m... John Spray
08:39 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
Just in case it wasn't clear from the log snippet above: the current counter is continously growing until the process... Arne Wiebalck
08:25 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
Hmm, I've looked at the code now and it turns out we do already have a client_oc_max_dirty setting that is supposed t... John Spray
05:18 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
I see ... so the fix would be to have a dirty_limit (like vm.dirty_ratio) above which all new request would not be ca... Arne Wiebalck
03:11 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
Hmm, could be that we just have no limit on the number of dirty objects (trim is only dropping clean things), so if w... John Spray
01:10 PM Bug #19343 (New): OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limit...
Creating a 100G file on CephFS reproducibly triggers OOM to kill the
ceph-fuse process.
With objectcacher debug ...
Arne Wiebalck
01:01 PM Backport #19335 (Resolved): kraken: MDS heartbeat timeout during rejoin, when working with large ...
https://github.com/ceph/ceph/pull/14572 Nathan Cutler
01:01 PM Backport #19334 (Resolved): jewel: MDS heartbeat timeout during rejoin, when working with large a...
https://github.com/ceph/ceph/pull/14672 Nathan Cutler

03/20/2017

07:07 AM Bug #19245 (Resolved): Crash in PurgeQueue::_execute_item when deletions happen extremely quickly
John Spray

03/19/2017

08:25 AM Bug #19306 (Resolved): fs: mount NFS to cephfs, and then ls a directory containing a large number...
The ceph_readdir function save lot of date in the file->private_date, include the last_name which uses as offset.Howe... geng jichao

03/18/2017

09:17 PM Bug #19240: multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid ...
Note: inodes loaded is visible in mds-ino+.png in both workflow directories. Patrick Donnelly
09:16 PM Bug #19240: multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid ...
I believe I have this one figured out. The client requests graph (mds-request.png) shows that the 8 MDS workflow is r... Patrick Donnelly

03/17/2017

04:37 PM Bug #19291 (Fix Under Review): mds: log rotation doesn't work if mds has respawned
https://github.com/ceph/ceph/pull/14021 Patrick Donnelly
12:00 PM Bug #17939 (Fix Under Review): non-local cephfs quota changes not visible until some IO is done
https://github.com/ceph/ceph/pull/14018 John Spray
11:51 AM Bug #19282 (Fix Under Review): RecoveryQueue::_recovered asserts out on OSD errors
https://github.com/ceph/ceph/pull/14017 John Spray
11:36 AM Fix #19288 (Fix Under Review): Remove legacy "mds tell"
https://github.com/ceph/ceph/pull/14015 John Spray

03/16/2017

10:17 PM Bug #19291: mds: log rotation doesn't work if mds has respawned
Sorry hit submit on accident before finishing writing this up. Standby! Patrick Donnelly
10:16 PM Bug #19291 (Resolved): mds: log rotation doesn't work if mds has respawned
If an MDS respawns then its "comm" name becomes "exe" which confuses logrotate since it relies on killlall. What ends... Patrick Donnelly
09:42 PM Bug #16842: mds: replacement MDS crashes on InoTable release

Hi All,
After upgrading to 10.2.6 on Debian Jessie, the MDS server fails to start. Below is what is written to ...
c sights
08:44 PM Feature #19286 (Closed): Log/warn if "mds bal fragment size max" is reached
John Spray
10:32 AM Feature #19286: Log/warn if "mds bal fragment size max" is reached
OK, not such a big deal for me, I guess I can increase fragment size till Luminous release. elder one
10:14 AM Feature #19286: Log/warn if "mds bal fragment size max" is reached
Were you planning to work on this?
I don't think we're likely to work on this as a new feature, given that in Lumi...
John Spray
09:51 AM Feature #19286 (Closed): Log/warn if "mds bal fragment size max" is reached
Since in Jewel cephfs directory fragmentation is disabled by default, would be nice to know when we hit "mds bal frag... elder one
04:49 PM Fix #19288 (Resolved): Remove legacy "mds tell"

This has been deprecated for a while in favour of the new-style "ceph tell mds.<id>". Luminous seems like a good t...
John Spray
10:14 AM Bug #18151 (Resolved): Incorrect report of size when quotas are enabled.
John Spray
03:10 AM Bug #18151: Incorrect report of size when quotas are enabled.
Hi John. Let us close it now since according to Greg it has been resolved. Still did not upgrade but will open a new ... Goncalo Borges

03/15/2017

09:51 PM Bug #19282 (Resolved): RecoveryQueue::_recovered asserts out on OSD errors

There's no good reason the OSD should ever give us errors in this case, but crashing out the MDS is definitely not ...
John Spray
09:16 PM Bug #19118 (Pending Backport): MDS heartbeat timeout during rejoin, when working with large amoun...
John Spray
11:43 AM Bug #17939: non-local cephfs quota changes not visible until some IO is done
Right, finally looking at this.
Turns out all our ceph.* vxattrs return whatever is in the local cache, with no re...
John Spray
10:30 AM Bug #18151: Incorrect report of size when quotas are enabled.
Goncalo: did the issue go away for you? If so let's close this. John Spray

03/13/2017

01:47 PM Bug #18579: Fuse client has "opening" session to nonexistent MDS rank after MDS cluster shrink
After discussion during standup, a few things probably needs to happen here:
* Upon receiving a new MDSMap, the cl...
Patrick Donnelly

03/10/2017

07:47 PM Bug #18579 (In Progress): Fuse client has "opening" session to nonexistent MDS rank after MDS clu...
Sorry for the delay updating this.
The issue appears to be that the stopping MDS is still authoritative for the pa...
Patrick Donnelly
01:08 PM Bug #19255 (Can't reproduce): qa: test_full_fclose failure

I suspect the rework of full handling broke this test:
https://github.com/ceph/ceph/pull/13615/files
It's faili...
John Spray

03/09/2017

01:18 PM Bug #16709 (Fix Under Review): No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" command
https://github.com/ceph/ceph/pull/13904 John Spray
01:11 PM Bug #18238 (Can't reproduce): TestDataScan failing due to log "unmatched rstat on 100"
This seems to have gone away. John Spray
01:08 PM Bug #19103 (Won't Fix): cephfs: Out of space handling
The consensus at the moment seems to be that cephfs shouldn't be doing any nearfull special behaviour. John Spray
11:29 AM Bug #19245 (Fix Under Review): Crash in PurgeQueue::_execute_item when deletions happen extremely...
https://github.com/ceph/ceph/pull/13899 John Spray
10:37 AM Bug #19245 (Resolved): Crash in PurgeQueue::_execute_item when deletions happen extremely quickly
There is a path here where the deletion completes so quickly that our call to gather.activate() is calling execute_it... John Spray
04:34 AM Bug #19243 (New): multimds on linode: MDSs make uneven progress on independent workloads
I have a test that just finished with 4 MDS with 400k cache object limit and 64 clients. What appears to happen in th... Patrick Donnelly
01:27 AM Bug #19240 (Closed): multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kern...
This is possibly not a bug but I thought I would put it here to solicit comments/assistance on what could be causing ... Patrick Donnelly
12:29 AM Bug #19239 (Resolved): mds: stray count remains static after workflows complete
For the Linode tests, I've observed that after the clients finish removing their sandboxes (thus making the subtree c... Patrick Donnelly

03/08/2017

05:15 PM Bug #16914: multimds: pathologically slow deletions in some tests
Notes on discussion today:
* The issue is that we usually (single mds) do fine in these cases because of the proj...
John Spray
02:44 PM Feature #19230 (Resolved): Limit MDS deactivation to one at a time
This needs to happen in the thrasher code, and also in the mon John Spray
12:17 PM Bug #19204: MDS assert failed when shutting down
https://github.com/ceph/ceph/pull/13859 John Spray
12:15 PM Bug #19204 (Fix Under Review): MDS assert failed when shutting down
Hmm, we do shut down the objecter before the finisher, which is clearly not handling this case.
Let's try swapping...
John Spray
11:20 AM Bug #19201 (Resolved): mds: status asok command prints MDS_RANK_NONE as unsigned long -1: 1844674...
John Spray
11:19 AM Feature #11950 (Resolved): Strays enqueued for purge cause MDCache to exceed size limit
PurgeQueue has merged to master, will be in Luminous. John Spray
08:21 AM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
John Spray wrote:
> After decreasing max_mds, you also need to use "ceph mds deactivate <rank>" to shrink the active...
xianglong wang
04:07 AM Bug #19220 (New): jewel: mds crashed (FAILED assert(mds->mds_lock.is_locked_by_me())
http://qa-proxy.ceph.com/teuthology/teuthology-2017-02-19_10:10:02-fs-jewel---basic-smithi/833293/teuthology.log
<...
Zheng Yan

03/07/2017

02:38 PM Backport #19206 (In Progress): jewel: Invalid error code returned by MDS is causing a kernel clie...
Jan Fajerski
01:28 PM Backport #19206 (Resolved): jewel: Invalid error code returned by MDS is causing a kernel client ...
https://github.com/ceph/ceph/pull/13831 Jan Fajerski
01:22 PM Bug #19205 (Resolved): Invalid error code returned by MDS is causing a kernel client WARNING
Found by an occasionally failing xfstest generic/011.
After some investigation, it was found out that a positive e...
Jan Fajerski
09:20 AM Bug #19204 (Resolved): MDS assert failed when shutting down
We encountered a failed assertion when trying to shutdown an MDS. Here is a snippet of the log:
> -14> 2017-01-22 ...
Sandy Xu
07:06 AM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
Zheng Yan wrote:
> [...]
>
> the test passes after applying above patch and adding "-W -R" options to fsx-mpi, me...
Ivan Guan

03/06/2017

08:45 PM Bug #19201 (Fix Under Review): mds: status asok command prints MDS_RANK_NONE as unsigned long -1:...
https://github.com/ceph/ceph/pull/13816 Patrick Donnelly
08:36 PM Bug #19201 (Resolved): mds: status asok command prints MDS_RANK_NONE as unsigned long -1: 1844674...
i.e.... Patrick Donnelly
01:06 PM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
After decreasing max_mds, you also need to use "ceph mds deactivate <rank>" to shrink the active cluster, so that one... John Spray
04:21 AM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
John Spray wrote:
> Having multiple active MDS daemons does not remove the need for standby daemons. Set max_mds to...
xianglong wang
12:54 PM Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodes
John Spray wrote:
> https://github.com/ceph/ceph/pull/13807
>
> I've added a periodic heartbeat reset in the long...
Zheng Yan
12:00 PM Bug #19118 (Fix Under Review): MDS heartbeat timeout during rejoin, when working with large amoun...
https://github.com/ceph/ceph/pull/13807
I've added a periodic heartbeat reset in the long loop in export_remaining...
John Spray
06:13 AM Bug #18797 (Duplicate): valgrind jobs hanging in fs suite
the master now passes, see http://pulpito.ceph.com/kchai-2017-03-06_05:57:24-fs-master---basic-mira/ Kefu Chai

03/03/2017

10:18 AM Bug #16914: multimds: pathologically slow deletions in some tests
Pulled the test off wip-11950 and onto https://github.com/ceph/ceph/pull/13770 John Spray
09:09 AM Feature #19135 (Rejected): Multi-Mds: One dead, all dead; what is Robustness??
Having multiple active MDS daemons does not remove the need for standby daemons. Set max_mds to something less than ... John Spray
07:54 AM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
Zheng Yan wrote:
> multiple active mds is mainly for improving performance (balance load to multiple mds). robustnes...
xianglong wang
06:59 AM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
multiple active mds is mainly for improving performance (balance load to multiple mds). robustness is achieved standb... Zheng Yan
02:57 AM Feature #19135 (Rejected): Multi-Mds: One dead, all dead; what is Robustness??
I have three mds, when I copy file to fuse-type mountpoint for test.
when I shutdown one mds, the client hang, the m...
xianglong wang
07:19 AM Feature #19137 (New): samba: implement aio callback for ceph vfs module
Zheng Yan

03/02/2017

11:52 AM Bug #17656: cephfs: high concurrent causing slow request
jichao sun wrote:
> I have the same problem too!!!
jichao sun
08:27 AM Bug #17656: cephfs: high concurrent causing slow request
I have the same problem too!!!

jichao sun
08:33 AM Bug #18995 (Resolved): ceph-fuse always fails If pid file is non-empty and run as daemon
Kefu Chai
08:11 AM Bug #18730: mds: backtrace issues getxattr for every file with cap on rejoin
Zheng Yan wrote:
> I think we should design a new mechanism to track in-use inodes (current method isn't scalable be...
Xiaoxi Chen
03:40 AM Bug #18730: mds: backtrace issues getxattr for every file with cap on rejoin
I think we should design a new mechanism to track in-use inodes (current method isn't scalable because it journals al... Zheng Yan

03/01/2017

11:30 PM Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodes
I have that already. I did set the beacon_grace to 600s to walk around the bug and bring the cluster back.
Seems r...
Xiaoxi Chen
11:16 PM Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodes
I have that already. I did set the beacon_grace to 600s to walk around the bug and bring the cluster back.
In firs...
Xiaoxi Chen
03:52 PM Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodes
May be related to http://tracker.ceph.com/issues/18730, although perhaps not since that one shouldn't be causing the ... John Spray
03:36 PM Bug #19118 (Resolved): MDS heartbeat timeout during rejoin, when working with large amount of cap...
We set an alarm every OPTION(mds_beacon_grace, OPT_FLOAT, 15) seconds, if mds_rank doesnt finish its task within this... Xiaoxi Chen
05:36 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Zheng Yan wrote:
> probably fixed by https://github.com/ceph/ceph-client/commit/10a2699426a732cbf3fc9e835187e8b914f0...
Darrell Enns
03:31 PM Feature #16523 (In Progress): Assert directory fragmentation is occuring during stress tests
John Spray
11:09 AM Bug #18883: qa: failures in samba suite
open ticket for samba build http://tracker.ceph.com/issues/19117 Zheng Yan
08:21 AM Bug #17828 (Need More Info): libceph setxattr returns 0 without setting the attr
ceph.quota.max_files is hidden xattr. It doesn't show in listxattr. you need to get it explictly (getfattr -n ceph.qu... Zheng Yan

02/28/2017

02:12 PM Bug #19103: cephfs: Out of space handling
Looking again now that I'm a few coffees into my day -- all the cephfs enospc stuff is just aimed at providing a slic... John Spray
08:58 AM Bug #19103: cephfs: Out of space handling
Believe it or not I sneezed and somehow that caused me to select some affected versions... John Spray
08:58 AM Bug #19103: cephfs: Out of space handling
Also, I'm not sure we actually need to do sync writes when the cluster is near full -- we already have machinery that... John Spray
08:56 AM Bug #19103: cephfs: Out of space handling
Could the "ENOSPC on failsafe_full_ratio" behaviour be the default? It seems like any application layer that wants t... John Spray
12:13 AM Bug #19103 (Won't Fix): cephfs: Out of space handling

Cephfs needs to be more careful on a cluster with almost full OSDs. There is a delay in OSDs reporting stats, a MO...
David Zafman
01:37 PM Feature #19109 (Resolved): Use data pool's 'df' for statfs instead of global stats, if there is o...

The client sends a MStatfs to the mon to get the info for a statfs system call. Currently the mon gives it the glo...
John Spray
09:03 AM Bug #17828: libceph setxattr returns 0 without setting the attr
This ticket didn't get noticed because it was filed in the 'mgr' component instead of the 'fs' component.
Chris: d...
John Spray

02/27/2017

09:14 PM Bug #19101 (Closed): "samba3error [Unknown error/failure. Missing torture_fail() or torture_asser...
This is jewel point v10.2.6
Run: http://pulpito.ceph.com/yuriw-2017-02-24_20:42:46-samba-jewel---basic-smithi/
Jo...
Yuri Weinstein
12:47 PM Bug #18883: qa: failures in samba suite
both ubuntu and centos samba packages have no dependency to libcephfs. It works when there happen to be libcephfs1. Zheng Yan
11:21 AM Bug #18883: qa: failures in samba suite
Which packages were you seeing the linkage issue on? The centos ones? John Spray
09:25 AM Bug #18883: qa: failures in samba suite
... Zheng Yan
07:56 AM Bug #18757: Jewel ceph-fuse does not recover after lost connection to MDS
I updated PR to do _closed_mds_session(s).
As for config option, I would expect client to reconnect automagically ...
Henrik Korkuc

02/24/2017

02:29 PM Feature #19075 (Fix Under Review): Extend 'p' mds auth cap to cover quotas and all layout fields
https://github.com/ceph/ceph/pull/13628 John Spray
02:21 PM Feature #19075 (Resolved): Extend 'p' mds auth cap to cover quotas and all layout fields
Re. mailing list thread "quota change restriction" http://marc.info/?l=ceph-devel&m=148769159329755&w=2
We should ...
John Spray
02:12 PM Bug #18600 (Resolved): multimds suite tries to run quota tests against kclient, fails
Zheng Yan
02:11 PM Bug #17990 (Resolved): newly created directory may get fragmented before it gets journaled
Zheng Yan
02:10 PM Bug #16768 (Resolved): multimds: check_rstat assertion failure
Zheng Yan
02:10 PM Bug #18159 (Resolved): "Unknown mount option mds_namespace"
Zheng Yan
02:09 PM Bug #18646 (Resolved): mds: rejoin_import_cap FAILED assert(session)
Zheng Yan
09:51 AM Bug #18953 (Resolved): mds applies 'fs full' check for CEPH_MDS_OP_SETFILELOCK
Zheng Yan
09:50 AM Bug #18663 (Resolved): teuthology teardown hangs if kclient umount fails
Zheng Yan
09:48 AM Bug #18675 (Resolved): client: during multimds thrashing FAILED assert(session->requests.empty())
Zheng Yan
01:47 AM Bug #18759 (Resolved): multimds suite tries to run norstats tests against kclient
Zheng Yan

02/23/2017

11:24 PM Feature #9754: A 'fence and evict' client eviction command
Underway on jcsp/wip-17980 along with #17980 John Spray
12:07 PM Feature #9754 (In Progress): A 'fence and evict' client eviction command
John Spray
09:42 AM Bug #18757: Jewel ceph-fuse does not recover after lost connection to MDS
you can use 'ceph daemon client.xxx kick_stale_sessions' to recover this issue. Maybe we should add config option to ... Zheng Yan
09:30 AM Backport #19045 (Resolved): kraken: buffer overflow in test LibCephFS.DirLs
https://github.com/ceph/ceph/pull/14571 Loïc Dachary
09:30 AM Backport #19044 (Resolved): jewel: buffer overflow in test LibCephFS.DirLs
https://github.com/ceph/ceph/pull/14671 Loïc Dachary

02/22/2017

02:45 PM Bug #19033: cephfs: mds is crushed, after I set about 400 64KB xattr kv pairs to a file
Thank you for the very detailed report John Spray
02:44 PM Bug #19033 (Fix Under Review): cephfs: mds is crushed, after I set about 400 64KB xattr kv pairs ...
John Spray
01:04 PM Bug #19033: cephfs: mds is crushed, after I set about 400 64KB xattr kv pairs to a file
h1. Fix proposal
https://github.com/ceph/ceph/pull/13587
Honggang Yang
12:22 PM Bug #19033 (Resolved): cephfs: mds is crushed, after I set about 400 64KB xattr kv pairs to a file
h1. 1. Problem
After I have set about 400 64KB xattr kv pair to a file,
mds is crashed. Every time I try to star...
Honggang Yang
10:05 AM Bug #18941 (Pending Backport): buffer overflow in test LibCephFS.DirLs
It's a rare thing, but let's backport so that we don't have to re-diagnose it in the future. John Spray
09:48 AM Bug #18964 (Resolved): mon: fs new is no longer idempotent
John Spray
09:36 AM Bug #19022 (Fix Under Review): Crash in Client::queue_cap_snap when thrashing
https://github.com/ceph/ceph/pull/13579 Zheng Yan
09:36 AM Bug #18914 (Fix Under Review): cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume...
https://github.com/ceph/ceph/pull/13580 Zheng Yan
 

Also available in: Atom