Activity
From 02/20/2017 to 03/21/2017
03/21/2017
- 08:56 PM Bug #16842: mds: replacement MDS crashes on InoTable release
- If anyone needs the debug level "30/30" dump I can send it directly to them.
Also, so far I have not nuked the cep... - 04:08 PM Bug #16842: mds: replacement MDS crashes on InoTable release
- I don't have any brilliant ideas for debugging this (can't go back in time to find out how those bogus inode ranges m...
- 08:39 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- Just in case it wasn't clear from the log snippet above: the current counter is continously growing until the process...
- 08:25 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- Hmm, I've looked at the code now and it turns out we do already have a client_oc_max_dirty setting that is supposed t...
- 05:18 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- I see ... so the fix would be to have a dirty_limit (like vm.dirty_ratio) above which all new request would not be ca...
- 03:11 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- Hmm, could be that we just have no limit on the number of dirty objects (trim is only dropping clean things), so if w...
- 01:10 PM Bug #19343 (New): OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limit...
- Creating a 100G file on CephFS reproducibly triggers OOM to kill the
ceph-fuse process.
With objectcacher debug ... - 01:01 PM Backport #19335 (Resolved): kraken: MDS heartbeat timeout during rejoin, when working with large ...
- https://github.com/ceph/ceph/pull/14572
- 01:01 PM Backport #19334 (Resolved): jewel: MDS heartbeat timeout during rejoin, when working with large a...
- https://github.com/ceph/ceph/pull/14672
03/20/2017
- 07:07 AM Bug #19245 (Resolved): Crash in PurgeQueue::_execute_item when deletions happen extremely quickly
03/19/2017
- 08:25 AM Bug #19306 (Resolved): fs: mount NFS to cephfs, and then ls a directory containing a large number...
- The ceph_readdir function save lot of date in the file->private_date, include the last_name which uses as offset.Howe...
03/18/2017
- 09:17 PM Bug #19240: multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid ...
- Note: inodes loaded is visible in mds-ino+.png in both workflow directories.
- 09:16 PM Bug #19240: multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid ...
- I believe I have this one figured out. The client requests graph (mds-request.png) shows that the 8 MDS workflow is r...
03/17/2017
- 04:37 PM Bug #19291 (Fix Under Review): mds: log rotation doesn't work if mds has respawned
- https://github.com/ceph/ceph/pull/14021
- 12:00 PM Bug #17939 (Fix Under Review): non-local cephfs quota changes not visible until some IO is done
- https://github.com/ceph/ceph/pull/14018
- 11:51 AM Bug #19282 (Fix Under Review): RecoveryQueue::_recovered asserts out on OSD errors
- https://github.com/ceph/ceph/pull/14017
- 11:36 AM Fix #19288 (Fix Under Review): Remove legacy "mds tell"
- https://github.com/ceph/ceph/pull/14015
03/16/2017
- 10:17 PM Bug #19291: mds: log rotation doesn't work if mds has respawned
- Sorry hit submit on accident before finishing writing this up. Standby!
- 10:16 PM Bug #19291 (Resolved): mds: log rotation doesn't work if mds has respawned
- If an MDS respawns then its "comm" name becomes "exe" which confuses logrotate since it relies on killlall. What ends...
- 09:42 PM Bug #16842: mds: replacement MDS crashes on InoTable release
Hi All,
After upgrading to 10.2.6 on Debian Jessie, the MDS server fails to start. Below is what is written to ...- 08:44 PM Feature #19286 (Closed): Log/warn if "mds bal fragment size max" is reached
- 10:32 AM Feature #19286: Log/warn if "mds bal fragment size max" is reached
- OK, not such a big deal for me, I guess I can increase fragment size till Luminous release.
- 10:14 AM Feature #19286: Log/warn if "mds bal fragment size max" is reached
- Were you planning to work on this?
I don't think we're likely to work on this as a new feature, given that in Lumi... - 09:51 AM Feature #19286 (Closed): Log/warn if "mds bal fragment size max" is reached
- Since in Jewel cephfs directory fragmentation is disabled by default, would be nice to know when we hit "mds bal frag...
- 04:49 PM Fix #19288 (Resolved): Remove legacy "mds tell"
This has been deprecated for a while in favour of the new-style "ceph tell mds.<id>". Luminous seems like a good t...- 10:14 AM Bug #18151 (Resolved): Incorrect report of size when quotas are enabled.
- 03:10 AM Bug #18151: Incorrect report of size when quotas are enabled.
- Hi John. Let us close it now since according to Greg it has been resolved. Still did not upgrade but will open a new ...
03/15/2017
- 09:51 PM Bug #19282 (Resolved): RecoveryQueue::_recovered asserts out on OSD errors
There's no good reason the OSD should ever give us errors in this case, but crashing out the MDS is definitely not ...- 09:16 PM Bug #19118 (Pending Backport): MDS heartbeat timeout during rejoin, when working with large amoun...
- 11:43 AM Bug #17939: non-local cephfs quota changes not visible until some IO is done
- Right, finally looking at this.
Turns out all our ceph.* vxattrs return whatever is in the local cache, with no re... - 10:30 AM Bug #18151: Incorrect report of size when quotas are enabled.
- Goncalo: did the issue go away for you? If so let's close this.
03/13/2017
- 01:47 PM Bug #18579: Fuse client has "opening" session to nonexistent MDS rank after MDS cluster shrink
- After discussion during standup, a few things probably needs to happen here:
* Upon receiving a new MDSMap, the cl...
03/10/2017
- 07:47 PM Bug #18579 (In Progress): Fuse client has "opening" session to nonexistent MDS rank after MDS clu...
- Sorry for the delay updating this.
The issue appears to be that the stopping MDS is still authoritative for the pa... - 01:08 PM Bug #19255 (Can't reproduce): qa: test_full_fclose failure
I suspect the rework of full handling broke this test:
https://github.com/ceph/ceph/pull/13615/files
It's faili...
03/09/2017
- 01:18 PM Bug #16709 (Fix Under Review): No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" command
- https://github.com/ceph/ceph/pull/13904
- 01:11 PM Bug #18238 (Can't reproduce): TestDataScan failing due to log "unmatched rstat on 100"
- This seems to have gone away.
- 01:08 PM Bug #19103 (Won't Fix): cephfs: Out of space handling
- The consensus at the moment seems to be that cephfs shouldn't be doing any nearfull special behaviour.
- 11:29 AM Bug #19245 (Fix Under Review): Crash in PurgeQueue::_execute_item when deletions happen extremely...
- https://github.com/ceph/ceph/pull/13899
- 10:37 AM Bug #19245 (Resolved): Crash in PurgeQueue::_execute_item when deletions happen extremely quickly
- There is a path here where the deletion completes so quickly that our call to gather.activate() is calling execute_it...
- 04:34 AM Bug #19243 (New): multimds on linode: MDSs make uneven progress on independent workloads
- I have a test that just finished with 4 MDS with 400k cache object limit and 64 clients. What appears to happen in th...
- 01:27 AM Bug #19240 (Closed): multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kern...
- This is possibly not a bug but I thought I would put it here to solicit comments/assistance on what could be causing ...
- 12:29 AM Bug #19239 (Resolved): mds: stray count remains static after workflows complete
- For the Linode tests, I've observed that after the clients finish removing their sandboxes (thus making the subtree c...
03/08/2017
- 05:15 PM Bug #16914: multimds: pathologically slow deletions in some tests
- Notes on discussion today:
* The issue is that we usually (single mds) do fine in these cases because of the proj... - 02:44 PM Feature #19230 (Resolved): Limit MDS deactivation to one at a time
- This needs to happen in the thrasher code, and also in the mon
- 12:17 PM Bug #19204: MDS assert failed when shutting down
- https://github.com/ceph/ceph/pull/13859
- 12:15 PM Bug #19204 (Fix Under Review): MDS assert failed when shutting down
- Hmm, we do shut down the objecter before the finisher, which is clearly not handling this case.
Let's try swapping... - 11:20 AM Bug #19201 (Resolved): mds: status asok command prints MDS_RANK_NONE as unsigned long -1: 1844674...
- 11:19 AM Feature #11950 (Resolved): Strays enqueued for purge cause MDCache to exceed size limit
- PurgeQueue has merged to master, will be in Luminous.
- 08:21 AM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
- John Spray wrote:
> After decreasing max_mds, you also need to use "ceph mds deactivate <rank>" to shrink the active... - 04:07 AM Bug #19220 (New): jewel: mds crashed (FAILED assert(mds->mds_lock.is_locked_by_me())
- http://qa-proxy.ceph.com/teuthology/teuthology-2017-02-19_10:10:02-fs-jewel---basic-smithi/833293/teuthology.log
<...
03/07/2017
- 02:38 PM Backport #19206 (In Progress): jewel: Invalid error code returned by MDS is causing a kernel clie...
- 01:28 PM Backport #19206 (Resolved): jewel: Invalid error code returned by MDS is causing a kernel client ...
- https://github.com/ceph/ceph/pull/13831
- 01:22 PM Bug #19205 (Resolved): Invalid error code returned by MDS is causing a kernel client WARNING
- Found by an occasionally failing xfstest generic/011.
After some investigation, it was found out that a positive e... - 09:20 AM Bug #19204 (Resolved): MDS assert failed when shutting down
- We encountered a failed assertion when trying to shutdown an MDS. Here is a snippet of the log:
> -14> 2017-01-22 ... - 07:06 AM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
- Zheng Yan wrote:
> [...]
>
> the test passes after applying above patch and adding "-W -R" options to fsx-mpi, me...
03/06/2017
- 08:45 PM Bug #19201 (Fix Under Review): mds: status asok command prints MDS_RANK_NONE as unsigned long -1:...
- https://github.com/ceph/ceph/pull/13816
- 08:36 PM Bug #19201 (Resolved): mds: status asok command prints MDS_RANK_NONE as unsigned long -1: 1844674...
- i.e....
- 01:06 PM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
- After decreasing max_mds, you also need to use "ceph mds deactivate <rank>" to shrink the active cluster, so that one...
- 04:21 AM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
- John Spray wrote:
> Having multiple active MDS daemons does not remove the need for standby daemons. Set max_mds to... - 12:54 PM Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodes
- John Spray wrote:
> https://github.com/ceph/ceph/pull/13807
>
> I've added a periodic heartbeat reset in the long... - 12:00 PM Bug #19118 (Fix Under Review): MDS heartbeat timeout during rejoin, when working with large amoun...
- https://github.com/ceph/ceph/pull/13807
I've added a periodic heartbeat reset in the long loop in export_remaining... - 06:13 AM Bug #18797 (Duplicate): valgrind jobs hanging in fs suite
- the master now passes, see http://pulpito.ceph.com/kchai-2017-03-06_05:57:24-fs-master---basic-mira/
03/03/2017
- 10:18 AM Bug #16914: multimds: pathologically slow deletions in some tests
- Pulled the test off wip-11950 and onto https://github.com/ceph/ceph/pull/13770
- 09:09 AM Feature #19135 (Rejected): Multi-Mds: One dead, all dead; what is Robustness??
- Having multiple active MDS daemons does not remove the need for standby daemons. Set max_mds to something less than ...
- 07:54 AM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
- Zheng Yan wrote:
> multiple active mds is mainly for improving performance (balance load to multiple mds). robustnes... - 06:59 AM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
- multiple active mds is mainly for improving performance (balance load to multiple mds). robustness is achieved standb...
- 02:57 AM Feature #19135 (Rejected): Multi-Mds: One dead, all dead; what is Robustness??
- I have three mds, when I copy file to fuse-type mountpoint for test.
when I shutdown one mds, the client hang, the m... - 07:19 AM Feature #19137 (New): samba: implement aio callback for ceph vfs module
03/02/2017
- 11:52 AM Bug #17656: cephfs: high concurrent causing slow request
- jichao sun wrote:
> I have the same problem too!!!
- 08:27 AM Bug #17656: cephfs: high concurrent causing slow request
- I have the same problem too!!!
- 08:33 AM Bug #18995 (Resolved): ceph-fuse always fails If pid file is non-empty and run as daemon
- 08:11 AM Bug #18730: mds: backtrace issues getxattr for every file with cap on rejoin
- Zheng Yan wrote:
> I think we should design a new mechanism to track in-use inodes (current method isn't scalable be... - 03:40 AM Bug #18730: mds: backtrace issues getxattr for every file with cap on rejoin
- I think we should design a new mechanism to track in-use inodes (current method isn't scalable because it journals al...
03/01/2017
- 11:30 PM Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodes
- I have that already. I did set the beacon_grace to 600s to walk around the bug and bring the cluster back.
Seems r... - 11:16 PM Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodes
- I have that already. I did set the beacon_grace to 600s to walk around the bug and bring the cluster back.
In firs... - 03:52 PM Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodes
- May be related to http://tracker.ceph.com/issues/18730, although perhaps not since that one shouldn't be causing the ...
- 03:36 PM Bug #19118 (Resolved): MDS heartbeat timeout during rejoin, when working with large amount of cap...
- We set an alarm every OPTION(mds_beacon_grace, OPT_FLOAT, 15) seconds, if mds_rank doesnt finish its task within this...
- 05:36 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
- Zheng Yan wrote:
> probably fixed by https://github.com/ceph/ceph-client/commit/10a2699426a732cbf3fc9e835187e8b914f0... - 03:31 PM Feature #16523 (In Progress): Assert directory fragmentation is occuring during stress tests
- 11:09 AM Bug #18883: qa: failures in samba suite
- open ticket for samba build http://tracker.ceph.com/issues/19117
- 08:21 AM Bug #17828 (Need More Info): libceph setxattr returns 0 without setting the attr
- ceph.quota.max_files is hidden xattr. It doesn't show in listxattr. you need to get it explictly (getfattr -n ceph.qu...
02/28/2017
- 02:12 PM Bug #19103: cephfs: Out of space handling
- Looking again now that I'm a few coffees into my day -- all the cephfs enospc stuff is just aimed at providing a slic...
- 08:58 AM Bug #19103: cephfs: Out of space handling
- Believe it or not I sneezed and somehow that caused me to select some affected versions...
- 08:58 AM Bug #19103: cephfs: Out of space handling
- Also, I'm not sure we actually need to do sync writes when the cluster is near full -- we already have machinery that...
- 08:56 AM Bug #19103: cephfs: Out of space handling
- Could the "ENOSPC on failsafe_full_ratio" behaviour be the default? It seems like any application layer that wants t...
- 12:13 AM Bug #19103 (Won't Fix): cephfs: Out of space handling
Cephfs needs to be more careful on a cluster with almost full OSDs. There is a delay in OSDs reporting stats, a MO...- 01:37 PM Feature #19109 (Resolved): Use data pool's 'df' for statfs instead of global stats, if there is o...
The client sends a MStatfs to the mon to get the info for a statfs system call. Currently the mon gives it the glo...- 09:03 AM Bug #17828: libceph setxattr returns 0 without setting the attr
- This ticket didn't get noticed because it was filed in the 'mgr' component instead of the 'fs' component.
Chris: d...
02/27/2017
- 09:14 PM Bug #19101 (Closed): "samba3error [Unknown error/failure. Missing torture_fail() or torture_asser...
- This is jewel point v10.2.6
Run: http://pulpito.ceph.com/yuriw-2017-02-24_20:42:46-samba-jewel---basic-smithi/
Jo... - 12:47 PM Bug #18883: qa: failures in samba suite
- both ubuntu and centos samba packages have no dependency to libcephfs. It works when there happen to be libcephfs1.
- 11:21 AM Bug #18883: qa: failures in samba suite
- Which packages were you seeing the linkage issue on? The centos ones?
- 09:25 AM Bug #18883: qa: failures in samba suite
- ...
- 07:56 AM Bug #18757: Jewel ceph-fuse does not recover after lost connection to MDS
- I updated PR to do _closed_mds_session(s).
As for config option, I would expect client to reconnect automagically ...
02/24/2017
- 02:29 PM Feature #19075 (Fix Under Review): Extend 'p' mds auth cap to cover quotas and all layout fields
- https://github.com/ceph/ceph/pull/13628
- 02:21 PM Feature #19075 (Resolved): Extend 'p' mds auth cap to cover quotas and all layout fields
- Re. mailing list thread "quota change restriction" http://marc.info/?l=ceph-devel&m=148769159329755&w=2
We should ... - 02:12 PM Bug #18600 (Resolved): multimds suite tries to run quota tests against kclient, fails
- 02:11 PM Bug #17990 (Resolved): newly created directory may get fragmented before it gets journaled
- 02:10 PM Bug #16768 (Resolved): multimds: check_rstat assertion failure
- 02:10 PM Bug #18159 (Resolved): "Unknown mount option mds_namespace"
- 02:09 PM Bug #18646 (Resolved): mds: rejoin_import_cap FAILED assert(session)
- 09:51 AM Bug #18953 (Resolved): mds applies 'fs full' check for CEPH_MDS_OP_SETFILELOCK
- 09:50 AM Bug #18663 (Resolved): teuthology teardown hangs if kclient umount fails
- 09:48 AM Bug #18675 (Resolved): client: during multimds thrashing FAILED assert(session->requests.empty())
- 01:47 AM Bug #18759 (Resolved): multimds suite tries to run norstats tests against kclient
02/23/2017
- 11:24 PM Feature #9754: A 'fence and evict' client eviction command
- Underway on jcsp/wip-17980 along with #17980
- 12:07 PM Feature #9754 (In Progress): A 'fence and evict' client eviction command
- 09:42 AM Bug #18757: Jewel ceph-fuse does not recover after lost connection to MDS
- you can use 'ceph daemon client.xxx kick_stale_sessions' to recover this issue. Maybe we should add config option to ...
- 09:30 AM Backport #19045 (Resolved): kraken: buffer overflow in test LibCephFS.DirLs
- https://github.com/ceph/ceph/pull/14571
- 09:30 AM Backport #19044 (Resolved): jewel: buffer overflow in test LibCephFS.DirLs
- https://github.com/ceph/ceph/pull/14671
02/22/2017
- 02:45 PM Bug #19033: cephfs: mds is crushed, after I set about 400 64KB xattr kv pairs to a file
- Thank you for the very detailed report
- 02:44 PM Bug #19033 (Fix Under Review): cephfs: mds is crushed, after I set about 400 64KB xattr kv pairs ...
- 01:04 PM Bug #19033: cephfs: mds is crushed, after I set about 400 64KB xattr kv pairs to a file
- h1. Fix proposal
https://github.com/ceph/ceph/pull/13587 - 12:22 PM Bug #19033 (Resolved): cephfs: mds is crushed, after I set about 400 64KB xattr kv pairs to a file
- h1. 1. Problem
After I have set about 400 64KB xattr kv pair to a file,
mds is crashed. Every time I try to star... - 10:05 AM Bug #18941 (Pending Backport): buffer overflow in test LibCephFS.DirLs
- It's a rare thing, but let's backport so that we don't have to re-diagnose it in the future.
- 09:48 AM Bug #18964 (Resolved): mon: fs new is no longer idempotent
- 09:36 AM Bug #19022 (Fix Under Review): Crash in Client::queue_cap_snap when thrashing
- https://github.com/ceph/ceph/pull/13579
- 09:36 AM Bug #18914 (Fix Under Review): cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume...
- https://github.com/ceph/ceph/pull/13580
02/21/2017
- 05:15 PM Feature #18490: client: implement delegation support in userland cephfs
- Greg Farnum wrote:
> I guess I'm not sure what you're going for with the Fb versus Fc here. Sure, if you have Fwb an... - 04:14 PM Feature #18490: client: implement delegation support in userland cephfs
- I guess I'm not sure what you're going for with the Fb versus Fc here. Sure, if you have Fwb and then get an Fr read ...
- 04:04 PM Feature #18490: client: implement delegation support in userland cephfs
- Greg Farnum wrote:
> > BTW: CEPH_CAPFILE_BUFFER does also imply CEPH_CAP_FILE_CACHE, doesn't it?
>
> No, I don't ... - 01:21 AM Feature #18490: client: implement delegation support in userland cephfs
- > BTW: CEPH_CAPFILE_BUFFER does also imply CEPH_CAP_FILE_CACHE, doesn't it?
No, I don't think it does. In practice... - 05:15 PM Bug #18914: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeC...
- Of course, you're right.
- 02:43 PM Bug #18914: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeC...
- I think cephfs python bind calls ceph_setxattr instead of ceph_ll_setxattr. There is no such code in Client::setxattr
- 12:31 PM Bug #18914: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeC...
- The thing that's confusing me is that Client::ll_setxattr has this block:...
- 09:05 AM Bug #18914: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeC...
- The error is because MDS had outdated osdmap and thought the newly creately pool does not exist. (MDS has code that m...
- 02:15 PM Bug #19022 (In Progress): Crash in Client::queue_cap_snap when thrashing
- 09:30 AM Backport #18707 (In Progress): kraken: failed filelock.can_read(-1) assertion in Server::_dir_is_...
- 09:28 AM Backport #18708 (Resolved): jewel: failed filelock.can_read(-1) assertion in Server::_dir_is_none...
02/20/2017
- 11:56 PM Bug #19022: Crash in Client::queue_cap_snap when thrashing
- http://pulpito.ceph.com/jspray-2017-02-20_15:59:37-fs-master---basic-smithi 837749 and 837668
- 11:55 PM Bug #19022 (Resolved): Crash in Client::queue_cap_snap when thrashing
- Seen on master. Mysterious regression....
- 02:04 PM Bug #18883: qa: failures in samba suite
- Latest run:
http://pulpito.ceph.com/jspray-2017-02-20_12:27:44-samba-master-testing-basic-smithi/
Now we're seein... - 08:01 AM Bug #18883: qa: failures in samba suite
- fix for the ceph-fuse bug: https://github.com/ceph/ceph/pull/13532
- 11:13 AM Bug #1656 (Won't Fix): Hadoop client unit test failures
- I'm told that there is a newer hdfs test suite that we would adopt if refreshing the hdfs support, so this ticket is ...
- 08:00 AM Bug #18995 (Fix Under Review): ceph-fuse always fails If pid file is non-empty and run as daemon
- https://github.com/ceph/ceph/pull/13532
- 07:58 AM Bug #18995 (Resolved): ceph-fuse always fails If pid file is non-empty and run as daemon
- It always fails with message like:...
Also available in: Atom