Activity
From 03/08/2017 to 04/06/2017
04/06/2017
- 01:30 PM Bug #18850: Leak in MDCache::handle_dentry_unlink
- It's likely some CInode/CDir/CDentry in cache are not properly freed when mds process exits. not caused by MDCache::h...
- 01:03 AM Bug #19406: MDS server crashes due to inconsistent metadata.
- Christoffer Lilja wrote:
> Hi,
>
> I have little spare time to spend on this and can't get past this compile erro...
04/05/2017
- 07:54 PM Bug #19406: MDS server crashes due to inconsistent metadata.
- Hi,
I have little spare time to spend on this and can't get past this compile error due to my very limited knowled... - 02:03 AM Bug #19406: MDS server crashes due to inconsistent metadata.
- Christoffer Lilja wrote:
> I downloaded the 10.2.6 tar and tried to apply the above patch from Zheng Yan but it got ... - 05:10 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
- Thanks for the link Zheng. Looks like that fix is in mainline as of 4.10.2. When I have a chance, I'll try upgrading ...
- 09:13 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
- elder one wrote:
> My cephfs clients (Ubuntu Xenial) are running kernel from Ubuntu PPA: http://kernel.ubuntu.com/~k... - 08:24 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
- My cephfs clients (Ubuntu Xenial) are running kernel from Ubuntu PPA: http://kernel.ubuntu.com/~kernel-ppa/mainline/v...
- 07:39 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
- elder one wrote:
> Hit the same bug today with 4.4.59 kernel client.
> 2 MDS servers (1 standby) 10.2.6-1 on Ubuntu... - 07:36 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
- Darrell Enns wrote:
> Zheng Yan wrote:
> > probably fixed by https://github.com/ceph/ceph-client/commit/10a2699426a... - 01:33 PM Bug #19501 (Fix Under Review): C_MDSInternalNoop::complete doesn't free itself
- 01:33 PM Bug #19501: C_MDSInternalNoop::complete doesn't free itself
- https://github.com/ceph/ceph/pull/14347
- 01:17 PM Bug #19501 (Resolved): C_MDSInternalNoop::complete doesn't free itself
- This cause memory leak
- 09:20 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
- geng jichao wrote:
> I have a question, if the file struct is destroyed,how to ensure that cache_ctl.index is correc... - 05:52 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
- I have a question, if the file struct is destroyed,how to ensure that cache_ctl.index is correct。
In other words,req... - 01:32 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
- kernel patch https://github.com/ceph/ceph-client/commit/b7e2eee12aa174bc91279a7cee85e9ea73092bad
- 06:01 AM Bug #19438: ceph mds error "No space left on device"
- ceph mds error "No space left on device" ,This problem I have encountered in the 10.2.6 version 。
Need to add a lin... - 04:12 AM Bug #19450: PurgeQueue read journal crash
- ...
04/04/2017
- 09:09 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
- Very interesting! That means we have:
4.4.59 - bug
4.7.5 - no bug (or at least rare enough that I'm not seeing it... - 06:03 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
- Hit the same bug today with 4.4.59 kernel client.
2 MDS servers (1 standby) 10.2.6-1 on Ubuntu Trusty.
From mds l... - 09:05 PM Bug #19388: mount.ceph does not accept -s option
- The kernel client has a mount helper in src/mount/mount.ceph.c -- although that only comes into play if the ceph pack...
- 01:50 PM Bug #19306 (Fix Under Review): fs: mount NFS to cephfs, and then ls a directory containing a larg...
- https://github.com/ceph/ceph/pull/14317
- 01:18 PM Bug #19456: Hadoop suite failing on missing upstream tarball
- I see the hadoop task is in ceph/teuthology - would it make sense to move it to ceph/ceph?
- 12:44 PM Backport #19483 (Resolved): kraken: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" co...
- https://github.com/ceph/ceph/pull/14573
- 12:44 PM Backport #19482 (Resolved): jewel: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" com...
- https://github.com/ceph/ceph/pull/14674
- 12:42 PM Backport #19466 (Resolved): jewel: mds: log rotation doesn't work if mds has respawned
- https://github.com/ceph/ceph/pull/14673
- 01:03 AM Bug #19445 (In Progress): client: warning: ‘*’ in boolean context, suggest ‘&&’ instead #14263
- https://github.com/ceph/ceph/pull/14308
04/03/2017
- 09:29 PM Bug #19456: Hadoop suite failing on missing upstream tarball
- Looks like v2.5.2 is too old. It is now in the Apache archive: https://archive.apache.org/dist/hadoop/core/hadoop-2.5...
- 08:20 PM Bug #19456 (Rejected): Hadoop suite failing on missing upstream tarball
- This is jewel v10.2.7 point release candidate
Also true on all branches
Run: http://pulpito.ceph.com/yuriw-2017-0... - 08:15 PM Bug #19388: mount.ceph does not accept -s option
- Looking at it, it seems I'll have to update the Linux kernel as well to support the sloppy option. Doing so in a sane...
- 12:53 PM Bug #19306 (In Progress): fs: mount NFS to cephfs, and then ls a directory containing a large num...
- 12:44 PM Bug #19450 (Resolved): PurgeQueue read journal crash
- ...
- 11:16 AM Bug #19437 (Fix Under Review): fs:The mount point break off when mds switch hanppened.
- https://github.com/ceph/ceph/pull/14267
04/02/2017
- 07:30 AM Bug #19445 (Resolved): client: warning: ‘*’ in boolean context, suggest ‘&&’ instead #14263
- The following warning appears during make. Here roll_die and empty() returns bool. This can be easily fixed by removi...
04/01/2017
- 06:15 PM Bug #19406: MDS server crashes due to inconsistent metadata.
- I downloaded the 10.2.6 tar and tried to apply the above patch from Zheng Yan but it got rejected. I guess it's writt...
03/31/2017
- 11:49 PM Bug #19291 (Pending Backport): mds: log rotation doesn't work if mds has respawned
- The bug is also in jewel 10.2.6, due to 6efad699249ba7c6928193dba111dbb23b606beb.
- 10:41 AM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- The log says "objects: max 100" but the default of client_oc_max_objects is 1000, so I assumed there was some manual ...
- 10:32 AM Bug #19438: ceph mds error "No space left on device"
- Hmm, so fragmentation is enabled but apparently isn't happening? Is your ceph.conf with "mds_bal_frag = true" presen...
- 10:17 AM Bug #19438 (Won't Fix): ceph mds error "No space left on device"
- Through testing the bash script for MDS cluster, create a test directory under the ceph mount path, in the test direc...
- 05:04 AM Bug #19437 (Resolved): fs:The mount point break off when mds switch hanppened.
My ceph version is jewel and my cluster have two nodes. I start two mds, one active and the other is hot-standby mo...- 02:58 AM Bug #18579 (Fix Under Review): Fuse client has "opening" session to nonexistent MDS rank after MD...
- should be fixed in https://github.com/ceph/ceph/pull/13698
03/30/2017
- 02:33 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- I don't think I have changed the defaults. How can I see the values I use?
- 12:35 PM Bug #19426: knfs blogbench hang
- I got several oops at other places. seem like random memory corruption, It only happens when cephfs is exported by nf...
- 09:57 AM Bug #19426: knfs blogbench hang
- ...
- 08:34 AM Bug #19426 (Can't reproduce): knfs blogbench hang
- http://pulpito.ceph.com/yuriw-2017-03-24_21:44:17-knfs-jewel-integration-ktdreyer-testing-basic-smithi/941067/
http:... - 01:23 AM Bug #19406: MDS server crashes due to inconsistent metadata.
- ...
03/29/2017
- 07:09 PM Bug #19406: MDS server crashes due to inconsistent metadata.
- I don't have a build environment nor the Ceph source code on the machines.
I'll download the source code, apply the ... - 06:45 PM Bug #19406: MDS server crashes due to inconsistent metadata.
- Looking at what DataScan.cc currently does, looks like we were injecting things with hash set to zero, so it would pi...
- 02:02 PM Bug #19406: MDS server crashes due to inconsistent metadata.
- looks like ~mds0's dir_layout.dl_dir_hash is wrong. Please check if below patch can prevent mds from crash...
- 01:28 PM Bug #19406: MDS server crashes due to inconsistent metadata.
- The logfile contains file structure that in it self is sensitive information, that's why it's truncated.
I'll see if... - 01:01 PM Bug #19406: MDS server crashes due to inconsistent metadata.
- To work out how the metadata is now corrupted (i.e. what the recovery tools broke), it would be useful to have the fu...
- 12:43 PM Bug #19406: MDS server crashes due to inconsistent metadata.
- Can you go back to your logs from when the original metadata damage occurred? There should be a message in your clus...
- 10:55 AM Bug #19406: MDS server crashes due to inconsistent metadata.
- I see now that the two other MDS servers crash the same way.
- 10:44 AM Bug #19406: MDS server crashes due to inconsistent metadata.
- A NIC broke and cause tremendous load on the server. Shortly after I had to kill the server by power, note that this ...
- 10:15 AM Bug #19406: MDS server crashes due to inconsistent metadata.
- Please also tell us what else was going on around this event. Had you recently upgraded? Had another daemon recentl...
- 09:43 AM Bug #19406: MDS server crashes due to inconsistent metadata.
- please set 'debug_mds = 20', restart the mds and upload the mds.log
- 07:57 AM Bug #19406 (Resolved): MDS server crashes due to inconsistent metadata.
- -48> 2017-03-26 23:34:38.886779 7f4b95030700 5 mds.neutron handle_mds_map epoch 620 from mon.1
-47> 2017-03-2...
03/28/2017
- 10:21 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- Hmm.
I've tried setting some comically low limits:... - 09:09 PM Bug #19401 (Fix Under Review): MDS goes readonly writing backtrace for a file whose data pool has...
- So on reflection I realise that the deletion/recovery cases are not an issue because those code paths already handle ...
- 05:28 PM Bug #19401 (Resolved): MDS goes readonly writing backtrace for a file whose data pool has been re...
- Reproduce:
1. create a pool
2. add it as a data pool
3. set that pool in a layout on the client, and write a file
... - 09:05 PM Bug #19395: "Too many inodes in cache" warning can happen even when trimming is working
- Couple of observations from today:
* after an mds failover, the issue cleared and we're back to having "ino" (CIno... - 02:55 PM Bug #19395: "Too many inodes in cache" warning can happen even when trimming is working
- Cache dumps in /root/19395 on mira060
- 01:36 PM Bug #19395 (Resolved): "Too many inodes in cache" warning can happen even when trimming is working
- 01:29 PM Bug #19291 (Resolved): mds: log rotation doesn't work if mds has respawned
- 01:29 PM Bug #19282 (Resolved): RecoveryQueue::_recovered asserts out on OSD errors
- 01:28 PM Fix #19288 (Resolved): Remove legacy "mds tell"
- 01:27 PM Bug #16709 (Pending Backport): No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" command
- 01:25 PM Feature #17604 (Resolved): MDSMonitor: raise health warning when there are no standbys but there ...
03/27/2017
- 05:26 PM Bug #19388: mount.ceph does not accept -s option
- OK, if you're motivated to do this (and update the pull request to implement -s rather than dropping it) then I don't...
- 05:24 PM Bug #19388: mount.ceph does not accept -s option
- I wasn't aware of this fix in autofs, thanks for pointing me to it.
However, several mount programs accept the -s ... - 10:32 AM Bug #19388: mount.ceph does not accept -s option
- According to the release notes, autofs 5.1.1 includes a fix to avoid passing -s to filesystems other than NFS (https:...
- 03:39 AM Bug #19388 (Fix Under Review): mount.ceph does not accept -s option
- https://github.com/ceph/ceph/pull/14158
- 11:58 AM Bug #16842: mds: replacement MDS crashes on InoTable release
- A patch that should allow MDSs to throw out the invalid sessions and start up. Does not fix whatever got the system ...
03/26/2017
- 12:31 PM Bug #19388 (Closed): mount.ceph does not accept -s option
- The mount.ceph tool does not accept the -s (sloppy) option. On Debian Jessie (and possibly more distributions), autof...
03/23/2017
- 01:35 PM Feature #18509 (Fix Under Review): MDS: damage reporting by ino number is useless
- 01:30 PM Feature #18509: MDS: damage reporting by ino number is useless
- https://github.com/ceph/ceph/pull/14104
- 02:16 AM Feature #19362 (Resolved): mds: add perf counters for each type of MDS operation
- It's desirable to know the types of operations are being performed on the MDS so we have a better idea over time what...
03/22/2017
- 06:31 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
- I have used the offset parameter of the ceph_dir_llseek function, and it will be passed to mds in readdir request,if ...
- 03:38 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
- This bug is not specific to kernel client. Enabling directory fragments can help. The complete fix is make client enc...
03/21/2017
- 08:56 PM Bug #16842: mds: replacement MDS crashes on InoTable release
- If anyone needs the debug level "30/30" dump I can send it directly to them.
Also, so far I have not nuked the cep... - 04:08 PM Bug #16842: mds: replacement MDS crashes on InoTable release
- I don't have any brilliant ideas for debugging this (can't go back in time to find out how those bogus inode ranges m...
- 08:39 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- Just in case it wasn't clear from the log snippet above: the current counter is continously growing until the process...
- 08:25 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- Hmm, I've looked at the code now and it turns out we do already have a client_oc_max_dirty setting that is supposed t...
- 05:18 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- I see ... so the fix would be to have a dirty_limit (like vm.dirty_ratio) above which all new request would not be ca...
- 03:11 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- Hmm, could be that we just have no limit on the number of dirty objects (trim is only dropping clean things), so if w...
- 01:10 PM Bug #19343 (New): OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limit...
- Creating a 100G file on CephFS reproducibly triggers OOM to kill the
ceph-fuse process.
With objectcacher debug ... - 01:01 PM Backport #19335 (Resolved): kraken: MDS heartbeat timeout during rejoin, when working with large ...
- https://github.com/ceph/ceph/pull/14572
- 01:01 PM Backport #19334 (Resolved): jewel: MDS heartbeat timeout during rejoin, when working with large a...
- https://github.com/ceph/ceph/pull/14672
03/20/2017
- 07:07 AM Bug #19245 (Resolved): Crash in PurgeQueue::_execute_item when deletions happen extremely quickly
03/19/2017
- 08:25 AM Bug #19306 (Resolved): fs: mount NFS to cephfs, and then ls a directory containing a large number...
- The ceph_readdir function save lot of date in the file->private_date, include the last_name which uses as offset.Howe...
03/18/2017
- 09:17 PM Bug #19240: multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid ...
- Note: inodes loaded is visible in mds-ino+.png in both workflow directories.
- 09:16 PM Bug #19240: multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid ...
- I believe I have this one figured out. The client requests graph (mds-request.png) shows that the 8 MDS workflow is r...
03/17/2017
- 04:37 PM Bug #19291 (Fix Under Review): mds: log rotation doesn't work if mds has respawned
- https://github.com/ceph/ceph/pull/14021
- 12:00 PM Bug #17939 (Fix Under Review): non-local cephfs quota changes not visible until some IO is done
- https://github.com/ceph/ceph/pull/14018
- 11:51 AM Bug #19282 (Fix Under Review): RecoveryQueue::_recovered asserts out on OSD errors
- https://github.com/ceph/ceph/pull/14017
- 11:36 AM Fix #19288 (Fix Under Review): Remove legacy "mds tell"
- https://github.com/ceph/ceph/pull/14015
03/16/2017
- 10:17 PM Bug #19291: mds: log rotation doesn't work if mds has respawned
- Sorry hit submit on accident before finishing writing this up. Standby!
- 10:16 PM Bug #19291 (Resolved): mds: log rotation doesn't work if mds has respawned
- If an MDS respawns then its "comm" name becomes "exe" which confuses logrotate since it relies on killlall. What ends...
- 09:42 PM Bug #16842: mds: replacement MDS crashes on InoTable release
Hi All,
After upgrading to 10.2.6 on Debian Jessie, the MDS server fails to start. Below is what is written to ...- 08:44 PM Feature #19286 (Closed): Log/warn if "mds bal fragment size max" is reached
- 10:32 AM Feature #19286: Log/warn if "mds bal fragment size max" is reached
- OK, not such a big deal for me, I guess I can increase fragment size till Luminous release.
- 10:14 AM Feature #19286: Log/warn if "mds bal fragment size max" is reached
- Were you planning to work on this?
I don't think we're likely to work on this as a new feature, given that in Lumi... - 09:51 AM Feature #19286 (Closed): Log/warn if "mds bal fragment size max" is reached
- Since in Jewel cephfs directory fragmentation is disabled by default, would be nice to know when we hit "mds bal frag...
- 04:49 PM Fix #19288 (Resolved): Remove legacy "mds tell"
This has been deprecated for a while in favour of the new-style "ceph tell mds.<id>". Luminous seems like a good t...- 10:14 AM Bug #18151 (Resolved): Incorrect report of size when quotas are enabled.
- 03:10 AM Bug #18151: Incorrect report of size when quotas are enabled.
- Hi John. Let us close it now since according to Greg it has been resolved. Still did not upgrade but will open a new ...
03/15/2017
- 09:51 PM Bug #19282 (Resolved): RecoveryQueue::_recovered asserts out on OSD errors
There's no good reason the OSD should ever give us errors in this case, but crashing out the MDS is definitely not ...- 09:16 PM Bug #19118 (Pending Backport): MDS heartbeat timeout during rejoin, when working with large amoun...
- 11:43 AM Bug #17939: non-local cephfs quota changes not visible until some IO is done
- Right, finally looking at this.
Turns out all our ceph.* vxattrs return whatever is in the local cache, with no re... - 10:30 AM Bug #18151: Incorrect report of size when quotas are enabled.
- Goncalo: did the issue go away for you? If so let's close this.
03/13/2017
- 01:47 PM Bug #18579: Fuse client has "opening" session to nonexistent MDS rank after MDS cluster shrink
- After discussion during standup, a few things probably needs to happen here:
* Upon receiving a new MDSMap, the cl...
03/10/2017
- 07:47 PM Bug #18579 (In Progress): Fuse client has "opening" session to nonexistent MDS rank after MDS clu...
- Sorry for the delay updating this.
The issue appears to be that the stopping MDS is still authoritative for the pa... - 01:08 PM Bug #19255 (Can't reproduce): qa: test_full_fclose failure
I suspect the rework of full handling broke this test:
https://github.com/ceph/ceph/pull/13615/files
It's faili...
03/09/2017
- 01:18 PM Bug #16709 (Fix Under Review): No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" command
- https://github.com/ceph/ceph/pull/13904
- 01:11 PM Bug #18238 (Can't reproduce): TestDataScan failing due to log "unmatched rstat on 100"
- This seems to have gone away.
- 01:08 PM Bug #19103 (Won't Fix): cephfs: Out of space handling
- The consensus at the moment seems to be that cephfs shouldn't be doing any nearfull special behaviour.
- 11:29 AM Bug #19245 (Fix Under Review): Crash in PurgeQueue::_execute_item when deletions happen extremely...
- https://github.com/ceph/ceph/pull/13899
- 10:37 AM Bug #19245 (Resolved): Crash in PurgeQueue::_execute_item when deletions happen extremely quickly
- There is a path here where the deletion completes so quickly that our call to gather.activate() is calling execute_it...
- 04:34 AM Bug #19243 (New): multimds on linode: MDSs make uneven progress on independent workloads
- I have a test that just finished with 4 MDS with 400k cache object limit and 64 clients. What appears to happen in th...
- 01:27 AM Bug #19240 (Closed): multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kern...
- This is possibly not a bug but I thought I would put it here to solicit comments/assistance on what could be causing ...
- 12:29 AM Bug #19239 (Resolved): mds: stray count remains static after workflows complete
- For the Linode tests, I've observed that after the clients finish removing their sandboxes (thus making the subtree c...
03/08/2017
- 05:15 PM Bug #16914: multimds: pathologically slow deletions in some tests
- Notes on discussion today:
* The issue is that we usually (single mds) do fine in these cases because of the proj... - 02:44 PM Feature #19230 (Resolved): Limit MDS deactivation to one at a time
- This needs to happen in the thrasher code, and also in the mon
- 12:17 PM Bug #19204: MDS assert failed when shutting down
- https://github.com/ceph/ceph/pull/13859
- 12:15 PM Bug #19204 (Fix Under Review): MDS assert failed when shutting down
- Hmm, we do shut down the objecter before the finisher, which is clearly not handling this case.
Let's try swapping... - 11:20 AM Bug #19201 (Resolved): mds: status asok command prints MDS_RANK_NONE as unsigned long -1: 1844674...
- 11:19 AM Feature #11950 (Resolved): Strays enqueued for purge cause MDCache to exceed size limit
- PurgeQueue has merged to master, will be in Luminous.
- 08:21 AM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
- John Spray wrote:
> After decreasing max_mds, you also need to use "ceph mds deactivate <rank>" to shrink the active... - 04:07 AM Bug #19220 (New): jewel: mds crashed (FAILED assert(mds->mds_lock.is_locked_by_me())
- http://qa-proxy.ceph.com/teuthology/teuthology-2017-02-19_10:10:02-fs-jewel---basic-smithi/833293/teuthology.log
<...
Also available in: Atom