Project

General

Profile

Activity

From 03/08/2017 to 04/06/2017

04/06/2017

01:30 PM Bug #18850: Leak in MDCache::handle_dentry_unlink
It's likely some CInode/CDir/CDentry in cache are not properly freed when mds process exits. not caused by MDCache::h... Zheng Yan
01:03 AM Bug #19406: MDS server crashes due to inconsistent metadata.
Christoffer Lilja wrote:
> Hi,
>
> I have little spare time to spend on this and can't get past this compile erro...
Zheng Yan

04/05/2017

07:54 PM Bug #19406: MDS server crashes due to inconsistent metadata.
Hi,
I have little spare time to spend on this and can't get past this compile error due to my very limited knowled...
Christoffer Lilja
02:03 AM Bug #19406: MDS server crashes due to inconsistent metadata.
Christoffer Lilja wrote:
> I downloaded the 10.2.6 tar and tried to apply the above patch from Zheng Yan but it got ...
Zheng Yan
05:10 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Thanks for the link Zheng. Looks like that fix is in mainline as of 4.10.2. When I have a chance, I'll try upgrading ... Darrell Enns
09:13 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
elder one wrote:
> My cephfs clients (Ubuntu Xenial) are running kernel from Ubuntu PPA: http://kernel.ubuntu.com/~k...
Zheng Yan
08:24 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
My cephfs clients (Ubuntu Xenial) are running kernel from Ubuntu PPA: http://kernel.ubuntu.com/~kernel-ppa/mainline/v... elder one
07:39 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
elder one wrote:
> Hit the same bug today with 4.4.59 kernel client.
> 2 MDS servers (1 standby) 10.2.6-1 on Ubuntu...
Zheng Yan
07:36 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Darrell Enns wrote:
> Zheng Yan wrote:
> > probably fixed by https://github.com/ceph/ceph-client/commit/10a2699426a...
Zheng Yan
01:33 PM Bug #19501 (Fix Under Review): C_MDSInternalNoop::complete doesn't free itself
Zheng Yan
01:33 PM Bug #19501: C_MDSInternalNoop::complete doesn't free itself
https://github.com/ceph/ceph/pull/14347 Zheng Yan
01:17 PM Bug #19501 (Resolved): C_MDSInternalNoop::complete doesn't free itself
This cause memory leak Zheng Yan
09:20 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
geng jichao wrote:
> I have a question, if the file struct is destroyed,how to ensure that cache_ctl.index is correc...
Zheng Yan
05:52 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
I have a question, if the file struct is destroyed,how to ensure that cache_ctl.index is correct。
In other words,req...
geng jichao
01:32 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
kernel patch https://github.com/ceph/ceph-client/commit/b7e2eee12aa174bc91279a7cee85e9ea73092bad Zheng Yan
06:01 AM Bug #19438: ceph mds error "No space left on device"
ceph mds error "No space left on device" ,This problem I have encountered in the 10.2.6 version 。
Need to add a lin...
berlin sun
04:12 AM Bug #19450: PurgeQueue read journal crash
... Zheng Yan

04/04/2017

09:09 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Very interesting! That means we have:
4.4.59 - bug
4.7.5 - no bug (or at least rare enough that I'm not seeing it...
Darrell Enns
06:03 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Hit the same bug today with 4.4.59 kernel client.
2 MDS servers (1 standby) 10.2.6-1 on Ubuntu Trusty.
From mds l...
elder one
09:05 PM Bug #19388: mount.ceph does not accept -s option
The kernel client has a mount helper in src/mount/mount.ceph.c -- although that only comes into play if the ceph pack... John Spray
01:50 PM Bug #19306 (Fix Under Review): fs: mount NFS to cephfs, and then ls a directory containing a larg...
https://github.com/ceph/ceph/pull/14317 Zheng Yan
01:18 PM Bug #19456: Hadoop suite failing on missing upstream tarball
I see the hadoop task is in ceph/teuthology - would it make sense to move it to ceph/ceph? Nathan Cutler
12:44 PM Backport #19483 (Resolved): kraken: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" co...
https://github.com/ceph/ceph/pull/14573 Nathan Cutler
12:44 PM Backport #19482 (Resolved): jewel: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" com...
https://github.com/ceph/ceph/pull/14674 Nathan Cutler
12:42 PM Backport #19466 (Resolved): jewel: mds: log rotation doesn't work if mds has respawned
https://github.com/ceph/ceph/pull/14673 Nathan Cutler
01:03 AM Bug #19445 (In Progress): client: warning: ‘*’ in boolean context, suggest ‘&&’ instead #14263
https://github.com/ceph/ceph/pull/14308 Brad Hubbard

04/03/2017

09:29 PM Bug #19456: Hadoop suite failing on missing upstream tarball
Looks like v2.5.2 is too old. It is now in the Apache archive: https://archive.apache.org/dist/hadoop/core/hadoop-2.5... Ken Dreyer
08:20 PM Bug #19456 (Rejected): Hadoop suite failing on missing upstream tarball
This is jewel v10.2.7 point release candidate
Also true on all branches
Run: http://pulpito.ceph.com/yuriw-2017-0...
Yuri Weinstein
08:15 PM Bug #19388: mount.ceph does not accept -s option
Looking at it, it seems I'll have to update the Linux kernel as well to support the sloppy option. Doing so in a sane... Michel Roelofs
12:53 PM Bug #19306 (In Progress): fs: mount NFS to cephfs, and then ls a directory containing a large num...
Zheng Yan
12:44 PM Bug #19450 (Resolved): PurgeQueue read journal crash
... Zheng Yan
11:16 AM Bug #19437 (Fix Under Review): fs:The mount point break off when mds switch hanppened.
https://github.com/ceph/ceph/pull/14267 John Spray

04/02/2017

07:30 AM Bug #19445 (Resolved): client: warning: ‘*’ in boolean context, suggest ‘&&’ instead #14263
The following warning appears during make. Here roll_die and empty() returns bool. This can be easily fixed by removi... Jos Collin

04/01/2017

06:15 PM Bug #19406: MDS server crashes due to inconsistent metadata.
I downloaded the 10.2.6 tar and tried to apply the above patch from Zheng Yan but it got rejected. I guess it's writt... Christoffer Lilja

03/31/2017

11:49 PM Bug #19291 (Pending Backport): mds: log rotation doesn't work if mds has respawned
The bug is also in jewel 10.2.6, due to 6efad699249ba7c6928193dba111dbb23b606beb. Patrick Donnelly
10:41 AM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
The log says "objects: max 100" but the default of client_oc_max_objects is 1000, so I assumed there was some manual ... John Spray
10:32 AM Bug #19438: ceph mds error "No space left on device"
Hmm, so fragmentation is enabled but apparently isn't happening? Is your ceph.conf with "mds_bal_frag = true" presen... John Spray
10:17 AM Bug #19438 (Won't Fix): ceph mds error "No space left on device"
Through testing the bash script for MDS cluster, create a test directory under the ceph mount path, in the test direc... william sheng
05:04 AM Bug #19437 (Resolved): fs:The mount point break off when mds switch hanppened.

My ceph version is jewel and my cluster have two nodes. I start two mds, one active and the other is hot-standby mo...
Ivan Guan
02:58 AM Bug #18579 (Fix Under Review): Fuse client has "opening" session to nonexistent MDS rank after MD...
should be fixed in https://github.com/ceph/ceph/pull/13698 Zheng Yan

03/30/2017

02:33 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
I don't think I have changed the defaults. How can I see the values I use? Arne Wiebalck
12:35 PM Bug #19426: knfs blogbench hang
I got several oops at other places. seem like random memory corruption, It only happens when cephfs is exported by nf... Zheng Yan
09:57 AM Bug #19426: knfs blogbench hang
... Zheng Yan
08:34 AM Bug #19426 (Can't reproduce): knfs blogbench hang
http://pulpito.ceph.com/yuriw-2017-03-24_21:44:17-knfs-jewel-integration-ktdreyer-testing-basic-smithi/941067/
http:...
Zheng Yan
01:23 AM Bug #19406: MDS server crashes due to inconsistent metadata.
... Zheng Yan

03/29/2017

07:09 PM Bug #19406: MDS server crashes due to inconsistent metadata.
I don't have a build environment nor the Ceph source code on the machines.
I'll download the source code, apply the ...
Christoffer Lilja
06:45 PM Bug #19406: MDS server crashes due to inconsistent metadata.
Looking at what DataScan.cc currently does, looks like we were injecting things with hash set to zero, so it would pi... John Spray
02:02 PM Bug #19406: MDS server crashes due to inconsistent metadata.
looks like ~mds0's dir_layout.dl_dir_hash is wrong. Please check if below patch can prevent mds from crash... Zheng Yan
01:28 PM Bug #19406: MDS server crashes due to inconsistent metadata.
The logfile contains file structure that in it self is sensitive information, that's why it's truncated.
I'll see if...
Christoffer Lilja
01:01 PM Bug #19406: MDS server crashes due to inconsistent metadata.
To work out how the metadata is now corrupted (i.e. what the recovery tools broke), it would be useful to have the fu... John Spray
12:43 PM Bug #19406: MDS server crashes due to inconsistent metadata.
Can you go back to your logs from when the original metadata damage occurred? There should be a message in your clus... John Spray
10:55 AM Bug #19406: MDS server crashes due to inconsistent metadata.
I see now that the two other MDS servers crash the same way. Christoffer Lilja
10:44 AM Bug #19406: MDS server crashes due to inconsistent metadata.
A NIC broke and cause tremendous load on the server. Shortly after I had to kill the server by power, note that this ... Christoffer Lilja
10:15 AM Bug #19406: MDS server crashes due to inconsistent metadata.
Please also tell us what else was going on around this event. Had you recently upgraded? Had another daemon recentl... John Spray
09:43 AM Bug #19406: MDS server crashes due to inconsistent metadata.
please set 'debug_mds = 20', restart the mds and upload the mds.log Zheng Yan
07:57 AM Bug #19406 (Resolved): MDS server crashes due to inconsistent metadata.
-48> 2017-03-26 23:34:38.886779 7f4b95030700 5 mds.neutron handle_mds_map epoch 620 from mon.1
-47> 2017-03-2...
Christoffer Lilja

03/28/2017

10:21 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
Hmm.
I've tried setting some comically low limits:...
John Spray
09:09 PM Bug #19401 (Fix Under Review): MDS goes readonly writing backtrace for a file whose data pool has...
So on reflection I realise that the deletion/recovery cases are not an issue because those code paths already handle ... John Spray
05:28 PM Bug #19401 (Resolved): MDS goes readonly writing backtrace for a file whose data pool has been re...
Reproduce:
1. create a pool
2. add it as a data pool
3. set that pool in a layout on the client, and write a file
...
John Spray
09:05 PM Bug #19395: "Too many inodes in cache" warning can happen even when trimming is working
Couple of observations from today:
* after an mds failover, the issue cleared and we're back to having "ino" (CIno...
John Spray
02:55 PM Bug #19395: "Too many inodes in cache" warning can happen even when trimming is working
Cache dumps in /root/19395 on mira060 John Spray
01:36 PM Bug #19395 (Resolved): "Too many inodes in cache" warning can happen even when trimming is working
John Spray
01:29 PM Bug #19291 (Resolved): mds: log rotation doesn't work if mds has respawned
John Spray
01:29 PM Bug #19282 (Resolved): RecoveryQueue::_recovered asserts out on OSD errors
John Spray
01:28 PM Fix #19288 (Resolved): Remove legacy "mds tell"
John Spray
01:27 PM Bug #16709 (Pending Backport): No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" command
John Spray
01:25 PM Feature #17604 (Resolved): MDSMonitor: raise health warning when there are no standbys but there ...
John Spray

03/27/2017

05:26 PM Bug #19388: mount.ceph does not accept -s option
OK, if you're motivated to do this (and update the pull request to implement -s rather than dropping it) then I don't... John Spray
05:24 PM Bug #19388: mount.ceph does not accept -s option
I wasn't aware of this fix in autofs, thanks for pointing me to it.
However, several mount programs accept the -s ...
Michel Roelofs
10:32 AM Bug #19388: mount.ceph does not accept -s option
According to the release notes, autofs 5.1.1 includes a fix to avoid passing -s to filesystems other than NFS (https:... John Spray
03:39 AM Bug #19388 (Fix Under Review): mount.ceph does not accept -s option
https://github.com/ceph/ceph/pull/14158 Kefu Chai
11:58 AM Bug #16842: mds: replacement MDS crashes on InoTable release
A patch that should allow MDSs to throw out the invalid sessions and start up. Does not fix whatever got the system ... John Spray

03/26/2017

12:31 PM Bug #19388 (Closed): mount.ceph does not accept -s option
The mount.ceph tool does not accept the -s (sloppy) option. On Debian Jessie (and possibly more distributions), autof... Michel Roelofs

03/23/2017

01:35 PM Feature #18509 (Fix Under Review): MDS: damage reporting by ino number is useless
John Spray
01:30 PM Feature #18509: MDS: damage reporting by ino number is useless
https://github.com/ceph/ceph/pull/14104 John Spray
02:16 AM Feature #19362 (Resolved): mds: add perf counters for each type of MDS operation
It's desirable to know the types of operations are being performed on the MDS so we have a better idea over time what... Patrick Donnelly

03/22/2017

06:31 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
I have used the offset parameter of the ceph_dir_llseek function, and it will be passed to mds in readdir request,if ... geng jichao
03:38 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
This bug is not specific to kernel client. Enabling directory fragments can help. The complete fix is make client enc... Zheng Yan

03/21/2017

08:56 PM Bug #16842: mds: replacement MDS crashes on InoTable release
If anyone needs the debug level "30/30" dump I can send it directly to them.
Also, so far I have not nuked the cep...
c sights
04:08 PM Bug #16842: mds: replacement MDS crashes on InoTable release
I don't have any brilliant ideas for debugging this (can't go back in time to find out how those bogus inode ranges m... John Spray
08:39 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
Just in case it wasn't clear from the log snippet above: the current counter is continously growing until the process... Arne Wiebalck
08:25 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
Hmm, I've looked at the code now and it turns out we do already have a client_oc_max_dirty setting that is supposed t... John Spray
05:18 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
I see ... so the fix would be to have a dirty_limit (like vm.dirty_ratio) above which all new request would not be ca... Arne Wiebalck
03:11 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
Hmm, could be that we just have no limit on the number of dirty objects (trim is only dropping clean things), so if w... John Spray
01:10 PM Bug #19343 (New): OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limit...
Creating a 100G file on CephFS reproducibly triggers OOM to kill the
ceph-fuse process.
With objectcacher debug ...
Arne Wiebalck
01:01 PM Backport #19335 (Resolved): kraken: MDS heartbeat timeout during rejoin, when working with large ...
https://github.com/ceph/ceph/pull/14572 Nathan Cutler
01:01 PM Backport #19334 (Resolved): jewel: MDS heartbeat timeout during rejoin, when working with large a...
https://github.com/ceph/ceph/pull/14672 Nathan Cutler

03/20/2017

07:07 AM Bug #19245 (Resolved): Crash in PurgeQueue::_execute_item when deletions happen extremely quickly
John Spray

03/19/2017

08:25 AM Bug #19306 (Resolved): fs: mount NFS to cephfs, and then ls a directory containing a large number...
The ceph_readdir function save lot of date in the file->private_date, include the last_name which uses as offset.Howe... geng jichao

03/18/2017

09:17 PM Bug #19240: multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid ...
Note: inodes loaded is visible in mds-ino+.png in both workflow directories. Patrick Donnelly
09:16 PM Bug #19240: multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid ...
I believe I have this one figured out. The client requests graph (mds-request.png) shows that the 8 MDS workflow is r... Patrick Donnelly

03/17/2017

04:37 PM Bug #19291 (Fix Under Review): mds: log rotation doesn't work if mds has respawned
https://github.com/ceph/ceph/pull/14021 Patrick Donnelly
12:00 PM Bug #17939 (Fix Under Review): non-local cephfs quota changes not visible until some IO is done
https://github.com/ceph/ceph/pull/14018 John Spray
11:51 AM Bug #19282 (Fix Under Review): RecoveryQueue::_recovered asserts out on OSD errors
https://github.com/ceph/ceph/pull/14017 John Spray
11:36 AM Fix #19288 (Fix Under Review): Remove legacy "mds tell"
https://github.com/ceph/ceph/pull/14015 John Spray

03/16/2017

10:17 PM Bug #19291: mds: log rotation doesn't work if mds has respawned
Sorry hit submit on accident before finishing writing this up. Standby! Patrick Donnelly
10:16 PM Bug #19291 (Resolved): mds: log rotation doesn't work if mds has respawned
If an MDS respawns then its "comm" name becomes "exe" which confuses logrotate since it relies on killlall. What ends... Patrick Donnelly
09:42 PM Bug #16842: mds: replacement MDS crashes on InoTable release

Hi All,
After upgrading to 10.2.6 on Debian Jessie, the MDS server fails to start. Below is what is written to ...
c sights
08:44 PM Feature #19286 (Closed): Log/warn if "mds bal fragment size max" is reached
John Spray
10:32 AM Feature #19286: Log/warn if "mds bal fragment size max" is reached
OK, not such a big deal for me, I guess I can increase fragment size till Luminous release. elder one
10:14 AM Feature #19286: Log/warn if "mds bal fragment size max" is reached
Were you planning to work on this?
I don't think we're likely to work on this as a new feature, given that in Lumi...
John Spray
09:51 AM Feature #19286 (Closed): Log/warn if "mds bal fragment size max" is reached
Since in Jewel cephfs directory fragmentation is disabled by default, would be nice to know when we hit "mds bal frag... elder one
04:49 PM Fix #19288 (Resolved): Remove legacy "mds tell"

This has been deprecated for a while in favour of the new-style "ceph tell mds.<id>". Luminous seems like a good t...
John Spray
10:14 AM Bug #18151 (Resolved): Incorrect report of size when quotas are enabled.
John Spray
03:10 AM Bug #18151: Incorrect report of size when quotas are enabled.
Hi John. Let us close it now since according to Greg it has been resolved. Still did not upgrade but will open a new ... Goncalo Borges

03/15/2017

09:51 PM Bug #19282 (Resolved): RecoveryQueue::_recovered asserts out on OSD errors

There's no good reason the OSD should ever give us errors in this case, but crashing out the MDS is definitely not ...
John Spray
09:16 PM Bug #19118 (Pending Backport): MDS heartbeat timeout during rejoin, when working with large amoun...
John Spray
11:43 AM Bug #17939: non-local cephfs quota changes not visible until some IO is done
Right, finally looking at this.
Turns out all our ceph.* vxattrs return whatever is in the local cache, with no re...
John Spray
10:30 AM Bug #18151: Incorrect report of size when quotas are enabled.
Goncalo: did the issue go away for you? If so let's close this. John Spray

03/13/2017

01:47 PM Bug #18579: Fuse client has "opening" session to nonexistent MDS rank after MDS cluster shrink
After discussion during standup, a few things probably needs to happen here:
* Upon receiving a new MDSMap, the cl...
Patrick Donnelly

03/10/2017

07:47 PM Bug #18579 (In Progress): Fuse client has "opening" session to nonexistent MDS rank after MDS clu...
Sorry for the delay updating this.
The issue appears to be that the stopping MDS is still authoritative for the pa...
Patrick Donnelly
01:08 PM Bug #19255 (Can't reproduce): qa: test_full_fclose failure

I suspect the rework of full handling broke this test:
https://github.com/ceph/ceph/pull/13615/files
It's faili...
John Spray

03/09/2017

01:18 PM Bug #16709 (Fix Under Review): No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" command
https://github.com/ceph/ceph/pull/13904 John Spray
01:11 PM Bug #18238 (Can't reproduce): TestDataScan failing due to log "unmatched rstat on 100"
This seems to have gone away. John Spray
01:08 PM Bug #19103 (Won't Fix): cephfs: Out of space handling
The consensus at the moment seems to be that cephfs shouldn't be doing any nearfull special behaviour. John Spray
11:29 AM Bug #19245 (Fix Under Review): Crash in PurgeQueue::_execute_item when deletions happen extremely...
https://github.com/ceph/ceph/pull/13899 John Spray
10:37 AM Bug #19245 (Resolved): Crash in PurgeQueue::_execute_item when deletions happen extremely quickly
There is a path here where the deletion completes so quickly that our call to gather.activate() is calling execute_it... John Spray
04:34 AM Bug #19243 (New): multimds on linode: MDSs make uneven progress on independent workloads
I have a test that just finished with 4 MDS with 400k cache object limit and 64 clients. What appears to happen in th... Patrick Donnelly
01:27 AM Bug #19240 (Closed): multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kern...
This is possibly not a bug but I thought I would put it here to solicit comments/assistance on what could be causing ... Patrick Donnelly
12:29 AM Bug #19239 (Resolved): mds: stray count remains static after workflows complete
For the Linode tests, I've observed that after the clients finish removing their sandboxes (thus making the subtree c... Patrick Donnelly

03/08/2017

05:15 PM Bug #16914: multimds: pathologically slow deletions in some tests
Notes on discussion today:
* The issue is that we usually (single mds) do fine in these cases because of the proj...
John Spray
02:44 PM Feature #19230 (Resolved): Limit MDS deactivation to one at a time
This needs to happen in the thrasher code, and also in the mon John Spray
12:17 PM Bug #19204: MDS assert failed when shutting down
https://github.com/ceph/ceph/pull/13859 John Spray
12:15 PM Bug #19204 (Fix Under Review): MDS assert failed when shutting down
Hmm, we do shut down the objecter before the finisher, which is clearly not handling this case.
Let's try swapping...
John Spray
11:20 AM Bug #19201 (Resolved): mds: status asok command prints MDS_RANK_NONE as unsigned long -1: 1844674...
John Spray
11:19 AM Feature #11950 (Resolved): Strays enqueued for purge cause MDCache to exceed size limit
PurgeQueue has merged to master, will be in Luminous. John Spray
08:21 AM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
John Spray wrote:
> After decreasing max_mds, you also need to use "ceph mds deactivate <rank>" to shrink the active...
xianglong wang
04:07 AM Bug #19220 (New): jewel: mds crashed (FAILED assert(mds->mds_lock.is_locked_by_me())
http://qa-proxy.ceph.com/teuthology/teuthology-2017-02-19_10:10:02-fs-jewel---basic-smithi/833293/teuthology.log
<...
Zheng Yan
 

Also available in: Atom