Activity
From 02/28/2017 to 03/29/2017
03/29/2017
- 07:09 PM Bug #19406: MDS server crashes due to inconsistent metadata.
- I don't have a build environment nor the Ceph source code on the machines.
I'll download the source code, apply the ... - 06:45 PM Bug #19406: MDS server crashes due to inconsistent metadata.
- Looking at what DataScan.cc currently does, looks like we were injecting things with hash set to zero, so it would pi...
- 02:02 PM Bug #19406: MDS server crashes due to inconsistent metadata.
- looks like ~mds0's dir_layout.dl_dir_hash is wrong. Please check if below patch can prevent mds from crash...
- 01:28 PM Bug #19406: MDS server crashes due to inconsistent metadata.
- The logfile contains file structure that in it self is sensitive information, that's why it's truncated.
I'll see if... - 01:01 PM Bug #19406: MDS server crashes due to inconsistent metadata.
- To work out how the metadata is now corrupted (i.e. what the recovery tools broke), it would be useful to have the fu...
- 12:43 PM Bug #19406: MDS server crashes due to inconsistent metadata.
- Can you go back to your logs from when the original metadata damage occurred? There should be a message in your clus...
- 10:55 AM Bug #19406: MDS server crashes due to inconsistent metadata.
- I see now that the two other MDS servers crash the same way.
- 10:44 AM Bug #19406: MDS server crashes due to inconsistent metadata.
- A NIC broke and cause tremendous load on the server. Shortly after I had to kill the server by power, note that this ...
- 10:15 AM Bug #19406: MDS server crashes due to inconsistent metadata.
- Please also tell us what else was going on around this event. Had you recently upgraded? Had another daemon recentl...
- 09:43 AM Bug #19406: MDS server crashes due to inconsistent metadata.
- please set 'debug_mds = 20', restart the mds and upload the mds.log
- 07:57 AM Bug #19406 (Resolved): MDS server crashes due to inconsistent metadata.
- -48> 2017-03-26 23:34:38.886779 7f4b95030700 5 mds.neutron handle_mds_map epoch 620 from mon.1
-47> 2017-03-2...
03/28/2017
- 10:21 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- Hmm.
I've tried setting some comically low limits:... - 09:09 PM Bug #19401 (Fix Under Review): MDS goes readonly writing backtrace for a file whose data pool has...
- So on reflection I realise that the deletion/recovery cases are not an issue because those code paths already handle ...
- 05:28 PM Bug #19401 (Resolved): MDS goes readonly writing backtrace for a file whose data pool has been re...
- Reproduce:
1. create a pool
2. add it as a data pool
3. set that pool in a layout on the client, and write a file
... - 09:05 PM Bug #19395: "Too many inodes in cache" warning can happen even when trimming is working
- Couple of observations from today:
* after an mds failover, the issue cleared and we're back to having "ino" (CIno... - 02:55 PM Bug #19395: "Too many inodes in cache" warning can happen even when trimming is working
- Cache dumps in /root/19395 on mira060
- 01:36 PM Bug #19395 (Resolved): "Too many inodes in cache" warning can happen even when trimming is working
- 01:29 PM Bug #19291 (Resolved): mds: log rotation doesn't work if mds has respawned
- 01:29 PM Bug #19282 (Resolved): RecoveryQueue::_recovered asserts out on OSD errors
- 01:28 PM Fix #19288 (Resolved): Remove legacy "mds tell"
- 01:27 PM Bug #16709 (Pending Backport): No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" command
- 01:25 PM Feature #17604 (Resolved): MDSMonitor: raise health warning when there are no standbys but there ...
03/27/2017
- 05:26 PM Bug #19388: mount.ceph does not accept -s option
- OK, if you're motivated to do this (and update the pull request to implement -s rather than dropping it) then I don't...
- 05:24 PM Bug #19388: mount.ceph does not accept -s option
- I wasn't aware of this fix in autofs, thanks for pointing me to it.
However, several mount programs accept the -s ... - 10:32 AM Bug #19388: mount.ceph does not accept -s option
- According to the release notes, autofs 5.1.1 includes a fix to avoid passing -s to filesystems other than NFS (https:...
- 03:39 AM Bug #19388 (Fix Under Review): mount.ceph does not accept -s option
- https://github.com/ceph/ceph/pull/14158
- 11:58 AM Bug #16842: mds: replacement MDS crashes on InoTable release
- A patch that should allow MDSs to throw out the invalid sessions and start up. Does not fix whatever got the system ...
03/26/2017
- 12:31 PM Bug #19388 (Closed): mount.ceph does not accept -s option
- The mount.ceph tool does not accept the -s (sloppy) option. On Debian Jessie (and possibly more distributions), autof...
03/23/2017
- 01:35 PM Feature #18509 (Fix Under Review): MDS: damage reporting by ino number is useless
- 01:30 PM Feature #18509: MDS: damage reporting by ino number is useless
- https://github.com/ceph/ceph/pull/14104
- 02:16 AM Feature #19362 (Resolved): mds: add perf counters for each type of MDS operation
- It's desirable to know the types of operations are being performed on the MDS so we have a better idea over time what...
03/22/2017
- 06:31 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
- I have used the offset parameter of the ceph_dir_llseek function, and it will be passed to mds in readdir request,if ...
- 03:38 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
- This bug is not specific to kernel client. Enabling directory fragments can help. The complete fix is make client enc...
03/21/2017
- 08:56 PM Bug #16842: mds: replacement MDS crashes on InoTable release
- If anyone needs the debug level "30/30" dump I can send it directly to them.
Also, so far I have not nuked the cep... - 04:08 PM Bug #16842: mds: replacement MDS crashes on InoTable release
- I don't have any brilliant ideas for debugging this (can't go back in time to find out how those bogus inode ranges m...
- 08:39 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- Just in case it wasn't clear from the log snippet above: the current counter is continously growing until the process...
- 08:25 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- Hmm, I've looked at the code now and it turns out we do already have a client_oc_max_dirty setting that is supposed t...
- 05:18 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- I see ... so the fix would be to have a dirty_limit (like vm.dirty_ratio) above which all new request would not be ca...
- 03:11 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
- Hmm, could be that we just have no limit on the number of dirty objects (trim is only dropping clean things), so if w...
- 01:10 PM Bug #19343 (New): OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limit...
- Creating a 100G file on CephFS reproducibly triggers OOM to kill the
ceph-fuse process.
With objectcacher debug ... - 01:01 PM Backport #19335 (Resolved): kraken: MDS heartbeat timeout during rejoin, when working with large ...
- https://github.com/ceph/ceph/pull/14572
- 01:01 PM Backport #19334 (Resolved): jewel: MDS heartbeat timeout during rejoin, when working with large a...
- https://github.com/ceph/ceph/pull/14672
03/20/2017
- 07:07 AM Bug #19245 (Resolved): Crash in PurgeQueue::_execute_item when deletions happen extremely quickly
03/19/2017
- 08:25 AM Bug #19306 (Resolved): fs: mount NFS to cephfs, and then ls a directory containing a large number...
- The ceph_readdir function save lot of date in the file->private_date, include the last_name which uses as offset.Howe...
03/18/2017
- 09:17 PM Bug #19240: multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid ...
- Note: inodes loaded is visible in mds-ino+.png in both workflow directories.
- 09:16 PM Bug #19240: multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid ...
- I believe I have this one figured out. The client requests graph (mds-request.png) shows that the 8 MDS workflow is r...
03/17/2017
- 04:37 PM Bug #19291 (Fix Under Review): mds: log rotation doesn't work if mds has respawned
- https://github.com/ceph/ceph/pull/14021
- 12:00 PM Bug #17939 (Fix Under Review): non-local cephfs quota changes not visible until some IO is done
- https://github.com/ceph/ceph/pull/14018
- 11:51 AM Bug #19282 (Fix Under Review): RecoveryQueue::_recovered asserts out on OSD errors
- https://github.com/ceph/ceph/pull/14017
- 11:36 AM Fix #19288 (Fix Under Review): Remove legacy "mds tell"
- https://github.com/ceph/ceph/pull/14015
03/16/2017
- 10:17 PM Bug #19291: mds: log rotation doesn't work if mds has respawned
- Sorry hit submit on accident before finishing writing this up. Standby!
- 10:16 PM Bug #19291 (Resolved): mds: log rotation doesn't work if mds has respawned
- If an MDS respawns then its "comm" name becomes "exe" which confuses logrotate since it relies on killlall. What ends...
- 09:42 PM Bug #16842: mds: replacement MDS crashes on InoTable release
Hi All,
After upgrading to 10.2.6 on Debian Jessie, the MDS server fails to start. Below is what is written to ...- 08:44 PM Feature #19286 (Closed): Log/warn if "mds bal fragment size max" is reached
- 10:32 AM Feature #19286: Log/warn if "mds bal fragment size max" is reached
- OK, not such a big deal for me, I guess I can increase fragment size till Luminous release.
- 10:14 AM Feature #19286: Log/warn if "mds bal fragment size max" is reached
- Were you planning to work on this?
I don't think we're likely to work on this as a new feature, given that in Lumi... - 09:51 AM Feature #19286 (Closed): Log/warn if "mds bal fragment size max" is reached
- Since in Jewel cephfs directory fragmentation is disabled by default, would be nice to know when we hit "mds bal frag...
- 04:49 PM Fix #19288 (Resolved): Remove legacy "mds tell"
This has been deprecated for a while in favour of the new-style "ceph tell mds.<id>". Luminous seems like a good t...- 10:14 AM Bug #18151 (Resolved): Incorrect report of size when quotas are enabled.
- 03:10 AM Bug #18151: Incorrect report of size when quotas are enabled.
- Hi John. Let us close it now since according to Greg it has been resolved. Still did not upgrade but will open a new ...
03/15/2017
- 09:51 PM Bug #19282 (Resolved): RecoveryQueue::_recovered asserts out on OSD errors
There's no good reason the OSD should ever give us errors in this case, but crashing out the MDS is definitely not ...- 09:16 PM Bug #19118 (Pending Backport): MDS heartbeat timeout during rejoin, when working with large amoun...
- 11:43 AM Bug #17939: non-local cephfs quota changes not visible until some IO is done
- Right, finally looking at this.
Turns out all our ceph.* vxattrs return whatever is in the local cache, with no re... - 10:30 AM Bug #18151: Incorrect report of size when quotas are enabled.
- Goncalo: did the issue go away for you? If so let's close this.
03/13/2017
- 01:47 PM Bug #18579: Fuse client has "opening" session to nonexistent MDS rank after MDS cluster shrink
- After discussion during standup, a few things probably needs to happen here:
* Upon receiving a new MDSMap, the cl...
03/10/2017
- 07:47 PM Bug #18579 (In Progress): Fuse client has "opening" session to nonexistent MDS rank after MDS clu...
- Sorry for the delay updating this.
The issue appears to be that the stopping MDS is still authoritative for the pa... - 01:08 PM Bug #19255 (Can't reproduce): qa: test_full_fclose failure
I suspect the rework of full handling broke this test:
https://github.com/ceph/ceph/pull/13615/files
It's faili...
03/09/2017
- 01:18 PM Bug #16709 (Fix Under Review): No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" command
- https://github.com/ceph/ceph/pull/13904
- 01:11 PM Bug #18238 (Can't reproduce): TestDataScan failing due to log "unmatched rstat on 100"
- This seems to have gone away.
- 01:08 PM Bug #19103 (Won't Fix): cephfs: Out of space handling
- The consensus at the moment seems to be that cephfs shouldn't be doing any nearfull special behaviour.
- 11:29 AM Bug #19245 (Fix Under Review): Crash in PurgeQueue::_execute_item when deletions happen extremely...
- https://github.com/ceph/ceph/pull/13899
- 10:37 AM Bug #19245 (Resolved): Crash in PurgeQueue::_execute_item when deletions happen extremely quickly
- There is a path here where the deletion completes so quickly that our call to gather.activate() is calling execute_it...
- 04:34 AM Bug #19243 (New): multimds on linode: MDSs make uneven progress on independent workloads
- I have a test that just finished with 4 MDS with 400k cache object limit and 64 clients. What appears to happen in th...
- 01:27 AM Bug #19240 (Closed): multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kern...
- This is possibly not a bug but I thought I would put it here to solicit comments/assistance on what could be causing ...
- 12:29 AM Bug #19239 (Resolved): mds: stray count remains static after workflows complete
- For the Linode tests, I've observed that after the clients finish removing their sandboxes (thus making the subtree c...
03/08/2017
- 05:15 PM Bug #16914: multimds: pathologically slow deletions in some tests
- Notes on discussion today:
* The issue is that we usually (single mds) do fine in these cases because of the proj... - 02:44 PM Feature #19230 (Resolved): Limit MDS deactivation to one at a time
- This needs to happen in the thrasher code, and also in the mon
- 12:17 PM Bug #19204: MDS assert failed when shutting down
- https://github.com/ceph/ceph/pull/13859
- 12:15 PM Bug #19204 (Fix Under Review): MDS assert failed when shutting down
- Hmm, we do shut down the objecter before the finisher, which is clearly not handling this case.
Let's try swapping... - 11:20 AM Bug #19201 (Resolved): mds: status asok command prints MDS_RANK_NONE as unsigned long -1: 1844674...
- 11:19 AM Feature #11950 (Resolved): Strays enqueued for purge cause MDCache to exceed size limit
- PurgeQueue has merged to master, will be in Luminous.
- 08:21 AM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
- John Spray wrote:
> After decreasing max_mds, you also need to use "ceph mds deactivate <rank>" to shrink the active... - 04:07 AM Bug #19220 (New): jewel: mds crashed (FAILED assert(mds->mds_lock.is_locked_by_me())
- http://qa-proxy.ceph.com/teuthology/teuthology-2017-02-19_10:10:02-fs-jewel---basic-smithi/833293/teuthology.log
<...
03/07/2017
- 02:38 PM Backport #19206 (In Progress): jewel: Invalid error code returned by MDS is causing a kernel clie...
- 01:28 PM Backport #19206 (Resolved): jewel: Invalid error code returned by MDS is causing a kernel client ...
- https://github.com/ceph/ceph/pull/13831
- 01:22 PM Bug #19205 (Resolved): Invalid error code returned by MDS is causing a kernel client WARNING
- Found by an occasionally failing xfstest generic/011.
After some investigation, it was found out that a positive e... - 09:20 AM Bug #19204 (Resolved): MDS assert failed when shutting down
- We encountered a failed assertion when trying to shutdown an MDS. Here is a snippet of the log:
> -14> 2017-01-22 ... - 07:06 AM Subtask #10489: Mpi tests fail on both ceph-fuse and kclient
- Zheng Yan wrote:
> [...]
>
> the test passes after applying above patch and adding "-W -R" options to fsx-mpi, me...
03/06/2017
- 08:45 PM Bug #19201 (Fix Under Review): mds: status asok command prints MDS_RANK_NONE as unsigned long -1:...
- https://github.com/ceph/ceph/pull/13816
- 08:36 PM Bug #19201 (Resolved): mds: status asok command prints MDS_RANK_NONE as unsigned long -1: 1844674...
- i.e....
- 01:06 PM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
- After decreasing max_mds, you also need to use "ceph mds deactivate <rank>" to shrink the active cluster, so that one...
- 04:21 AM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
- John Spray wrote:
> Having multiple active MDS daemons does not remove the need for standby daemons. Set max_mds to... - 12:54 PM Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodes
- John Spray wrote:
> https://github.com/ceph/ceph/pull/13807
>
> I've added a periodic heartbeat reset in the long... - 12:00 PM Bug #19118 (Fix Under Review): MDS heartbeat timeout during rejoin, when working with large amoun...
- https://github.com/ceph/ceph/pull/13807
I've added a periodic heartbeat reset in the long loop in export_remaining... - 06:13 AM Bug #18797 (Duplicate): valgrind jobs hanging in fs suite
- the master now passes, see http://pulpito.ceph.com/kchai-2017-03-06_05:57:24-fs-master---basic-mira/
03/03/2017
- 10:18 AM Bug #16914: multimds: pathologically slow deletions in some tests
- Pulled the test off wip-11950 and onto https://github.com/ceph/ceph/pull/13770
- 09:09 AM Feature #19135 (Rejected): Multi-Mds: One dead, all dead; what is Robustness??
- Having multiple active MDS daemons does not remove the need for standby daemons. Set max_mds to something less than ...
- 07:54 AM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
- Zheng Yan wrote:
> multiple active mds is mainly for improving performance (balance load to multiple mds). robustnes... - 06:59 AM Feature #19135: Multi-Mds: One dead, all dead; what is Robustness??
- multiple active mds is mainly for improving performance (balance load to multiple mds). robustness is achieved standb...
- 02:57 AM Feature #19135 (Rejected): Multi-Mds: One dead, all dead; what is Robustness??
- I have three mds, when I copy file to fuse-type mountpoint for test.
when I shutdown one mds, the client hang, the m... - 07:19 AM Feature #19137 (New): samba: implement aio callback for ceph vfs module
03/02/2017
- 11:52 AM Bug #17656: cephfs: high concurrent causing slow request
- jichao sun wrote:
> I have the same problem too!!!
- 08:27 AM Bug #17656: cephfs: high concurrent causing slow request
- I have the same problem too!!!
- 08:33 AM Bug #18995 (Resolved): ceph-fuse always fails If pid file is non-empty and run as daemon
- 08:11 AM Bug #18730: mds: backtrace issues getxattr for every file with cap on rejoin
- Zheng Yan wrote:
> I think we should design a new mechanism to track in-use inodes (current method isn't scalable be... - 03:40 AM Bug #18730: mds: backtrace issues getxattr for every file with cap on rejoin
- I think we should design a new mechanism to track in-use inodes (current method isn't scalable because it journals al...
03/01/2017
- 11:30 PM Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodes
- I have that already. I did set the beacon_grace to 600s to walk around the bug and bring the cluster back.
Seems r... - 11:16 PM Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodes
- I have that already. I did set the beacon_grace to 600s to walk around the bug and bring the cluster back.
In firs... - 03:52 PM Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodes
- May be related to http://tracker.ceph.com/issues/18730, although perhaps not since that one shouldn't be causing the ...
- 03:36 PM Bug #19118 (Resolved): MDS heartbeat timeout during rejoin, when working with large amount of cap...
- We set an alarm every OPTION(mds_beacon_grace, OPT_FLOAT, 15) seconds, if mds_rank doesnt finish its task within this...
- 05:36 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
- Zheng Yan wrote:
> probably fixed by https://github.com/ceph/ceph-client/commit/10a2699426a732cbf3fc9e835187e8b914f0... - 03:31 PM Feature #16523 (In Progress): Assert directory fragmentation is occuring during stress tests
- 11:09 AM Bug #18883: qa: failures in samba suite
- open ticket for samba build http://tracker.ceph.com/issues/19117
- 08:21 AM Bug #17828 (Need More Info): libceph setxattr returns 0 without setting the attr
- ceph.quota.max_files is hidden xattr. It doesn't show in listxattr. you need to get it explictly (getfattr -n ceph.qu...
02/28/2017
- 02:12 PM Bug #19103: cephfs: Out of space handling
- Looking again now that I'm a few coffees into my day -- all the cephfs enospc stuff is just aimed at providing a slic...
- 08:58 AM Bug #19103: cephfs: Out of space handling
- Believe it or not I sneezed and somehow that caused me to select some affected versions...
- 08:58 AM Bug #19103: cephfs: Out of space handling
- Also, I'm not sure we actually need to do sync writes when the cluster is near full -- we already have machinery that...
- 08:56 AM Bug #19103: cephfs: Out of space handling
- Could the "ENOSPC on failsafe_full_ratio" behaviour be the default? It seems like any application layer that wants t...
- 12:13 AM Bug #19103 (Won't Fix): cephfs: Out of space handling
Cephfs needs to be more careful on a cluster with almost full OSDs. There is a delay in OSDs reporting stats, a MO...- 01:37 PM Feature #19109 (Resolved): Use data pool's 'df' for statfs instead of global stats, if there is o...
The client sends a MStatfs to the mon to get the info for a statfs system call. Currently the mon gives it the glo...- 09:03 AM Bug #17828: libceph setxattr returns 0 without setting the attr
- This ticket didn't get noticed because it was filed in the 'mgr' component instead of the 'fs' component.
Chris: d...
Also available in: Atom