Project

General

Profile

Activity

From 03/16/2017 to 04/14/2017

04/14/2017

10:00 PM Backport #19620 (In Progress): kraken: MDS server crashes due to inconsistent metadata.
Nathan Cutler
09:59 PM Bug #19406: MDS server crashes due to inconsistent metadata.
*master PR*: https://github.com/ceph/ceph/pull/14234 Nathan Cutler
09:57 PM Backport #19483 (In Progress): kraken: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it"...
Nathan Cutler
09:55 PM Backport #19335 (In Progress): kraken: MDS heartbeat timeout during rejoin, when working with lar...
Nathan Cutler
09:54 PM Backport #19045 (In Progress): kraken: buffer overflow in test LibCephFS.DirLs
Nathan Cutler
09:52 PM Backport #18950 (In Progress): kraken: mds/StrayManager: avoid reusing deleted inode in StrayMana...
Nathan Cutler
09:49 PM Backport #18899 (In Progress): kraken: Test failure: test_open_inode
Nathan Cutler
09:48 PM Backport #18706 (In Progress): kraken: fragment space check can cause replayed request fail
Nathan Cutler
09:46 PM Backport #18700 (In Progress): kraken: client: fix the cross-quota rename boundary check conditions
Nathan Cutler
09:44 PM Backport #18616 (In Progress): kraken: segfault in handle_client_caps
Nathan Cutler
09:43 PM Backport #18566 (In Progress): kraken: MDS crashes on missing metadata object
Nathan Cutler
09:42 PM Backport #18562 (In Progress): kraken: Test Failure: kcephfs test_client_recovery.TestClientRecovery
Nathan Cutler
09:40 PM Backport #18552 (In Progress): kraken: ceph-fuse crash during snapshot tests
Nathan Cutler
09:39 PM Bug #18166 (Resolved): monitor cannot start because of "FAILED assert(info.state == MDSMap::STATE...
Nathan Cutler
09:39 PM Bug #18166: monitor cannot start because of "FAILED assert(info.state == MDSMap::STATE_STANDBY)"
kraken backport is unnecessary (fix already in v11.2.0) Nathan Cutler
09:39 PM Backport #18283 (Resolved): kraken: monitor cannot start because of "FAILED assert(info.state == ...
Already included in v11.2.0... Nathan Cutler
08:07 PM Support #16738: mount.ceph: unknown mount options: rbytes and norbytes
Ceph: v10.2.7
Linux Kernel: 4.9.21-040921-generic on Ubuntu 16.04.1
The "rbytes" option in fstab seems to be work...
Fred Drake
01:42 PM Bug #16886 (Can't reproduce): multimds: kclient hang (?) in tests
Zheng Yan
12:50 PM Bug #19630 (Fix Under Review): StrayManager::num_stray is inaccurate
https://github.com/ceph/ceph/pull/14554 Zheng Yan
09:26 AM Bug #19630 (Resolved): StrayManager::num_stray is inaccurate
Zheng Yan
09:50 AM Bug #19204 (Pending Backport): MDS assert failed when shutting down
John Spray
09:48 AM Feature #19551 (Resolved): CephFS MDS health messages should be logged in the cluster log
John Spray
08:28 AM Bug #18680 (Fix Under Review): multimds: cluster can assign active mds beyond max_mds during fail...
Zheng Yan
08:27 AM Bug #18680: multimds: cluster can assign active mds beyond max_mds during failures
commit "mon/MDSMonitor: only allow deactivating the mds with max rank" in https://github.com/ceph/ceph/pull/14550 sho... Zheng Yan
07:59 AM Bug #18755: multimds: MDCache.cc: 4735: FAILED assert(in)
by commit a1499bc4 (mds: stop purging strays when mds is being shutdown) Zheng Yan
07:56 AM Bug #18755 (Resolved): multimds: MDCache.cc: 4735: FAILED assert(in)
Zheng Yan
07:58 AM Bug #18754 (Resolved): multimds: MDCache.cc: 8569: FAILED assert(!info.ancestors.empty())
commit 20d43372 (mds: drop superfluous MMDSOpenInoReply) Zheng Yan
07:55 AM Bug #19239 (Fix Under Review): mds: stray count remains static after workflows complete
should be fixed by commits in https://github.com/ceph/ceph/pull/14550... Zheng Yan

04/13/2017

04:23 PM Bug #19395 (Fix Under Review): "Too many inodes in cache" warning can happen even when trimming i...
The bad state is long gone, so I'm just going to change this ticket to fixing the weird case where we were getting a ... John Spray
03:39 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Yeah, you're right Darrell, operator error :) I guess VM did not have correct storage network attached when I tried t... elder one
02:22 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
FYI, I don't think the module signature stuff is an issue. It's just notifying you that you've loaded a module that d... Darrell Enns
12:23 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
One difference I noticed between 4.4 and 4.9 kernels
- with 4.4 kernel on cephfs directory sizes (total bytes of al...
elder one
08:04 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Yeah, did that.
My test server with compiled 4.9 kernel and patched ceph module mounted cephfs just fine.
It might...
elder one
02:15 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
elder one wrote:
> Well, managed to patch ceph kernel module with 2b1ac852 commit, but my Ubuntu 4.9 kernel will not...
Zheng Yan
03:02 PM Bug #19566 (Fix Under Review): MDS crash on mgr message during shutdown
https://github.com/ceph/ceph/pull/14505 John Spray
02:24 PM Bug #19566 (In Progress): MDS crash on mgr message during shutdown
John Spray
02:31 PM Backport #19620 (Resolved): kraken: MDS server crashes due to inconsistent metadata.
https://github.com/ceph/ceph/pull/14574 Nathan Cutler
02:31 PM Backport #19619 (Resolved): jewel: MDS server crashes due to inconsistent metadata.
https://github.com/ceph/ceph/pull/14676 Nathan Cutler
12:03 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
Patrick was going to spin up a patch for this, so reassigning to him. Patrick, if I have that wrong, then please just... Jeff Layton
11:07 AM Bug #19406 (Pending Backport): MDS server crashes due to inconsistent metadata.
Marking as pending backport for the fix to data-scan which seems likely to be the underlying cause https://github.com... John Spray
08:13 AM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
Well, currently the last MDS fails over to the old balancer, so he can in fact shift his load back to the others acco... Dan van der Ster

04/12/2017

10:27 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Well, managed to patch ceph kernel module with 2b1ac852 commit, but my Ubuntu 4.9 kernel will not load unsigned modul... elder one
01:36 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
could you try adding commit 2b1ac852 (ceph: try getting buffer capability for readahead/fadvise) to your 4.9.x kernel... Zheng Yan
07:36 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
I do have mmap disabled in Dovecot conf.
Relevant bits from Dovecot conf:
mmap_disable = yes
mail_nfs_index = ...
elder one
02:17 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
elder one wrote:
> Got another error with 4.9.21 kernel.
>
> On the cephfs node /sys/kernel/debug/ceph/xxx/:
>
...
Zheng Yan
08:29 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
We could make the greedyspill.lua balancer check to see if it is the last MDS. Then just return instead of failing. I... Michael Sevilla
03:17 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
and the cpu use on MDS0 stays at +/- 250% Mark Guz
03:16 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
This error shouldn't be an expected occurrence. I'll create a fix for this. Patrick Donnelly
03:15 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
i see this in the logs... Mark Guz
03:12 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
Nope. Is your load on mds.0 ? If yes, and it get's heavily loaded, and if mds.1 has load = 0, then i expect the balan... Dan van der Ster
03:09 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
did you modify the greedyspill.lua script at all? Mark Guz
02:51 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
BTW, you need to set debug_mds_balancer = 2 to see the balance working. Dan van der Ster
02:49 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
Yes. For example, when I have 50 clients untarring the linux kernel into unique directories, the load is moved around. Dan van der Ster
02:39 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
Dan, do you see any evidence of actual load balancing? Mark Guz
02:39 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
I also see this error. I have 2 Active/active mdses. The first shows no errors, the second shows the errors above. No... Mark Guz
11:56 AM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
Ahh, it's even documented:... Dan van der Ster
11:55 AM Bug #19589 (Resolved): greedyspill.lua: :18: attempt to index a nil value (field '?')
The included greedyspill.lua doesn't seem to work in a simple 3-active MDS scenario.... Dan van der Ster
01:58 PM Bug #19593 (Resolved): purge queue and standby replay mds
Current code opens the purge queue when mds starts standby replay. purge queue can have changed when the standby repl... Zheng Yan
12:46 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
John Spray wrote:
> Intuitively I would say that things we expose as xattrs should probably bump ctime when they're ...
Jeff Layton
12:30 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
Intuitively I would say that things we expose as xattrs should probably bump ctime when they're set, if we're being c... John Spray
11:54 AM Bug #19388 (Closed): mount.ceph does not accept -s option
John Spray
11:43 AM Bug #18578 (Resolved): failed filelock.can_read(-1) assertion in Server::_dir_is_nonempty
Nathan Cutler
11:43 AM Backport #18707 (Resolved): kraken: failed filelock.can_read(-1) assertion in Server::_dir_is_non...
Nathan Cutler
10:46 AM Bug #18461 (Resolved): failed to reconnect caps during snapshot tests
Nathan Cutler
10:46 AM Backport #18678 (Resolved): kraken: failed to reconnect caps during snapshot tests
Nathan Cutler
10:24 AM Bug #19205 (Resolved): Invalid error code returned by MDS is causing a kernel client WARNING
Nathan Cutler
10:24 AM Backport #19206 (Resolved): jewel: Invalid error code returned by MDS is causing a kernel client ...
Nathan Cutler

04/11/2017

09:03 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Dumped also mds cache (healhty cluster state 8 hours later)
Searched inode in mds error log:...
elder one
01:54 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Got another error with 4.9.21 kernel.
On the cephfs node /sys/kernel/debug/ceph/xxx/:...
elder one
08:16 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
No, I mean the ctime. You only want to update the mtime if you're changing the directory's contents. I would consider... Jeff Layton
07:55 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
> Should we be updating the ctime and change_attr in the ceph.dir.layout case as well?
Did you mean mtime?
I th...
Patrick Donnelly
07:32 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
I think you're right -- well spotted! In particular, we need to bump it in the case where we update the ctime (ceph.f... Jeff Layton
07:21 PM Bug #19583 (Resolved): mds: change_attr not inc in Server::handle_set_vxattr
Noticed this was missing, was this intentional? Patrick Donnelly
07:22 PM Bug #19388: mount.ceph does not accept -s option
Looking at the complete picture, I think it's easiest (and acceptable for me) to wait for Jessie's successor, Stretch... Michel Roelofs
06:53 AM Feature #19578: mds: optimize CDir::_omap_commit() and CDir::_committed() for large directory
We can track dirty dentries in a dirty list, CDir::_omap_commit() and CDir::_committed() only needs to check dentries... Zheng Yan
06:51 AM Feature #19578 (Resolved): mds: optimize CDir::_omap_commit() and CDir::_committed() for large di...
CDir::_omap_commit() and CDir::_committed() need to traverse whole dirfrag to find dirty dentries. It's not efficienc... Zheng Yan
03:38 AM Bug #19450 (Fix Under Review): PurgeQueue read journal crash
https://github.com/ceph/ceph/pull/14447 Zheng Yan
02:18 AM Bug #19450: PurgeQueue read journal crash
mds always first read all entries in mds log, then write new entries to it. The partial entry is detected and dropped... Zheng Yan

04/10/2017

10:58 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Some wierd issues raised with my applications (unrelated to cephfs) with 4.10.x kernel.
The same fix is in 4.9.x k...
elder one
02:19 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Upgraded cephfs clients to 4.10.9 kernel.
elder one
12:08 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Can you try updating the cephfs client to 4.10.x kernel Zheng Yan
09:59 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Error happened again:... elder one
11:48 AM Bug #19566 (Resolved): MDS crash on mgr message during shutdown
... John Spray

04/09/2017

05:30 PM Bug #19388: mount.ceph does not accept -s option
Fair point about options getting passed through.
The thing I'm not sure about here is who is going to backport a f...
John Spray
01:16 PM Bug #19388: mount.ceph does not accept -s option
mount.ceph passes options which it does not recognize directly to the kernel mount function, therefore a full impleme... Michel Roelofs
12:45 PM Bug #19450: PurgeQueue read journal crash
hit this again... Zheng Yan

04/08/2017

10:03 AM Bug #19406: MDS server crashes due to inconsistent metadata.
The master install-deps.sh installed a flood of other stuff the earlier didn't. But still get fatal errors when runni... Christoffer Lilja
09:52 AM Bug #19406: MDS server crashes due to inconsistent metadata.
I'll download latest master instead and try. Christoffer Lilja
09:41 AM Bug #19406: MDS server crashes due to inconsistent metadata.
Okay, I followed the guide at the Ceph site how to prepare and compile Ceph.
I already did run install_deps.sh in or...
Christoffer Lilja

04/07/2017

03:04 PM Feature #19551 (Fix Under Review): CephFS MDS health messages should be logged in the cluster log

This was inspired by a ceph.log from cern's testing, in which we could see mysterious-looking fsmap epoch bump mess...
John Spray
03:01 PM Feature #19551 (Resolved): CephFS MDS health messages should be logged in the cluster log
John Spray
08:37 AM Bug #19239 (In Progress): mds: stray count remains static after workflows complete
Zheng Yan
07:26 AM Bug #18850 (Rejected): Leak in MDCache::handle_dentry_unlink
Zheng Yan
05:53 AM Bug #19445 (Resolved): client: warning: ‘*’ in boolean context, suggest ‘&&’ instead #14263
The PR#14308 is merged. Jos Collin

04/06/2017

01:30 PM Bug #18850: Leak in MDCache::handle_dentry_unlink
It's likely some CInode/CDir/CDentry in cache are not properly freed when mds process exits. not caused by MDCache::h... Zheng Yan
01:03 AM Bug #19406: MDS server crashes due to inconsistent metadata.
Christoffer Lilja wrote:
> Hi,
>
> I have little spare time to spend on this and can't get past this compile erro...
Zheng Yan

04/05/2017

07:54 PM Bug #19406: MDS server crashes due to inconsistent metadata.
Hi,
I have little spare time to spend on this and can't get past this compile error due to my very limited knowled...
Christoffer Lilja
02:03 AM Bug #19406: MDS server crashes due to inconsistent metadata.
Christoffer Lilja wrote:
> I downloaded the 10.2.6 tar and tried to apply the above patch from Zheng Yan but it got ...
Zheng Yan
05:10 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Thanks for the link Zheng. Looks like that fix is in mainline as of 4.10.2. When I have a chance, I'll try upgrading ... Darrell Enns
09:13 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
elder one wrote:
> My cephfs clients (Ubuntu Xenial) are running kernel from Ubuntu PPA: http://kernel.ubuntu.com/~k...
Zheng Yan
08:24 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
My cephfs clients (Ubuntu Xenial) are running kernel from Ubuntu PPA: http://kernel.ubuntu.com/~kernel-ppa/mainline/v... elder one
07:39 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
elder one wrote:
> Hit the same bug today with 4.4.59 kernel client.
> 2 MDS servers (1 standby) 10.2.6-1 on Ubuntu...
Zheng Yan
07:36 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Darrell Enns wrote:
> Zheng Yan wrote:
> > probably fixed by https://github.com/ceph/ceph-client/commit/10a2699426a...
Zheng Yan
01:33 PM Bug #19501 (Fix Under Review): C_MDSInternalNoop::complete doesn't free itself
Zheng Yan
01:33 PM Bug #19501: C_MDSInternalNoop::complete doesn't free itself
https://github.com/ceph/ceph/pull/14347 Zheng Yan
01:17 PM Bug #19501 (Resolved): C_MDSInternalNoop::complete doesn't free itself
This cause memory leak Zheng Yan
09:20 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
geng jichao wrote:
> I have a question, if the file struct is destroyed,how to ensure that cache_ctl.index is correc...
Zheng Yan
05:52 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
I have a question, if the file struct is destroyed,how to ensure that cache_ctl.index is correct。
In other words,req...
geng jichao
01:32 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
kernel patch https://github.com/ceph/ceph-client/commit/b7e2eee12aa174bc91279a7cee85e9ea73092bad Zheng Yan
06:01 AM Bug #19438: ceph mds error "No space left on device"
ceph mds error "No space left on device" ,This problem I have encountered in the 10.2.6 version 。
Need to add a lin...
berlin sun
04:12 AM Bug #19450: PurgeQueue read journal crash
... Zheng Yan

04/04/2017

09:09 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Very interesting! That means we have:
4.4.59 - bug
4.7.5 - no bug (or at least rare enough that I'm not seeing it...
Darrell Enns
06:03 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Hit the same bug today with 4.4.59 kernel client.
2 MDS servers (1 standby) 10.2.6-1 on Ubuntu Trusty.
From mds l...
elder one
09:05 PM Bug #19388: mount.ceph does not accept -s option
The kernel client has a mount helper in src/mount/mount.ceph.c -- although that only comes into play if the ceph pack... John Spray
01:50 PM Bug #19306 (Fix Under Review): fs: mount NFS to cephfs, and then ls a directory containing a larg...
https://github.com/ceph/ceph/pull/14317 Zheng Yan
01:18 PM Bug #19456: Hadoop suite failing on missing upstream tarball
I see the hadoop task is in ceph/teuthology - would it make sense to move it to ceph/ceph? Nathan Cutler
12:44 PM Backport #19483 (Resolved): kraken: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" co...
https://github.com/ceph/ceph/pull/14573 Nathan Cutler
12:44 PM Backport #19482 (Resolved): jewel: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" com...
https://github.com/ceph/ceph/pull/14674 Nathan Cutler
12:42 PM Backport #19466 (Resolved): jewel: mds: log rotation doesn't work if mds has respawned
https://github.com/ceph/ceph/pull/14673 Nathan Cutler
01:03 AM Bug #19445 (In Progress): client: warning: ‘*’ in boolean context, suggest ‘&&’ instead #14263
https://github.com/ceph/ceph/pull/14308 Brad Hubbard

04/03/2017

09:29 PM Bug #19456: Hadoop suite failing on missing upstream tarball
Looks like v2.5.2 is too old. It is now in the Apache archive: https://archive.apache.org/dist/hadoop/core/hadoop-2.5... Ken Dreyer
08:20 PM Bug #19456 (Rejected): Hadoop suite failing on missing upstream tarball
This is jewel v10.2.7 point release candidate
Also true on all branches
Run: http://pulpito.ceph.com/yuriw-2017-0...
Yuri Weinstein
08:15 PM Bug #19388: mount.ceph does not accept -s option
Looking at it, it seems I'll have to update the Linux kernel as well to support the sloppy option. Doing so in a sane... Michel Roelofs
12:53 PM Bug #19306 (In Progress): fs: mount NFS to cephfs, and then ls a directory containing a large num...
Zheng Yan
12:44 PM Bug #19450 (Resolved): PurgeQueue read journal crash
... Zheng Yan
11:16 AM Bug #19437 (Fix Under Review): fs:The mount point break off when mds switch hanppened.
https://github.com/ceph/ceph/pull/14267 John Spray

04/02/2017

07:30 AM Bug #19445 (Resolved): client: warning: ‘*’ in boolean context, suggest ‘&&’ instead #14263
The following warning appears during make. Here roll_die and empty() returns bool. This can be easily fixed by removi... Jos Collin

04/01/2017

06:15 PM Bug #19406: MDS server crashes due to inconsistent metadata.
I downloaded the 10.2.6 tar and tried to apply the above patch from Zheng Yan but it got rejected. I guess it's writt... Christoffer Lilja

03/31/2017

11:49 PM Bug #19291 (Pending Backport): mds: log rotation doesn't work if mds has respawned
The bug is also in jewel 10.2.6, due to 6efad699249ba7c6928193dba111dbb23b606beb. Patrick Donnelly
10:41 AM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
The log says "objects: max 100" but the default of client_oc_max_objects is 1000, so I assumed there was some manual ... John Spray
10:32 AM Bug #19438: ceph mds error "No space left on device"
Hmm, so fragmentation is enabled but apparently isn't happening? Is your ceph.conf with "mds_bal_frag = true" presen... John Spray
10:17 AM Bug #19438 (Won't Fix): ceph mds error "No space left on device"
Through testing the bash script for MDS cluster, create a test directory under the ceph mount path, in the test direc... william sheng
05:04 AM Bug #19437 (Resolved): fs:The mount point break off when mds switch hanppened.

My ceph version is jewel and my cluster have two nodes. I start two mds, one active and the other is hot-standby mo...
Ivan Guan
02:58 AM Bug #18579 (Fix Under Review): Fuse client has "opening" session to nonexistent MDS rank after MD...
should be fixed in https://github.com/ceph/ceph/pull/13698 Zheng Yan

03/30/2017

02:33 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
I don't think I have changed the defaults. How can I see the values I use? Arne Wiebalck
12:35 PM Bug #19426: knfs blogbench hang
I got several oops at other places. seem like random memory corruption, It only happens when cephfs is exported by nf... Zheng Yan
09:57 AM Bug #19426: knfs blogbench hang
... Zheng Yan
08:34 AM Bug #19426 (Can't reproduce): knfs blogbench hang
http://pulpito.ceph.com/yuriw-2017-03-24_21:44:17-knfs-jewel-integration-ktdreyer-testing-basic-smithi/941067/
http:...
Zheng Yan
01:23 AM Bug #19406: MDS server crashes due to inconsistent metadata.
... Zheng Yan

03/29/2017

07:09 PM Bug #19406: MDS server crashes due to inconsistent metadata.
I don't have a build environment nor the Ceph source code on the machines.
I'll download the source code, apply the ...
Christoffer Lilja
06:45 PM Bug #19406: MDS server crashes due to inconsistent metadata.
Looking at what DataScan.cc currently does, looks like we were injecting things with hash set to zero, so it would pi... John Spray
02:02 PM Bug #19406: MDS server crashes due to inconsistent metadata.
looks like ~mds0's dir_layout.dl_dir_hash is wrong. Please check if below patch can prevent mds from crash... Zheng Yan
01:28 PM Bug #19406: MDS server crashes due to inconsistent metadata.
The logfile contains file structure that in it self is sensitive information, that's why it's truncated.
I'll see if...
Christoffer Lilja
01:01 PM Bug #19406: MDS server crashes due to inconsistent metadata.
To work out how the metadata is now corrupted (i.e. what the recovery tools broke), it would be useful to have the fu... John Spray
12:43 PM Bug #19406: MDS server crashes due to inconsistent metadata.
Can you go back to your logs from when the original metadata damage occurred? There should be a message in your clus... John Spray
10:55 AM Bug #19406: MDS server crashes due to inconsistent metadata.
I see now that the two other MDS servers crash the same way. Christoffer Lilja
10:44 AM Bug #19406: MDS server crashes due to inconsistent metadata.
A NIC broke and cause tremendous load on the server. Shortly after I had to kill the server by power, note that this ... Christoffer Lilja
10:15 AM Bug #19406: MDS server crashes due to inconsistent metadata.
Please also tell us what else was going on around this event. Had you recently upgraded? Had another daemon recentl... John Spray
09:43 AM Bug #19406: MDS server crashes due to inconsistent metadata.
please set 'debug_mds = 20', restart the mds and upload the mds.log Zheng Yan
07:57 AM Bug #19406 (Resolved): MDS server crashes due to inconsistent metadata.
-48> 2017-03-26 23:34:38.886779 7f4b95030700 5 mds.neutron handle_mds_map epoch 620 from mon.1
-47> 2017-03-2...
Christoffer Lilja

03/28/2017

10:21 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
Hmm.
I've tried setting some comically low limits:...
John Spray
09:09 PM Bug #19401 (Fix Under Review): MDS goes readonly writing backtrace for a file whose data pool has...
So on reflection I realise that the deletion/recovery cases are not an issue because those code paths already handle ... John Spray
05:28 PM Bug #19401 (Resolved): MDS goes readonly writing backtrace for a file whose data pool has been re...
Reproduce:
1. create a pool
2. add it as a data pool
3. set that pool in a layout on the client, and write a file
...
John Spray
09:05 PM Bug #19395: "Too many inodes in cache" warning can happen even when trimming is working
Couple of observations from today:
* after an mds failover, the issue cleared and we're back to having "ino" (CIno...
John Spray
02:55 PM Bug #19395: "Too many inodes in cache" warning can happen even when trimming is working
Cache dumps in /root/19395 on mira060 John Spray
01:36 PM Bug #19395 (Resolved): "Too many inodes in cache" warning can happen even when trimming is working
John Spray
01:29 PM Bug #19291 (Resolved): mds: log rotation doesn't work if mds has respawned
John Spray
01:29 PM Bug #19282 (Resolved): RecoveryQueue::_recovered asserts out on OSD errors
John Spray
01:28 PM Fix #19288 (Resolved): Remove legacy "mds tell"
John Spray
01:27 PM Bug #16709 (Pending Backport): No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" command
John Spray
01:25 PM Feature #17604 (Resolved): MDSMonitor: raise health warning when there are no standbys but there ...
John Spray

03/27/2017

05:26 PM Bug #19388: mount.ceph does not accept -s option
OK, if you're motivated to do this (and update the pull request to implement -s rather than dropping it) then I don't... John Spray
05:24 PM Bug #19388: mount.ceph does not accept -s option
I wasn't aware of this fix in autofs, thanks for pointing me to it.
However, several mount programs accept the -s ...
Michel Roelofs
10:32 AM Bug #19388: mount.ceph does not accept -s option
According to the release notes, autofs 5.1.1 includes a fix to avoid passing -s to filesystems other than NFS (https:... John Spray
03:39 AM Bug #19388 (Fix Under Review): mount.ceph does not accept -s option
https://github.com/ceph/ceph/pull/14158 Kefu Chai
11:58 AM Bug #16842: mds: replacement MDS crashes on InoTable release
A patch that should allow MDSs to throw out the invalid sessions and start up. Does not fix whatever got the system ... John Spray

03/26/2017

12:31 PM Bug #19388 (Closed): mount.ceph does not accept -s option
The mount.ceph tool does not accept the -s (sloppy) option. On Debian Jessie (and possibly more distributions), autof... Michel Roelofs

03/23/2017

01:35 PM Feature #18509 (Fix Under Review): MDS: damage reporting by ino number is useless
John Spray
01:30 PM Feature #18509: MDS: damage reporting by ino number is useless
https://github.com/ceph/ceph/pull/14104 John Spray
02:16 AM Feature #19362 (Resolved): mds: add perf counters for each type of MDS operation
It's desirable to know the types of operations are being performed on the MDS so we have a better idea over time what... Patrick Donnelly

03/22/2017

06:31 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
I have used the offset parameter of the ceph_dir_llseek function, and it will be passed to mds in readdir request,if ... geng jichao
03:38 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
This bug is not specific to kernel client. Enabling directory fragments can help. The complete fix is make client enc... Zheng Yan

03/21/2017

08:56 PM Bug #16842: mds: replacement MDS crashes on InoTable release
If anyone needs the debug level "30/30" dump I can send it directly to them.
Also, so far I have not nuked the cep...
c sights
04:08 PM Bug #16842: mds: replacement MDS crashes on InoTable release
I don't have any brilliant ideas for debugging this (can't go back in time to find out how those bogus inode ranges m... John Spray
08:39 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
Just in case it wasn't clear from the log snippet above: the current counter is continously growing until the process... Arne Wiebalck
08:25 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
Hmm, I've looked at the code now and it turns out we do already have a client_oc_max_dirty setting that is supposed t... John Spray
05:18 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
I see ... so the fix would be to have a dirty_limit (like vm.dirty_ratio) above which all new request would not be ca... Arne Wiebalck
03:11 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
Hmm, could be that we just have no limit on the number of dirty objects (trim is only dropping clean things), so if w... John Spray
01:10 PM Bug #19343 (New): OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limit...
Creating a 100G file on CephFS reproducibly triggers OOM to kill the
ceph-fuse process.
With objectcacher debug ...
Arne Wiebalck
01:01 PM Backport #19335 (Resolved): kraken: MDS heartbeat timeout during rejoin, when working with large ...
https://github.com/ceph/ceph/pull/14572 Nathan Cutler
01:01 PM Backport #19334 (Resolved): jewel: MDS heartbeat timeout during rejoin, when working with large a...
https://github.com/ceph/ceph/pull/14672 Nathan Cutler

03/20/2017

07:07 AM Bug #19245 (Resolved): Crash in PurgeQueue::_execute_item when deletions happen extremely quickly
John Spray

03/19/2017

08:25 AM Bug #19306 (Resolved): fs: mount NFS to cephfs, and then ls a directory containing a large number...
The ceph_readdir function save lot of date in the file->private_date, include the last_name which uses as offset.Howe... geng jichao

03/18/2017

09:17 PM Bug #19240: multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid ...
Note: inodes loaded is visible in mds-ino+.png in both workflow directories. Patrick Donnelly
09:16 PM Bug #19240: multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid ...
I believe I have this one figured out. The client requests graph (mds-request.png) shows that the 8 MDS workflow is r... Patrick Donnelly

03/17/2017

04:37 PM Bug #19291 (Fix Under Review): mds: log rotation doesn't work if mds has respawned
https://github.com/ceph/ceph/pull/14021 Patrick Donnelly
12:00 PM Bug #17939 (Fix Under Review): non-local cephfs quota changes not visible until some IO is done
https://github.com/ceph/ceph/pull/14018 John Spray
11:51 AM Bug #19282 (Fix Under Review): RecoveryQueue::_recovered asserts out on OSD errors
https://github.com/ceph/ceph/pull/14017 John Spray
11:36 AM Fix #19288 (Fix Under Review): Remove legacy "mds tell"
https://github.com/ceph/ceph/pull/14015 John Spray

03/16/2017

10:17 PM Bug #19291: mds: log rotation doesn't work if mds has respawned
Sorry hit submit on accident before finishing writing this up. Standby! Patrick Donnelly
10:16 PM Bug #19291 (Resolved): mds: log rotation doesn't work if mds has respawned
If an MDS respawns then its "comm" name becomes "exe" which confuses logrotate since it relies on killlall. What ends... Patrick Donnelly
09:42 PM Bug #16842: mds: replacement MDS crashes on InoTable release

Hi All,
After upgrading to 10.2.6 on Debian Jessie, the MDS server fails to start. Below is what is written to ...
c sights
08:44 PM Feature #19286 (Closed): Log/warn if "mds bal fragment size max" is reached
John Spray
10:32 AM Feature #19286: Log/warn if "mds bal fragment size max" is reached
OK, not such a big deal for me, I guess I can increase fragment size till Luminous release. elder one
10:14 AM Feature #19286: Log/warn if "mds bal fragment size max" is reached
Were you planning to work on this?
I don't think we're likely to work on this as a new feature, given that in Lumi...
John Spray
09:51 AM Feature #19286 (Closed): Log/warn if "mds bal fragment size max" is reached
Since in Jewel cephfs directory fragmentation is disabled by default, would be nice to know when we hit "mds bal frag... elder one
04:49 PM Fix #19288 (Resolved): Remove legacy "mds tell"

This has been deprecated for a while in favour of the new-style "ceph tell mds.<id>". Luminous seems like a good t...
John Spray
10:14 AM Bug #18151 (Resolved): Incorrect report of size when quotas are enabled.
John Spray
03:10 AM Bug #18151: Incorrect report of size when quotas are enabled.
Hi John. Let us close it now since according to Greg it has been resolved. Still did not upgrade but will open a new ... Goncalo Borges
 

Also available in: Atom