Project

General

Profile

Activity

From 04/03/2017 to 05/02/2017

05/02/2017

09:40 AM Bug #17408 (Can't reproduce): Possible un-needed wait on rstats when listing dir?
can't reproduce. probably fixed by ... Zheng Yan
09:21 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Just reporting that no errors after using patched (commit 2b1ac852) cephfs kernel module.
Also upgraded ceph to 1...
elder one
08:29 AM Feature #19820 (Fix Under Review): ceph-fuse: use userspace permission check by default
https://github.com/ceph/ceph/pull/14907 Zheng Yan
07:39 AM Feature #19820 (Resolved): ceph-fuse: use userspace permission check by default
Zheng Yan
07:32 AM Bug #19812: client: not swapping directory caps efficiently leads to very slow create chains
The reason of slow creation/deletion is that ceph-fuse sends a getattr request (check permission of test directory) b... Zheng Yan
06:17 AM Feature #19819 (Rejected): Add support of FS_IOC_FIEMAP ioctl on files, accessible through CephFS
The purpose of this -- is to have standard interface for examining allocated extents of the files. For example, xfs_i... Марк Коренберг

04/28/2017

10:09 PM Bug #19812 (New): client: not swapping directory caps efficiently leads to very slow create chains
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg34200.html
In short: if you have a ceph-fuse and a kerne...
Greg Farnum

04/27/2017

07:27 AM Bug #18872: write to cephfs mount hangs, ceph-fuse and kernel
PR: https://github.com/ceph/ceph/pull/14822 Jan Fajerski
03:21 AM Bug #19789 (New): FAIL: test_evict_client (tasks.cephfs.test_misc.TestMisc)
http://qa-proxy.ceph.com/teuthology/teuthology-2017-04-24_03:15:02-fs-master---basic-smithi/ Zheng Yan

04/25/2017

09:03 AM Bug #19755 (Fix Under Review): MDS became unresponsive when truncating a very large file
https://github.com/ceph/ceph/pull/14769 Zheng Yan
03:39 AM Bug #19755 (Resolved): MDS became unresponsive when truncating a very large file
We were trying to copy a very large file (7TB exactly) between two directories through Samba/CephFS, and cancelled it... Sandy Xu
06:52 AM Backport #19763 (Resolved): kraken: non-local cephfs quota changes not visible until some IO is done
https://github.com/ceph/ceph/pull/16108 Nathan Cutler
06:52 AM Backport #19762 (Resolved): jewel: non-local cephfs quota changes not visible until some IO is done
https://github.com/ceph/ceph/pull/15466 Nathan Cutler

04/24/2017

09:14 PM Bug #19583 (Resolved): mds: change_attr not inc in Server::handle_set_vxattr
John Spray
09:13 PM Fix #19691 (Resolved): Remove journaler_allow_split_entries option
John Spray
09:13 PM Bug #18816 (Resolved): MDS crashes with log disabled
John Spray
09:12 PM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
The userspace piece (https://github.com/ceph/ceph/pull/14317) has merged.
Zheng: please resolve the ticket when th...
John Spray
09:11 PM Feature #17855 (Resolved): Don't evict a slow client if it's the only client
John Spray
01:43 PM Bug #19706 (In Progress): Laggy mon daemons causing MDS failover (symptom: failed to set counters...
Should disable this check for the misc workunit, it was mainly intended for the workunits that have more sustained lo... John Spray
12:27 PM Feature #18425 (Resolved): mds: add the option to use tcmalloc directly
John Spray
10:13 AM Bug #17939 (Pending Backport): non-local cephfs quota changes not visible until some IO is done
John Spray
07:08 AM Support #16884 (Closed): rename() doesn't work between directories
Zheng Yan
07:02 AM Support #16884: rename() doesn't work between directories
Zheng Yan
03:01 AM Bug #19635 (Fix Under Review): Deadlock on two ceph-fuse clients accessing the same file
https://github.com/ceph/ceph/pull/14743 Zheng Yan

04/22/2017

02:10 AM Bug #19583 (Fix Under Review): mds: change_attr not inc in Server::handle_set_vxattr
https://github.com/ceph/ceph/pull/14726 Patrick Donnelly

04/21/2017

03:41 AM Bug #19734 (Resolved): mds: subsystems like ceph_subsys_mds_balancer do not log correctly
Our logging for subsystems is relying on a removed configuration macro:... Patrick Donnelly
03:36 AM Bug #19589 (Fix Under Review): greedyspill.lua: :18: attempt to index a nil value (field '?')
https://github.com/ceph/ceph/pull/14704 Patrick Donnelly

04/20/2017

09:54 PM Backport #19709 (In Progress): jewel: Enable MDS to start when session ino info is corrupt
Nathan Cutler
12:13 PM Backport #19709 (Resolved): jewel: Enable MDS to start when session ino info is corrupt
https://github.com/ceph/ceph/pull/14700 Nathan Cutler
09:49 PM Backport #19679 (In Progress): jewel: MDS: damage reporting by ino number is useless
Nathan Cutler
09:29 PM Backport #19677 (In Progress): jewel: Jewel ceph-fuse does not recover after lost connection to MDS
Nathan Cutler
05:55 PM Backport #19466 (Need More Info): jewel: mds: log rotation doesn't work if mds has respawned
Needs 3ba63063 1fb15a21
Of these two, 3ba63063 is non-trivial.
Nathan Cutler
11:32 AM Backport #19466 (In Progress): jewel: mds: log rotation doesn't work if mds has respawned
Nathan Cutler
05:06 PM Backport #19620 (Resolved): kraken: MDS server crashes due to inconsistent metadata.
Nathan Cutler
05:06 PM Backport #19483 (Resolved): kraken: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" co...
Nathan Cutler
05:05 PM Backport #19335 (Resolved): kraken: MDS heartbeat timeout during rejoin, when working with large ...
Nathan Cutler
05:04 PM Backport #19045 (Resolved): kraken: buffer overflow in test LibCephFS.DirLs
Nathan Cutler
05:03 PM Backport #18950 (Resolved): kraken: mds/StrayManager: avoid reusing deleted inode in StrayManager...
Nathan Cutler
05:02 PM Backport #18899 (Resolved): kraken: Test failure: test_open_inode
Nathan Cutler
05:01 PM Backport #18706 (Resolved): kraken: fragment space check can cause replayed request fail
Nathan Cutler
04:59 PM Backport #18700 (Resolved): kraken: client: fix the cross-quota rename boundary check conditions
Nathan Cutler
04:58 PM Bug #18306 (Resolved): segfault in handle_client_caps
Nathan Cutler
04:58 PM Backport #18616 (Resolved): kraken: segfault in handle_client_caps
Nathan Cutler
04:57 PM Bug #18179 (Resolved): MDS crashes on missing metadata object
Nathan Cutler
04:57 PM Backport #18566 (Resolved): kraken: MDS crashes on missing metadata object
Nathan Cutler
04:56 PM Bug #18396 (Resolved): Test Failure: kcephfs test_client_recovery.TestClientRecovery
Nathan Cutler
07:57 AM Bug #18396: Test Failure: kcephfs test_client_recovery.TestClientRecovery
http://qa-proxy.ceph.com/teuthology/teuthology-2017-04-13_05:20:02-kcephfs-kraken-testing-basic-smithi/1019312/ Zheng Yan
04:56 PM Backport #18562 (Resolved): kraken: Test Failure: kcephfs test_client_recovery.TestClientRecovery
Nathan Cutler
04:55 PM Bug #18460 (Resolved): ceph-fuse crash during snapshot tests
Nathan Cutler
04:55 PM Backport #18552 (Resolved): kraken: ceph-fuse crash during snapshot tests
Nathan Cutler
02:53 PM Bug #19707 (Duplicate): Hadoop tests fail due to missing upstream tarball
Indeed it is -- I hadn't seen that other ticket because it was in the wrong project. John Spray
02:04 PM Bug #19707: Hadoop tests fail due to missing upstream tarball
Dup of #19456? Ken Dreyer
09:57 AM Bug #19707 (Duplicate): Hadoop tests fail due to missing upstream tarball
http://pulpito.ceph.com/teuthology-2017-04-03_03:45:03-hadoop-master---basic-mira/... John Spray
01:49 PM Bug #19712 (New): some kcephfs tests become very slow
http://qa-proxy.ceph.com/teuthology/teuthology-2017-04-16_04:20:02-kcephfs-jewel-testing-basic-smithi/
http://pulpit...
Zheng Yan
01:40 PM Backport #19675 (In Progress): jewel: cephfs: Test failure: test_data_isolated (tasks.cephfs.test...
Nathan Cutler
01:24 PM Backport #19673 (In Progress): jewel: cephfs: mds is crushed, after I set about 400 64KB xattr kv...
Nathan Cutler
01:22 PM Backport #19671 (In Progress): jewel: MDS assert failed when shutting down
Nathan Cutler
01:15 PM Backport #19668 (In Progress): jewel: MDS goes readonly writing backtrace for a file whose data p...
Nathan Cutler
01:04 PM Bug #18872 (Fix Under Review): write to cephfs mount hangs, ceph-fuse and kernel
Turns out this is an issue of ceph leaking arch-dependend flags on the wire. See kernel ml [PATCH] ceph: Fix file ope... Jan Fajerski
12:13 PM Backport #19710 (Resolved): kraken: Enable MDS to start when session ino info is corrupt
https://github.com/ceph/ceph/pull/16107 Nathan Cutler
12:08 PM Backport #19666 (In Progress): jewel: fs:The mount point break off when mds switch hanppened.
Nathan Cutler
11:42 AM Backport #19665 (In Progress): jewel: C_MDSInternalNoop::complete doesn't free itself
Nathan Cutler
11:37 AM Backport #19619 (In Progress): jewel: MDS server crashes due to inconsistent metadata.
Nathan Cutler
11:34 AM Backport #19482 (In Progress): jewel: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" ...
Nathan Cutler
10:57 AM Backport #19334 (In Progress): jewel: MDS heartbeat timeout during rejoin, when working with larg...
Nathan Cutler
10:55 AM Backport #19044 (In Progress): jewel: buffer overflow in test LibCephFS.DirLs
Nathan Cutler
10:54 AM Backport #18949 (In Progress): jewel: mds/StrayManager: avoid reusing deleted inode in StrayManag...
Nathan Cutler
10:49 AM Backport #18900 (In Progress): jewel: Test failure: test_open_inode
Nathan Cutler
10:47 AM Backport #18705 (In Progress): jewel: fragment space check can cause replayed request fail
Nathan Cutler
10:44 AM Backport #18699 (In Progress): jewel: client: fix the cross-quota rename boundary check conditions
Nathan Cutler
10:22 AM Bug #16842: mds: replacement MDS crashes on InoTable release
Created ticket for the workaround and marked it for backport here: http://tracker.ceph.com/issues/19708
I should h...
John Spray
10:21 AM Fix #19708 (Resolved): Enable MDS to start when session ino info is corrupt

This was a mitigation for issue #16842, which is itself a mystery.
Creating ticket to backport it.
Fix on mas...
John Spray
09:03 AM Bug #18816 (Fix Under Review): MDS crashes with log disabled
I'm proposing that we rip out this configuration option, it's a trap for the unwary:
https://github.com/ceph/ceph/pu...
John Spray
08:30 AM Bug #19706 (Can't reproduce): Laggy mon daemons causing MDS failover (symptom: failed to set coun...
http://qa-proxy.ceph.com/teuthology/teuthology-2017-04-15_03:15:10-fs-master---basic-smithi/1027137/ Zheng Yan

04/19/2017

10:24 PM Feature #17834 (Fix Under Review): MDS Balancer overrides
https://github.com/ceph/ceph/pull/14598 Patrick Donnelly
04:02 PM Bug #15467 (Won't Fix): After "mount -l", ceph-fuse does not work
This appears to have happened on pre-jewel code, so it's unlikely anyone is interested in investigating. John Spray
10:11 AM Fix #19691 (Fix Under Review): Remove journaler_allow_split_entries option
https://github.com/ceph/ceph/pull/14636 John Spray
10:06 AM Fix #19691 (Resolved): Remove journaler_allow_split_entries option
This has been broken in practice since at least when the MDS journal format (JournalStream etc) was changed, as the r... John Spray

04/18/2017

08:38 PM Feature #17980 (Fix Under Review): MDS should reject connections from OSD-blacklisted clients
https://github.com/ceph/ceph/pull/14610 John Spray
08:38 PM Feature #9754 (Fix Under Review): A 'fence and evict' client eviction command
See #17980 patch John Spray
07:39 PM Backport #19680 (Resolved): kraken: MDS: damage reporting by ino number is useless
https://github.com/ceph/ceph/pull/16106 Nathan Cutler
07:39 PM Backport #19679 (Resolved): jewel: MDS: damage reporting by ino number is useless
https://github.com/ceph/ceph/pull/14699 Nathan Cutler
07:38 PM Backport #19678 (Resolved): kraken: Jewel ceph-fuse does not recover after lost connection to MDS
https://github.com/ceph/ceph/pull/16105 Nathan Cutler
07:38 PM Backport #19677 (Resolved): jewel: Jewel ceph-fuse does not recover after lost connection to MDS
https://github.com/ceph/ceph/pull/14698 Nathan Cutler
07:38 PM Backport #19676 (Resolved): kraken: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_v...
https://github.com/ceph/ceph/pull/16104 Nathan Cutler
07:38 PM Backport #19675 (Resolved): jewel: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_vo...
https://github.com/ceph/ceph/pull/14685 Nathan Cutler
07:38 PM Backport #19674 (Resolved): kraken: cephfs: mds is crushed, after I set about 400 64KB xattr kv p...
https://github.com/ceph/ceph/pull/16103 Nathan Cutler
07:38 PM Backport #19673 (Resolved): jewel: cephfs: mds is crushed, after I set about 400 64KB xattr kv pa...
https://github.com/ceph/ceph/pull/14684 Nathan Cutler
07:37 PM Backport #19672 (Resolved): kraken: MDS assert failed when shutting down
https://github.com/ceph/ceph/pull/16102 Nathan Cutler
07:37 PM Backport #19671 (Resolved): jewel: MDS assert failed when shutting down
https://github.com/ceph/ceph/pull/14683 Nathan Cutler
07:37 PM Backport #19669 (Resolved): kraken: MDS goes readonly writing backtrace for a file whose data poo...
https://github.com/ceph/ceph/pull/16101 Nathan Cutler
07:37 PM Backport #19668 (Resolved): jewel: MDS goes readonly writing backtrace for a file whose data pool...
https://github.com/ceph/ceph/pull/14682 Nathan Cutler
07:37 PM Backport #19667 (Resolved): kraken: fs:The mount point break off when mds switch hanppened.
https://github.com/ceph/ceph/pull/16100 Nathan Cutler
07:37 PM Backport #19666 (Resolved): jewel: fs:The mount point break off when mds switch hanppened.
https://github.com/ceph/ceph/pull/14679 Nathan Cutler
07:37 PM Backport #19665 (Resolved): jewel: C_MDSInternalNoop::complete doesn't free itself
https://github.com/ceph/ceph/pull/14677 Nathan Cutler
07:37 PM Backport #19664 (Resolved): kraken: C_MDSInternalNoop::complete doesn't free itself
https://github.com/ceph/ceph/pull/16099 Nathan Cutler
04:20 PM Bug #16842: mds: replacement MDS crashes on InoTable release
backport? c sights
01:29 PM Feature #10792 (New): qa: enable thrasher for MDS cluster size (vary max_mds)
The thrasher exists, this ticket is now for switching it on and getting the resulting runs green. John Spray
01:28 PM Feature #15068 (Resolved): fsck: multifs: enable repair tools to read from one filesystem and wri...
John Spray
01:28 PM Feature #15069 (Resolved): MDS: multifs: enable two filesystems to point to same pools if one of ...
John Spray
01:28 PM Fix #15134 (New): multifs: test case exercising mds_thrash for multiple filesystems
(tweaking ticket to be for creating a test case that uses the new/smarter thrashing code in a multi-fs way) John Spray
01:26 PM Bug #18579 (Resolved): Fuse client has "opening" session to nonexistent MDS rank after MDS cluste...
John Spray
01:26 PM Bug #18914 (Pending Backport): cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume...
John Spray
01:26 PM Feature #19075 (Resolved): Extend 'p' mds auth cap to cover quotas and all layout fields
John Spray
01:24 PM Bug #19501 (Pending Backport): C_MDSInternalNoop::complete doesn't free itself
John Spray
01:23 PM Bug #19566 (Resolved): MDS crash on mgr message during shutdown
John Spray
12:51 PM Bug #19640 (Resolved): ceph-fuse should only return writeback errors once per file handle
John Spray
11:49 AM Feature #18509 (Pending Backport): MDS: damage reporting by ino number is useless
John Spray

04/17/2017

02:08 PM Bug #19426: knfs blogbench hang
Sorry I didn't see this sooner. Is this still cropping up?
So what might be helpful the next time this happens loc...
Jeff Layton
01:17 PM Bug #19640 (Fix Under Review): ceph-fuse should only return writeback errors once per file handle
https://github.com/ceph/ceph/pull/14589 John Spray
01:16 PM Bug #19640 (Resolved): ceph-fuse should only return writeback errors once per file handle
Currently if someone fsyncs and sees the error, we give them the same error again on fclose.
We should match the n...
John Spray
11:13 AM Bug #19635 (In Progress): Deadlock on two ceph-fuse clients accessing the same file
This bug happens in following sequence of events
- Reuqest1 (from client1) create file1 (mds issues caps Asx to cl...
Zheng Yan
12:15 AM Bug #19635: Deadlock on two ceph-fuse clients accessing the same file
Those requests are getting hung up on the iauth and ixattr locks on the inode for the ".syn" file the test script cre... John Spray

04/16/2017

11:02 PM Bug #19635: Deadlock on two ceph-fuse clients accessing the same file
I was wondering if d463107473 ("mds: finish lock waiters in the same order that they were added.") could have been th... John Spray
09:48 PM Bug #19635 (Resolved): Deadlock on two ceph-fuse clients accessing the same file
See Dan's reproducer script, and thread "[ceph-users] fsping, why you no work no mo?"
https://raw.githubusercontent....
John Spray
01:19 PM Support #16738 (Closed): mount.ceph: unknown mount options: rbytes and norbytes
John Spray

04/15/2017

06:47 PM Bug #19437 (Pending Backport): fs:The mount point break off when mds switch hanppened.
John Spray
06:46 PM Bug #19033 (Pending Backport): cephfs: mds is crushed, after I set about 400 64KB xattr kv pairs ...
John Spray
06:45 PM Bug #18757 (Pending Backport): Jewel ceph-fuse does not recover after lost connection to MDS
Let's backport this for the benefit of people running cephfs today John Spray
06:41 PM Bug #19401 (Pending Backport): MDS goes readonly writing backtrace for a file whose data pool has...
John Spray
03:21 PM Support #16738: mount.ceph: unknown mount options: rbytes and norbytes
This issue can be closed. I have not experienced this issue anymore and the original systems the issue was found is n... Alexander Trost
11:15 AM Bug #19022 (Resolved): Crash in Client::queue_cap_snap when thrashing
John Spray

04/14/2017

10:00 PM Backport #19620 (In Progress): kraken: MDS server crashes due to inconsistent metadata.
Nathan Cutler
09:59 PM Bug #19406: MDS server crashes due to inconsistent metadata.
*master PR*: https://github.com/ceph/ceph/pull/14234 Nathan Cutler
09:57 PM Backport #19483 (In Progress): kraken: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it"...
Nathan Cutler
09:55 PM Backport #19335 (In Progress): kraken: MDS heartbeat timeout during rejoin, when working with lar...
Nathan Cutler
09:54 PM Backport #19045 (In Progress): kraken: buffer overflow in test LibCephFS.DirLs
Nathan Cutler
09:52 PM Backport #18950 (In Progress): kraken: mds/StrayManager: avoid reusing deleted inode in StrayMana...
Nathan Cutler
09:49 PM Backport #18899 (In Progress): kraken: Test failure: test_open_inode
Nathan Cutler
09:48 PM Backport #18706 (In Progress): kraken: fragment space check can cause replayed request fail
Nathan Cutler
09:46 PM Backport #18700 (In Progress): kraken: client: fix the cross-quota rename boundary check conditions
Nathan Cutler
09:44 PM Backport #18616 (In Progress): kraken: segfault in handle_client_caps
Nathan Cutler
09:43 PM Backport #18566 (In Progress): kraken: MDS crashes on missing metadata object
Nathan Cutler
09:42 PM Backport #18562 (In Progress): kraken: Test Failure: kcephfs test_client_recovery.TestClientRecovery
Nathan Cutler
09:40 PM Backport #18552 (In Progress): kraken: ceph-fuse crash during snapshot tests
Nathan Cutler
09:39 PM Bug #18166 (Resolved): monitor cannot start because of "FAILED assert(info.state == MDSMap::STATE...
Nathan Cutler
09:39 PM Bug #18166: monitor cannot start because of "FAILED assert(info.state == MDSMap::STATE_STANDBY)"
kraken backport is unnecessary (fix already in v11.2.0) Nathan Cutler
09:39 PM Backport #18283 (Resolved): kraken: monitor cannot start because of "FAILED assert(info.state == ...
Already included in v11.2.0... Nathan Cutler
08:07 PM Support #16738: mount.ceph: unknown mount options: rbytes and norbytes
Ceph: v10.2.7
Linux Kernel: 4.9.21-040921-generic on Ubuntu 16.04.1
The "rbytes" option in fstab seems to be work...
Fred Drake
01:42 PM Bug #16886 (Can't reproduce): multimds: kclient hang (?) in tests
Zheng Yan
12:50 PM Bug #19630 (Fix Under Review): StrayManager::num_stray is inaccurate
https://github.com/ceph/ceph/pull/14554 Zheng Yan
09:26 AM Bug #19630 (Resolved): StrayManager::num_stray is inaccurate
Zheng Yan
09:50 AM Bug #19204 (Pending Backport): MDS assert failed when shutting down
John Spray
09:48 AM Feature #19551 (Resolved): CephFS MDS health messages should be logged in the cluster log
John Spray
08:28 AM Bug #18680 (Fix Under Review): multimds: cluster can assign active mds beyond max_mds during fail...
Zheng Yan
08:27 AM Bug #18680: multimds: cluster can assign active mds beyond max_mds during failures
commit "mon/MDSMonitor: only allow deactivating the mds with max rank" in https://github.com/ceph/ceph/pull/14550 sho... Zheng Yan
07:59 AM Bug #18755: multimds: MDCache.cc: 4735: FAILED assert(in)
by commit a1499bc4 (mds: stop purging strays when mds is being shutdown) Zheng Yan
07:56 AM Bug #18755 (Resolved): multimds: MDCache.cc: 4735: FAILED assert(in)
Zheng Yan
07:58 AM Bug #18754 (Resolved): multimds: MDCache.cc: 8569: FAILED assert(!info.ancestors.empty())
commit 20d43372 (mds: drop superfluous MMDSOpenInoReply) Zheng Yan
07:55 AM Bug #19239 (Fix Under Review): mds: stray count remains static after workflows complete
should be fixed by commits in https://github.com/ceph/ceph/pull/14550... Zheng Yan

04/13/2017

04:23 PM Bug #19395 (Fix Under Review): "Too many inodes in cache" warning can happen even when trimming i...
The bad state is long gone, so I'm just going to change this ticket to fixing the weird case where we were getting a ... John Spray
03:39 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Yeah, you're right Darrell, operator error :) I guess VM did not have correct storage network attached when I tried t... elder one
02:22 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
FYI, I don't think the module signature stuff is an issue. It's just notifying you that you've loaded a module that d... Darrell Enns
12:23 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
One difference I noticed between 4.4 and 4.9 kernels
- with 4.4 kernel on cephfs directory sizes (total bytes of al...
elder one
08:04 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Yeah, did that.
My test server with compiled 4.9 kernel and patched ceph module mounted cephfs just fine.
It might...
elder one
02:15 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
elder one wrote:
> Well, managed to patch ceph kernel module with 2b1ac852 commit, but my Ubuntu 4.9 kernel will not...
Zheng Yan
03:02 PM Bug #19566 (Fix Under Review): MDS crash on mgr message during shutdown
https://github.com/ceph/ceph/pull/14505 John Spray
02:24 PM Bug #19566 (In Progress): MDS crash on mgr message during shutdown
John Spray
02:31 PM Backport #19620 (Resolved): kraken: MDS server crashes due to inconsistent metadata.
https://github.com/ceph/ceph/pull/14574 Nathan Cutler
02:31 PM Backport #19619 (Resolved): jewel: MDS server crashes due to inconsistent metadata.
https://github.com/ceph/ceph/pull/14676 Nathan Cutler
12:03 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
Patrick was going to spin up a patch for this, so reassigning to him. Patrick, if I have that wrong, then please just... Jeff Layton
11:07 AM Bug #19406 (Pending Backport): MDS server crashes due to inconsistent metadata.
Marking as pending backport for the fix to data-scan which seems likely to be the underlying cause https://github.com... John Spray
08:13 AM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
Well, currently the last MDS fails over to the old balancer, so he can in fact shift his load back to the others acco... Dan van der Ster

04/12/2017

10:27 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Well, managed to patch ceph kernel module with 2b1ac852 commit, but my Ubuntu 4.9 kernel will not load unsigned modul... elder one
01:36 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
could you try adding commit 2b1ac852 (ceph: try getting buffer capability for readahead/fadvise) to your 4.9.x kernel... Zheng Yan
07:36 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
I do have mmap disabled in Dovecot conf.
Relevant bits from Dovecot conf:
mmap_disable = yes
mail_nfs_index = ...
elder one
02:17 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
elder one wrote:
> Got another error with 4.9.21 kernel.
>
> On the cephfs node /sys/kernel/debug/ceph/xxx/:
>
...
Zheng Yan
08:29 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
We could make the greedyspill.lua balancer check to see if it is the last MDS. Then just return instead of failing. I... Michael Sevilla
03:17 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
and the cpu use on MDS0 stays at +/- 250% Mark Guz
03:16 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
This error shouldn't be an expected occurrence. I'll create a fix for this. Patrick Donnelly
03:15 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
i see this in the logs... Mark Guz
03:12 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
Nope. Is your load on mds.0 ? If yes, and it get's heavily loaded, and if mds.1 has load = 0, then i expect the balan... Dan van der Ster
03:09 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
did you modify the greedyspill.lua script at all? Mark Guz
02:51 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
BTW, you need to set debug_mds_balancer = 2 to see the balance working. Dan van der Ster
02:49 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
Yes. For example, when I have 50 clients untarring the linux kernel into unique directories, the load is moved around. Dan van der Ster
02:39 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
Dan, do you see any evidence of actual load balancing? Mark Guz
02:39 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
I also see this error. I have 2 Active/active mdses. The first shows no errors, the second shows the errors above. No... Mark Guz
11:56 AM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
Ahh, it's even documented:... Dan van der Ster
11:55 AM Bug #19589 (Resolved): greedyspill.lua: :18: attempt to index a nil value (field '?')
The included greedyspill.lua doesn't seem to work in a simple 3-active MDS scenario.... Dan van der Ster
01:58 PM Bug #19593 (Resolved): purge queue and standby replay mds
Current code opens the purge queue when mds starts standby replay. purge queue can have changed when the standby repl... Zheng Yan
12:46 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
John Spray wrote:
> Intuitively I would say that things we expose as xattrs should probably bump ctime when they're ...
Jeff Layton
12:30 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
Intuitively I would say that things we expose as xattrs should probably bump ctime when they're set, if we're being c... John Spray
11:54 AM Bug #19388 (Closed): mount.ceph does not accept -s option
John Spray
11:43 AM Bug #18578 (Resolved): failed filelock.can_read(-1) assertion in Server::_dir_is_nonempty
Nathan Cutler
11:43 AM Backport #18707 (Resolved): kraken: failed filelock.can_read(-1) assertion in Server::_dir_is_non...
Nathan Cutler
10:46 AM Bug #18461 (Resolved): failed to reconnect caps during snapshot tests
Nathan Cutler
10:46 AM Backport #18678 (Resolved): kraken: failed to reconnect caps during snapshot tests
Nathan Cutler
10:24 AM Bug #19205 (Resolved): Invalid error code returned by MDS is causing a kernel client WARNING
Nathan Cutler
10:24 AM Backport #19206 (Resolved): jewel: Invalid error code returned by MDS is causing a kernel client ...
Nathan Cutler

04/11/2017

09:03 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Dumped also mds cache (healhty cluster state 8 hours later)
Searched inode in mds error log:...
elder one
01:54 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Got another error with 4.9.21 kernel.
On the cephfs node /sys/kernel/debug/ceph/xxx/:...
elder one
08:16 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
No, I mean the ctime. You only want to update the mtime if you're changing the directory's contents. I would consider... Jeff Layton
07:55 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
> Should we be updating the ctime and change_attr in the ceph.dir.layout case as well?
Did you mean mtime?
I th...
Patrick Donnelly
07:32 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
I think you're right -- well spotted! In particular, we need to bump it in the case where we update the ctime (ceph.f... Jeff Layton
07:21 PM Bug #19583 (Resolved): mds: change_attr not inc in Server::handle_set_vxattr
Noticed this was missing, was this intentional? Patrick Donnelly
07:22 PM Bug #19388: mount.ceph does not accept -s option
Looking at the complete picture, I think it's easiest (and acceptable for me) to wait for Jessie's successor, Stretch... Michel Roelofs
06:53 AM Feature #19578: mds: optimize CDir::_omap_commit() and CDir::_committed() for large directory
We can track dirty dentries in a dirty list, CDir::_omap_commit() and CDir::_committed() only needs to check dentries... Zheng Yan
06:51 AM Feature #19578 (Resolved): mds: optimize CDir::_omap_commit() and CDir::_committed() for large di...
CDir::_omap_commit() and CDir::_committed() need to traverse whole dirfrag to find dirty dentries. It's not efficienc... Zheng Yan
03:38 AM Bug #19450 (Fix Under Review): PurgeQueue read journal crash
https://github.com/ceph/ceph/pull/14447 Zheng Yan
02:18 AM Bug #19450: PurgeQueue read journal crash
mds always first read all entries in mds log, then write new entries to it. The partial entry is detected and dropped... Zheng Yan

04/10/2017

10:58 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Some wierd issues raised with my applications (unrelated to cephfs) with 4.10.x kernel.
The same fix is in 4.9.x k...
elder one
02:19 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Upgraded cephfs clients to 4.10.9 kernel.
elder one
12:08 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Can you try updating the cephfs client to 4.10.x kernel Zheng Yan
09:59 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Error happened again:... elder one
11:48 AM Bug #19566 (Resolved): MDS crash on mgr message during shutdown
... John Spray

04/09/2017

05:30 PM Bug #19388: mount.ceph does not accept -s option
Fair point about options getting passed through.
The thing I'm not sure about here is who is going to backport a f...
John Spray
01:16 PM Bug #19388: mount.ceph does not accept -s option
mount.ceph passes options which it does not recognize directly to the kernel mount function, therefore a full impleme... Michel Roelofs
12:45 PM Bug #19450: PurgeQueue read journal crash
hit this again... Zheng Yan

04/08/2017

10:03 AM Bug #19406: MDS server crashes due to inconsistent metadata.
The master install-deps.sh installed a flood of other stuff the earlier didn't. But still get fatal errors when runni... Christoffer Lilja
09:52 AM Bug #19406: MDS server crashes due to inconsistent metadata.
I'll download latest master instead and try. Christoffer Lilja
09:41 AM Bug #19406: MDS server crashes due to inconsistent metadata.
Okay, I followed the guide at the Ceph site how to prepare and compile Ceph.
I already did run install_deps.sh in or...
Christoffer Lilja

04/07/2017

03:04 PM Feature #19551 (Fix Under Review): CephFS MDS health messages should be logged in the cluster log

This was inspired by a ceph.log from cern's testing, in which we could see mysterious-looking fsmap epoch bump mess...
John Spray
03:01 PM Feature #19551 (Resolved): CephFS MDS health messages should be logged in the cluster log
John Spray
08:37 AM Bug #19239 (In Progress): mds: stray count remains static after workflows complete
Zheng Yan
07:26 AM Bug #18850 (Rejected): Leak in MDCache::handle_dentry_unlink
Zheng Yan
05:53 AM Bug #19445 (Resolved): client: warning: ‘*’ in boolean context, suggest ‘&&’ instead #14263
The PR#14308 is merged. Jos Collin

04/06/2017

01:30 PM Bug #18850: Leak in MDCache::handle_dentry_unlink
It's likely some CInode/CDir/CDentry in cache are not properly freed when mds process exits. not caused by MDCache::h... Zheng Yan
01:03 AM Bug #19406: MDS server crashes due to inconsistent metadata.
Christoffer Lilja wrote:
> Hi,
>
> I have little spare time to spend on this and can't get past this compile erro...
Zheng Yan

04/05/2017

07:54 PM Bug #19406: MDS server crashes due to inconsistent metadata.
Hi,
I have little spare time to spend on this and can't get past this compile error due to my very limited knowled...
Christoffer Lilja
02:03 AM Bug #19406: MDS server crashes due to inconsistent metadata.
Christoffer Lilja wrote:
> I downloaded the 10.2.6 tar and tried to apply the above patch from Zheng Yan but it got ...
Zheng Yan
05:10 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Thanks for the link Zheng. Looks like that fix is in mainline as of 4.10.2. When I have a chance, I'll try upgrading ... Darrell Enns
09:13 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
elder one wrote:
> My cephfs clients (Ubuntu Xenial) are running kernel from Ubuntu PPA: http://kernel.ubuntu.com/~k...
Zheng Yan
08:24 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
My cephfs clients (Ubuntu Xenial) are running kernel from Ubuntu PPA: http://kernel.ubuntu.com/~kernel-ppa/mainline/v... elder one
07:39 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
elder one wrote:
> Hit the same bug today with 4.4.59 kernel client.
> 2 MDS servers (1 standby) 10.2.6-1 on Ubuntu...
Zheng Yan
07:36 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Darrell Enns wrote:
> Zheng Yan wrote:
> > probably fixed by https://github.com/ceph/ceph-client/commit/10a2699426a...
Zheng Yan
01:33 PM Bug #19501 (Fix Under Review): C_MDSInternalNoop::complete doesn't free itself
Zheng Yan
01:33 PM Bug #19501: C_MDSInternalNoop::complete doesn't free itself
https://github.com/ceph/ceph/pull/14347 Zheng Yan
01:17 PM Bug #19501 (Resolved): C_MDSInternalNoop::complete doesn't free itself
This cause memory leak Zheng Yan
09:20 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
geng jichao wrote:
> I have a question, if the file struct is destroyed,how to ensure that cache_ctl.index is correc...
Zheng Yan
05:52 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
I have a question, if the file struct is destroyed,how to ensure that cache_ctl.index is correct。
In other words,req...
geng jichao
01:32 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
kernel patch https://github.com/ceph/ceph-client/commit/b7e2eee12aa174bc91279a7cee85e9ea73092bad Zheng Yan
06:01 AM Bug #19438: ceph mds error "No space left on device"
ceph mds error "No space left on device" ,This problem I have encountered in the 10.2.6 version 。
Need to add a lin...
berlin sun
04:12 AM Bug #19450: PurgeQueue read journal crash
... Zheng Yan

04/04/2017

09:09 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Very interesting! That means we have:
4.4.59 - bug
4.7.5 - no bug (or at least rare enough that I'm not seeing it...
Darrell Enns
06:03 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Hit the same bug today with 4.4.59 kernel client.
2 MDS servers (1 standby) 10.2.6-1 on Ubuntu Trusty.
From mds l...
elder one
09:05 PM Bug #19388: mount.ceph does not accept -s option
The kernel client has a mount helper in src/mount/mount.ceph.c -- although that only comes into play if the ceph pack... John Spray
01:50 PM Bug #19306 (Fix Under Review): fs: mount NFS to cephfs, and then ls a directory containing a larg...
https://github.com/ceph/ceph/pull/14317 Zheng Yan
01:18 PM Bug #19456: Hadoop suite failing on missing upstream tarball
I see the hadoop task is in ceph/teuthology - would it make sense to move it to ceph/ceph? Nathan Cutler
12:44 PM Backport #19483 (Resolved): kraken: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" co...
https://github.com/ceph/ceph/pull/14573 Nathan Cutler
12:44 PM Backport #19482 (Resolved): jewel: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" com...
https://github.com/ceph/ceph/pull/14674 Nathan Cutler
12:42 PM Backport #19466 (Resolved): jewel: mds: log rotation doesn't work if mds has respawned
https://github.com/ceph/ceph/pull/14673 Nathan Cutler
01:03 AM Bug #19445 (In Progress): client: warning: ‘*’ in boolean context, suggest ‘&&’ instead #14263
https://github.com/ceph/ceph/pull/14308 Brad Hubbard

04/03/2017

09:29 PM Bug #19456: Hadoop suite failing on missing upstream tarball
Looks like v2.5.2 is too old. It is now in the Apache archive: https://archive.apache.org/dist/hadoop/core/hadoop-2.5... Ken Dreyer
08:20 PM Bug #19456 (Rejected): Hadoop suite failing on missing upstream tarball
This is jewel v10.2.7 point release candidate
Also true on all branches
Run: http://pulpito.ceph.com/yuriw-2017-0...
Yuri Weinstein
08:15 PM Bug #19388: mount.ceph does not accept -s option
Looking at it, it seems I'll have to update the Linux kernel as well to support the sloppy option. Doing so in a sane... Michel Roelofs
12:53 PM Bug #19306 (In Progress): fs: mount NFS to cephfs, and then ls a directory containing a large num...
Zheng Yan
12:44 PM Bug #19450 (Resolved): PurgeQueue read journal crash
... Zheng Yan
11:16 AM Bug #19437 (Fix Under Review): fs:The mount point break off when mds switch hanppened.
https://github.com/ceph/ceph/pull/14267 John Spray
 

Also available in: Atom