Project

General

Profile

Activity

From 03/22/2017 to 04/20/2017

04/20/2017

09:54 PM Backport #19709 (In Progress): jewel: Enable MDS to start when session ino info is corrupt
Nathan Cutler
12:13 PM Backport #19709 (Resolved): jewel: Enable MDS to start when session ino info is corrupt
https://github.com/ceph/ceph/pull/14700 Nathan Cutler
09:49 PM Backport #19679 (In Progress): jewel: MDS: damage reporting by ino number is useless
Nathan Cutler
09:29 PM Backport #19677 (In Progress): jewel: Jewel ceph-fuse does not recover after lost connection to MDS
Nathan Cutler
05:55 PM Backport #19466 (Need More Info): jewel: mds: log rotation doesn't work if mds has respawned
Needs 3ba63063 1fb15a21
Of these two, 3ba63063 is non-trivial.
Nathan Cutler
11:32 AM Backport #19466 (In Progress): jewel: mds: log rotation doesn't work if mds has respawned
Nathan Cutler
05:06 PM Backport #19620 (Resolved): kraken: MDS server crashes due to inconsistent metadata.
Nathan Cutler
05:06 PM Backport #19483 (Resolved): kraken: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" co...
Nathan Cutler
05:05 PM Backport #19335 (Resolved): kraken: MDS heartbeat timeout during rejoin, when working with large ...
Nathan Cutler
05:04 PM Backport #19045 (Resolved): kraken: buffer overflow in test LibCephFS.DirLs
Nathan Cutler
05:03 PM Backport #18950 (Resolved): kraken: mds/StrayManager: avoid reusing deleted inode in StrayManager...
Nathan Cutler
05:02 PM Backport #18899 (Resolved): kraken: Test failure: test_open_inode
Nathan Cutler
05:01 PM Backport #18706 (Resolved): kraken: fragment space check can cause replayed request fail
Nathan Cutler
04:59 PM Backport #18700 (Resolved): kraken: client: fix the cross-quota rename boundary check conditions
Nathan Cutler
04:58 PM Bug #18306 (Resolved): segfault in handle_client_caps
Nathan Cutler
04:58 PM Backport #18616 (Resolved): kraken: segfault in handle_client_caps
Nathan Cutler
04:57 PM Bug #18179 (Resolved): MDS crashes on missing metadata object
Nathan Cutler
04:57 PM Backport #18566 (Resolved): kraken: MDS crashes on missing metadata object
Nathan Cutler
04:56 PM Bug #18396 (Resolved): Test Failure: kcephfs test_client_recovery.TestClientRecovery
Nathan Cutler
07:57 AM Bug #18396: Test Failure: kcephfs test_client_recovery.TestClientRecovery
http://qa-proxy.ceph.com/teuthology/teuthology-2017-04-13_05:20:02-kcephfs-kraken-testing-basic-smithi/1019312/ Zheng Yan
04:56 PM Backport #18562 (Resolved): kraken: Test Failure: kcephfs test_client_recovery.TestClientRecovery
Nathan Cutler
04:55 PM Bug #18460 (Resolved): ceph-fuse crash during snapshot tests
Nathan Cutler
04:55 PM Backport #18552 (Resolved): kraken: ceph-fuse crash during snapshot tests
Nathan Cutler
02:53 PM Bug #19707 (Duplicate): Hadoop tests fail due to missing upstream tarball
Indeed it is -- I hadn't seen that other ticket because it was in the wrong project. John Spray
02:04 PM Bug #19707: Hadoop tests fail due to missing upstream tarball
Dup of #19456? Ken Dreyer
09:57 AM Bug #19707 (Duplicate): Hadoop tests fail due to missing upstream tarball
http://pulpito.ceph.com/teuthology-2017-04-03_03:45:03-hadoop-master---basic-mira/... John Spray
01:49 PM Bug #19712 (New): some kcephfs tests become very slow
http://qa-proxy.ceph.com/teuthology/teuthology-2017-04-16_04:20:02-kcephfs-jewel-testing-basic-smithi/
http://pulpit...
Zheng Yan
01:40 PM Backport #19675 (In Progress): jewel: cephfs: Test failure: test_data_isolated (tasks.cephfs.test...
Nathan Cutler
01:24 PM Backport #19673 (In Progress): jewel: cephfs: mds is crushed, after I set about 400 64KB xattr kv...
Nathan Cutler
01:22 PM Backport #19671 (In Progress): jewel: MDS assert failed when shutting down
Nathan Cutler
01:15 PM Backport #19668 (In Progress): jewel: MDS goes readonly writing backtrace for a file whose data p...
Nathan Cutler
01:04 PM Bug #18872 (Fix Under Review): write to cephfs mount hangs, ceph-fuse and kernel
Turns out this is an issue of ceph leaking arch-dependend flags on the wire. See kernel ml [PATCH] ceph: Fix file ope... Jan Fajerski
12:13 PM Backport #19710 (Resolved): kraken: Enable MDS to start when session ino info is corrupt
https://github.com/ceph/ceph/pull/16107 Nathan Cutler
12:08 PM Backport #19666 (In Progress): jewel: fs:The mount point break off when mds switch hanppened.
Nathan Cutler
11:42 AM Backport #19665 (In Progress): jewel: C_MDSInternalNoop::complete doesn't free itself
Nathan Cutler
11:37 AM Backport #19619 (In Progress): jewel: MDS server crashes due to inconsistent metadata.
Nathan Cutler
11:34 AM Backport #19482 (In Progress): jewel: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" ...
Nathan Cutler
10:57 AM Backport #19334 (In Progress): jewel: MDS heartbeat timeout during rejoin, when working with larg...
Nathan Cutler
10:55 AM Backport #19044 (In Progress): jewel: buffer overflow in test LibCephFS.DirLs
Nathan Cutler
10:54 AM Backport #18949 (In Progress): jewel: mds/StrayManager: avoid reusing deleted inode in StrayManag...
Nathan Cutler
10:49 AM Backport #18900 (In Progress): jewel: Test failure: test_open_inode
Nathan Cutler
10:47 AM Backport #18705 (In Progress): jewel: fragment space check can cause replayed request fail
Nathan Cutler
10:44 AM Backport #18699 (In Progress): jewel: client: fix the cross-quota rename boundary check conditions
Nathan Cutler
10:22 AM Bug #16842: mds: replacement MDS crashes on InoTable release
Created ticket for the workaround and marked it for backport here: http://tracker.ceph.com/issues/19708
I should h...
John Spray
10:21 AM Fix #19708 (Resolved): Enable MDS to start when session ino info is corrupt

This was a mitigation for issue #16842, which is itself a mystery.
Creating ticket to backport it.
Fix on mas...
John Spray
09:03 AM Bug #18816 (Fix Under Review): MDS crashes with log disabled
I'm proposing that we rip out this configuration option, it's a trap for the unwary:
https://github.com/ceph/ceph/pu...
John Spray
08:30 AM Bug #19706 (Can't reproduce): Laggy mon daemons causing MDS failover (symptom: failed to set coun...
http://qa-proxy.ceph.com/teuthology/teuthology-2017-04-15_03:15:10-fs-master---basic-smithi/1027137/ Zheng Yan

04/19/2017

10:24 PM Feature #17834 (Fix Under Review): MDS Balancer overrides
https://github.com/ceph/ceph/pull/14598 Patrick Donnelly
04:02 PM Bug #15467 (Won't Fix): After "mount -l", ceph-fuse does not work
This appears to have happened on pre-jewel code, so it's unlikely anyone is interested in investigating. John Spray
10:11 AM Fix #19691 (Fix Under Review): Remove journaler_allow_split_entries option
https://github.com/ceph/ceph/pull/14636 John Spray
10:06 AM Fix #19691 (Resolved): Remove journaler_allow_split_entries option
This has been broken in practice since at least when the MDS journal format (JournalStream etc) was changed, as the r... John Spray

04/18/2017

08:38 PM Feature #17980 (Fix Under Review): MDS should reject connections from OSD-blacklisted clients
https://github.com/ceph/ceph/pull/14610 John Spray
08:38 PM Feature #9754 (Fix Under Review): A 'fence and evict' client eviction command
See #17980 patch John Spray
07:39 PM Backport #19680 (Resolved): kraken: MDS: damage reporting by ino number is useless
https://github.com/ceph/ceph/pull/16106 Nathan Cutler
07:39 PM Backport #19679 (Resolved): jewel: MDS: damage reporting by ino number is useless
https://github.com/ceph/ceph/pull/14699 Nathan Cutler
07:38 PM Backport #19678 (Resolved): kraken: Jewel ceph-fuse does not recover after lost connection to MDS
https://github.com/ceph/ceph/pull/16105 Nathan Cutler
07:38 PM Backport #19677 (Resolved): jewel: Jewel ceph-fuse does not recover after lost connection to MDS
https://github.com/ceph/ceph/pull/14698 Nathan Cutler
07:38 PM Backport #19676 (Resolved): kraken: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_v...
https://github.com/ceph/ceph/pull/16104 Nathan Cutler
07:38 PM Backport #19675 (Resolved): jewel: cephfs: Test failure: test_data_isolated (tasks.cephfs.test_vo...
https://github.com/ceph/ceph/pull/14685 Nathan Cutler
07:38 PM Backport #19674 (Resolved): kraken: cephfs: mds is crushed, after I set about 400 64KB xattr kv p...
https://github.com/ceph/ceph/pull/16103 Nathan Cutler
07:38 PM Backport #19673 (Resolved): jewel: cephfs: mds is crushed, after I set about 400 64KB xattr kv pa...
https://github.com/ceph/ceph/pull/14684 Nathan Cutler
07:37 PM Backport #19672 (Resolved): kraken: MDS assert failed when shutting down
https://github.com/ceph/ceph/pull/16102 Nathan Cutler
07:37 PM Backport #19671 (Resolved): jewel: MDS assert failed when shutting down
https://github.com/ceph/ceph/pull/14683 Nathan Cutler
07:37 PM Backport #19669 (Resolved): kraken: MDS goes readonly writing backtrace for a file whose data poo...
https://github.com/ceph/ceph/pull/16101 Nathan Cutler
07:37 PM Backport #19668 (Resolved): jewel: MDS goes readonly writing backtrace for a file whose data pool...
https://github.com/ceph/ceph/pull/14682 Nathan Cutler
07:37 PM Backport #19667 (Resolved): kraken: fs:The mount point break off when mds switch hanppened.
https://github.com/ceph/ceph/pull/16100 Nathan Cutler
07:37 PM Backport #19666 (Resolved): jewel: fs:The mount point break off when mds switch hanppened.
https://github.com/ceph/ceph/pull/14679 Nathan Cutler
07:37 PM Backport #19665 (Resolved): jewel: C_MDSInternalNoop::complete doesn't free itself
https://github.com/ceph/ceph/pull/14677 Nathan Cutler
07:37 PM Backport #19664 (Resolved): kraken: C_MDSInternalNoop::complete doesn't free itself
https://github.com/ceph/ceph/pull/16099 Nathan Cutler
04:20 PM Bug #16842: mds: replacement MDS crashes on InoTable release
backport? c sights
01:29 PM Feature #10792 (New): qa: enable thrasher for MDS cluster size (vary max_mds)
The thrasher exists, this ticket is now for switching it on and getting the resulting runs green. John Spray
01:28 PM Feature #15068 (Resolved): fsck: multifs: enable repair tools to read from one filesystem and wri...
John Spray
01:28 PM Feature #15069 (Resolved): MDS: multifs: enable two filesystems to point to same pools if one of ...
John Spray
01:28 PM Fix #15134 (New): multifs: test case exercising mds_thrash for multiple filesystems
(tweaking ticket to be for creating a test case that uses the new/smarter thrashing code in a multi-fs way) John Spray
01:26 PM Bug #18579 (Resolved): Fuse client has "opening" session to nonexistent MDS rank after MDS cluste...
John Spray
01:26 PM Bug #18914 (Pending Backport): cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume...
John Spray
01:26 PM Feature #19075 (Resolved): Extend 'p' mds auth cap to cover quotas and all layout fields
John Spray
01:24 PM Bug #19501 (Pending Backport): C_MDSInternalNoop::complete doesn't free itself
John Spray
01:23 PM Bug #19566 (Resolved): MDS crash on mgr message during shutdown
John Spray
12:51 PM Bug #19640 (Resolved): ceph-fuse should only return writeback errors once per file handle
John Spray
11:49 AM Feature #18509 (Pending Backport): MDS: damage reporting by ino number is useless
John Spray

04/17/2017

02:08 PM Bug #19426: knfs blogbench hang
Sorry I didn't see this sooner. Is this still cropping up?
So what might be helpful the next time this happens loc...
Jeff Layton
01:17 PM Bug #19640 (Fix Under Review): ceph-fuse should only return writeback errors once per file handle
https://github.com/ceph/ceph/pull/14589 John Spray
01:16 PM Bug #19640 (Resolved): ceph-fuse should only return writeback errors once per file handle
Currently if someone fsyncs and sees the error, we give them the same error again on fclose.
We should match the n...
John Spray
11:13 AM Bug #19635 (In Progress): Deadlock on two ceph-fuse clients accessing the same file
This bug happens in following sequence of events
- Reuqest1 (from client1) create file1 (mds issues caps Asx to cl...
Zheng Yan
12:15 AM Bug #19635: Deadlock on two ceph-fuse clients accessing the same file
Those requests are getting hung up on the iauth and ixattr locks on the inode for the ".syn" file the test script cre... John Spray

04/16/2017

11:02 PM Bug #19635: Deadlock on two ceph-fuse clients accessing the same file
I was wondering if d463107473 ("mds: finish lock waiters in the same order that they were added.") could have been th... John Spray
09:48 PM Bug #19635 (Resolved): Deadlock on two ceph-fuse clients accessing the same file
See Dan's reproducer script, and thread "[ceph-users] fsping, why you no work no mo?"
https://raw.githubusercontent....
John Spray
01:19 PM Support #16738 (Closed): mount.ceph: unknown mount options: rbytes and norbytes
John Spray

04/15/2017

06:47 PM Bug #19437 (Pending Backport): fs:The mount point break off when mds switch hanppened.
John Spray
06:46 PM Bug #19033 (Pending Backport): cephfs: mds is crushed, after I set about 400 64KB xattr kv pairs ...
John Spray
06:45 PM Bug #18757 (Pending Backport): Jewel ceph-fuse does not recover after lost connection to MDS
Let's backport this for the benefit of people running cephfs today John Spray
06:41 PM Bug #19401 (Pending Backport): MDS goes readonly writing backtrace for a file whose data pool has...
John Spray
03:21 PM Support #16738: mount.ceph: unknown mount options: rbytes and norbytes
This issue can be closed. I have not experienced this issue anymore and the original systems the issue was found is n... Alexander Trost
11:15 AM Bug #19022 (Resolved): Crash in Client::queue_cap_snap when thrashing
John Spray

04/14/2017

10:00 PM Backport #19620 (In Progress): kraken: MDS server crashes due to inconsistent metadata.
Nathan Cutler
09:59 PM Bug #19406: MDS server crashes due to inconsistent metadata.
*master PR*: https://github.com/ceph/ceph/pull/14234 Nathan Cutler
09:57 PM Backport #19483 (In Progress): kraken: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it"...
Nathan Cutler
09:55 PM Backport #19335 (In Progress): kraken: MDS heartbeat timeout during rejoin, when working with lar...
Nathan Cutler
09:54 PM Backport #19045 (In Progress): kraken: buffer overflow in test LibCephFS.DirLs
Nathan Cutler
09:52 PM Backport #18950 (In Progress): kraken: mds/StrayManager: avoid reusing deleted inode in StrayMana...
Nathan Cutler
09:49 PM Backport #18899 (In Progress): kraken: Test failure: test_open_inode
Nathan Cutler
09:48 PM Backport #18706 (In Progress): kraken: fragment space check can cause replayed request fail
Nathan Cutler
09:46 PM Backport #18700 (In Progress): kraken: client: fix the cross-quota rename boundary check conditions
Nathan Cutler
09:44 PM Backport #18616 (In Progress): kraken: segfault in handle_client_caps
Nathan Cutler
09:43 PM Backport #18566 (In Progress): kraken: MDS crashes on missing metadata object
Nathan Cutler
09:42 PM Backport #18562 (In Progress): kraken: Test Failure: kcephfs test_client_recovery.TestClientRecovery
Nathan Cutler
09:40 PM Backport #18552 (In Progress): kraken: ceph-fuse crash during snapshot tests
Nathan Cutler
09:39 PM Bug #18166 (Resolved): monitor cannot start because of "FAILED assert(info.state == MDSMap::STATE...
Nathan Cutler
09:39 PM Bug #18166: monitor cannot start because of "FAILED assert(info.state == MDSMap::STATE_STANDBY)"
kraken backport is unnecessary (fix already in v11.2.0) Nathan Cutler
09:39 PM Backport #18283 (Resolved): kraken: monitor cannot start because of "FAILED assert(info.state == ...
Already included in v11.2.0... Nathan Cutler
08:07 PM Support #16738: mount.ceph: unknown mount options: rbytes and norbytes
Ceph: v10.2.7
Linux Kernel: 4.9.21-040921-generic on Ubuntu 16.04.1
The "rbytes" option in fstab seems to be work...
Fred Drake
01:42 PM Bug #16886 (Can't reproduce): multimds: kclient hang (?) in tests
Zheng Yan
12:50 PM Bug #19630 (Fix Under Review): StrayManager::num_stray is inaccurate
https://github.com/ceph/ceph/pull/14554 Zheng Yan
09:26 AM Bug #19630 (Resolved): StrayManager::num_stray is inaccurate
Zheng Yan
09:50 AM Bug #19204 (Pending Backport): MDS assert failed when shutting down
John Spray
09:48 AM Feature #19551 (Resolved): CephFS MDS health messages should be logged in the cluster log
John Spray
08:28 AM Bug #18680 (Fix Under Review): multimds: cluster can assign active mds beyond max_mds during fail...
Zheng Yan
08:27 AM Bug #18680: multimds: cluster can assign active mds beyond max_mds during failures
commit "mon/MDSMonitor: only allow deactivating the mds with max rank" in https://github.com/ceph/ceph/pull/14550 sho... Zheng Yan
07:59 AM Bug #18755: multimds: MDCache.cc: 4735: FAILED assert(in)
by commit a1499bc4 (mds: stop purging strays when mds is being shutdown) Zheng Yan
07:56 AM Bug #18755 (Resolved): multimds: MDCache.cc: 4735: FAILED assert(in)
Zheng Yan
07:58 AM Bug #18754 (Resolved): multimds: MDCache.cc: 8569: FAILED assert(!info.ancestors.empty())
commit 20d43372 (mds: drop superfluous MMDSOpenInoReply) Zheng Yan
07:55 AM Bug #19239 (Fix Under Review): mds: stray count remains static after workflows complete
should be fixed by commits in https://github.com/ceph/ceph/pull/14550... Zheng Yan

04/13/2017

04:23 PM Bug #19395 (Fix Under Review): "Too many inodes in cache" warning can happen even when trimming i...
The bad state is long gone, so I'm just going to change this ticket to fixing the weird case where we were getting a ... John Spray
03:39 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Yeah, you're right Darrell, operator error :) I guess VM did not have correct storage network attached when I tried t... elder one
02:22 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
FYI, I don't think the module signature stuff is an issue. It's just notifying you that you've loaded a module that d... Darrell Enns
12:23 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
One difference I noticed between 4.4 and 4.9 kernels
- with 4.4 kernel on cephfs directory sizes (total bytes of al...
elder one
08:04 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Yeah, did that.
My test server with compiled 4.9 kernel and patched ceph module mounted cephfs just fine.
It might...
elder one
02:15 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
elder one wrote:
> Well, managed to patch ceph kernel module with 2b1ac852 commit, but my Ubuntu 4.9 kernel will not...
Zheng Yan
03:02 PM Bug #19566 (Fix Under Review): MDS crash on mgr message during shutdown
https://github.com/ceph/ceph/pull/14505 John Spray
02:24 PM Bug #19566 (In Progress): MDS crash on mgr message during shutdown
John Spray
02:31 PM Backport #19620 (Resolved): kraken: MDS server crashes due to inconsistent metadata.
https://github.com/ceph/ceph/pull/14574 Nathan Cutler
02:31 PM Backport #19619 (Resolved): jewel: MDS server crashes due to inconsistent metadata.
https://github.com/ceph/ceph/pull/14676 Nathan Cutler
12:03 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
Patrick was going to spin up a patch for this, so reassigning to him. Patrick, if I have that wrong, then please just... Jeff Layton
11:07 AM Bug #19406 (Pending Backport): MDS server crashes due to inconsistent metadata.
Marking as pending backport for the fix to data-scan which seems likely to be the underlying cause https://github.com... John Spray
08:13 AM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
Well, currently the last MDS fails over to the old balancer, so he can in fact shift his load back to the others acco... Dan van der Ster

04/12/2017

10:27 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Well, managed to patch ceph kernel module with 2b1ac852 commit, but my Ubuntu 4.9 kernel will not load unsigned modul... elder one
01:36 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
could you try adding commit 2b1ac852 (ceph: try getting buffer capability for readahead/fadvise) to your 4.9.x kernel... Zheng Yan
07:36 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
I do have mmap disabled in Dovecot conf.
Relevant bits from Dovecot conf:
mmap_disable = yes
mail_nfs_index = ...
elder one
02:17 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
elder one wrote:
> Got another error with 4.9.21 kernel.
>
> On the cephfs node /sys/kernel/debug/ceph/xxx/:
>
...
Zheng Yan
08:29 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
We could make the greedyspill.lua balancer check to see if it is the last MDS. Then just return instead of failing. I... Michael Sevilla
03:17 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
and the cpu use on MDS0 stays at +/- 250% Mark Guz
03:16 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
This error shouldn't be an expected occurrence. I'll create a fix for this. Patrick Donnelly
03:15 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
i see this in the logs... Mark Guz
03:12 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
Nope. Is your load on mds.0 ? If yes, and it get's heavily loaded, and if mds.1 has load = 0, then i expect the balan... Dan van der Ster
03:09 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
did you modify the greedyspill.lua script at all? Mark Guz
02:51 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
BTW, you need to set debug_mds_balancer = 2 to see the balance working. Dan van der Ster
02:49 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
Yes. For example, when I have 50 clients untarring the linux kernel into unique directories, the load is moved around. Dan van der Ster
02:39 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
Dan, do you see any evidence of actual load balancing? Mark Guz
02:39 PM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
I also see this error. I have 2 Active/active mdses. The first shows no errors, the second shows the errors above. No... Mark Guz
11:56 AM Bug #19589: greedyspill.lua: :18: attempt to index a nil value (field '?')
Ahh, it's even documented:... Dan van der Ster
11:55 AM Bug #19589 (Resolved): greedyspill.lua: :18: attempt to index a nil value (field '?')
The included greedyspill.lua doesn't seem to work in a simple 3-active MDS scenario.... Dan van der Ster
01:58 PM Bug #19593 (Resolved): purge queue and standby replay mds
Current code opens the purge queue when mds starts standby replay. purge queue can have changed when the standby repl... Zheng Yan
12:46 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
John Spray wrote:
> Intuitively I would say that things we expose as xattrs should probably bump ctime when they're ...
Jeff Layton
12:30 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
Intuitively I would say that things we expose as xattrs should probably bump ctime when they're set, if we're being c... John Spray
11:54 AM Bug #19388 (Closed): mount.ceph does not accept -s option
John Spray
11:43 AM Bug #18578 (Resolved): failed filelock.can_read(-1) assertion in Server::_dir_is_nonempty
Nathan Cutler
11:43 AM Backport #18707 (Resolved): kraken: failed filelock.can_read(-1) assertion in Server::_dir_is_non...
Nathan Cutler
10:46 AM Bug #18461 (Resolved): failed to reconnect caps during snapshot tests
Nathan Cutler
10:46 AM Backport #18678 (Resolved): kraken: failed to reconnect caps during snapshot tests
Nathan Cutler
10:24 AM Bug #19205 (Resolved): Invalid error code returned by MDS is causing a kernel client WARNING
Nathan Cutler
10:24 AM Backport #19206 (Resolved): jewel: Invalid error code returned by MDS is causing a kernel client ...
Nathan Cutler

04/11/2017

09:03 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Dumped also mds cache (healhty cluster state 8 hours later)
Searched inode in mds error log:...
elder one
01:54 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Got another error with 4.9.21 kernel.
On the cephfs node /sys/kernel/debug/ceph/xxx/:...
elder one
08:16 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
No, I mean the ctime. You only want to update the mtime if you're changing the directory's contents. I would consider... Jeff Layton
07:55 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
> Should we be updating the ctime and change_attr in the ceph.dir.layout case as well?
Did you mean mtime?
I th...
Patrick Donnelly
07:32 PM Bug #19583: mds: change_attr not inc in Server::handle_set_vxattr
I think you're right -- well spotted! In particular, we need to bump it in the case where we update the ctime (ceph.f... Jeff Layton
07:21 PM Bug #19583 (Resolved): mds: change_attr not inc in Server::handle_set_vxattr
Noticed this was missing, was this intentional? Patrick Donnelly
07:22 PM Bug #19388: mount.ceph does not accept -s option
Looking at the complete picture, I think it's easiest (and acceptable for me) to wait for Jessie's successor, Stretch... Michel Roelofs
06:53 AM Feature #19578: mds: optimize CDir::_omap_commit() and CDir::_committed() for large directory
We can track dirty dentries in a dirty list, CDir::_omap_commit() and CDir::_committed() only needs to check dentries... Zheng Yan
06:51 AM Feature #19578 (Resolved): mds: optimize CDir::_omap_commit() and CDir::_committed() for large di...
CDir::_omap_commit() and CDir::_committed() need to traverse whole dirfrag to find dirty dentries. It's not efficienc... Zheng Yan
03:38 AM Bug #19450 (Fix Under Review): PurgeQueue read journal crash
https://github.com/ceph/ceph/pull/14447 Zheng Yan
02:18 AM Bug #19450: PurgeQueue read journal crash
mds always first read all entries in mds log, then write new entries to it. The partial entry is detected and dropped... Zheng Yan

04/10/2017

10:58 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Some wierd issues raised with my applications (unrelated to cephfs) with 4.10.x kernel.
The same fix is in 4.9.x k...
elder one
02:19 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Upgraded cephfs clients to 4.10.9 kernel.
elder one
12:08 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Can you try updating the cephfs client to 4.10.x kernel Zheng Yan
09:59 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Error happened again:... elder one
11:48 AM Bug #19566 (Resolved): MDS crash on mgr message during shutdown
... John Spray

04/09/2017

05:30 PM Bug #19388: mount.ceph does not accept -s option
Fair point about options getting passed through.
The thing I'm not sure about here is who is going to backport a f...
John Spray
01:16 PM Bug #19388: mount.ceph does not accept -s option
mount.ceph passes options which it does not recognize directly to the kernel mount function, therefore a full impleme... Michel Roelofs
12:45 PM Bug #19450: PurgeQueue read journal crash
hit this again... Zheng Yan

04/08/2017

10:03 AM Bug #19406: MDS server crashes due to inconsistent metadata.
The master install-deps.sh installed a flood of other stuff the earlier didn't. But still get fatal errors when runni... Christoffer Lilja
09:52 AM Bug #19406: MDS server crashes due to inconsistent metadata.
I'll download latest master instead and try. Christoffer Lilja
09:41 AM Bug #19406: MDS server crashes due to inconsistent metadata.
Okay, I followed the guide at the Ceph site how to prepare and compile Ceph.
I already did run install_deps.sh in or...
Christoffer Lilja

04/07/2017

03:04 PM Feature #19551 (Fix Under Review): CephFS MDS health messages should be logged in the cluster log

This was inspired by a ceph.log from cern's testing, in which we could see mysterious-looking fsmap epoch bump mess...
John Spray
03:01 PM Feature #19551 (Resolved): CephFS MDS health messages should be logged in the cluster log
John Spray
08:37 AM Bug #19239 (In Progress): mds: stray count remains static after workflows complete
Zheng Yan
07:26 AM Bug #18850 (Rejected): Leak in MDCache::handle_dentry_unlink
Zheng Yan
05:53 AM Bug #19445 (Resolved): client: warning: ‘*’ in boolean context, suggest ‘&&’ instead #14263
The PR#14308 is merged. Jos Collin

04/06/2017

01:30 PM Bug #18850: Leak in MDCache::handle_dentry_unlink
It's likely some CInode/CDir/CDentry in cache are not properly freed when mds process exits. not caused by MDCache::h... Zheng Yan
01:03 AM Bug #19406: MDS server crashes due to inconsistent metadata.
Christoffer Lilja wrote:
> Hi,
>
> I have little spare time to spend on this and can't get past this compile erro...
Zheng Yan

04/05/2017

07:54 PM Bug #19406: MDS server crashes due to inconsistent metadata.
Hi,
I have little spare time to spend on this and can't get past this compile error due to my very limited knowled...
Christoffer Lilja
02:03 AM Bug #19406: MDS server crashes due to inconsistent metadata.
Christoffer Lilja wrote:
> I downloaded the 10.2.6 tar and tried to apply the above patch from Zheng Yan but it got ...
Zheng Yan
05:10 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Thanks for the link Zheng. Looks like that fix is in mainline as of 4.10.2. When I have a chance, I'll try upgrading ... Darrell Enns
09:13 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
elder one wrote:
> My cephfs clients (Ubuntu Xenial) are running kernel from Ubuntu PPA: http://kernel.ubuntu.com/~k...
Zheng Yan
08:24 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
My cephfs clients (Ubuntu Xenial) are running kernel from Ubuntu PPA: http://kernel.ubuntu.com/~kernel-ppa/mainline/v... elder one
07:39 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
elder one wrote:
> Hit the same bug today with 4.4.59 kernel client.
> 2 MDS servers (1 standby) 10.2.6-1 on Ubuntu...
Zheng Yan
07:36 AM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Darrell Enns wrote:
> Zheng Yan wrote:
> > probably fixed by https://github.com/ceph/ceph-client/commit/10a2699426a...
Zheng Yan
01:33 PM Bug #19501 (Fix Under Review): C_MDSInternalNoop::complete doesn't free itself
Zheng Yan
01:33 PM Bug #19501: C_MDSInternalNoop::complete doesn't free itself
https://github.com/ceph/ceph/pull/14347 Zheng Yan
01:17 PM Bug #19501 (Resolved): C_MDSInternalNoop::complete doesn't free itself
This cause memory leak Zheng Yan
09:20 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
geng jichao wrote:
> I have a question, if the file struct is destroyed,how to ensure that cache_ctl.index is correc...
Zheng Yan
05:52 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
I have a question, if the file struct is destroyed,how to ensure that cache_ctl.index is correct。
In other words,req...
geng jichao
01:32 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
kernel patch https://github.com/ceph/ceph-client/commit/b7e2eee12aa174bc91279a7cee85e9ea73092bad Zheng Yan
06:01 AM Bug #19438: ceph mds error "No space left on device"
ceph mds error "No space left on device" ,This problem I have encountered in the 10.2.6 version 。
Need to add a lin...
berlin sun
04:12 AM Bug #19450: PurgeQueue read journal crash
... Zheng Yan

04/04/2017

09:09 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Very interesting! That means we have:
4.4.59 - bug
4.7.5 - no bug (or at least rare enough that I'm not seeing it...
Darrell Enns
06:03 PM Bug #18798: FS activity hung, MDS reports client "failing to respond to capability release"
Hit the same bug today with 4.4.59 kernel client.
2 MDS servers (1 standby) 10.2.6-1 on Ubuntu Trusty.
From mds l...
elder one
09:05 PM Bug #19388: mount.ceph does not accept -s option
The kernel client has a mount helper in src/mount/mount.ceph.c -- although that only comes into play if the ceph pack... John Spray
01:50 PM Bug #19306 (Fix Under Review): fs: mount NFS to cephfs, and then ls a directory containing a larg...
https://github.com/ceph/ceph/pull/14317 Zheng Yan
01:18 PM Bug #19456: Hadoop suite failing on missing upstream tarball
I see the hadoop task is in ceph/teuthology - would it make sense to move it to ceph/ceph? Nathan Cutler
12:44 PM Backport #19483 (Resolved): kraken: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" co...
https://github.com/ceph/ceph/pull/14573 Nathan Cutler
12:44 PM Backport #19482 (Resolved): jewel: No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" com...
https://github.com/ceph/ceph/pull/14674 Nathan Cutler
12:42 PM Backport #19466 (Resolved): jewel: mds: log rotation doesn't work if mds has respawned
https://github.com/ceph/ceph/pull/14673 Nathan Cutler
01:03 AM Bug #19445 (In Progress): client: warning: ‘*’ in boolean context, suggest ‘&&’ instead #14263
https://github.com/ceph/ceph/pull/14308 Brad Hubbard

04/03/2017

09:29 PM Bug #19456: Hadoop suite failing on missing upstream tarball
Looks like v2.5.2 is too old. It is now in the Apache archive: https://archive.apache.org/dist/hadoop/core/hadoop-2.5... Ken Dreyer
08:20 PM Bug #19456 (Rejected): Hadoop suite failing on missing upstream tarball
This is jewel v10.2.7 point release candidate
Also true on all branches
Run: http://pulpito.ceph.com/yuriw-2017-0...
Yuri Weinstein
08:15 PM Bug #19388: mount.ceph does not accept -s option
Looking at it, it seems I'll have to update the Linux kernel as well to support the sloppy option. Doing so in a sane... Michel Roelofs
12:53 PM Bug #19306 (In Progress): fs: mount NFS to cephfs, and then ls a directory containing a large num...
Zheng Yan
12:44 PM Bug #19450 (Resolved): PurgeQueue read journal crash
... Zheng Yan
11:16 AM Bug #19437 (Fix Under Review): fs:The mount point break off when mds switch hanppened.
https://github.com/ceph/ceph/pull/14267 John Spray

04/02/2017

07:30 AM Bug #19445 (Resolved): client: warning: ‘*’ in boolean context, suggest ‘&&’ instead #14263
The following warning appears during make. Here roll_die and empty() returns bool. This can be easily fixed by removi... Jos Collin

04/01/2017

06:15 PM Bug #19406: MDS server crashes due to inconsistent metadata.
I downloaded the 10.2.6 tar and tried to apply the above patch from Zheng Yan but it got rejected. I guess it's writt... Christoffer Lilja

03/31/2017

11:49 PM Bug #19291 (Pending Backport): mds: log rotation doesn't work if mds has respawned
The bug is also in jewel 10.2.6, due to 6efad699249ba7c6928193dba111dbb23b606beb. Patrick Donnelly
10:41 AM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
The log says "objects: max 100" but the default of client_oc_max_objects is 1000, so I assumed there was some manual ... John Spray
10:32 AM Bug #19438: ceph mds error "No space left on device"
Hmm, so fragmentation is enabled but apparently isn't happening? Is your ceph.conf with "mds_bal_frag = true" presen... John Spray
10:17 AM Bug #19438 (Won't Fix): ceph mds error "No space left on device"
Through testing the bash script for MDS cluster, create a test directory under the ceph mount path, in the test direc... william sheng
05:04 AM Bug #19437 (Resolved): fs:The mount point break off when mds switch hanppened.

My ceph version is jewel and my cluster have two nodes. I start two mds, one active and the other is hot-standby mo...
Ivan Guan
02:58 AM Bug #18579 (Fix Under Review): Fuse client has "opening" session to nonexistent MDS rank after MD...
should be fixed in https://github.com/ceph/ceph/pull/13698 Zheng Yan

03/30/2017

02:33 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
I don't think I have changed the defaults. How can I see the values I use? Arne Wiebalck
12:35 PM Bug #19426: knfs blogbench hang
I got several oops at other places. seem like random memory corruption, It only happens when cephfs is exported by nf... Zheng Yan
09:57 AM Bug #19426: knfs blogbench hang
... Zheng Yan
08:34 AM Bug #19426 (Can't reproduce): knfs blogbench hang
http://pulpito.ceph.com/yuriw-2017-03-24_21:44:17-knfs-jewel-integration-ktdreyer-testing-basic-smithi/941067/
http:...
Zheng Yan
01:23 AM Bug #19406: MDS server crashes due to inconsistent metadata.
... Zheng Yan

03/29/2017

07:09 PM Bug #19406: MDS server crashes due to inconsistent metadata.
I don't have a build environment nor the Ceph source code on the machines.
I'll download the source code, apply the ...
Christoffer Lilja
06:45 PM Bug #19406: MDS server crashes due to inconsistent metadata.
Looking at what DataScan.cc currently does, looks like we were injecting things with hash set to zero, so it would pi... John Spray
02:02 PM Bug #19406: MDS server crashes due to inconsistent metadata.
looks like ~mds0's dir_layout.dl_dir_hash is wrong. Please check if below patch can prevent mds from crash... Zheng Yan
01:28 PM Bug #19406: MDS server crashes due to inconsistent metadata.
The logfile contains file structure that in it self is sensitive information, that's why it's truncated.
I'll see if...
Christoffer Lilja
01:01 PM Bug #19406: MDS server crashes due to inconsistent metadata.
To work out how the metadata is now corrupted (i.e. what the recovery tools broke), it would be useful to have the fu... John Spray
12:43 PM Bug #19406: MDS server crashes due to inconsistent metadata.
Can you go back to your logs from when the original metadata damage occurred? There should be a message in your clus... John Spray
10:55 AM Bug #19406: MDS server crashes due to inconsistent metadata.
I see now that the two other MDS servers crash the same way. Christoffer Lilja
10:44 AM Bug #19406: MDS server crashes due to inconsistent metadata.
A NIC broke and cause tremendous load on the server. Shortly after I had to kill the server by power, note that this ... Christoffer Lilja
10:15 AM Bug #19406: MDS server crashes due to inconsistent metadata.
Please also tell us what else was going on around this event. Had you recently upgraded? Had another daemon recentl... John Spray
09:43 AM Bug #19406: MDS server crashes due to inconsistent metadata.
please set 'debug_mds = 20', restart the mds and upload the mds.log Zheng Yan
07:57 AM Bug #19406 (Resolved): MDS server crashes due to inconsistent metadata.
-48> 2017-03-26 23:34:38.886779 7f4b95030700 5 mds.neutron handle_mds_map epoch 620 from mon.1
-47> 2017-03-2...
Christoffer Lilja

03/28/2017

10:21 PM Bug #19343: OOM kills ceph-fuse: objectcacher doesn't seem to respect its max objects limits (jew...
Hmm.
I've tried setting some comically low limits:...
John Spray
09:09 PM Bug #19401 (Fix Under Review): MDS goes readonly writing backtrace for a file whose data pool has...
So on reflection I realise that the deletion/recovery cases are not an issue because those code paths already handle ... John Spray
05:28 PM Bug #19401 (Resolved): MDS goes readonly writing backtrace for a file whose data pool has been re...
Reproduce:
1. create a pool
2. add it as a data pool
3. set that pool in a layout on the client, and write a file
...
John Spray
09:05 PM Bug #19395: "Too many inodes in cache" warning can happen even when trimming is working
Couple of observations from today:
* after an mds failover, the issue cleared and we're back to having "ino" (CIno...
John Spray
02:55 PM Bug #19395: "Too many inodes in cache" warning can happen even when trimming is working
Cache dumps in /root/19395 on mira060 John Spray
01:36 PM Bug #19395 (Resolved): "Too many inodes in cache" warning can happen even when trimming is working
John Spray
01:29 PM Bug #19291 (Resolved): mds: log rotation doesn't work if mds has respawned
John Spray
01:29 PM Bug #19282 (Resolved): RecoveryQueue::_recovered asserts out on OSD errors
John Spray
01:28 PM Fix #19288 (Resolved): Remove legacy "mds tell"
John Spray
01:27 PM Bug #16709 (Pending Backport): No output for "ceph mds rmfailed 0 --yes-i-really-mean-it" command
John Spray
01:25 PM Feature #17604 (Resolved): MDSMonitor: raise health warning when there are no standbys but there ...
John Spray

03/27/2017

05:26 PM Bug #19388: mount.ceph does not accept -s option
OK, if you're motivated to do this (and update the pull request to implement -s rather than dropping it) then I don't... John Spray
05:24 PM Bug #19388: mount.ceph does not accept -s option
I wasn't aware of this fix in autofs, thanks for pointing me to it.
However, several mount programs accept the -s ...
Michel Roelofs
10:32 AM Bug #19388: mount.ceph does not accept -s option
According to the release notes, autofs 5.1.1 includes a fix to avoid passing -s to filesystems other than NFS (https:... John Spray
03:39 AM Bug #19388 (Fix Under Review): mount.ceph does not accept -s option
https://github.com/ceph/ceph/pull/14158 Kefu Chai
11:58 AM Bug #16842: mds: replacement MDS crashes on InoTable release
A patch that should allow MDSs to throw out the invalid sessions and start up. Does not fix whatever got the system ... John Spray

03/26/2017

12:31 PM Bug #19388 (Closed): mount.ceph does not accept -s option
The mount.ceph tool does not accept the -s (sloppy) option. On Debian Jessie (and possibly more distributions), autof... Michel Roelofs

03/23/2017

01:35 PM Feature #18509 (Fix Under Review): MDS: damage reporting by ino number is useless
John Spray
01:30 PM Feature #18509: MDS: damage reporting by ino number is useless
https://github.com/ceph/ceph/pull/14104 John Spray
02:16 AM Feature #19362 (Resolved): mds: add perf counters for each type of MDS operation
It's desirable to know the types of operations are being performed on the MDS so we have a better idea over time what... Patrick Donnelly

03/22/2017

06:31 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
I have used the offset parameter of the ceph_dir_llseek function, and it will be passed to mds in readdir request,if ... geng jichao
03:38 AM Bug #19306: fs: mount NFS to cephfs, and then ls a directory containing a large number of files, ...
This bug is not specific to kernel client. Enabling directory fragments can help. The complete fix is make client enc... Zheng Yan
 

Also available in: Atom