Activity
From 12/07/2021 to 01/05/2022
01/05/2022
- 12:34 PM Backport #53444: octopus: qa: "dd: error reading '/sys/kernel/debug/ceph/2a934501-6731-4052-a836-...
- Jeff Layton wrote:
> I don't think this is applicable to Octopus (unless the relevant tests also get backported).
... - 12:31 PM Backport #53777 (Resolved): pacific: fs perf stats command crashes
- https://github.com/ceph/ceph/pull/44516
- 12:31 PM Backport #53776 (Resolved): octopus: fs perf stats command crashes
- 12:28 PM Bug #51964: qa: test_cephfs_mirror_restart_sync_on_blocklist failure
- Unable to hit this consistently: https://pulpito.ceph.com/vshankar-2022-01-05_05:43:45-fs-wip-vshankar-testing-202201...
- 05:35 AM Bug #51964: qa: test_cephfs_mirror_restart_sync_on_blocklist failure
- The mirror daemon is stuck at mounting the file system::...
- 12:26 PM Bug #48473 (Pending Backport): fs perf stats command crashes
- 12:16 PM Feature #50235 (Resolved): allow cephfs-shell to mount named filesystems
- 10:31 AM Bug #53509 (Fix Under Review): quota support for subvolumegroup
01/04/2022
- 04:41 PM Bug #53753 (Duplicate): mds: crash (assert hit) when merging dirfrags
- 04:39 PM Bug #53753: mds: crash (assert hit) when merging dirfrags
- > mds_oft_prefetch_dirfrags is by default true which means we were already testing it in the past
Thanks. This is ... - 04:28 PM Bug #53765 (Fix Under Review): mount helper mangles the new syntax device string by qualifying th...
- 04:20 PM Bug #53765 (Resolved): mount helper mangles the new syntax device string by qualifying the name
- The new mount syntax allows you to add mounts with a syntax like this:
admin@.test=/
...but then reports th... - 09:50 AM Backport #53761 (Resolved): pacific: mds: mds_oft_prefetch_dirfrags = false is not qa tested
- https://github.com/ceph/ceph/pull/44504
- 09:45 AM Fix #52591 (Pending Backport): mds: mds_oft_prefetch_dirfrags = false is not qa tested
- 09:45 AM Bug #44916 (Resolved): client: syncfs flush is only fast with a single MDS
- 09:15 AM Backport #53760 (Resolved): pacific: snap scheduler: cephfs snapshot schedule status doesn't list...
- https://github.com/ceph/ceph/pull/45906
- 09:15 AM Backport #53759 (Resolved): pacific: mds: heartbeat timeout by _prefetch_dirfrags during up:rejoin
- https://github.com/ceph/ceph/pull/44551
- 09:10 AM Bug #52642 (Pending Backport): snap scheduler: cephfs snapshot schedule status doesn't list the s...
- 09:10 AM Bug #53521 (Pending Backport): mds: heartbeat timeout by _prefetch_dirfrags during up:rejoin
- 09:09 AM Feature #46166 (Resolved): mds: store symlink target as xattr in data pool inode for disaster rec...
- 03:24 AM Bug #53750 (Fix Under Review): mds: FAILED ceph_assert(mut->is_wrlocked(&pin->filelock))
01/03/2022
- 05:15 PM Backport #53444: octopus: qa: "dd: error reading '/sys/kernel/debug/ceph/2a934501-6731-4052-a836-...
- I don't think this is applicable to Octopus (unless the relevant tests also get backported).
- 05:14 PM Backport #53443: pacific: qa: "dd: error reading '/sys/kernel/debug/ceph/2a934501-6731-4052-a836-...
- I don't think this is applicable to Pacific (unless the relevant tests also get backported).
- 05:14 PM Bug #53214: qa: "dd: error reading '/sys/kernel/debug/ceph/2a934501-6731-4052-a836-f42229a869be.c...
- Yes. Please do close them out. No need to backport this to those.
- 09:12 AM Bug #53753 (Duplicate): mds: crash (assert hit) when merging dirfrags
- This run: https://pulpito.ceph.com/vshankar-2021-12-22_07:37:44-fs-wip-vshankar-testing-20211216-114012-testing-defau...
12/31/2021
- 01:21 PM Bug #53750: mds: FAILED ceph_assert(mut->is_wrlocked(&pin->filelock))
- This is reproducable very easily by using the following scripts in two different teriminal:...
- 01:14 PM Bug #53750 (In Progress): mds: FAILED ceph_assert(mut->is_wrlocked(&pin->filelock))
- 01:12 PM Bug #53750 (Resolved): mds: FAILED ceph_assert(mut->is_wrlocked(&pin->filelock))
- ...
- 09:51 AM Feature #53730: ceph-fuse: suppor "entry_timeout" and "attr_timeout" options for improve performance
- Sheng Xie wrote:
> I will modify it gracefully as soon as possible after fully understanding this logic but it may t... - 09:11 AM Feature #53730: ceph-fuse: suppor "entry_timeout" and "attr_timeout" options for improve performance
- I will modify it gracefully as soon as possible after fully understanding this logic but it may take some time.
12/30/2021
- 05:32 AM Feature #53730: ceph-fuse: suppor "entry_timeout" and "attr_timeout" options for improve performance
- Sheng Xie wrote:
> Xiubo Li wrote:
> [...]
> Yes, I know this.
>
> This is the issue_Caps() code snippet before... - 03:45 AM Feature #53730: ceph-fuse: suppor "entry_timeout" and "attr_timeout" options for improve performance
- Xiubo Li wrote:...
- 02:40 AM Feature #53730: ceph-fuse: suppor "entry_timeout" and "attr_timeout" options for improve performance
- Sheng Xie wrote:
> Thank you for your prompt.
> I spent some time learning the logic of MDS caps, although these co... - 02:25 AM Feature #53730: ceph-fuse: suppor "entry_timeout" and "attr_timeout" options for improve performance
- Thank you for your prompt.
I spent some time learning the logic of MDS caps, although these contents are complex and... - 03:48 AM Bug #51705: qa: tasks.cephfs.fuse_mount:mount command failed
- Not very sure will the above bug could lead to the auth fail issue:...
12/29/2021
- 08:32 AM Bug #53741 (Resolved): crash just after MDS become active
- FAILED ceph_assert(lock->get_state() == LOCK_PRE_SCAN) at mds/Locker.cc:5682...
- 04:38 AM Bug #51705 (Fix Under Review): qa: tasks.cephfs.fuse_mount:mount command failed
- 03:10 AM Bug #51705 (In Progress): qa: tasks.cephfs.fuse_mount:mount command failed
- ...
- 01:05 AM Bug #53724: mds: stray directories are not purged when all past parents are clear
- Sorry, didn't carefully read the description, already renewed it :-)
- 01:03 AM Bug #53724 (New): mds: stray directories are not purged when all past parents are clear
- 01:02 AM Bug #53724 (In Progress): mds: stray directories are not purged when all past parents are clear
12/28/2021
- 04:03 AM Feature #53730: ceph-fuse: suppor "entry_timeout" and "attr_timeout" options for improve performance
- Sheng Xie wrote:
> I tested case C. as you guessed, A-node can still see file0 after B-node deletes file0. but when ... - 02:51 AM Feature #53730: ceph-fuse: suppor "entry_timeout" and "attr_timeout" options for improve performance
- I tested case C. as you guessed, A-node can still see file0 after B-node deletes file0. but when 'll /cephfs' is exec...
12/27/2021
- 06:00 PM Backport #53736 (Resolved): pacific: mds: recursive scrub does not trigger stray reintegration
- https://github.com/ceph/ceph/pull/44514
- 06:00 PM Backport #53735 (Rejected): octopus: mds: recursive scrub does not trigger stray reintegration
- https://github.com/ceph/ceph/pull/44657
- 05:57 PM Bug #53641 (Pending Backport): mds: recursive scrub does not trigger stray reintegration
- 12:33 PM Feature #53730: ceph-fuse: suppor "entry_timeout" and "attr_timeout" options for improve performance
- Could u try:
case C:
A-node: 'touch /cephfs/file0; ll /cephfs'
B-node: 'ln /cephfs/file0 /cephfs/file1; rm /ceph... - 07:54 AM Feature #53730: ceph-fuse: suppor "entry_timeout" and "attr_timeout" options for improve performance
- Xiubo Li wrote:
> Why do you think the default value is 1s ? Should it be 0 for ceph-fuse. More detail please see ... - 04:13 AM Feature #53730: ceph-fuse: suppor "entry_timeout" and "attr_timeout" options for improve performance
- xie sheng wrote:
> Sorry, I didn't test kclient I just compared the performance of ceph-fuse mount point before and ... - 03:42 AM Feature #53730: ceph-fuse: suppor "entry_timeout" and "attr_timeout" options for improve performance
- Sorry, I didn't test kclient I just compared the performance of ceph-fuse mount point before and after setting 'timeo...
- 03:07 AM Feature #53730: ceph-fuse: suppor "entry_timeout" and "attr_timeout" options for improve performance
- BTW, have you test fuse vs kclient at the same time by setting the `entry_timeout` and `attr_timeout` ?...
- 02:12 AM Feature #53730 (Fix Under Review): ceph-fuse: suppor "entry_timeout" and "attr_timeout" options f...
- I noticed that use ceph-fuse to mount to a directory has lower performance than use the kernel mode to mount to a dir...
12/24/2021
- 06:52 AM Bug #53726 (Fix Under Review): mds: crash when `ceph tell mds.0 dump tree ''`
- 04:40 AM Bug #53726 (Resolved): mds: crash when `ceph tell mds.0 dump tree ''`
- ...
- 01:58 AM Bug #53724 (Pending Backport): mds: stray directories are not purged when all past parents are clear
- (Note to experienced CephFS devs: let's save this for a newer dev!)
If a directory is not purged because it has pa...
12/23/2021
- 01:55 PM Backport #53715 (Resolved): octopus: mds: fails to reintegrate strays if destdn's directory is fu...
- https://github.com/ceph/ceph/pull/44668
- 01:55 PM Backport #53714 (Resolved): pacific: mds: fails to reintegrate strays if destdn's directory is fu...
- https://github.com/ceph/ceph/pull/44513
- 01:52 PM Bug #53619 (Pending Backport): mds: fails to reintegrate strays if destdn's directory is full (EN...
12/22/2021
12/21/2021
- 10:48 AM Bug #53509: quota support for subvolumegroup
- Introducing subvolumegroup quotas hits this known issue [1]. The issue is hit while removing subvolume
under the sub... - 10:46 AM Bug #53509 (In Progress): quota support for subvolumegroup
- 06:49 AM Bug #53597 (Fix Under Review): mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_v...
12/16/2021
- 07:32 PM Bug #53641 (Fix Under Review): mds: recursive scrub does not trigger stray reintegration
- 03:44 PM Bug #53641 (Resolved): mds: recursive scrub does not trigger stray reintegration
- One might think using recursive scrub would load a dentry and trigger reintegration, but it does not. Presently the c...
- 06:34 PM Bug #53649 (New): allow teuthology to create more than one named filesystem
- Varsha had asked that I create a test for this PR:
https://github.com/ceph/ceph/pull/44279
...but it wasn't... - 04:48 PM Bug #53645 (New): MDCache::shutdown_pass: ceph_assert(!migrator->is_importing())
- I'm running a pinning/multimds thrash test (see stressfs.sh attached) on a 3 node test cluster and occasionally seein...
- 01:42 PM Bug #53623 (Fix Under Review): mds: LogSegment will only save one ESubtreeMap event if the ESubtr...
- 02:24 AM Bug #53623 (Fix Under Review): mds: LogSegment will only save one ESubtreeMap event if the ESubtr...
- ...
- 05:35 AM Bug #53459 (Won't Fix): mds: start a new MDLog segment if new coming event possibly exceeds the e...
- Will fix it in another tracker https://tracker.ceph.com/issues/53623. Closing this one.
- 02:44 AM Bug #53542: Ceph Metadata Pool disk throughput usage increasing
- I have figured out one case may could cause this, please see the tracker https://tracker.ceph.com/issues/53623.
Ju... - 02:20 AM Bug #40002: mds: not trim log under heavy load
More logs:...
12/15/2021
- 05:58 PM Bug #53615 (Fix Under Review): qa: upgrade test fails with "timeout expired in wait_until_healthy"
- 05:51 PM Bug #53615: qa: upgrade test fails with "timeout expired in wait_until_healthy"
- regression caused by fix for #51984
- 10:11 AM Bug #53615 (Resolved): qa: upgrade test fails with "timeout expired in wait_until_healthy"
- https://pulpito.ceph.com/vshankar-2021-12-15_07:13:38-fs-master-testing-default-smithi/6563822/
The test reached a... - 04:05 PM Bug #53619 (Fix Under Review): mds: fails to reintegrate strays if destdn's directory is full (EN...
- 03:00 PM Bug #53619 (Resolved): mds: fails to reintegrate strays if destdn's directory is full (ENOSPC)
- This should work because no stray needs created and the directory's size will not increase.
- 03:00 PM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- > Do you have the logs when the last inode disappeared ?
I got the log for inode 0x20006fdf4cf being purged. log l... - 05:02 AM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- 玮文 胡 wrote:
> Xiubo Li wrote:
> > Do you have the logs when the last inode disappeared ?
>
> No, I only have deb... - 04:56 AM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- Xiubo Li wrote:
> Do you have the logs when the last inode disappeared ?
No, I only have debug_mds set to 1/5. I ... - 04:48 AM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- 玮文 胡 wrote:
> Xiubo Li wrote:
> > BTW, could your lasted of `ceph fs status` ?
>
> I don't quite understand thi... - 04:37 AM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- Xiubo Li wrote:
> BTW, could your lasted of `ceph fs status` ?
I don't quite understand this. the output don't c... - 02:34 AM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- 玮文 胡 wrote:
> BTW, is there any way to traverse all the inodes in the stray dir, so that I can find out all such sta... - 01:00 AM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- 玮文 胡 wrote:
> The inode 0x200065b309d has gone, I don't know how. But I got another inode that crashes the rank 1. I... - 10:39 AM Bug #40002: mds: not trim log under heavy load
- Xiubo Li wrote:
> There has one case that could lead the journal logs to fill the metadata pool full, such as in cas... - 03:39 AM Bug #53611 (Triaged): mds,client: can not identify pool id if pool name is positive integer when ...
- ...
12/14/2021
- 05:29 PM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- Though almost identical, here are the logs before the crash in the new case....
- 05:25 PM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- BTW, is there any way to traverse all the inodes in the stray dir, so that I can find out all such stall caps in one ...
- 05:18 PM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- The inode 0x200065b309d has gone, I don't know how. But I got another inode that crashes the rank 1. It is very simil...
- 07:39 AM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- There is one thing that looks strange to me. It is rank 1 that wants to export the inode to rank 0. But when I issue ...
- 07:19 AM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- Xiubo Li wrote:
> 玮文 胡 wrote:
> > This dir should have been deleted about one month ago. Just found that one client... - 06:51 AM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- 玮文 胡 wrote:
> This dir should have been deleted about one month ago. Just found that one client is still holding a c... - 05:49 AM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- Xiubo Li wrote:
> Are you using the fuse client or kclient ?
Both. But I believe only kclient have ever accessed ... - 05:37 AM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- Are you using the fuse client or kclient ?
- 03:42 AM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- This dir should have been deleted about one month ago. Just found that one client is still holding a cap on it. And I...
- 02:57 AM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- The dir "200065b309d/" is already located in the stray/, I think it's queued for being purging, will it be possible t...
- 02:29 AM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- If I didn't miss something important the system dirs shouldn't be migrated in theory.
- 01:58 AM Bug #53597: mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- Attached the logs after setting debug_mds to 1/20. This may be the most interesting part:...
- 10:47 AM Bug #53601 (Fix Under Review): vstart_runner: Running test_data_scan test locally fails with trac...
- 10:35 AM Bug #53601 (Resolved): vstart_runner: Running test_data_scan test locally fails with tracebacks
- Following tracebacks are seen
1.... - 10:08 AM Bug #40002: mds: not trim log under heavy load
- There has one case that could lead the journal logs to fill the metadata pool full, such as in case of tracker #53597...
- 01:06 AM Bug #44988 (Duplicate): client: track dirty inodes in a per-session list for effective cap flushing
12/13/2021
- 06:55 PM Bug #44100: cephfs rsync kworker high load.
- We've recently started using cephfs snapshots and are running into a similar issue with the kernel client. It seems ...
- 05:25 PM Bug #53597 (Resolved): mds: FAILED ceph_assert(dir->get_projected_version() == dir->get_version())
- ...
- 03:37 PM Backport #53445 (In Progress): pacific: mds: opening connection to up:replay/up:creating daemon c...
- 01:38 PM Bug #53521 (Fix Under Review): mds: heartbeat timeout by _prefetch_dirfrags during up:rejoin
- 01:38 PM Bug #53542 (Triaged): Ceph Metadata Pool disk throughput usage increasing
- 01:15 PM Fix #52824 (Closed): qa: skip internal metadata directory when scanning ceph debugfs directory
- Not relevant anymore.
- 01:12 PM Feature #49942 (Resolved): cephfs-mirror: enable running in HA
- 01:12 PM Feature #50372 (Resolved): test: Implement cephfs-mirror trasher test for HA active/active
- 08:30 AM Bug #53509: quota support for subvolumegroup
- Elaborating a bit more on the restriction mentioned by Ramana in the first comment.
If the quota is set on the sub...
12/10/2021
- 02:49 PM Feature #50235 (Fix Under Review): allow cephfs-shell to mount named filesystems
- 11:13 AM Bug #53542: Ceph Metadata Pool disk throughput usage increasing
- We are considering to increase the "activity" based thresholds to see if we get less metadata IO.
We were actually... - 10:06 AM Bug #53542: Ceph Metadata Pool disk throughput usage increasing
- Thanks for the reply, we tried decreasing the mds_log_max_segments option, but didn't really notice a difference.
... - 09:18 AM Bug #53542: Ceph Metadata Pool disk throughput usage increasing
- In heavy load case, the MDLog could accumulate many journal log events and could be submit in batch to metadata pool ...
- 08:55 AM Bug #53542: Ceph Metadata Pool disk throughput usage increasing
- We also did a dump of the objecter_requests and it seems there are some large objects written by the mds-es?
ceph ... - 10:08 AM Backport #53332 (In Progress): pacific: ceph-fuse seems to need root permissions to mount (ceph-f...
- 10:01 AM Backport #53331 (In Progress): octopus: ceph-fuse seems to need root permissions to mount (ceph-f...
- 09:12 AM Backport #53444 (In Progress): octopus: qa: "dd: error reading '/sys/kernel/debug/ceph/2a934501-6...
- 08:58 AM Backport #52951 (Rejected): octopus: qa: skip internal metadata directory when scanning ceph debu...
- Not required since the relevant files are under sysfs.
- 08:58 AM Backport #52950 (Rejected): pacific: qa: skip internal metadata directory when scanning ceph debu...
- Not required since the relevant files are under sysfs.
- 04:46 AM Backport #52952 (Resolved): pacific: mds: crash when journaling during replay
12/09/2021
- 09:03 PM Bug #53574 (New): qa: downgrade testing of MDS/mons in minor releases
- Verify that an older mon/MDS can be brought up and can decode on-disk structures normally.
- 08:59 PM Bug #53573 (Resolved): qa: test new clients against older Ceph clusters
- Confirm that e.g. a Quincy client can still mount/use a Pacific CephFS cluster.
- 04:34 PM Bug #53487 (Resolved): qa: mount error 22 = Invalid argument
- 09:43 AM Documentation #53558 (New): Document cephfs recursive accounting
- I cannot find any user documentation for cephfs recursive accounting.
We should add something similar to https://b... - 04:44 AM Bug #44916 (Fix Under Review): client: syncfs flush is only fast with a single MDS
- 01:52 AM Bug #51956 (Resolved): mds: switch to use ceph_assert() instead of assert()
- 01:46 AM Bug #53504: client: infinite loop "got ESTALE" after mds recovery
- Dan van der Ster wrote:
> Xiubo Li wrote:
> > Dan van der Ster wrote:
> > > more of the client log is attached. (f...
12/08/2021
- 05:05 PM Bug #53542 (Fix Under Review): Ceph Metadata Pool disk throughput usage increasing
- Hi All,
We have been observing that if we let our MDS run for some time, the bandwidth usage of the disks in the m... - 02:22 PM Bug #53509: quota support for subvolumegroup
- Thanks, Ramana for the follow-up and validation of the use case.
- 08:10 AM Bug #53504: client: infinite loop "got ESTALE" after mds recovery
- Xiubo Li wrote:
> Dan van der Ster wrote:
> > more of the client log is attached. (from yesterday)
> >
> > do yo... - 08:04 AM Bug #53521 (Resolved): mds: heartbeat timeout by _prefetch_dirfrags during up:rejoin
- This timeout issue happens with v14.2.19. It may also be reproduced in the latest version.
2021-12-05 20:42:13.472... - 06:33 AM Bug #53520 (New): mds: put both fair mutex MDLog::submit_mutex and mds_lock to test under heavy load
- The related trackers:
MDLog::submit_mutex: https://tracker.ceph.com/issues/40002
mds_lock: https://tracker.ceph.c... - 01:58 AM Bug #53459: mds: start a new MDLog segment if new coming event possibly exceeds the expected segm...
- Yeah, by creating a number of directories and set the distributed pin on each of them, the ESubtreeMap event can reac...
12/07/2021
- 10:02 PM Bug #53509: quota support for subvolumegroup
- As per Venky, we need to keep in mind the following limitation with CephFS quotas:
"Quotas must be configured carefu... - 01:31 PM Bug #53509 (Resolved): quota support for subvolumegroup
- Today, we can apply quota to individual subvolume. However when working on a multi-tenant environment, the storage ad...
- 01:25 PM Bug #53504: client: infinite loop "got ESTALE" after mds recovery
- Dan van der Ster wrote:
> more of the client log is attached. (from yesterday)
>
> do you still need mds? which d... - 08:16 AM Bug #53504: client: infinite loop "got ESTALE" after mds recovery
- more of the client log is attached. (from yesterday)
do you still need mds? which debug level? - 06:26 AM Bug #53504 (Fix Under Review): client: infinite loop "got ESTALE" after mds recovery
- 02:28 AM Bug #53504: client: infinite loop "got ESTALE" after mds recovery
- Dan van der Ster wrote:
> Cluster had max_mds 3 at the time of those logs. It's running 14.2.22 -- we didn't upgrade... - 02:20 AM Bug #53504: client: infinite loop "got ESTALE" after mds recovery
- Cluster had max_mds 3 at the time of those logs. It's running 14.2.22 -- we didn't upgrade; the latest recovery was f...
- 12:37 AM Bug #53504: client: infinite loop "got ESTALE" after mds recovery
- BTW, what's the `max_mds` in your setups ? And how many up MDSes after you upgraded ? It seems there only has one.
Also available in: Atom