Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2023-10-18T12:24:10ZCeph
Redmine CephFS - Bug #63233 (New): mon|client|mds: valgrind reports possible leaks in the MDShttps://tracker.ceph.com/issues/632332023-10-18T12:24:10ZVenky Shankarvshankar@redhat.com
<p>/a/vshankar-2023-10-14_01:51:22-fs-wip-vshankar-testing-20231013.093215-testing-default-smithi/7427332</p>
<pre>
2023-10-16T04:24:05.833 DEBUG:teuthology.orchestra.run.smithi031:> sudo zgrep '<kind>' /var/log/ceph/valgrind/* /dev/null | sort | uniq
2023-10-16T04:24:05.842 DEBUG:teuthology.orchestra.run.smithi062:> sudo zgrep '<kind>' /var/log/ceph/valgrind/* /dev/null | sort | uniq
2023-10-16T04:24:05.886 INFO:teuthology.orchestra.run.smithi062.stdout:/var/log/ceph/valgrind/mds.b.log: <kind>Leak_PossiblyLost</kind>
2023-10-16T04:24:05.886 INFO:teuthology.orchestra.run.smithi062.stdout:/var/log/ceph/valgrind/mds.d.log: <kind>Leak_PossiblyLost</kind>
2023-10-16T04:24:05.887 INFO:teuthology.orchestra.run.smithi062.stdout:/var/log/ceph/valgrind/mds.f.log: <kind>Leak_PossiblyLost</kind>
2023-10-16T04:24:05.887 INFO:teuthology.orchestra.run.smithi062.stdout:/var/log/ceph/valgrind/mon.b.log: <kind>Leak_StillReachable</kind>
2023-10-16T04:24:05.887 INFO:teuthology.orchestra.run.smithi062.stdout:/var/log/ceph/valgrind/mon.c.log: <kind>Leak_StillReachable</kind>
2023-10-16T04:24:05.978 INFO:teuthology.orchestra.run.smithi031.stdout:/var/log/ceph/valgrind/mds.a.log: <kind>Leak_PossiblyLost</kind>
2023-10-16T04:24:05.978 INFO:teuthology.orchestra.run.smithi031.stdout:/var/log/ceph/valgrind/mds.c.log: <kind>Leak_PossiblyLost</kind>
2023-10-16T04:24:05.978 INFO:teuthology.orchestra.run.smithi031.stdout:/var/log/ceph/valgrind/mon.a.log: <kind>Leak_StillReachable</kind>
2023-10-16T04:24:05.979 DEBUG:tasks.ceph:file /var/log/ceph/valgrind/mds.a.log kind <kind>Leak_PossiblyLost</kind>
2023-10-16T04:24:05.980 DEBUG:tasks.ceph:file /var/log/ceph/valgrind/mds.c.log kind <kind>Leak_PossiblyLost</kind>
2023-10-16T04:24:05.980 DEBUG:tasks.ceph:file /var/log/ceph/valgrind/mon.a.log kind <kind>Leak_StillReachable</kind>
2023-10-16T04:24:05.980 ERROR:tasks.ceph:saw valgrind issue <kind>Leak_StillReachable</kind> in /var/log/ceph/valgrind/mon.a.log
2023-10-16T04:24:05.980 DEBUG:tasks.ceph:file /var/log/ceph/valgrind/mds.b.log kind <kind>Leak_PossiblyLost</kind>
2023-10-16T04:24:05.981 DEBUG:tasks.ceph:file /var/log/ceph/valgrind/mds.d.log kind <kind>Leak_PossiblyLost</kind>
2023-10-16T04:24:05.981 DEBUG:tasks.ceph:file /var/log/ceph/valgrind/mds.f.log kind <kind>Leak_PossiblyLost</kind>
2023-10-16T04:24:05.981 DEBUG:tasks.ceph:file /var/log/ceph/valgrind/mon.b.log kind <kind>Leak_StillReachable</kind>
2023-10-16T04:24:05.981 ERROR:tasks.ceph:saw valgrind issue <kind>Leak_StillReachable</kind> in /var/log/ceph/valgrind/mon.b.log
2023-10-16T04:24:05.981 DEBUG:tasks.ceph:file /var/log/ceph/valgrind/mon.c.log kind <kind>Leak_StillReachable</kind>
2023-10-16T04:24:05.982 ERROR:tasks.ceph:saw valgrind issue <kind>Leak_StillReachable</kind> in /var/log/ceph/valgrind/mon.c.log
</pre> CephFS - Feature #63191 (New): tools/cephfs: provide an estimate completion time for offline toolshttps://tracker.ceph.com/issues/631912023-10-13T05:56:37ZVenky Shankarvshankar@redhat.com
<p>Especially, cephfs-data-scan which scales to the number of objects in the data pool - scan_{extents,inodes} step and scan_links which iterates objects in the metadata pool twice.</p> CephFS - Backport #62333 (New): quincy: MDSAuthCaps: minor improvementshttps://tracker.ceph.com/issues/623332023-08-04T19:52:01ZRishabh DaveCephFS - Bug #62126 (New): test failure: suites/blogbench.sh stops runninghttps://tracker.ceph.com/issues/621262023-07-24T12:39:03ZRishabh Dave
<p>I found this failure during running integration tests for few CephFS PRs. This failure occurred even after running the buggy PR from the batch of PRs that were under testing. And this PR even occurred against main branch (IOW, no PRs were on the testing branch this time).<br /><a class="external" href="https://pulpito.ceph.com/rishabh-2023-07-21_21:30:53-fs-wip-rishabh-2023Jul13-base-2-testing-default-smithi/7347715">https://pulpito.ceph.com/rishabh-2023-07-21_21:30:53-fs-wip-rishabh-2023Jul13-base-2-testing-default-smithi/7347715</a></p>
<pre>
2023-07-21T22:18:55.848 INFO:tasks.workunit:Stopping ['suites/blogbench.sh'] on client.0...
</pre>
<pre>
2023-07-21T22:15:30.678 INFO:tasks.workunit.client.0.smithi073.stdout:
2023-07-21T22:15:30.678 INFO:tasks.workunit.client.0.smithi073.stdout:Final score for writes: 425
2023-07-21T22:15:30.678 INFO:tasks.workunit.client.0.smithi073.stdout:Final score for reads : 1132026
2023-07-21T22:15:30.679 INFO:tasks.workunit.client.0.smithi073.stdout:
</pre> CephFS - Documentation #61902 (New): Recommend pinning _deleting directory to another rank for ce...https://tracker.ceph.com/issues/619022023-07-05T20:08:18ZPatrick Donnellypdonnell@redhat.com
<p>The _deleting directory can often get sudden large volumes to recursively unlink. Rank 0 is not an ideal default target for this extra workload. We should just select rank 1 by default but introduce a configuration to alter that setting.</p>
<p>If max_mds==1, of course, the _deleting directory still stays on rank 0.</p>
<p>As an add-on to this, add a file to the directory which persists (i.e. not deleted by the async deleter threads) so that the directory is not exported back to rank 0 whenever it is empty.</p> Linux kernel client - Bug #59735 (New): fs/ceph: cross check passed in fsid during mount with clu...https://tracker.ceph.com/issues/597352023-05-12T03:10:42ZVenky Shankarvshankar@redhat.com
<p>Right now, basic checks are done for the passed in fsid - just that it "looks" like a fsid. However, the kclient can cross check this fsid with the cluster fsid and fail the mount (maybe?) on mismatch.</p> CephFS - Bug #58945 (New): qa: xfstests-dev's generic test suite has 20 failures with fuse clienthttps://tracker.ceph.com/issues/589452023-03-09T18:40:53ZRishabh Dave
<p><a href="https://github.com/ceph/ceph/pull/45960" class="external">PR #45960</a> enables running tests from xfstests-dev against CephFS. For FUSE mounted CephFS, the generic test suite from xfstests-dev has 20 failures.</p>
<p>Following are the failures -<br /><code>generic/020 generic/093 generic/125 generic/126 generic/184 generic/192 generic/193 generic/209 generic/213 generic/258 generic/355 generic/426 generic/434 generic/451 generic/467 generic/478 generic/504 generic/597 generic/598 generic/633</code></p>
<p>Failures have been seen on the following teuthology runs -<br /><a class="external" href="http://pulpito.front.sepia.ceph.com/rishabh-2023-03-03_17:32:02-fs:functional-wip-rishabh-fs-qa-workunit-quota.sh-4-distro-default-smithi/">http://pulpito.front.sepia.ceph.com/rishabh-2023-03-03_17:32:02-fs:functional-wip-rishabh-fs-qa-workunit-quota.sh-4-distro-default-smithi/</a><br /><a class="external" href="http://pulpito.front.sepia.ceph.com/rishabh-2023-03-06_20:45:05-fs:functional-wip-rishabh-fs-qa-workunit-quota.sh-4-distro-default-smithi/">http://pulpito.front.sepia.ceph.com/rishabh-2023-03-06_20:45:05-fs:functional-wip-rishabh-fs-qa-workunit-quota.sh-4-distro-default-smithi/</a></p>
<p>These runs above have been respectively reported here -<br /><a class="external" href="https://github.com/ceph/ceph/pull/45960#issuecomment-1456941244">https://github.com/ceph/ceph/pull/45960#issuecomment-1456941244</a><br /><a class="external" href="https://github.com/ceph/ceph/pull/45960#issuecomment-1458012782">https://github.com/ceph/ceph/pull/45960#issuecomment-1458012782</a></p> CephFS - Bug #58938 (New): qa: xfstests-dev's generic test suite has 7 failures with kclienthttps://tracker.ceph.com/issues/589382023-03-08T14:16:17ZRishabh Dave
<p><a href="https://github.com/ceph/ceph/pull/45960" class="external">PR #45960</a> enables running tests from xfstests-dev against CephFS. For kernel mounted CephFS, the generic test suite from xfstests-dev has 7 failures.</p>
<p>Following are the failures -<br /><code>generic/003 generic/093 generic/125 generic/192 generic/317 generic/631 generic/684</code></p>
<p>Failures have been seen on following teuthology runs -<br /><a class="external" href="http://pulpito.front.sepia.ceph.com/rishabh-2023-03-03_17:30:49-fs:functional-wip-rishabh-fs-qa-workunit-quota.sh-4-distro-default-smithi/">http://pulpito.front.sepia.ceph.com/rishabh-2023-03-03_17:30:49-fs:functional-wip-rishabh-fs-qa-workunit-quota.sh-4-distro-default-smithi/</a><br /><a class="external" href="http://pulpito.front.sepia.ceph.com/rishabh-2023-03-07_11:30:13-fs:functional-wip-rishabh-fs-qa-workunit-quota.sh-4-distro-default-smithi/">http://pulpito.front.sepia.ceph.com/rishabh-2023-03-07_11:30:13-fs:functional-wip-rishabh-fs-qa-workunit-quota.sh-4-distro-default-smithi/</a></p>
<p>These runs have ben respectively reported here -<br /><a class="external" href="https://github.com/ceph/ceph/pull/45960#issuecomment-1456941244">https://github.com/ceph/ceph/pull/45960#issuecomment-1456941244</a><br /><a class="external" href="https://github.com/ceph/ceph/pull/45960#issuecomment-1458012782">https://github.com/ceph/ceph/pull/45960#issuecomment-1458012782</a></p> CephFS - Documentation #58620 (New): document asok commandshttps://tracker.ceph.com/issues/586202023-01-31T13:26:19ZDhairya Parmar
<p>some of the asok commands can be found here and there in the docs but most lack documentation, it would be good to have all of the them documented if possible.</p> CephFS - Feature #56428 (New): add command "fs deauthorize"https://tracker.ceph.com/issues/564282022-06-30T10:04:25ZRishabh Dave
<p>Since entity auth keyrings can now hold auth caps for multiple Ceph FSs, it is very tedious and very error-prone to remove caps for a single FS from the keyring. Here's an example from vstart cluster I locally have -</p>
<pre>
build$ ./bin/ceph fs ls
name: a, metadata pool: cephfs.a.meta, data pools: [cephfs.a.data ]
name: b, metadata pool: cephfs.b.meta, data pools: [cephfs.b.data ]
name: c, metadata pool: cephfs.c.meta, data pools: [cephfs.c.data ]
name: d, metadata pool: cephfs.d.meta, data pools: [cephfs.d.data ]
build$ ./bin/ceph auth get client.x
[client.x]
key = AQD8TMpgdaLvEBAAS1IDcMDvIGt1Yw2NYKtjeg==
caps mds = "allow rw fsname=a, allow rw fsname=b, allow rw
fsname=c, allow rw fsname=d"
caps mon = "allow r fsname=a, allow r fsname=b, allow r
fsname=c, allow r fsname=d"
caps osd = "allow rw tag cephfs data=a, allow rw tag cephfs
data=b, allow rw tag cephfs data=c, allow rw tag cephfs data=d"
exported keyring for client.x
</pre>
<p>The only current way to do it is to use <code>ceph auth caps</code> command and pass all the new caps -</p>
<pre>
build$ ./bin/ceph auth caps client.x mon "allow r fsname=a, allow r
fsname=b, allow r fsname=c" osd "allow rw tag cephfs data=a, allow rw
tag cephfs data=b, allow rw tag cephfs data=c" mds "allow rw fsname=a,
allow rw fsname=b, allow rw fsname=c"
updated caps for client.x
build$ ./bin/ceph auth get client.x
[client.x]
key = AQD8TMpgdaLvEBAAS1IDcMDvIGt1Yw2NYKtjeg==
caps mds = "allow rw fsname=a, allow rw fsname=b, allow rw fsname=c"
caps mon = "allow r fsname=a, allow r fsname=b, allow r fsname=c"
caps osd = "allow rw tag cephfs data=a, allow rw tag cephfs
data=b, allow rw tag cephfs data=c"
exported keyring for client.x
</pre>
<p>The other way of doing this is to copy the keyring to a file, modify that file and pass the file path to <code>ceph auth import -i</code>. But what would make it really short and straightforward, IMO, is something like following -</p>
<pre>
./bin/ceph fs deauthorize d client.x
</pre>
<p>It would have exactly the same effect as the <code>ceph auth caps</code> command above. I think it would also be a good idea to extend this command to take a path -</p>
<pre>
./bin/ceph fs deauthorize d client.x /dir1/dir2/
</pre>
<p>In this case, the command would remove only the caps for path <code>/dir1/dir2</code> for FS "d" from the keyring for the entity "client.x". Every other cap for client.x (both for FS "d" and not for FS "d") would remain unaffected.</p> Ceph - Backport #55403 (New): quincy: qa/cephfs: don't exclamation mark on test_cephfs_shell.pyhttps://tracker.ceph.com/issues/554032022-04-21T13:25:07ZBackport BotCephFS - Bug #54798 (New): crash: double const ceph::common::ConfigProxy::get_val<double>(std::ba...https://tracker.ceph.com/issues/547982022-03-19T01:24:20ZTelemetry Bot
<p><a class="external" href="http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=a6be1eafcec3445c3e9779d3e775e339dd9c48c7512cf89ac5737c2abff13a12">http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=a6be1eafcec3445c3e9779d3e775e339dd9c48c7512cf89ac5737c2abff13a12</a></p>
<p>Sanitized backtrace:<br /><pre> double const ceph::common::ConfigProxy::get_val<double>(std::basic_string_view<char, std::char_traits<char> >) const
MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)
MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)
MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)
MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)
DispatchQueue::entry()
DispatchQueue::DispatchThread::entry()
</pre><br />Crash dump sample:<br /><pre>{
"backtrace": [
"/lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7fd485a76140]",
"gsignal()",
"abort()",
"/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9a7ec) [0x7fd4859277ec]",
"/lib/x86_64-linux-gnu/libstdc++.so.6(+0xa5966) [0x7fd485932966]",
"/lib/x86_64-linux-gnu/libstdc++.so.6(+0xa59d1) [0x7fd4859329d1]",
"/lib/x86_64-linux-gnu/libstdc++.so.6(+0xa5c65) [0x7fd485932c65]",
"/usr/bin/ceph-mds(+0x13c2b6) [0x558c1c8c82b6]",
"(double const ceph::common::ConfigProxy::get_val<double>(std::basic_string_view<char, std::char_traits<char> >) const+0x99) [0x558c1c92ae39]",
"(MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x1ae) [0x558c1c95119e]",
"(MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0xd5a) [0x558c1c92178a]",
"(MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x37f) [0x558c1c92496f]",
"(MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x187) [0x558c1c9251e7]",
"(Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x468) [0x7fd485f77448]",
"(DispatchQueue::entry()+0x5ff) [0x7fd485f74abf]",
"(DispatchQueue::DispatchThread::entry()+0xd) [0x7fd486036c3d]",
"/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd485a6aea7]",
"clone()"
],
"ceph_version": "16.2.7",
"crash_id": "2022-02-20T04:03:27.693374Z_d519e925-1413-42c0-b0ca-49c24cacdd6e",
"entity_name": "mds.20e498aa737f24767ed073d524fb7f19ff86197f",
"os_id": "11",
"os_name": "Debian GNU/Linux 11 (bullseye)",
"os_version": "11 (bullseye)",
"os_version_id": "11",
"process_name": "ceph-mds",
"stack_sig": "7e8e26a42e3616f3f9e5ca48bef80648ffe3f56da687a80cb469f84f71882368",
"timestamp": "2022-02-20T04:03:27.693374Z",
"utsname_machine": "x86_64",
"utsname_release": "5.15.0-0.bpo.2-amd64",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Debian 5.15.5-2~bpo11+1 (2022-01-02)"
}</pre></p> CephFS - Bug #49644 (New): vstart_runner: run_ceph_w() doesn't work with shell=Truehttps://tracker.ceph.com/issues/496442021-03-08T09:05:49ZRishabh Dave
<p>Setting <code>shell</code> to <code>True</code> leads to a crash when <code>tasks.mgr.test_module_selftest.TestModuleSelftest.test_selftest_cluster_log</code> is run on "ceph API" GitHub CI job. The crash happens when <code>self.subproc.communicate()</code> is run in <code>self.watcher_process.finished()</code> (<code>self.watcher_process</code> is an instance of <code>LocalRemoteProcess</code>) in <code>ContextManager.__exit__()</code>.</p>
<p>See ceph API job on this PR (<a class="external" href="https://github.com/ceph/ceph/pull/38471">https://github.com/ceph/ceph/pull/38471</a>) to see the crash. The console output also contains extra debug messages that shows that crashed happened on call to <code>self.proc.communicate()</code> in <code>LocalRemoteProcess.finished()</code>.</p>
<p>Ideally, the <code>run_ceph_w()</code> should work even when <code>shell</code> is set to <code>True</code>.</p>
<p>Also see this PR - <a class="external" href="https://github.com/ceph/ceph/pull/38443">https://github.com/ceph/ceph/pull/38443</a>. Especially this discussion on it might be helpful - <a class="external" href="https://github.com/ceph/ceph/pull/38443#discussion_r537673068">https://github.com/ceph/ceph/pull/38443#discussion_r537673068</a>.</p>
<p>Why do we need to set <code>shell</code> to <code>True</code>?<br />----------------------------------------<br />Currently, <code>teuthology.orchestra.run.run()</code> executes commands with <code>shell</code> set to <code>True</code> while <code>vstart_runner.LocalRemoteProcess.run</code> runs with <code>shell@s set to @False</code>. This inconsistency leads to incompatibility and bugs and makes code in <code>teuthology.remote</code> non-reusable.</p> CephFS - Bug #15783 (New): client: enable acls by defaulthttps://tracker.ceph.com/issues/157832016-05-09T18:05:09ZEric Eastmaneric0e@yahoo.com
<p>I found while doing some SAMBA testing using Jewel on both a kernel mounted and fuse mounted Ceph File system that ACLs cannot be set on directories on the fuse mounted Ceph file system. SAMBA gave the following error in the smbd log file, with log level = 20 when I tried to add an additional user to have access to a directory:</p>
<p>2016/05/07 23:41:19.213997, 10, pid=2823630, effective(2000501,2000514), real(2000501, 0)]../source3/modules/vfs_posixacl.c:92(posixacl_sys_acl_set_file) Calling acl_set_file: New folder (4), 0 [2016/05/07 23:41:19.214170, 10, pid=2823630, effective(2000501,2000514),real(2000501, 0)]../source3/modules/vfs_posixacl.c:111 (posixacl_sys_acl_set_file) acl_set_file failed: Operation not supported</p>
<p>This same SAMBA test works without errors on the same Ceph file system if it is kernel mounted.</p>
<p>A simple test of setting an ACL from the command line to a fuse mounted Ceph file system also fails:<br /><pre>
# mkdir /cephfsFUSE/x
# setfacl -m d:o:rw /cephfsFUSE/x
setfacl: /cephfsFUSE/x: Operation not supported
</pre></p>
<p>The same test to the same Ceph file system using the kernel mount method works.</p>
<p>This was first reported on the ceph-user email list: <a class="external" href="http://www.spinics.net/lists/ceph-users/msg27568.html">http://www.spinics.net/lists/ceph-users/msg27568.html</a></p>
<p>Test setup info:<br />ceph -v<br />ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)</p>
<p>Ubuntu version is 14.04 with the 4.6rc4 PPA kernel:<br />uname -a<br />Linux ede-c1-gw04 4.6.0-040600rc4-generic #201604172330 SMP Mon Apr 18 03:32:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux</p>
<p>Samba version 4.4.2</p>
<p>Ceph file system mount info:<br />grep ceph /proc/mounts<br />10.14.2.11,10.14.2.12,10.14.2.13:/ /cephfs ceph rw,noatime,name=cephfs,secret=<hidden>,acl 0 0<br />ceph-fuse /cephfsFUSE fuse.ceph-fuse rw,noatime,user_id=0,group_id=0,default_permissions,allow_other 0 0</p>
<p>I put instructions on how I built SAMBA, the smb.conf file, /etc/fstab, and the ceph.conf file in pastebin at: <a class="external" href="http://pastebin.com/hv7PEqNm">http://pastebin.com/hv7PEqNm</a></p> CephFS - Feature #13999 (New): client: richacl supporthttps://tracker.ceph.com/issues/139992015-12-07T09:00:23ZZheng Yanukernel@gmail.com
<p><a class="external" href="http://www.bestbits.at/richacl/">http://www.bestbits.at/richacl/</a></p>
<p>So far no other project has merged richacl patch. Should we be the first one.</p>