Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2023-06-06T10:51:16ZCeph
Redmine Ceph - Bug #61598 (New): gcc-14: FTBFS "error: call to non-'constexpr' function 'virtual unsigned...https://tracker.ceph.com/issues/615982023-06-06T10:51:16ZTim Serongtserong@suse.com
<p>gcc 14 has introduced a change which results in ceph build failures:</p>
<pre>
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/osd/osd_types.h: In lambda function:
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/common/dout.h:184:73: error: call to non-'constexpr' function 'virtual unsigned int DoutPrefixProvider::get_subsys() const'
[ 270s] 184 | dout_impl(pdpp->get_cct(), ceph::dout::need_dynamic(pdpp->get_subsys()), v) \
[ 270s] | ~~~~~~~~~~~~~~~~^~
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/common/dout.h:155:58: note: in definition of macro 'dout_impl'
[ 270s] 155 | return (cctX->_conf->subsys.template should_gather<sub, v>()); \
[ 270s] | ^~~
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/osd/osd_types.h:3618:3: note: in expansion of macro 'ldpp_dout'
[ 270s] 3618 | ldpp_dout(dpp, 10) << "build_prior all_probe " << all_probe << dendl;
[ 270s] | ^~~~~~~~~
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/common/dout.h:51:20: note: 'virtual unsigned int DoutPrefixProvider::get_subsys() const' declared here
[ 270s] 51 | virtual unsigned get_subsys() const = 0;
[ 270s] | ^~~~~~~~~~
</pre>
<p>The gcc change is described at <a class="external" href="https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617196.html">https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617196.html</a>.</p>
<p>The ceph FTBFS was mentioned in a followup post at <a class="external" href="https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618384.html">https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618384.html</a>, and apparently this failure is now expected, as <code> DoutPrefixProvider::get_subsys()</code> isn't declared <code>constexpr</code> but really should be.</p>
<p>I tried to fix this experimentally by simply declaring <code>constexpr get_subsys()</code>, e.g.:</p>
<pre>
diff --git a/src/common/dout.h b/src/common/dout.h
index a1375fbb910..6e91750708a 100644
--- a/src/common/dout.h
+++ b/src/common/dout.h
@@ -61,7 +61,7 @@ class NoDoutPrefix : public DoutPrefixProvider {
std::ostream& gen_prefix(std::ostream& out) const override { return out; }
CephContext *get_cct() const override { return cct; }
- unsigned get_subsys() const override { return subsys; }
+ constexpr unsigned get_subsys() const override { return subsys; }
};
// a prefix provider with static (const char*) prefix
@@ -88,7 +88,7 @@ class DoutPrefixPipe : public DoutPrefixProvider {
return out;
}
CephContext *get_cct() const override { return dpp.get_cct(); }
- unsigned get_subsys() const override { return dpp.get_subsys(); }
+ constexpr unsigned get_subsys() const override { return dpp.get_subsys(); }
virtual void add_prefix(std::ostream& out) const = 0;
};
</pre>
<p>...but that has some problems:</p>
<p>1) Instead of an outright build failure, I get <code>warning: virtual functions cannot be 'constexpr' before C++20 [-Winvalid-constexpr]</code>. I imaging this is undesirable.<br />2) Even if 1 <em>is</em> desirable, there's plenty of other subclasses of <code>DoutPrefixProvider</code> which would all <em>also</em> need to have their <code>get_subsys()</code> methods declared <code>conxtexpr</code> for the build to complete.</p>
<p>TBH the whole <code>dout</code> thing is black magic to me, so I could really use some assistance with how best to fix this.</p> Ceph - Bug #58501 (Resolved): ceph.spec.in: need to replace SUSE usrmerged macro with version checkhttps://tracker.ceph.com/issues/585012023-01-19T07:23:05ZTim Serongtserong@suse.com
<p><a class="external" href="https://github.com/ceph/ceph/commit/e4c4a4ce97fff8a5b4efa747d9cffeabcceedd25">https://github.com/ceph/ceph/commit/e4c4a4ce97fff8a5b4efa747d9cffeabcceedd25</a> introduced the use of the <code>usrmerged</code> macro on SUSE distros to guard against installing the /sbin/mount.ceph symlink. This macro has since been deprecated and should be replaced with a version check instead (<code>%if 0%{?suse_version} < 1550</code>). See <a class="external" href="https://en.opensuse.org/openSUSE:Usr_merge">https://en.opensuse.org/openSUSE:Usr_merge</a> for more details.</p> Ceph - Bug #57967 (Resolved): ceph-crash service should run as unprivileged user, not root (CVE-2...https://tracker.ceph.com/issues/579672022-11-03T05:11:53ZTim Serongtserong@suse.com
<p>As reported at <a class="external" href="https://www.openwall.com/lists/oss-security/2022/10/25/1">https://www.openwall.com/lists/oss-security/2022/10/25/1</a>, ceph-crash runs as root, which makes it vulnerable to a potential ceph user to root privilege escalation. This is fixable by making the ceph-crash process drop privileges and run as the ceph user, just as the other ceph daemons do.</p> Ceph - Bug #57893 (Pending Backport): make-dist creates ceph.spec with incorrect Release tag for ...https://tracker.ceph.com/issues/578932022-10-19T08:04:36ZTim Serongtserong@suse.com
<p><code>ceph.spec.in</code> says:</p>
<pre>
Name: ceph
Version: @PROJECT_VERSION@
Release: @RPM_RELEASE@%{?dist}
%if 0%{?fedora} || 0%{?rhel}
Epoch: 2
%endif
</pre>
<p>When the <code>make-dist</code> script generates the final <code>ceph.spec</code> file for RPM builds, it will set PROJECT_VERSION to the version from the latest tag (e.g.: 17.0.0), and set RPM_RELEASE to the number of additional commits plus the last commit hash (e.g.: 14883.gc49b81c7d61). This doesn't work properly when building in SUSE's Open Build Service, because OBS overwrites the Release tag with checkin and build counters (see <a class="external" href="https://en.opensuse.org/openSUSE:Package_versioning_guidelines">https://en.opensuse.org/openSUSE:Package_versioning_guidelines</a>).</p>
<p>We've long carried a downstream patch for <code>make-dist</code> to fix this, by putting everything in PROJECT_VERSION, so you end up with something like <code>Version: 17.0.0.14883+gc49b81c7d61</code> (see <a class="external" href="https://github.com/SUSE/ceph/commit/9ee636cdca3">https://github.com/SUSE/ceph/commit/9ee636cdca3</a>), so I figure I should really submit that upstream.</p> Ceph - Bug #57860 (Pending Backport): disable system_pmdk on s390x for SUSE distroshttps://tracker.ceph.com/issues/578602022-10-13T04:28:31ZTim Serongtserong@suse.com
<p>Same as <a class="external" href="https://tracker.ceph.com/issues/56491">https://tracker.ceph.com/issues/56491</a> which addressed RHEL and Fedora not shipping libpmem on s390x, but for SUSE.</p> Ceph - Bug #57497 (Pending Backport): openSUSE Leap 15.x needs to explicitly specify gcc-11https://tracker.ceph.com/issues/574972022-09-12T01:06:36ZTim Serongtserong@suse.com
<p>This is a recurrence of <a class="external" href="https://tracker.ceph.com/issues/55237">https://tracker.ceph.com/issues/55237</a>. I wrote <a class="external" href="https://github.com/ceph/ceph/commit/80949babab4">https://github.com/ceph/ceph/commit/80949babab4</a> to use gcc-c++ >= 11 on SUSE distros, which works fine on Tumbleweed (our latest and greatest), but doesn't work on openSUSE Leap 15, which has gcc 11, but not packaged in a way in which that nice neat >= requirement works. So I need to re-instate part of <a class="external" href="https://github.com/ceph/ceph/pull/45845/commits/8ab5d7eea07">https://github.com/ceph/ceph/pull/45845/commits/8ab5d7eea07</a></p> Ceph - Bug #57390 (Pending Backport): denc-mod-osd.so: undefined symbol _ZN4ceph25ErasureCodePlug...https://tracker.ceph.com/issues/573902022-09-02T08:42:22ZTim Serongtserong@suse.com
<p>When running <code>ceph-dencoder</code> on openSUSE Tumbleweed (built with GCC 12 and LTO, in case that's relevant), I get the following failure:</p>
<pre>
# ceph-dencoder
failed to dlopen("/usr/lib64/ceph/denc/denc-mod-osd.so"): /usr/lib64/ceph/denc/denc-mod-osd.so: undefined symbol: _ZN4ceph25ErasureCodePluginRegistry9singletonE
-h for help
</pre>
<p>This is fixable by adding "erasure_code" to denc-mod-osd's target_link_libraries.</p> Ceph - Bug #56658 (Resolved): build: cephfs-shell fails to build/install with python setuptools >...https://tracker.ceph.com/issues/566582022-07-21T07:56:14ZTim Serongtserong@suse.com
<p>python setuptools v61 changed package discovery so that if it finds what it thinks are multiple top-level packages in a directory, it will fail to build. This was introduced by <a class="external" href="https://github.com/pypa/setuptools/pull/3177">https://github.com/pypa/setuptools/pull/3177</a>, and causes the ceph RPM build to fail with:</p>
<pre>
...
[ 9562s] error: Multiple top-level packages discovered in a flat-layout: ['top', 'CMakeFiles'].
[ 9562s]
[ 9562s] To avoid accidental inclusion of unwanted files or directories,
[ 9562s] setuptools will not proceed with this build.
[ 9562s]
[ 9562s] If you are trying to create a single distribution with multiple packages
[ 9562s] on purpose, you should not rely on automatic discovery.
[ 9562s] Instead, consider the following options:
[ 9562s]
[ 9562s] 1. set up custom discovery (`find` directive with `include` or `exclude`)
[ 9562s] 2. use a `src-layout`
[ 9562s] 3. explicitly set `py_modules` or `packages` with a list of names
[ 9562s]
[ 9562s] To find more information, look for "package discovery" on setuptools docs.
...
[ 9833s] RPM build errors:
[ 9833s] File not found: /home/abuild/rpmbuild/BUILDROOT/ceph-16.2.9.158+gd93952c7eea-2.3.x86_64/usr/lib/python3.10/site-packages/cephfs_shell-*.egg-info
[ 9833s] File not found: /home/abuild/rpmbuild/BUILDROOT/ceph-16.2.9.158+gd93952c7eea-2.3.x86_64/usr/bin/cephfs-shell
</pre>
<p>This has been fixed in Fedora downstream by moving a/src/tools/cephfs/cephfs-shell to a separate subdirectory (see <a class="external" href="https://src.fedoraproject.org/rpms/ceph/blob/rawhide/f/0021-cephfs-shell.patch">https://src.fedoraproject.org/rpms/ceph/blob/rawhide/f/0021-cephfs-shell.patch</a>). I've confirmed this approach also works for openSUSE.</p> Ceph - Bug #55237 (Resolved): rpm: openSUSE build fails - needs explicit gcc version, also can't ...https://tracker.ceph.com/issues/552372022-04-08T06:23:06ZTim Serongtserong@suse.com
<p>Two issues here which are strictly speaking unrelated, but I thought it'd be less annoying to just fix the openSUSE build with one bug.</p>
<p>Issue 1: openSUSE Leap 15.3 and 15.4 use gcc 7 by default, which is not new enough to build ceph. Both distros do provide gcc 11, but we have to explicitly request that version if we want to use it.</p>
<p>Issue 2: Parquet, which in turn requires Arrow, can't currently be built for openSUSE. The problem here is that we don't have those dependencies packaged as RPMs, and when trying to build Arrow out of the submodule in the ceph source tree, one of its dependencies (xsimd) tries to download source from the internet, which doesn't work in the openSUSE Build Service (build workers have no internet access).</p> Ceph - Bug #55087 (Resolved): rpm: openSUSE needs libthrift-devel, not thrift-develhttps://tracker.ceph.com/issues/550872022-03-28T09:42:57ZTim Serongtserong@suse.com
<p>In <a class="external" href="https://github.com/ceph/ceph/pull/38783">https://github.com/ceph/ceph/pull/38783</a>, <a class="external" href="https://github.com/ceph/ceph/pull/38783/commits/80e82686eba">https://github.com/ceph/ceph/pull/38783/commits/80e82686eba</a> added "thrift-devel >= 0.13.0" as a BuildRequires. On SUSE distros, this package is named libthrift-devel, so we need an <code>%if 0%{?suse_version}</code> block around that one.</p> Ceph - Bug #55079 (Pending Backport): rpm: remove contents of build directory at end of %install ...https://tracker.ceph.com/issues/550792022-03-28T04:02:45ZTim Serongtserong@suse.com
<p>I've been doing some measurements of disk usage during SUSE RPM builds (of Pacific, but this should roughly apply for newer Cephs too). In our particular build environment, which builds everything in VMs, we see something like this:</p>
<pre>
Filesystem Size Used Avail Use% Mounted on
df start of build: /dev/vda 53G 14G 40G 25% /
df end of build: /dev/vda 53G 31G 23G 58% /
df end of install: /dev/vda 53G 39G 15G 74% /
df before clamscan: /dev/vda 53G 41G 13G 78% /
df after clamscan: /dev/vda 53G 50G 3.9G 93% /
</pre>
<p>So after compiling everything, we've consumed about 17GB (that's all the binaries and object files and whatnot that end up in the "build" directory in the source tree). Then, after %install (which installs everything in the build root, ready to be turned into actual RPMs), we've used another 8GB. The next part - the clamscan bit - is one of the rpmlint checks SUSE runs, which takes another 9G when it extracts all the built RPMs (including debuginfo RPMs), in order to scan them.</p>
<p>In summary, our build worker VMs currently need a bit over 50G disk to build Ceph.</p>
<p>If I add <code>`rm -rf build`</code> to the very end of the <span>install section, to get rid of the 17GB of built binaries, we go into clamscan with 24G free, rather than 41G free, and when clamscan finishes we're using 32G. This means the peak build disk usage with that change is about 39G, so we reduce our build worker's disk space requirements by about 11G (or 20</span>%).</p> Ceph - Bug #53060 (Closed): Unable to load libceph_snappy.so due to undefined symbol _ZTIN6snappy...https://tracker.ceph.com/issues/530602021-10-27T09:21:47ZTim Serongtserong@suse.com
<p>If you try to run Ceph with snappy 1.1.9 installed, <code>ceph status</code> will show HEALTH_WARN, and tell you that your OSDs "have broken BlueStore compression". <code>ceph health detail</code> will tell you that each of your OSDs is "unable to load:snappy". The OSD logs will show something like this:</p>
<pre>
Oct 27 08:55:33 node1 ceph-osd[561817]: load failed dlopen(): "/usr/lib64/ceph/compressor/libceph_snappy.so: undefined symbol: _ZTIN6snappy6SourceE" or "/usr/lib64/ceph/libceph_snappy.so: cannot open shared object file: No such file or directory"
Oct 27 08:55:33 node1 ceph-osd[561817]: create cannot load compressor of type snappy
</pre>
<p>This is because RTTI was disabled in snappy 1.1.9, so the typeinfo for the <code>snappy::Source</code> class - which Ceph's SnappyCompressor creates a subclass of - isn't included in libsnappy.so. Ceph still <em>builds</em> just fine, because the compressors are built as shared libraries. The problem only manifests when our snappy plugin is dlopen()ed at runtime, and then the linker kicks in and can't find that missing symbol.</p>
<p>This would ideally be fixed by getting RTTI re-enabled in snappy, so I've gone ahead and opened <a class="external" href="https://github.com/google/snappy/pull/144">https://github.com/google/snappy/pull/144</a></p> Ceph - Bug #37503 (Resolved): Audit log: mgr module passwords set on CLI written as plaintext in ...https://tracker.ceph.com/issues/375032018-12-03T10:57:04ZTim Serongtserong@suse.com
<p>A number of mgr modules need passwords set for one reason or another, either to authenticate with external systems (deepsea, influx, diskprediction), or to define credentials for users of those modules (dashboard, restful).</p>
<p>In all cases, these passwords are set from the command line, either via module-specific commands (<code>`ceph dashboard ac-user-create`</code>, <code>`deepsea config-set salt_api_password`</code>, etc.) or via <code>`ceph config set`</code> with some particular key (e.g.: mgr/influx/passsword)</p>
<p>All module-specific commands go through <code>DaemonServer::_handle_command()</code>, which then logs the command via <code>audit_clog->debug()</code> (or <code>audit_clog->info()</code> in case of access denied). This all ends up written to <code>/var/log/ceph/ceph-mgr.$ID.log</code>, which is world-readable, e.g.:</p>
<pre>
2018-12-03 10:45:28.864 7f67e7f8f700 0 log_channel(audit) log [DBG] : from='client.343880 172.16.1.254:39896/3560370796' entity='client.admin' cmd=[{"prefix": "deepsea config-set", "key": "salt_api_password", "value": "foo", "target": ["mgr", ""]}]: dispatch
</pre>
<p>Additionally, anything that results in a "config set" lands in the mon log, e.g.:</p>
<pre>
2018-12-03 10:45:28.881552 [INF] from='mgr.295252 172.16.1.21:56636/175641' entity='mgr.data1' cmd='[{"prefix":"config set","who":"mgr","name":"mgr/deepsea/salt_api_password","value":"foo"}]': finished
</pre>
<p>This also appears in the Audit log in the Dashboard.</p>
<p>Some things that land in the mon log probably don't matter; for any module that hashes passwords before saving them, only the hashed password should land in the mon log. But there's still the problem of the CLI commands in the mgr log, and in any case, modules that need to authenticate with external services will need to store plaintext passwords.</p>
<p>ISTM we need to either never log these things, or somehow keep the command logging, but filter the passwords out, so it renders the value as "*****" instead of the actual password.</p>
<p>I'm not sure how best to approach this, given the way command logging is structured. At the point commands are logged, the commands themselves are just strings. Admittedly, they're strings of JSON, but they're effectively opaque at that point - we'd have to parse the JSON, then look for things that might be passwords, blank them out, and turn the whole lot back into a string. Yuck.</p> Ceph - Bug #18163 (Resolved): platform.linux_distribution() is deprecated; stop using ithttps://tracker.ceph.com/issues/181632016-12-07T04:10:56ZTim Serongtserong@suse.com
<p>platform.linux_distribution() is deprecated, so we should stop using it. Notably it uses /etc/SuSE-release on SUSE systems, and the latest SUSE versions don't ship this file; instead they ship /etc/os-release, which platform.linux_distribution() doesn't know about, so it returns ('','','').</p>
<p>AFAICT, platform.linux_distribution() is currently used by ceph-detect-init, which in turn is used by ceph-disk. If ceph-detect-init can't determine the distro because it sees ('','',''), this results in ceph-disk always tagging the init system as sysvinit.</p>
<p>There are also platform.linux_distribution() invocations in qa/workunits/ceph-disk/ceph-disk-no-lockbox and src/ceph-disk/ceph_disk/main.py, but they look like dead code to me.</p>
<p>See also bug <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: platform.linux_distribution() is deprecated; stop using it (Resolved)" href="https://tracker.ceph.com/issues/18141">#18141</a></p> Ceph - Bug #14864 (Resolved): ceph-detect-init requires python-setuptools at runtimehttps://tracker.ceph.com/issues/148642016-02-25T14:58:49ZTim Serongtserong@suse.com
<p>Testing a reasonably recent ceph-10.0.2 on openSUSE Leap 42.1, my OSDs weren't mounting. I tracked this back to /usr/lib/systemd/system/ceph-disk@.service which invokes `flock /var/lock/ceph-disk /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f`. This in turn results in:</p>
<pre>
ceph-disk: main_trigger: Namespace(dev='/dev/sdb1', func=<function main_trigger at 0x7fa6ebf6b050>, log_stdout=True, prepend_to_path='/usr/bin', prog='ceph-disk', statedir='/var/lib/ceph', sync=True, sysconfdir='/etc/ceph', verbose=True)
ceph-disk: Running command: /sbin/init --version
/sbin/init: unrecognized option '--version'
ceph-disk: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid
ceph-disk: Running command: /usr/sbin/sgdisk -i 1 /dev/sdb
ceph-disk: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid
ceph-disk: Running command: /usr/sbin/sgdisk -i 1 /dev/sdb
ceph-disk: trigger /dev/sdb1 parttype 4fbd7e29-9d25-41b8-afd0-062c0ceff05d uuid 93b72ed5-7d84-4b0b-a227-330fcd22513e
ceph-disk: Running command: /usr/sbin/ceph-disk activate /dev/sdb1
Traceback (most recent call last):
File "/usr/bin/ceph-detect-init", line 5, in <module>
from pkg_resources import load_entry_point
ImportError: No module named pkg_resources
ERROR:ceph-disk:Failed to activate
Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 4036, in <module>
main(sys.argv[1:])
File "/usr/sbin/ceph-disk", line 3992, in main
main_catch(args.func, args)
File "/usr/sbin/ceph-disk", line 4014, in main_catch
func(args)
File "/usr/sbin/ceph-disk", line 2530, in main_activate
reactivate=args.reactivate,
File "/usr/sbin/ceph-disk", line 2296, in mount_activate
(osd_id, cluster) = activate(path, activate_key_template, init)
File "/usr/sbin/ceph-disk", line 2477, in activate
init = init_get()
File "/usr/sbin/ceph-disk", line 799, in init_get
'--default', 'sysvinit',
File "/usr/sbin/ceph-disk", line 902, in _check_output
raise error
subprocess.CalledProcessError: Command '/usr/bin/ceph-detect-init' returned non-zero exit status 1
</pre>
<p>The important part is:</p>
<pre>
Traceback (most recent call last):
File "/usr/bin/ceph-detect-init", line 5, in <module>
from pkg_resources import load_entry_point
ImportError: No module named pkg_resources
</pre>
<p>This is fixable by installing python-setuptools, suggesting that package needs to be added to the RPM Requires and, I assume, the Debian Depends.</p>