Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2023-06-06T10:51:16ZCeph
Redmine Ceph - Bug #61598 (New): gcc-14: FTBFS "error: call to non-'constexpr' function 'virtual unsigned...https://tracker.ceph.com/issues/615982023-06-06T10:51:16ZTim Serongtserong@suse.com
<p>gcc 14 has introduced a change which results in ceph build failures:</p>
<pre>
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/osd/osd_types.h: In lambda function:
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/common/dout.h:184:73: error: call to non-'constexpr' function 'virtual unsigned int DoutPrefixProvider::get_subsys() const'
[ 270s] 184 | dout_impl(pdpp->get_cct(), ceph::dout::need_dynamic(pdpp->get_subsys()), v) \
[ 270s] | ~~~~~~~~~~~~~~~~^~
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/common/dout.h:155:58: note: in definition of macro 'dout_impl'
[ 270s] 155 | return (cctX->_conf->subsys.template should_gather<sub, v>()); \
[ 270s] | ^~~
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/osd/osd_types.h:3618:3: note: in expansion of macro 'ldpp_dout'
[ 270s] 3618 | ldpp_dout(dpp, 10) << "build_prior all_probe " << all_probe << dendl;
[ 270s] | ^~~~~~~~~
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/common/dout.h:51:20: note: 'virtual unsigned int DoutPrefixProvider::get_subsys() const' declared here
[ 270s] 51 | virtual unsigned get_subsys() const = 0;
[ 270s] | ^~~~~~~~~~
</pre>
<p>The gcc change is described at <a class="external" href="https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617196.html">https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617196.html</a>.</p>
<p>The ceph FTBFS was mentioned in a followup post at <a class="external" href="https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618384.html">https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618384.html</a>, and apparently this failure is now expected, as <code> DoutPrefixProvider::get_subsys()</code> isn't declared <code>constexpr</code> but really should be.</p>
<p>I tried to fix this experimentally by simply declaring <code>constexpr get_subsys()</code>, e.g.:</p>
<pre>
diff --git a/src/common/dout.h b/src/common/dout.h
index a1375fbb910..6e91750708a 100644
--- a/src/common/dout.h
+++ b/src/common/dout.h
@@ -61,7 +61,7 @@ class NoDoutPrefix : public DoutPrefixProvider {
std::ostream& gen_prefix(std::ostream& out) const override { return out; }
CephContext *get_cct() const override { return cct; }
- unsigned get_subsys() const override { return subsys; }
+ constexpr unsigned get_subsys() const override { return subsys; }
};
// a prefix provider with static (const char*) prefix
@@ -88,7 +88,7 @@ class DoutPrefixPipe : public DoutPrefixProvider {
return out;
}
CephContext *get_cct() const override { return dpp.get_cct(); }
- unsigned get_subsys() const override { return dpp.get_subsys(); }
+ constexpr unsigned get_subsys() const override { return dpp.get_subsys(); }
virtual void add_prefix(std::ostream& out) const = 0;
};
</pre>
<p>...but that has some problems:</p>
<p>1) Instead of an outright build failure, I get <code>warning: virtual functions cannot be 'constexpr' before C++20 [-Winvalid-constexpr]</code>. I imaging this is undesirable.<br />2) Even if 1 <em>is</em> desirable, there's plenty of other subclasses of <code>DoutPrefixProvider</code> which would all <em>also</em> need to have their <code>get_subsys()</code> methods declared <code>conxtexpr</code> for the build to complete.</p>
<p>TBH the whole <code>dout</code> thing is black magic to me, so I could really use some assistance with how best to fix this.</p> Ceph - Bug #58501 (Resolved): ceph.spec.in: need to replace SUSE usrmerged macro with version checkhttps://tracker.ceph.com/issues/585012023-01-19T07:23:05ZTim Serongtserong@suse.com
<p><a class="external" href="https://github.com/ceph/ceph/commit/e4c4a4ce97fff8a5b4efa747d9cffeabcceedd25">https://github.com/ceph/ceph/commit/e4c4a4ce97fff8a5b4efa747d9cffeabcceedd25</a> introduced the use of the <code>usrmerged</code> macro on SUSE distros to guard against installing the /sbin/mount.ceph symlink. This macro has since been deprecated and should be replaced with a version check instead (<code>%if 0%{?suse_version} < 1550</code>). See <a class="external" href="https://en.opensuse.org/openSUSE:Usr_merge">https://en.opensuse.org/openSUSE:Usr_merge</a> for more details.</p> Ceph - Bug #57967 (Resolved): ceph-crash service should run as unprivileged user, not root (CVE-2...https://tracker.ceph.com/issues/579672022-11-03T05:11:53ZTim Serongtserong@suse.com
<p>As reported at <a class="external" href="https://www.openwall.com/lists/oss-security/2022/10/25/1">https://www.openwall.com/lists/oss-security/2022/10/25/1</a>, ceph-crash runs as root, which makes it vulnerable to a potential ceph user to root privilege escalation. This is fixable by making the ceph-crash process drop privileges and run as the ceph user, just as the other ceph daemons do.</p> Ceph - Bug #57893 (Pending Backport): make-dist creates ceph.spec with incorrect Release tag for ...https://tracker.ceph.com/issues/578932022-10-19T08:04:36ZTim Serongtserong@suse.com
<p><code>ceph.spec.in</code> says:</p>
<pre>
Name: ceph
Version: @PROJECT_VERSION@
Release: @RPM_RELEASE@%{?dist}
%if 0%{?fedora} || 0%{?rhel}
Epoch: 2
%endif
</pre>
<p>When the <code>make-dist</code> script generates the final <code>ceph.spec</code> file for RPM builds, it will set PROJECT_VERSION to the version from the latest tag (e.g.: 17.0.0), and set RPM_RELEASE to the number of additional commits plus the last commit hash (e.g.: 14883.gc49b81c7d61). This doesn't work properly when building in SUSE's Open Build Service, because OBS overwrites the Release tag with checkin and build counters (see <a class="external" href="https://en.opensuse.org/openSUSE:Package_versioning_guidelines">https://en.opensuse.org/openSUSE:Package_versioning_guidelines</a>).</p>
<p>We've long carried a downstream patch for <code>make-dist</code> to fix this, by putting everything in PROJECT_VERSION, so you end up with something like <code>Version: 17.0.0.14883+gc49b81c7d61</code> (see <a class="external" href="https://github.com/SUSE/ceph/commit/9ee636cdca3">https://github.com/SUSE/ceph/commit/9ee636cdca3</a>), so I figure I should really submit that upstream.</p> Ceph - Bug #57860 (Pending Backport): disable system_pmdk on s390x for SUSE distroshttps://tracker.ceph.com/issues/578602022-10-13T04:28:31ZTim Serongtserong@suse.com
<p>Same as <a class="external" href="https://tracker.ceph.com/issues/56491">https://tracker.ceph.com/issues/56491</a> which addressed RHEL and Fedora not shipping libpmem on s390x, but for SUSE.</p> Orchestrator - Bug #57609 (Resolved): applying osd service spec with size filter fails if there's...https://tracker.ceph.com/issues/576092022-09-20T05:33:28ZTim Serongtserong@suse.com
<p>This issue came up on a system with a 4KB virtual floppy disk drive.</p>
<p><code>ceph-volume inventory</code> gives:</p>
<pre>
Device Path Size rotates available Model name
/dev/fd0 4.00 KB True False
/dev/sda 50.00 GB True False Virtual disk
/dev/sdb 50.00 GB True False Virtual disk
/dev/sdc 50.00 GB True False Virtual disk
/dev/sdd 50.00 GB True False Virtual disk
</pre>
<p>Doing a simple <code>ceph orch apply osd --all-available-devices</code> works just fine, but service specs utilising size specifiers will fail to apply. For example:</p>
<pre>
service_id: at_least_8g
service_type: osd
placement:
host_pattern: '*'
spec:
data_devices:
size: '8G:'
</pre>
<p>Applying the above will give the following error in <code>ceph log last cephadm</code>:</p>
<pre>
ceph.deployment.drive_group.DriveGroupValidationError: Failed to validate OSD spec "at_least_8g.data_devices": Unit 'KB' not supported
</pre>
<p>The problem is that the SizeMatcher class only understands MB, GB and TB. When presented with a disk whose size is expressed in KB, it doesn't know what to do with it.</p> Ceph - Bug #57497 (Pending Backport): openSUSE Leap 15.x needs to explicitly specify gcc-11https://tracker.ceph.com/issues/574972022-09-12T01:06:36ZTim Serongtserong@suse.com
<p>This is a recurrence of <a class="external" href="https://tracker.ceph.com/issues/55237">https://tracker.ceph.com/issues/55237</a>. I wrote <a class="external" href="https://github.com/ceph/ceph/commit/80949babab4">https://github.com/ceph/ceph/commit/80949babab4</a> to use gcc-c++ >= 11 on SUSE distros, which works fine on Tumbleweed (our latest and greatest), but doesn't work on openSUSE Leap 15, which has gcc 11, but not packaged in a way in which that nice neat >= requirement works. So I need to re-instate part of <a class="external" href="https://github.com/ceph/ceph/pull/45845/commits/8ab5d7eea07">https://github.com/ceph/ceph/pull/45845/commits/8ab5d7eea07</a></p> Ceph - Bug #57390 (Pending Backport): denc-mod-osd.so: undefined symbol _ZN4ceph25ErasureCodePlug...https://tracker.ceph.com/issues/573902022-09-02T08:42:22ZTim Serongtserong@suse.com
<p>When running <code>ceph-dencoder</code> on openSUSE Tumbleweed (built with GCC 12 and LTO, in case that's relevant), I get the following failure:</p>
<pre>
# ceph-dencoder
failed to dlopen("/usr/lib64/ceph/denc/denc-mod-osd.so"): /usr/lib64/ceph/denc/denc-mod-osd.so: undefined symbol: _ZN4ceph25ErasureCodePluginRegistry9singletonE
-h for help
</pre>
<p>This is fixable by adding "erasure_code" to denc-mod-osd's target_link_libraries.</p> Ceph - Bug #56658 (Resolved): build: cephfs-shell fails to build/install with python setuptools >...https://tracker.ceph.com/issues/566582022-07-21T07:56:14ZTim Serongtserong@suse.com
<p>python setuptools v61 changed package discovery so that if it finds what it thinks are multiple top-level packages in a directory, it will fail to build. This was introduced by <a class="external" href="https://github.com/pypa/setuptools/pull/3177">https://github.com/pypa/setuptools/pull/3177</a>, and causes the ceph RPM build to fail with:</p>
<pre>
...
[ 9562s] error: Multiple top-level packages discovered in a flat-layout: ['top', 'CMakeFiles'].
[ 9562s]
[ 9562s] To avoid accidental inclusion of unwanted files or directories,
[ 9562s] setuptools will not proceed with this build.
[ 9562s]
[ 9562s] If you are trying to create a single distribution with multiple packages
[ 9562s] on purpose, you should not rely on automatic discovery.
[ 9562s] Instead, consider the following options:
[ 9562s]
[ 9562s] 1. set up custom discovery (`find` directive with `include` or `exclude`)
[ 9562s] 2. use a `src-layout`
[ 9562s] 3. explicitly set `py_modules` or `packages` with a list of names
[ 9562s]
[ 9562s] To find more information, look for "package discovery" on setuptools docs.
...
[ 9833s] RPM build errors:
[ 9833s] File not found: /home/abuild/rpmbuild/BUILDROOT/ceph-16.2.9.158+gd93952c7eea-2.3.x86_64/usr/lib/python3.10/site-packages/cephfs_shell-*.egg-info
[ 9833s] File not found: /home/abuild/rpmbuild/BUILDROOT/ceph-16.2.9.158+gd93952c7eea-2.3.x86_64/usr/bin/cephfs-shell
</pre>
<p>This has been fixed in Fedora downstream by moving a/src/tools/cephfs/cephfs-shell to a separate subdirectory (see <a class="external" href="https://src.fedoraproject.org/rpms/ceph/blob/rawhide/f/0021-cephfs-shell.patch">https://src.fedoraproject.org/rpms/ceph/blob/rawhide/f/0021-cephfs-shell.patch</a>). I've confirmed this approach also works for openSUSE.</p> Ceph - Bug #56466 (Resolved): pacific: boost 1.73.0 is incompatible with python 3.10https://tracker.ceph.com/issues/564662022-07-05T05:54:10ZTim Serongtserong@suse.com
<p>Ceph pacific includes boost 1.73.0, which uses the <code>_Py_fopen()</code> function, which is no longer available in python 3.10. This means it's not possible to build ceph pacific RPMs against python 3.10. Builds will fail with:</p>
<pre>[ 182s] libs/python/src/exec.cpp: In function 'boost::python::api::object boost::python::exec_file(const char*, api::object, api::object)':
[ 182s] libs/python/src/exec.cpp:109:14: error: '_Py_fopen' was not declared in this scope; did you mean '_Py_wfopen'?
[ 182s] 109 | FILE *fs = _Py_fopen(f, "r");
[ 182s] | ^~~~~~~~~
[ 182s] | _Py_wfopen
</pre>
<p>This is not a problem with quincy or newer, as those use boost 1.75.0, which includes a patch to switches to using fopen() for python versions >= 3.1.</p> sepia - Support #55535 (Resolved): Sepia Lab Access Requesthttps://tracker.ceph.com/issues/555352022-05-04T04:39:55ZTim Serongtserong@suse.com
<p>1) Do you just need VPN access or will you also be running teuthology jobs?</p>
<p>Both</p>
<p>2) Desired Username:</p>
<p>tserong</p>
<p>3) Alternate e-mail address(es) we can reach you at:</p>
<p><a class="email" href="mailto:tserong@suse.com">tserong@suse.com</a></p>
<p>4) If you don't already have an established history of code contributions to Ceph, is there an existing community or core developer you've worked with who has reviewed your work and can vouch for your access request?</p>
<p style="padding-left:2em;">If you answered "No" to # 4, please answer the following (paste directly below the question to keep indentation):</p>
<p style="padding-left:2em;">4a) Paste a link to a Blueprint or planning doc of yours that was reviewed at a Ceph Developer Monthly.</p>
<p style="padding-left:2em;">4b) Paste a link to an accepted pull request for a major patch or feature.</p>
<p style="padding-left:2em;">4c) If applicable, include a link to the current project (planning doc, dev branch, or pull request) that you are looking to test.</p>
<p>5) Paste your SSH public key(s) between the <code>pre</code> tags<br /><pre>ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAs6Qm1chlR0DVvDfkjqm7t+mjmXS/AkMBblcruV1qFTOYi4zkryJiqIUe/41gmDlMi4Nqa7dzgAoS/OPMCIXFE+XxyuhggvAa87YxDZCkE5XjZ54jbbSdRYn/xa/S3yLLAXhd2J8hfcBj2qmmosV/iT0pWQm5lXD7fQBqLiSt/5DyLzOFVL45oDYy2mN2u5VWBWrKXMPzCF2HQQWv4gISLBY1gF6WozNDChOrmujK6buMjdA29IbrjTZ1YykngeK8fCAxPTaXWdDxbY5fxPF1dw6xAWz3dvtre2OIcm1Z5Au9yuiMCecA5nb+OxFho8zOhwv1+bQmqpcM4kfjWYb33Q== tserong@suse.com</pre></p>
<p>6) Paste your hashed VPN credentials between the <code>pre</code> tags (Format: <code>user@hostname 22CharacterSalt 65CharacterHashedPassword</code>)<br /><pre>tserong@thor VhPmkUKJdgNE/RJBN0nZvA 11671c34eb25c5fdff7f0002de197c1e72922a0e2122901d062a6d3ed758a9cf</pre></p> Ceph - Bug #55237 (Resolved): rpm: openSUSE build fails - needs explicit gcc version, also can't ...https://tracker.ceph.com/issues/552372022-04-08T06:23:06ZTim Serongtserong@suse.com
<p>Two issues here which are strictly speaking unrelated, but I thought it'd be less annoying to just fix the openSUSE build with one bug.</p>
<p>Issue 1: openSUSE Leap 15.3 and 15.4 use gcc 7 by default, which is not new enough to build ceph. Both distros do provide gcc 11, but we have to explicitly request that version if we want to use it.</p>
<p>Issue 2: Parquet, which in turn requires Arrow, can't currently be built for openSUSE. The problem here is that we don't have those dependencies packaged as RPMs, and when trying to build Arrow out of the submodule in the ceph source tree, one of its dependencies (xsimd) tries to download source from the internet, which doesn't work in the openSUSE Build Service (build workers have no internet access).</p> Ceph - Bug #55087 (Resolved): rpm: openSUSE needs libthrift-devel, not thrift-develhttps://tracker.ceph.com/issues/550872022-03-28T09:42:57ZTim Serongtserong@suse.com
<p>In <a class="external" href="https://github.com/ceph/ceph/pull/38783">https://github.com/ceph/ceph/pull/38783</a>, <a class="external" href="https://github.com/ceph/ceph/pull/38783/commits/80e82686eba">https://github.com/ceph/ceph/pull/38783/commits/80e82686eba</a> added "thrift-devel >= 0.13.0" as a BuildRequires. On SUSE distros, this package is named libthrift-devel, so we need an <code>%if 0%{?suse_version}</code> block around that one.</p> Ceph - Bug #55079 (Pending Backport): rpm: remove contents of build directory at end of %install ...https://tracker.ceph.com/issues/550792022-03-28T04:02:45ZTim Serongtserong@suse.com
<p>I've been doing some measurements of disk usage during SUSE RPM builds (of Pacific, but this should roughly apply for newer Cephs too). In our particular build environment, which builds everything in VMs, we see something like this:</p>
<pre>
Filesystem Size Used Avail Use% Mounted on
df start of build: /dev/vda 53G 14G 40G 25% /
df end of build: /dev/vda 53G 31G 23G 58% /
df end of install: /dev/vda 53G 39G 15G 74% /
df before clamscan: /dev/vda 53G 41G 13G 78% /
df after clamscan: /dev/vda 53G 50G 3.9G 93% /
</pre>
<p>So after compiling everything, we've consumed about 17GB (that's all the binaries and object files and whatnot that end up in the "build" directory in the source tree). Then, after %install (which installs everything in the build root, ready to be turned into actual RPMs), we've used another 8GB. The next part - the clamscan bit - is one of the rpmlint checks SUSE runs, which takes another 9G when it extracts all the built RPMs (including debuginfo RPMs), in order to scan them.</p>
<p>In summary, our build worker VMs currently need a bit over 50G disk to build Ceph.</p>
<p>If I add <code>`rm -rf build`</code> to the very end of the <span>install section, to get rid of the 17GB of built binaries, we go into clamscan with 24G free, rather than 41G free, and when clamscan finishes we're using 32G. This means the peak build disk usage with that change is about 39G, so we reduce our build worker's disk space requirements by about 11G (or 20</span>%).</p> Dashboard - Bug #54215 (Resolved): mgr/dashboard: "Please expand your cluster first" shouldn't be...https://tracker.ceph.com/issues/542152022-02-09T04:12:58ZTim Serongtserong@suse.com
<a name="Description-of-problem"></a>
<h3 >Description of problem<a href="#Description-of-problem" class="wiki-anchor">¶</a></h3>
<p><a class="external" href="https://tracker.ceph.com/issues/50336">https://tracker.ceph.com/issues/50336</a> introduced a neat "Create Cluster" workflow, to help set up new clusters. When you first log in to the dashboard you're prompted to expand the cluster (or skip that step). IMO this screen should not be present at all, for clusters that have already been "expanded", for example if I've already used `ceph orch` to create OSDs, add hosts, etc. this step is redundant - I shouldn't have to click "skip", it should just not be there in the first place. Likewise if I'm upgrading from an earlier (pre-Pacific) release.</p>
<a name="Environment"></a>
<h3 >Environment<a href="#Environment" class="wiki-anchor">¶</a></h3>
<ul>
<li><code>ceph version</code> string: 16.2.7-37-gb3be69440db</li>
<li>Platform (OS/distro/release): SUSE Linux Enterprise Server 15 SP3</li>
<li>Cluster details (nodes, monitors, OSDs): 4 nodes, 3 mons, 8 OSDs</li>
<li>Did it happen on a stable environment or after a migration/upgrade?: seen after an upgrade from Octopus</li>
<li>Browser used (e.g.: <code>Version 86.0.4240.198 (Official Build) (64-bit)</code>): Firefox 96.0.1 64 bit</li>
</ul>
<a name="How-reproducible"></a>
<h3 >How reproducible<a href="#How-reproducible" class="wiki-anchor">¶</a></h3>
<p>Steps:</p>
<ol>
<li>Deploy Octopus</li>
<li>Configure the cluster (add some hosts, OSDs etc.)</li>
<li>Upgrade to Pacific</li>
<li>Log in to the dashboard</li>
</ol>
<p>or:</p>
<ol>
<li>Deploy Pacific</li>
<li>Use `ceph orch` to add hosts, deploy OSDs, ...</li>
<li>Log in to the dashboard</li>
</ol>
<a name="Actual-results"></a>
<h3 >Actual results<a href="#Actual-results" class="wiki-anchor">¶</a></h3>
<p>I see a screen that says "Welcome to Ceph Dashboard - Please expand your cluster first"</p>
<a name="Expected-results"></a>
<h3 >Expected results<a href="#Expected-results" class="wiki-anchor">¶</a></h3>
<p>I see the regular status screen</p>
<a name="Additional-info"></a>
<h3 >Additional info<a href="#Additional-info" class="wiki-anchor">¶</a></h3>
<p>Would it be enough to add a check to see if there's > 1 node and/or > 0 OSDs, and in this case assume we don't need to show this screen?</p>