Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2023-06-06T10:51:16ZCeph
Redmine Ceph - Bug #61598 (New): gcc-14: FTBFS "error: call to non-'constexpr' function 'virtual unsigned...https://tracker.ceph.com/issues/615982023-06-06T10:51:16ZTim Serongtserong@suse.com
<p>gcc 14 has introduced a change which results in ceph build failures:</p>
<pre>
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/osd/osd_types.h: In lambda function:
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/common/dout.h:184:73: error: call to non-'constexpr' function 'virtual unsigned int DoutPrefixProvider::get_subsys() const'
[ 270s] 184 | dout_impl(pdpp->get_cct(), ceph::dout::need_dynamic(pdpp->get_subsys()), v) \
[ 270s] | ~~~~~~~~~~~~~~~~^~
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/common/dout.h:155:58: note: in definition of macro 'dout_impl'
[ 270s] 155 | return (cctX->_conf->subsys.template should_gather<sub, v>()); \
[ 270s] | ^~~
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/osd/osd_types.h:3618:3: note: in expansion of macro 'ldpp_dout'
[ 270s] 3618 | ldpp_dout(dpp, 10) << "build_prior all_probe " << all_probe << dendl;
[ 270s] | ^~~~~~~~~
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/common/dout.h:51:20: note: 'virtual unsigned int DoutPrefixProvider::get_subsys() const' declared here
[ 270s] 51 | virtual unsigned get_subsys() const = 0;
[ 270s] | ^~~~~~~~~~
</pre>
<p>The gcc change is described at <a class="external" href="https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617196.html">https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617196.html</a>.</p>
<p>The ceph FTBFS was mentioned in a followup post at <a class="external" href="https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618384.html">https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618384.html</a>, and apparently this failure is now expected, as <code> DoutPrefixProvider::get_subsys()</code> isn't declared <code>constexpr</code> but really should be.</p>
<p>I tried to fix this experimentally by simply declaring <code>constexpr get_subsys()</code>, e.g.:</p>
<pre>
diff --git a/src/common/dout.h b/src/common/dout.h
index a1375fbb910..6e91750708a 100644
--- a/src/common/dout.h
+++ b/src/common/dout.h
@@ -61,7 +61,7 @@ class NoDoutPrefix : public DoutPrefixProvider {
std::ostream& gen_prefix(std::ostream& out) const override { return out; }
CephContext *get_cct() const override { return cct; }
- unsigned get_subsys() const override { return subsys; }
+ constexpr unsigned get_subsys() const override { return subsys; }
};
// a prefix provider with static (const char*) prefix
@@ -88,7 +88,7 @@ class DoutPrefixPipe : public DoutPrefixProvider {
return out;
}
CephContext *get_cct() const override { return dpp.get_cct(); }
- unsigned get_subsys() const override { return dpp.get_subsys(); }
+ constexpr unsigned get_subsys() const override { return dpp.get_subsys(); }
virtual void add_prefix(std::ostream& out) const = 0;
};
</pre>
<p>...but that has some problems:</p>
<p>1) Instead of an outright build failure, I get <code>warning: virtual functions cannot be 'constexpr' before C++20 [-Winvalid-constexpr]</code>. I imaging this is undesirable.<br />2) Even if 1 <em>is</em> desirable, there's plenty of other subclasses of <code>DoutPrefixProvider</code> which would all <em>also</em> need to have their <code>get_subsys()</code> methods declared <code>conxtexpr</code> for the build to complete.</p>
<p>TBH the whole <code>dout</code> thing is black magic to me, so I could really use some assistance with how best to fix this.</p> Ceph - Bug #58501 (Resolved): ceph.spec.in: need to replace SUSE usrmerged macro with version checkhttps://tracker.ceph.com/issues/585012023-01-19T07:23:05ZTim Serongtserong@suse.com
<p><a class="external" href="https://github.com/ceph/ceph/commit/e4c4a4ce97fff8a5b4efa747d9cffeabcceedd25">https://github.com/ceph/ceph/commit/e4c4a4ce97fff8a5b4efa747d9cffeabcceedd25</a> introduced the use of the <code>usrmerged</code> macro on SUSE distros to guard against installing the /sbin/mount.ceph symlink. This macro has since been deprecated and should be replaced with a version check instead (<code>%if 0%{?suse_version} < 1550</code>). See <a class="external" href="https://en.opensuse.org/openSUSE:Usr_merge">https://en.opensuse.org/openSUSE:Usr_merge</a> for more details.</p> Ceph - Bug #57967 (Resolved): ceph-crash service should run as unprivileged user, not root (CVE-2...https://tracker.ceph.com/issues/579672022-11-03T05:11:53ZTim Serongtserong@suse.com
<p>As reported at <a class="external" href="https://www.openwall.com/lists/oss-security/2022/10/25/1">https://www.openwall.com/lists/oss-security/2022/10/25/1</a>, ceph-crash runs as root, which makes it vulnerable to a potential ceph user to root privilege escalation. This is fixable by making the ceph-crash process drop privileges and run as the ceph user, just as the other ceph daemons do.</p> Ceph - Bug #57893 (Pending Backport): make-dist creates ceph.spec with incorrect Release tag for ...https://tracker.ceph.com/issues/578932022-10-19T08:04:36ZTim Serongtserong@suse.com
<p><code>ceph.spec.in</code> says:</p>
<pre>
Name: ceph
Version: @PROJECT_VERSION@
Release: @RPM_RELEASE@%{?dist}
%if 0%{?fedora} || 0%{?rhel}
Epoch: 2
%endif
</pre>
<p>When the <code>make-dist</code> script generates the final <code>ceph.spec</code> file for RPM builds, it will set PROJECT_VERSION to the version from the latest tag (e.g.: 17.0.0), and set RPM_RELEASE to the number of additional commits plus the last commit hash (e.g.: 14883.gc49b81c7d61). This doesn't work properly when building in SUSE's Open Build Service, because OBS overwrites the Release tag with checkin and build counters (see <a class="external" href="https://en.opensuse.org/openSUSE:Package_versioning_guidelines">https://en.opensuse.org/openSUSE:Package_versioning_guidelines</a>).</p>
<p>We've long carried a downstream patch for <code>make-dist</code> to fix this, by putting everything in PROJECT_VERSION, so you end up with something like <code>Version: 17.0.0.14883+gc49b81c7d61</code> (see <a class="external" href="https://github.com/SUSE/ceph/commit/9ee636cdca3">https://github.com/SUSE/ceph/commit/9ee636cdca3</a>), so I figure I should really submit that upstream.</p> Ceph - Bug #57860 (Pending Backport): disable system_pmdk on s390x for SUSE distroshttps://tracker.ceph.com/issues/578602022-10-13T04:28:31ZTim Serongtserong@suse.com
<p>Same as <a class="external" href="https://tracker.ceph.com/issues/56491">https://tracker.ceph.com/issues/56491</a> which addressed RHEL and Fedora not shipping libpmem on s390x, but for SUSE.</p> Ceph - Bug #57497 (Pending Backport): openSUSE Leap 15.x needs to explicitly specify gcc-11https://tracker.ceph.com/issues/574972022-09-12T01:06:36ZTim Serongtserong@suse.com
<p>This is a recurrence of <a class="external" href="https://tracker.ceph.com/issues/55237">https://tracker.ceph.com/issues/55237</a>. I wrote <a class="external" href="https://github.com/ceph/ceph/commit/80949babab4">https://github.com/ceph/ceph/commit/80949babab4</a> to use gcc-c++ >= 11 on SUSE distros, which works fine on Tumbleweed (our latest and greatest), but doesn't work on openSUSE Leap 15, which has gcc 11, but not packaged in a way in which that nice neat >= requirement works. So I need to re-instate part of <a class="external" href="https://github.com/ceph/ceph/pull/45845/commits/8ab5d7eea07">https://github.com/ceph/ceph/pull/45845/commits/8ab5d7eea07</a></p> Ceph - Bug #57390 (Pending Backport): denc-mod-osd.so: undefined symbol _ZN4ceph25ErasureCodePlug...https://tracker.ceph.com/issues/573902022-09-02T08:42:22ZTim Serongtserong@suse.com
<p>When running <code>ceph-dencoder</code> on openSUSE Tumbleweed (built with GCC 12 and LTO, in case that's relevant), I get the following failure:</p>
<pre>
# ceph-dencoder
failed to dlopen("/usr/lib64/ceph/denc/denc-mod-osd.so"): /usr/lib64/ceph/denc/denc-mod-osd.so: undefined symbol: _ZN4ceph25ErasureCodePluginRegistry9singletonE
-h for help
</pre>
<p>This is fixable by adding "erasure_code" to denc-mod-osd's target_link_libraries.</p> Ceph - Bug #56658 (Resolved): build: cephfs-shell fails to build/install with python setuptools >...https://tracker.ceph.com/issues/566582022-07-21T07:56:14ZTim Serongtserong@suse.com
<p>python setuptools v61 changed package discovery so that if it finds what it thinks are multiple top-level packages in a directory, it will fail to build. This was introduced by <a class="external" href="https://github.com/pypa/setuptools/pull/3177">https://github.com/pypa/setuptools/pull/3177</a>, and causes the ceph RPM build to fail with:</p>
<pre>
...
[ 9562s] error: Multiple top-level packages discovered in a flat-layout: ['top', 'CMakeFiles'].
[ 9562s]
[ 9562s] To avoid accidental inclusion of unwanted files or directories,
[ 9562s] setuptools will not proceed with this build.
[ 9562s]
[ 9562s] If you are trying to create a single distribution with multiple packages
[ 9562s] on purpose, you should not rely on automatic discovery.
[ 9562s] Instead, consider the following options:
[ 9562s]
[ 9562s] 1. set up custom discovery (`find` directive with `include` or `exclude`)
[ 9562s] 2. use a `src-layout`
[ 9562s] 3. explicitly set `py_modules` or `packages` with a list of names
[ 9562s]
[ 9562s] To find more information, look for "package discovery" on setuptools docs.
...
[ 9833s] RPM build errors:
[ 9833s] File not found: /home/abuild/rpmbuild/BUILDROOT/ceph-16.2.9.158+gd93952c7eea-2.3.x86_64/usr/lib/python3.10/site-packages/cephfs_shell-*.egg-info
[ 9833s] File not found: /home/abuild/rpmbuild/BUILDROOT/ceph-16.2.9.158+gd93952c7eea-2.3.x86_64/usr/bin/cephfs-shell
</pre>
<p>This has been fixed in Fedora downstream by moving a/src/tools/cephfs/cephfs-shell to a separate subdirectory (see <a class="external" href="https://src.fedoraproject.org/rpms/ceph/blob/rawhide/f/0021-cephfs-shell.patch">https://src.fedoraproject.org/rpms/ceph/blob/rawhide/f/0021-cephfs-shell.patch</a>). I've confirmed this approach also works for openSUSE.</p> Ceph - Bug #55079 (Pending Backport): rpm: remove contents of build directory at end of %install ...https://tracker.ceph.com/issues/550792022-03-28T04:02:45ZTim Serongtserong@suse.com
<p>I've been doing some measurements of disk usage during SUSE RPM builds (of Pacific, but this should roughly apply for newer Cephs too). In our particular build environment, which builds everything in VMs, we see something like this:</p>
<pre>
Filesystem Size Used Avail Use% Mounted on
df start of build: /dev/vda 53G 14G 40G 25% /
df end of build: /dev/vda 53G 31G 23G 58% /
df end of install: /dev/vda 53G 39G 15G 74% /
df before clamscan: /dev/vda 53G 41G 13G 78% /
df after clamscan: /dev/vda 53G 50G 3.9G 93% /
</pre>
<p>So after compiling everything, we've consumed about 17GB (that's all the binaries and object files and whatnot that end up in the "build" directory in the source tree). Then, after %install (which installs everything in the build root, ready to be turned into actual RPMs), we've used another 8GB. The next part - the clamscan bit - is one of the rpmlint checks SUSE runs, which takes another 9G when it extracts all the built RPMs (including debuginfo RPMs), in order to scan them.</p>
<p>In summary, our build worker VMs currently need a bit over 50G disk to build Ceph.</p>
<p>If I add <code>`rm -rf build`</code> to the very end of the <span>install section, to get rid of the 17GB of built binaries, we go into clamscan with 24G free, rather than 41G free, and when clamscan finishes we're using 32G. This means the peak build disk usage with that change is about 39G, so we reduce our build worker's disk space requirements by about 11G (or 20</span>%).</p> Orchestrator - Feature #45996 (New): adopted prometheus instance uses port 9095, regardless of or...https://tracker.ceph.com/issues/459962020-06-15T11:13:22ZTim Serongtserong@suse.com
<p>When adopting prometheus (<code>cephadm adopt --style legacy --name prometheus.HOSTNAME</code>), the new prometheus daemon starts listening on port 9095, regardless of what port the original daemon was running on. This is a problem for upgrades, as if you have an existing grafana instance it will still be looking at the old prometheus port number.</p> Ceph - Bug #37503 (Resolved): Audit log: mgr module passwords set on CLI written as plaintext in ...https://tracker.ceph.com/issues/375032018-12-03T10:57:04ZTim Serongtserong@suse.com
<p>A number of mgr modules need passwords set for one reason or another, either to authenticate with external systems (deepsea, influx, diskprediction), or to define credentials for users of those modules (dashboard, restful).</p>
<p>In all cases, these passwords are set from the command line, either via module-specific commands (<code>`ceph dashboard ac-user-create`</code>, <code>`deepsea config-set salt_api_password`</code>, etc.) or via <code>`ceph config set`</code> with some particular key (e.g.: mgr/influx/passsword)</p>
<p>All module-specific commands go through <code>DaemonServer::_handle_command()</code>, which then logs the command via <code>audit_clog->debug()</code> (or <code>audit_clog->info()</code> in case of access denied). This all ends up written to <code>/var/log/ceph/ceph-mgr.$ID.log</code>, which is world-readable, e.g.:</p>
<pre>
2018-12-03 10:45:28.864 7f67e7f8f700 0 log_channel(audit) log [DBG] : from='client.343880 172.16.1.254:39896/3560370796' entity='client.admin' cmd=[{"prefix": "deepsea config-set", "key": "salt_api_password", "value": "foo", "target": ["mgr", ""]}]: dispatch
</pre>
<p>Additionally, anything that results in a "config set" lands in the mon log, e.g.:</p>
<pre>
2018-12-03 10:45:28.881552 [INF] from='mgr.295252 172.16.1.21:56636/175641' entity='mgr.data1' cmd='[{"prefix":"config set","who":"mgr","name":"mgr/deepsea/salt_api_password","value":"foo"}]': finished
</pre>
<p>This also appears in the Audit log in the Dashboard.</p>
<p>Some things that land in the mon log probably don't matter; for any module that hashes passwords before saving them, only the hashed password should land in the mon log. But there's still the problem of the CLI commands in the mgr log, and in any case, modules that need to authenticate with external services will need to store plaintext passwords.</p>
<p>ISTM we need to either never log these things, or somehow keep the command logging, but filter the passwords out, so it renders the value as "*****" instead of the actual password.</p>
<p>I'm not sure how best to approach this, given the way command logging is structured. At the point commands are logged, the commands themselves are just strings. Admittedly, they're strings of JSON, but they're effectively opaque at that point - we'd have to parse the JSON, then look for things that might be passwords, blank them out, and turn the whole lot back into a string. Yuck.</p> mgr - Bug #37377 (New): ceph-mgr/influx: verify "no metadata" fix is completehttps://tracker.ceph.com/issues/373772018-11-23T10:11:07ZTim Serongtserong@suse.com
<p>Seen while reviewing <a class="external" href="https://github.com/ceph/ceph/pull/25184">https://github.com/ceph/ceph/pull/25184</a>. The fix for <a class="external" href="http://tracker.ceph.com/issues/25191">http://tracker.ceph.com/issues/25191</a> in <a class="external" href="https://github.com/ceph/ceph/pull/22794">https://github.com/ceph/ceph/pull/22794</a> is applied to the get_pg_summary() function, but not to the get_daemon_stats() function. We need to verify whether this fix should also be applied to the latter function (my guess is "yes", but I don't know for certain).</p> Ceph - Bug #35906 (Resolved): ceph-disk: is_mounted() returns None for mounted OSDs with Python 3https://tracker.ceph.com/issues/359062018-09-10T11:10:49ZTim Serongtserong@suse.com
<p>`ceph-disk list --format=json` on python 3 gives null for the mount member, even for mounted OSDs, e.g.:</p>
<pre>
# ceph-disk list --format=json|json_pp
...
{
"path" : "/dev/vdg",
"partitions" : [
{
"whoami" : "23",
"is_partition" : true,
"path" : "/dev/vdg1",
"ceph_fsid" : "00296336-7bf2-43f1-a48c-24c7212bf478",
"dmcrypt" : {},
"uuid" : "b447f027-f116-47d0-9cd1-ca2348e8e3db",
"block_uuid" : "dfaf6613-f958-497a-9dfb-ad343e897639",
"block_dev" : "/dev/vdg2",
"type" : "data",
*"mount" : null,*
"ptype" : "4fbd7e29-9d25-41b8-afd0-062c0ceff05d",
"magic" : "ceph osd volume v026",
"cluster" : "ceph",
"state" : "prepared",
"fs_type" : "xfs"
},
{
"type" : "block",
"is_partition" : true,
"path" : "/dev/vdg2",
"ptype" : "cafecafe-9b03-4f30-b4c6-b4b80ceff106",
"block_for" : "/dev/vdg1",
"dmcrypt" : {},
"uuid" : "dfaf6613-f958-497a-9dfb-ad343e897639"
}
]
}
...
</pre> Ceph - Bug #18163 (Resolved): platform.linux_distribution() is deprecated; stop using ithttps://tracker.ceph.com/issues/181632016-12-07T04:10:56ZTim Serongtserong@suse.com
<p>platform.linux_distribution() is deprecated, so we should stop using it. Notably it uses /etc/SuSE-release on SUSE systems, and the latest SUSE versions don't ship this file; instead they ship /etc/os-release, which platform.linux_distribution() doesn't know about, so it returns ('','','').</p>
<p>AFAICT, platform.linux_distribution() is currently used by ceph-detect-init, which in turn is used by ceph-disk. If ceph-detect-init can't determine the distro because it sees ('','',''), this results in ceph-disk always tagging the init system as sysvinit.</p>
<p>There are also platform.linux_distribution() invocations in qa/workunits/ceph-disk/ceph-disk-no-lockbox and src/ceph-disk/ceph_disk/main.py, but they look like dead code to me.</p>
<p>See also bug <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: platform.linux_distribution() is deprecated; stop using it (Resolved)" href="https://tracker.ceph.com/issues/18141">#18141</a></p> Ceph - Bug #14864 (Resolved): ceph-detect-init requires python-setuptools at runtimehttps://tracker.ceph.com/issues/148642016-02-25T14:58:49ZTim Serongtserong@suse.com
<p>Testing a reasonably recent ceph-10.0.2 on openSUSE Leap 42.1, my OSDs weren't mounting. I tracked this back to /usr/lib/systemd/system/ceph-disk@.service which invokes `flock /var/lock/ceph-disk /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f`. This in turn results in:</p>
<pre>
ceph-disk: main_trigger: Namespace(dev='/dev/sdb1', func=<function main_trigger at 0x7fa6ebf6b050>, log_stdout=True, prepend_to_path='/usr/bin', prog='ceph-disk', statedir='/var/lib/ceph', sync=True, sysconfdir='/etc/ceph', verbose=True)
ceph-disk: Running command: /sbin/init --version
/sbin/init: unrecognized option '--version'
ceph-disk: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid
ceph-disk: Running command: /usr/sbin/sgdisk -i 1 /dev/sdb
ceph-disk: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid
ceph-disk: Running command: /usr/sbin/sgdisk -i 1 /dev/sdb
ceph-disk: trigger /dev/sdb1 parttype 4fbd7e29-9d25-41b8-afd0-062c0ceff05d uuid 93b72ed5-7d84-4b0b-a227-330fcd22513e
ceph-disk: Running command: /usr/sbin/ceph-disk activate /dev/sdb1
Traceback (most recent call last):
File "/usr/bin/ceph-detect-init", line 5, in <module>
from pkg_resources import load_entry_point
ImportError: No module named pkg_resources
ERROR:ceph-disk:Failed to activate
Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 4036, in <module>
main(sys.argv[1:])
File "/usr/sbin/ceph-disk", line 3992, in main
main_catch(args.func, args)
File "/usr/sbin/ceph-disk", line 4014, in main_catch
func(args)
File "/usr/sbin/ceph-disk", line 2530, in main_activate
reactivate=args.reactivate,
File "/usr/sbin/ceph-disk", line 2296, in mount_activate
(osd_id, cluster) = activate(path, activate_key_template, init)
File "/usr/sbin/ceph-disk", line 2477, in activate
init = init_get()
File "/usr/sbin/ceph-disk", line 799, in init_get
'--default', 'sysvinit',
File "/usr/sbin/ceph-disk", line 902, in _check_output
raise error
subprocess.CalledProcessError: Command '/usr/bin/ceph-detect-init' returned non-zero exit status 1
</pre>
<p>The important part is:</p>
<pre>
Traceback (most recent call last):
File "/usr/bin/ceph-detect-init", line 5, in <module>
from pkg_resources import load_entry_point
ImportError: No module named pkg_resources
</pre>
<p>This is fixable by installing python-setuptools, suggesting that package needs to be added to the RPM Requires and, I assume, the Debian Depends.</p>