Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2023-06-06T10:51:16ZCeph
Redmine Ceph - Bug #61598 (New): gcc-14: FTBFS "error: call to non-'constexpr' function 'virtual unsigned...https://tracker.ceph.com/issues/615982023-06-06T10:51:16ZTim Serongtserong@suse.com
<p>gcc 14 has introduced a change which results in ceph build failures:</p>
<pre>
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/osd/osd_types.h: In lambda function:
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/common/dout.h:184:73: error: call to non-'constexpr' function 'virtual unsigned int DoutPrefixProvider::get_subsys() const'
[ 270s] 184 | dout_impl(pdpp->get_cct(), ceph::dout::need_dynamic(pdpp->get_subsys()), v) \
[ 270s] | ~~~~~~~~~~~~~~~~^~
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/common/dout.h:155:58: note: in definition of macro 'dout_impl'
[ 270s] 155 | return (cctX->_conf->subsys.template should_gather<sub, v>()); \
[ 270s] | ^~~
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/osd/osd_types.h:3618:3: note: in expansion of macro 'ldpp_dout'
[ 270s] 3618 | ldpp_dout(dpp, 10) << "build_prior all_probe " << all_probe << dendl;
[ 270s] | ^~~~~~~~~
[ 270s] /home/abuild/rpmbuild/BUILD/ceph-18.0.0-4135-g87cd54281c8/src/common/dout.h:51:20: note: 'virtual unsigned int DoutPrefixProvider::get_subsys() const' declared here
[ 270s] 51 | virtual unsigned get_subsys() const = 0;
[ 270s] | ^~~~~~~~~~
</pre>
<p>The gcc change is described at <a class="external" href="https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617196.html">https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617196.html</a>.</p>
<p>The ceph FTBFS was mentioned in a followup post at <a class="external" href="https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618384.html">https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618384.html</a>, and apparently this failure is now expected, as <code> DoutPrefixProvider::get_subsys()</code> isn't declared <code>constexpr</code> but really should be.</p>
<p>I tried to fix this experimentally by simply declaring <code>constexpr get_subsys()</code>, e.g.:</p>
<pre>
diff --git a/src/common/dout.h b/src/common/dout.h
index a1375fbb910..6e91750708a 100644
--- a/src/common/dout.h
+++ b/src/common/dout.h
@@ -61,7 +61,7 @@ class NoDoutPrefix : public DoutPrefixProvider {
std::ostream& gen_prefix(std::ostream& out) const override { return out; }
CephContext *get_cct() const override { return cct; }
- unsigned get_subsys() const override { return subsys; }
+ constexpr unsigned get_subsys() const override { return subsys; }
};
// a prefix provider with static (const char*) prefix
@@ -88,7 +88,7 @@ class DoutPrefixPipe : public DoutPrefixProvider {
return out;
}
CephContext *get_cct() const override { return dpp.get_cct(); }
- unsigned get_subsys() const override { return dpp.get_subsys(); }
+ constexpr unsigned get_subsys() const override { return dpp.get_subsys(); }
virtual void add_prefix(std::ostream& out) const = 0;
};
</pre>
<p>...but that has some problems:</p>
<p>1) Instead of an outright build failure, I get <code>warning: virtual functions cannot be 'constexpr' before C++20 [-Winvalid-constexpr]</code>. I imaging this is undesirable.<br />2) Even if 1 <em>is</em> desirable, there's plenty of other subclasses of <code>DoutPrefixProvider</code> which would all <em>also</em> need to have their <code>get_subsys()</code> methods declared <code>conxtexpr</code> for the build to complete.</p>
<p>TBH the whole <code>dout</code> thing is black magic to me, so I could really use some assistance with how best to fix this.</p> Ceph - Bug #56466 (Resolved): pacific: boost 1.73.0 is incompatible with python 3.10https://tracker.ceph.com/issues/564662022-07-05T05:54:10ZTim Serongtserong@suse.com
<p>Ceph pacific includes boost 1.73.0, which uses the <code>_Py_fopen()</code> function, which is no longer available in python 3.10. This means it's not possible to build ceph pacific RPMs against python 3.10. Builds will fail with:</p>
<pre>[ 182s] libs/python/src/exec.cpp: In function 'boost::python::api::object boost::python::exec_file(const char*, api::object, api::object)':
[ 182s] libs/python/src/exec.cpp:109:14: error: '_Py_fopen' was not declared in this scope; did you mean '_Py_wfopen'?
[ 182s] 109 | FILE *fs = _Py_fopen(f, "r");
[ 182s] | ^~~~~~~~~
[ 182s] | _Py_wfopen
</pre>
<p>This is not a problem with quincy or newer, as those use boost 1.75.0, which includes a patch to switches to using fopen() for python versions >= 3.1.</p> sepia - Support #55535 (Resolved): Sepia Lab Access Requesthttps://tracker.ceph.com/issues/555352022-05-04T04:39:55ZTim Serongtserong@suse.com
<p>1) Do you just need VPN access or will you also be running teuthology jobs?</p>
<p>Both</p>
<p>2) Desired Username:</p>
<p>tserong</p>
<p>3) Alternate e-mail address(es) we can reach you at:</p>
<p><a class="email" href="mailto:tserong@suse.com">tserong@suse.com</a></p>
<p>4) If you don't already have an established history of code contributions to Ceph, is there an existing community or core developer you've worked with who has reviewed your work and can vouch for your access request?</p>
<p style="padding-left:2em;">If you answered "No" to # 4, please answer the following (paste directly below the question to keep indentation):</p>
<p style="padding-left:2em;">4a) Paste a link to a Blueprint or planning doc of yours that was reviewed at a Ceph Developer Monthly.</p>
<p style="padding-left:2em;">4b) Paste a link to an accepted pull request for a major patch or feature.</p>
<p style="padding-left:2em;">4c) If applicable, include a link to the current project (planning doc, dev branch, or pull request) that you are looking to test.</p>
<p>5) Paste your SSH public key(s) between the <code>pre</code> tags<br /><pre>ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAs6Qm1chlR0DVvDfkjqm7t+mjmXS/AkMBblcruV1qFTOYi4zkryJiqIUe/41gmDlMi4Nqa7dzgAoS/OPMCIXFE+XxyuhggvAa87YxDZCkE5XjZ54jbbSdRYn/xa/S3yLLAXhd2J8hfcBj2qmmosV/iT0pWQm5lXD7fQBqLiSt/5DyLzOFVL45oDYy2mN2u5VWBWrKXMPzCF2HQQWv4gISLBY1gF6WozNDChOrmujK6buMjdA29IbrjTZ1YykngeK8fCAxPTaXWdDxbY5fxPF1dw6xAWz3dvtre2OIcm1Z5Au9yuiMCecA5nb+OxFho8zOhwv1+bQmqpcM4kfjWYb33Q== tserong@suse.com</pre></p>
<p>6) Paste your hashed VPN credentials between the <code>pre</code> tags (Format: <code>user@hostname 22CharacterSalt 65CharacterHashedPassword</code>)<br /><pre>tserong@thor VhPmkUKJdgNE/RJBN0nZvA 11671c34eb25c5fdff7f0002de197c1e72922a0e2122901d062a6d3ed758a9cf</pre></p> Ceph - Bug #55087 (Resolved): rpm: openSUSE needs libthrift-devel, not thrift-develhttps://tracker.ceph.com/issues/550872022-03-28T09:42:57ZTim Serongtserong@suse.com
<p>In <a class="external" href="https://github.com/ceph/ceph/pull/38783">https://github.com/ceph/ceph/pull/38783</a>, <a class="external" href="https://github.com/ceph/ceph/pull/38783/commits/80e82686eba">https://github.com/ceph/ceph/pull/38783/commits/80e82686eba</a> added "thrift-devel >= 0.13.0" as a BuildRequires. On SUSE distros, this package is named libthrift-devel, so we need an <code>%if 0%{?suse_version}</code> block around that one.</p> Dashboard - Bug #54215 (Resolved): mgr/dashboard: "Please expand your cluster first" shouldn't be...https://tracker.ceph.com/issues/542152022-02-09T04:12:58ZTim Serongtserong@suse.com
<a name="Description-of-problem"></a>
<h3 >Description of problem<a href="#Description-of-problem" class="wiki-anchor">¶</a></h3>
<p><a class="external" href="https://tracker.ceph.com/issues/50336">https://tracker.ceph.com/issues/50336</a> introduced a neat "Create Cluster" workflow, to help set up new clusters. When you first log in to the dashboard you're prompted to expand the cluster (or skip that step). IMO this screen should not be present at all, for clusters that have already been "expanded", for example if I've already used `ceph orch` to create OSDs, add hosts, etc. this step is redundant - I shouldn't have to click "skip", it should just not be there in the first place. Likewise if I'm upgrading from an earlier (pre-Pacific) release.</p>
<a name="Environment"></a>
<h3 >Environment<a href="#Environment" class="wiki-anchor">¶</a></h3>
<ul>
<li><code>ceph version</code> string: 16.2.7-37-gb3be69440db</li>
<li>Platform (OS/distro/release): SUSE Linux Enterprise Server 15 SP3</li>
<li>Cluster details (nodes, monitors, OSDs): 4 nodes, 3 mons, 8 OSDs</li>
<li>Did it happen on a stable environment or after a migration/upgrade?: seen after an upgrade from Octopus</li>
<li>Browser used (e.g.: <code>Version 86.0.4240.198 (Official Build) (64-bit)</code>): Firefox 96.0.1 64 bit</li>
</ul>
<a name="How-reproducible"></a>
<h3 >How reproducible<a href="#How-reproducible" class="wiki-anchor">¶</a></h3>
<p>Steps:</p>
<ol>
<li>Deploy Octopus</li>
<li>Configure the cluster (add some hosts, OSDs etc.)</li>
<li>Upgrade to Pacific</li>
<li>Log in to the dashboard</li>
</ol>
<p>or:</p>
<ol>
<li>Deploy Pacific</li>
<li>Use `ceph orch` to add hosts, deploy OSDs, ...</li>
<li>Log in to the dashboard</li>
</ol>
<a name="Actual-results"></a>
<h3 >Actual results<a href="#Actual-results" class="wiki-anchor">¶</a></h3>
<p>I see a screen that says "Welcome to Ceph Dashboard - Please expand your cluster first"</p>
<a name="Expected-results"></a>
<h3 >Expected results<a href="#Expected-results" class="wiki-anchor">¶</a></h3>
<p>I see the regular status screen</p>
<a name="Additional-info"></a>
<h3 >Additional info<a href="#Additional-info" class="wiki-anchor">¶</a></h3>
<p>Would it be enough to add a check to see if there's > 1 node and/or > 0 OSDs, and in this case assume we don't need to show this screen?</p> ceph-volume - Bug #53846 (Resolved): ceph-volume should ignore /dev/rbd* deviceshttps://tracker.ceph.com/issues/538462022-01-12T07:10:56ZTim Serongtserong@suse.com
<p>If rbd devices are mapped on ceph cluster nodes (as they may be if you're running an iSCSI gateway for example), then <code>ceph-volume inventory</code> will list those RBD devices, and quite possibly list them as being "available". This causes a couple of problems:</p>
<p>1) Because /dev/rbd0 appears in the list of available devices, the orchestrator will actually try to deploy OSDs on top of those RBD devices. Luckily, this will fail, because the various LVM invocations will die with "Device /dev/rbd0 excluded by a filter", but really we shouldn't even be trying to do this in the first place. Let's not rely on luck ;-)<br />2) It's possible for /dev/rbd* devices to be locked/stuck in such a way that when ceph-volume invokes <code>blkid</code>, it hangs indefinitely (the process ends up in D-state). This can actually block the entire orchestrator, because the orchestrator calls out to cephadm periodically to inventory devices, and the latter tries to acquire a lock, which it can't get because a prior invocation is stuck running <code>ceph-volume inventory</code>.</p>
<p>I suggest we make ceph-volume completely ignore /dev/rbd* when doing a device inventory. I know we had a similar discussion on dev@ceph.io regarding ceph-volume listing, or not listing, GPT devices (see <a class="external" href="https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/N3TK4IO2QYHXIZMQTZ4AMPU5BE56J5MP/#T7UM53WCW2MDD62DDH6KLI4EZXKBXZBY">https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/N3TK4IO2QYHXIZMQTZ4AMPU5BE56J5MP/#T7UM53WCW2MDD62DDH6KLI4EZXKBXZBY</a>) but the difference here is that mapped RBD volumes really <em>aren't</em> part of the host inventory, so IMO should be excluded.</p> Ceph - Bug #53060 (Closed): Unable to load libceph_snappy.so due to undefined symbol _ZTIN6snappy...https://tracker.ceph.com/issues/530602021-10-27T09:21:47ZTim Serongtserong@suse.com
<p>If you try to run Ceph with snappy 1.1.9 installed, <code>ceph status</code> will show HEALTH_WARN, and tell you that your OSDs "have broken BlueStore compression". <code>ceph health detail</code> will tell you that each of your OSDs is "unable to load:snappy". The OSD logs will show something like this:</p>
<pre>
Oct 27 08:55:33 node1 ceph-osd[561817]: load failed dlopen(): "/usr/lib64/ceph/compressor/libceph_snappy.so: undefined symbol: _ZTIN6snappy6SourceE" or "/usr/lib64/ceph/libceph_snappy.so: cannot open shared object file: No such file or directory"
Oct 27 08:55:33 node1 ceph-osd[561817]: create cannot load compressor of type snappy
</pre>
<p>This is because RTTI was disabled in snappy 1.1.9, so the typeinfo for the <code>snappy::Source</code> class - which Ceph's SnappyCompressor creates a subclass of - isn't included in libsnappy.so. Ceph still <em>builds</em> just fine, because the compressors are built as shared libraries. The problem only manifests when our snappy plugin is dlopen()ed at runtime, and then the linker kicks in and can't find that missing symbol.</p>
<p>This would ideally be fixed by getting RTTI re-enabled in snappy, so I've gone ahead and opened <a class="external" href="https://github.com/google/snappy/pull/144">https://github.com/google/snappy/pull/144</a></p> Orchestrator - Bug #51665 (Resolved): document unforunate interactions between cephadm and restri...https://tracker.ceph.com/issues/516652021-07-14T08:41:05ZTim Serongtserong@suse.com
<p>This one is a little obscure, so please bear with me.</p>
<p>If you deploy ceph using ceph-salt, it will invoke <code>cephadm bootstrap [...] --ssh-user cephadm</code>, i.e. it's setting a non-root user for ssh access. That's fine, unless you happen to also have a restrictive /etc/ssh/sshd_config, e.g.: AllowUsers or AllowGroups is specified, and doesn't mention that user/group, in which case ssh access wont't work, and it's not immediately obvious what the problem is.</p>
<p>I don't expect the general ceph docs to cover ceph-salt and its choice of user, but I went looking through the docs looking for mention of cephadm's --ssh-user option, and only found it in the manpage, and also on <a class="external" href="https://docs.ceph.com/en/latest/cephadm/install/">https://docs.ceph.com/en/latest/cephadm/install/</a>, which says "The --ssh-user <strong><user></strong> option makes it possible to choose which ssh user cephadm will use to connect to hosts. The associated ssh key will be added to /home/*<user>*/.ssh/authorized_keys. The user that you designate with this option must have passwordless sudo access."</p>
<p>Should we elaborate on this further? Add a tip along the lines of "if you're using a non-root user, make sure your ssh config allows them access"? Or is the blanket "The user that you designate with this option must have passwordless sudo access" sufficient?</p> Orchestrator - Documentation #49488 (Resolved): Document service specs for iSCSI deployment https://tracker.ceph.com/issues/494882021-02-25T08:58:17ZTim Serongtserong@suse.com
<p>There's currently no documentation on how to deploy iSCSI gateways (or, if there is, I can't find it beyond what's listed in the API documentation at <a class="external" href="https://docs.ceph.com/en/latest/api/mon_command_api/?highlight=iscsi#orch-apply-iscsi">https://docs.ceph.com/en/latest/api/mon_command_api/?highlight=iscsi#orch-apply-iscsi</a>). We should at least add an example service spec somewhere which includes all possible parameters, e.g.:</p>
<pre>service_type: iscsi
service_id: SERVICE_ID
placement:
hosts:
- [...]
spec:
pool: ISCSI_POOL
trusted_ip_list: "IP_ADDRESS_1,IP_ADDRESS_2,IP_ADDRESS_3,..."
api_user: API_USERNAME
api_password: API_PASSWORD
api_secure: true
ssl_cert: |
-----BEGIN CERTIFICATE-----
MIIDtTCCAp2gAwIBAgIYMC4xNzc1NDQxNjEzMzc2MjMyXzxvQ7EcMA0GCSqGSIb3
DQEBCwUAMG0xCzAJBgNVBAYTAlVTMQ0wCwYDVQQIDARVdGFoMRcwFQYDVQQHDA5T
[...]
-----END CERTIFICATE-----
ssl_key: |
-----BEGIN PRIVATE KEY-----
MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC5jdYbjtNTAKW4
/CwQr/7wOiLGzVxChn3mmCIF3DwbL/qvTFTX2d8bDf6LjGwLYloXHscRfxszX/4h
[...]
-----END PRIVATE KEY-----
</pre> Orchestrator - Bug #47745 (Resolved): cephadm: adopt {prometheus,grafana,alertmanager} fails with...https://tracker.ceph.com/issues/477452020-10-05T09:12:09ZTim Serongtserong@suse.com
<p>This is essentially the same problem as <a class="external" href="https://tracker.ceph.com/issues/46398">https://tracker.ceph.com/issues/46398</a>, but occurs when adopting prometheus/grafana/alertmanager, as opposed to when initially deploying those services:</p>
<pre>
# cephadm --image 192.168.121.1:5000/caasp/v4.5/prometheus-server:2.18.0 adopt --style=legacy --name prometheus.$(hostname)
INFO:cephadm:Pulling container image 192.168.121.1:5000/caasp/v4.5/prometheus-server:2.18.0...
INFO:cephadm:Non-zero exit code 1 from /usr/bin/podman run --rm --net=host --ipc=host -e CONTAINER_IMAGE=192.168.121.1:5000/caasp/v4.5/prometheus-server:2.18.0 -e NODE_NAME=admin --entrypoint stat 192.168.121.1:5000/caasp/v4.5/prometheus-server:2.18.0 -c %u %g /var/lib/ceph
INFO:cephadm:stat:stderr stat: cannot stat '/var/lib/ceph': No such file or directory
Traceback (most recent call last):
File "/usr/sbin/cephadm", line 5758, in <module>
r = args.func()
File "/usr/sbin/cephadm", line 1213, in _default_image
return func()
File "/usr/sbin/cephadm", line 3648, in command_adopt
command_adopt_prometheus(daemon_id, fsid)
File "/usr/sbin/cephadm", line 3874, in command_adopt_prometheus
copy_files([config_src], config_dst, uid=uid, gid=gid)
File "/usr/sbin/cephadm", line 1332, in copy_files
(uid, gid) = extract_uid_gid()
File "/usr/sbin/cephadm", line 1864, in extract_uid_gid
raise RuntimeError('uid/gid not found')
RuntimeError: uid/gid not found
</pre>
<p>The problem here is that <code>command_adopt_prometheus()</code> correctly calls <code>extract_uid_gid_monitoring(daemon_type)</code>, but if the uid and gid returned are 0 (i.e. the root user), the subsequent call to <code>copy_files()</code> checks <code>if not uid or not gid</code> (which of course matches, because those values are 0), then calls <code>extract_uid_gid()</code>, which tries to stat /var/lib/ceph, which of course doesn't exist inside the monitoring container images. The fix is to change this to <code>if uid is None or gid is None</code>.</p> Orchestrator - Bug #47360 (Resolved): cephadm: osd unit.run creates /var/run/ceph/$FSID too late,...https://tracker.ceph.com/issues/473602020-09-08T10:44:58ZTim Serongtserong@suse.com
<p>The OSD unit.run file currently has the following form:</p>
<pre>
/usr/bin/podman run [...] -v /var/run/ceph/$FSID:/var/run/ceph:z [...] ceph-volume lvm activate [...]
/usr/bin/install -d -m0770 -o 167 -g 167 /var/run/ceph/$FSID
# osd.$ID
/usr/bin/podman run [...] ceph-osd [...]
</pre>
<p>i.e. first it invokes <code>podman [...] ceph-volume lvm activate</code>, then creates /var/run/ceph/$FSID, then starts the OSD container. The problem is that the <code>podman [...] ceph-volume lvm activate</code> call will fail because /var/run/ceph/$FSID doesn't exist (you'll see something like 'Error: error checking path "/var/run/ceph/8f5be3a6-f1bb-11ea-9130-525400a64977": stat /var/run/ceph/8f5be3a6-f1bb-11ea-9130-525400a64977: no such file or directory' in the journal and your OSD won't start).</p>
<p>I assume most users have never experienced this, because every other ceph daemon's unit.run file <strong>also</strong> creates /var/run/ceph/$FSID, so if any other ceph daemon (including the crash daemon, which usually runs on all nodes) starts first, the directory is already created, and so the OSDs start up just fine. During upgrades however, it's entirely possible to adopt a bunch of OSDs, and not start any other services on that node yet (including the crash service), reboot, and then have all the OSDs on that node fail to start. Ouch.</p> Orchestrator - Bug #46833 (Resolved): simple (ceph-disk style) OSDs adopted by cephadm must not c...https://tracker.ceph.com/issues/468332020-08-05T06:32:11ZTim Serongtserong@suse.com
<p>The unit files created when adopting OSDs include a few <code>chown</code> calls to handle simple (ceph-disk style) OSDs, and also include a call to <code>ceph-volume lvm activate</code>, to handle LVM OSDs. The problem here is that <code>ceph-volume lvm activate</code> fails for simple OSDs, and this in turn causes the OSD to not start. Last time I was working on this (see <a class="external" href="https://tracker.ceph.com/issues/45129">https://tracker.ceph.com/issues/45129</a>), I didn't hit this problem, but something must have changed in the meantime. Whether that's a new systemd version, or a slightly different error handling path, I'm not sure, but the point is it would be best if we only generated exactly the correct commands, depending on the type of OSD, rather than including both the <code>chown</code> and the <code>ceph-volume lvm activate</code> calls in all cases.</p>
<p>(This issue was originally reported downstream as <a class="external" href="https://bugzilla.suse.com/show_bug.cgi?id=1174594">https://bugzilla.suse.com/show_bug.cgi?id=1174594</a>)</p> Orchestrator - Feature #45996 (New): adopted prometheus instance uses port 9095, regardless of or...https://tracker.ceph.com/issues/459962020-06-15T11:13:22ZTim Serongtserong@suse.com
<p>When adopting prometheus (<code>cephadm adopt --style legacy --name prometheus.HOSTNAME</code>), the new prometheus daemon starts listening on port 9095, regardless of what port the original daemon was running on. This is a problem for upgrades, as if you have an existing grafana instance it will still be looking at the old prometheus port number.</p> Orchestrator - Bug #45973 (Rejected): Adopted MDS daemons are removed by the orchestrator because...https://tracker.ceph.com/issues/459732020-06-11T10:13:13ZTim Serongtserong@suse.com
<p>The <a href="https://docs.ceph.com/docs/master/cephadm/adoption/" class="external">docs</a> say that when converting to cephadm, one needs to redeploy MDS daemons. However, it <strong>is</strong> possible to adopt them (<code>cephadm adopt [...] --name mds.myhost</code> seems to work just fine). The problem is that shortly after being adopted, the cephadm orchestrator decides that the MDS is an orphan (there's no service spec), and goes and removes the daemon.</p>
<p>If the correct procedure is always to redeploy, and never to adopt an MDS, then <code>cephadm adopt</code> should be presumably be changed to refuse to adopt MDSes (the same is possibly true for RGW, but I haven't verified this).</p>
<p>If, on the other hand, it's permitted to adopt an MDS, then I guess a service spec needs to be created for it automatically?</p>
<p>What's the right thing to do here?</p> Orchestrator - Bug #45572 (Rejected): cephadm: ceph-crash isn't deployed anywherehttps://tracker.ceph.com/issues/455722020-05-16T06:50:24ZTim Serongtserong@suse.com
<p>AFAICT when deploying a containerized cluster with cephadm, ceph-crash is never deployed anywhere. This means that if a daemon crashes and dumps info to /var/lib/ceph/crash, the crash data is saved to disk, but <code>ceph crash stat</code> always prints "0 crashes recorded", because the crashes never actually get posted.</p>
<p>Previously, installing ceph-base would enable the ceph-crash service, but if you're running a cephadm-deployed cluster, you never install that package.</p>
<p>Do we perhaps need to run an additional ceph-crash container on each host?</p>