Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2022-11-03T05:11:53ZCeph
Redmine Ceph - Bug #57967 (Resolved): ceph-crash service should run as unprivileged user, not root (CVE-2...https://tracker.ceph.com/issues/579672022-11-03T05:11:53ZTim Serongtserong@suse.com
<p>As reported at <a class="external" href="https://www.openwall.com/lists/oss-security/2022/10/25/1">https://www.openwall.com/lists/oss-security/2022/10/25/1</a>, ceph-crash runs as root, which makes it vulnerable to a potential ceph user to root privilege escalation. This is fixable by making the ceph-crash process drop privileges and run as the ceph user, just as the other ceph daemons do.</p> Ceph - Bug #57860 (Pending Backport): disable system_pmdk on s390x for SUSE distroshttps://tracker.ceph.com/issues/578602022-10-13T04:28:31ZTim Serongtserong@suse.com
<p>Same as <a class="external" href="https://tracker.ceph.com/issues/56491">https://tracker.ceph.com/issues/56491</a> which addressed RHEL and Fedora not shipping libpmem on s390x, but for SUSE.</p> Ceph - Bug #55087 (Resolved): rpm: openSUSE needs libthrift-devel, not thrift-develhttps://tracker.ceph.com/issues/550872022-03-28T09:42:57ZTim Serongtserong@suse.com
<p>In <a class="external" href="https://github.com/ceph/ceph/pull/38783">https://github.com/ceph/ceph/pull/38783</a>, <a class="external" href="https://github.com/ceph/ceph/pull/38783/commits/80e82686eba">https://github.com/ceph/ceph/pull/38783/commits/80e82686eba</a> added "thrift-devel >= 0.13.0" as a BuildRequires. On SUSE distros, this package is named libthrift-devel, so we need an <code>%if 0%{?suse_version}</code> block around that one.</p> ceph-volume - Bug #53846 (Resolved): ceph-volume should ignore /dev/rbd* deviceshttps://tracker.ceph.com/issues/538462022-01-12T07:10:56ZTim Serongtserong@suse.com
<p>If rbd devices are mapped on ceph cluster nodes (as they may be if you're running an iSCSI gateway for example), then <code>ceph-volume inventory</code> will list those RBD devices, and quite possibly list them as being "available". This causes a couple of problems:</p>
<p>1) Because /dev/rbd0 appears in the list of available devices, the orchestrator will actually try to deploy OSDs on top of those RBD devices. Luckily, this will fail, because the various LVM invocations will die with "Device /dev/rbd0 excluded by a filter", but really we shouldn't even be trying to do this in the first place. Let's not rely on luck ;-)<br />2) It's possible for /dev/rbd* devices to be locked/stuck in such a way that when ceph-volume invokes <code>blkid</code>, it hangs indefinitely (the process ends up in D-state). This can actually block the entire orchestrator, because the orchestrator calls out to cephadm periodically to inventory devices, and the latter tries to acquire a lock, which it can't get because a prior invocation is stuck running <code>ceph-volume inventory</code>.</p>
<p>I suggest we make ceph-volume completely ignore /dev/rbd* when doing a device inventory. I know we had a similar discussion on dev@ceph.io regarding ceph-volume listing, or not listing, GPT devices (see <a class="external" href="https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/N3TK4IO2QYHXIZMQTZ4AMPU5BE56J5MP/#T7UM53WCW2MDD62DDH6KLI4EZXKBXZBY">https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/N3TK4IO2QYHXIZMQTZ4AMPU5BE56J5MP/#T7UM53WCW2MDD62DDH6KLI4EZXKBXZBY</a>) but the difference here is that mapped RBD volumes really <em>aren't</em> part of the host inventory, so IMO should be excluded.</p> RADOS - Bug #52553 (Resolved): pybind: rados.RadosStateError raised when closed watch object goes...https://tracker.ceph.com/issues/525532021-09-09T07:06:25ZTim Serongtserong@suse.com
<p>This one is easiest to demonstrate by example. Here's some code:</p>
<pre>
#!/usr/bin/env python3
import rados
def notify(notify_id, notifier_id, watch_id, data):
pass
if __name__ == "__main__":
cluster = rados.Rados(conffile="/etc/ceph/ceph.conf")
cluster.connect()
ioctx = cluster.open_ioctx("aquarium")
watch = ioctx.watch("kvstore", notify)
watch.close()
cluster.shutdown()
</pre>
<p>If I run that, I see the following error output:</p>
<pre>Traceback (most recent call last):
File "rados.pyx", line 477, in rados.Rados.require_state
rados.RadosStateError: RADOS rados state (You cannot perform that operation on a Rados object in state shutdown.)
Exception ignored in: 'rados.Watch.__dealloc__'
Traceback (most recent call last):
File "rados.pyx", line 477, in rados.Rados.require_state
rados.RadosStateError: RADOS rados state (You cannot perform that operation on a Rados object in state shutdown.)
</pre>
<p>What's happening here, is that even though I called <code>watch.close()</code>, later, once the watch goes out of scope, its <code>__dealloc__()</code> method tries to close the watch <em>again</em>, after first calling <code>self.ioctx.rados.require_state("connected")</code>, which results in that exception.</p>
<p>The fix is easy:</p>
<pre>
diff --git a/src/pybind/rados/rados.pyx b/src/pybind/rados/rados.pyx
index 4a5db349516..8772942e7ca 100644
--- a/src/pybind/rados/rados.pyx
+++ b/src/pybind/rados/rados.pyx
@@ -2025,6 +2025,8 @@ cdef class Watch(object):
return False
def __dealloc__(self):
+ if self.id == 0:
+ return
self.ioctx.rados.require_state("connected")
self.close()
</pre>
<p>The one thing I can't work out how to do, is write a test for this case, because as the exception is in <code>__dealloc__</code>, it gets printed to stderr, but is otherwise ignored, so I can't seem to catch it anywhere in src/test/pybind/test_rados.py</p> Orchestrator - Bug #45572 (Rejected): cephadm: ceph-crash isn't deployed anywherehttps://tracker.ceph.com/issues/455722020-05-16T06:50:24ZTim Serongtserong@suse.com
<p>AFAICT when deploying a containerized cluster with cephadm, ceph-crash is never deployed anywhere. This means that if a daemon crashes and dumps info to /var/lib/ceph/crash, the crash data is saved to disk, but <code>ceph crash stat</code> always prints "0 crashes recorded", because the crashes never actually get posted.</p>
<p>Previously, installing ceph-base would enable the ceph-crash service, but if you're running a cephadm-deployed cluster, you never install that package.</p>
<p>Do we perhaps need to run an additional ceph-crash container on each host?</p> Orchestrator - Bug #40801 (Duplicate): ceph orchestrator cli alleges support for json-pretty, xml...https://tracker.ceph.com/issues/408012019-07-17T11:27:42ZTim Serongtserong@suse.com
<p>The <code>_list_devices()</code> and <code>_list_services()</code> methods of the orchestrator CLI both specify <code>"name=format,type=CephChoices,strings=json|plain,req=false"</code> for the format parameter, so it should only accept json or plain as formats. However, if I run <code>`ceph orchestrator service ls --format=some-unsupported-thing`</code>, I get:</p>
<p><code>ceph: error: argument -f/--format: invalid choice: 'json_pretty' (choose from 'json', 'json-pretty', 'xml', 'xml-pretty', 'plain')</code></p>
<p>Given those methods only support json and plain, I'd not expect to see json-pretty, xml or xml-pretty as available options.</p> Orchestrator - Bug #37514 (Can't reproduce): mgr CLI commands block one another (indefinitely if ...https://tracker.ceph.com/issues/375142018-12-04T06:46:49ZTim Serongtserong@suse.com
<p>There seems to be two problems here:</p>
<p>1) Only one mgr CLI command runs at a time. This isn't obvious unless you find an mgr command that takes a few seconds to run. For example, when testing the deepsea orchestrator, <code>`ceph orchestrator device ls`</code> takes about five seconds to complete. If I invoke that command in one terminal window, then invoke `<code>ceph osd status`</code> in another terminal window, the latter will block until the former completes. That's irritating, but probably not fatal.</p>
<p>2) Orchestrator CLI commands spin waiting for command completions from whatever orchestrator module is active. If you manage to break an orchestrator in such a way as commands never complete (try e.g.: stopping the salt-api while using DeepSea), <code>`ceph orchestrator device ls`</code> will never complete. Even if you hit CTRL-C, mgr is (presumably) still stuck in that loop waiting for completions that are never going to happen, which means it's unable to service any subsequent CLI command. Trying to run, say, <code>`ceph osd status`</code> at this point will also simply just hang. You can't even quickly restart mgr at this point: <code>`systemctl restart ceph-mgr</code>$(hostname)`@ hangs until it hits "State 'stop-sigterm' timed out." (after maybe a minute and a half), then it sends a SIGKILL and mgr is finally restarted.</p> Ceph - Bug #37503 (Resolved): Audit log: mgr module passwords set on CLI written as plaintext in ...https://tracker.ceph.com/issues/375032018-12-03T10:57:04ZTim Serongtserong@suse.com
<p>A number of mgr modules need passwords set for one reason or another, either to authenticate with external systems (deepsea, influx, diskprediction), or to define credentials for users of those modules (dashboard, restful).</p>
<p>In all cases, these passwords are set from the command line, either via module-specific commands (<code>`ceph dashboard ac-user-create`</code>, <code>`deepsea config-set salt_api_password`</code>, etc.) or via <code>`ceph config set`</code> with some particular key (e.g.: mgr/influx/passsword)</p>
<p>All module-specific commands go through <code>DaemonServer::_handle_command()</code>, which then logs the command via <code>audit_clog->debug()</code> (or <code>audit_clog->info()</code> in case of access denied). This all ends up written to <code>/var/log/ceph/ceph-mgr.$ID.log</code>, which is world-readable, e.g.:</p>
<pre>
2018-12-03 10:45:28.864 7f67e7f8f700 0 log_channel(audit) log [DBG] : from='client.343880 172.16.1.254:39896/3560370796' entity='client.admin' cmd=[{"prefix": "deepsea config-set", "key": "salt_api_password", "value": "foo", "target": ["mgr", ""]}]: dispatch
</pre>
<p>Additionally, anything that results in a "config set" lands in the mon log, e.g.:</p>
<pre>
2018-12-03 10:45:28.881552 [INF] from='mgr.295252 172.16.1.21:56636/175641' entity='mgr.data1' cmd='[{"prefix":"config set","who":"mgr","name":"mgr/deepsea/salt_api_password","value":"foo"}]': finished
</pre>
<p>This also appears in the Audit log in the Dashboard.</p>
<p>Some things that land in the mon log probably don't matter; for any module that hashes passwords before saving them, only the hashed password should land in the mon log. But there's still the problem of the CLI commands in the mgr log, and in any case, modules that need to authenticate with external services will need to store plaintext passwords.</p>
<p>ISTM we need to either never log these things, or somehow keep the command logging, but filter the passwords out, so it renders the value as "*****" instead of the actual password.</p>
<p>I'm not sure how best to approach this, given the way command logging is structured. At the point commands are logged, the commands themselves are just strings. Admittedly, they're strings of JSON, but they're effectively opaque at that point - we'd have to parse the JSON, then look for things that might be passwords, blank them out, and turn the whole lot back into a string. Yuck.</p> Ceph - Bug #35906 (Resolved): ceph-disk: is_mounted() returns None for mounted OSDs with Python 3https://tracker.ceph.com/issues/359062018-09-10T11:10:49ZTim Serongtserong@suse.com
<p>`ceph-disk list --format=json` on python 3 gives null for the mount member, even for mounted OSDs, e.g.:</p>
<pre>
# ceph-disk list --format=json|json_pp
...
{
"path" : "/dev/vdg",
"partitions" : [
{
"whoami" : "23",
"is_partition" : true,
"path" : "/dev/vdg1",
"ceph_fsid" : "00296336-7bf2-43f1-a48c-24c7212bf478",
"dmcrypt" : {},
"uuid" : "b447f027-f116-47d0-9cd1-ca2348e8e3db",
"block_uuid" : "dfaf6613-f958-497a-9dfb-ad343e897639",
"block_dev" : "/dev/vdg2",
"type" : "data",
*"mount" : null,*
"ptype" : "4fbd7e29-9d25-41b8-afd0-062c0ceff05d",
"magic" : "ceph osd volume v026",
"cluster" : "ceph",
"state" : "prepared",
"fs_type" : "xfs"
},
{
"type" : "block",
"is_partition" : true,
"path" : "/dev/vdg2",
"ptype" : "cafecafe-9b03-4f30-b4c6-b4b80ceff106",
"block_for" : "/dev/vdg1",
"dmcrypt" : {},
"uuid" : "dfaf6613-f958-497a-9dfb-ad343e897639"
}
]
}
...
</pre> mgr - Bug #19954 (Resolved): mgr key needs "mon 'allow *'" capshttps://tracker.ceph.com/issues/199542017-05-17T03:14:44ZTim Serongtserong@suse.com
<p>Prior to <a class="external" href="https://github.com/ceph/ceph/commit/5906e359bd">https://github.com/ceph/ceph/commit/5906e359bd</a> and <a class="external" href="https://github.com/ceph/ceph/commit/6625fcd8fd">https://github.com/ceph/ceph/commit/6625fcd8fd</a>, the mgr key had "mon 'allow *'" caps. Since those commits, it only has "mon 'allow profile mgr'". This means that mgr modules now can't set config keys, so, for example, the new REST API (<a class="external" href="https://github.com/ceph/ceph/pull/14457">https://github.com/ceph/ceph/pull/14457</a>) won't actually be able to persist its settings.</p>
<p>Is there a reason the mgr key should have limited mon capabilities, or can we revert those two commits?</p> mgr - Bug #19629 (Resolved): mgr: set_config from python module crashes mgr (assertion failure du...https://tracker.ceph.com/issues/196292017-04-14T05:40:39ZTim Serongtserong@suse.com
<p>Tried `ceph tell mgr enable_auth false` (enable_auth being implemented by the rest module). The ceph CLI became unresponsive (never returned to the shell). Meanwhile, mgr crashed with:<br /><pre>
-8> 2017-04-14 15:34:45.938295 7f00c778a700 4 mgr[rest] handle_command: {
"prefix": "enable_auth",
"val": "false"
}
-7> 2017-04-14 15:34:45.938336 7f00c778a700 1 lockdep using id 41
-6> 2017-04-14 15:34:45.938369 7f00c778a700 10 monclient: _send_command 5 [{"prefix":"config-key put","key":"mgr.rest.enable_auth","val":"false"}]
-5> 2017-04-14 15:34:45.938376 7f00c778a700 10 monclient: _send_mon_message to mon.b at 127.0.0.1:40447/0
-4> 2017-04-14 15:34:45.938381 7f00c778a700 1 -- 127.0.0.1:0/3288703021 --> 127.0.0.1:40447/0 -- mon_command({"prefix":"config-key put","key":"mgr.rest.enable_auth","val":"false"} v 0) v1 -- 0x55ea04db41c0 con 0
-3> 2017-04-14 15:34:45.938756 7f00caf91700 1 -- 127.0.0.1:0/3288703021 <== mon.1 127.0.0.1:40447/0 25 ==== mon_command_ack([{"prefix":"config-key put","key":"mgr.rest.enable_auth","val":"false"}]=-13 access denied v0) v1 ==== 117+0+0 (2523711155 0 0) 0x55ea04db41c0 con 0x55ea02d59800
-2> 2017-04-14 15:34:45.938773 7f00caf91700 10 monclient: handle_mon_command_ack 5 [{"prefix":"config-key put","key":"mgr.rest.enable_auth","val":"false"}]
-1> 2017-04-14 15:34:45.938778 7f00caf91700 10 monclient: _finish_command 5 = -13 access denied
0> 2017-04-14 15:34:45.940929 7f00c778a700 -1 /home/tserong/src/github/SUSE/ceph/src/mgr/PyModules.cc: In function 'void PyModules::set_config(const string&, const string&, const string&)' thread 7f00c778a700 time 2017-04-14 15:34:45.938831
/home/tserong/src/github/SUSE/ceph/src/mgr/PyModules.cc: 546: FAILED assert(set_cmd.r == 0)
</pre></p>
<p>Why is the mon giving mgr "access denied" when it tries to set a config key?</p>
<p>We should consider getting rid of that assert too...</p> Ceph-deploy - Bug #18164 (Resolved): platform.linux_distribution() fails on distros with /etc/os-...https://tracker.ceph.com/issues/181642016-12-07T04:14:41ZTim Serongtserong@suse.com
<p>As with bugs <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: platform.linux_distribution() is deprecated; stop using it (Resolved)" href="https://tracker.ceph.com/issues/18141">#18141</a> and <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: platform.linux_distribution() is deprecated; stop using it (Resolved)" href="https://tracker.ceph.com/issues/18163">#18163</a>, on the latest SUSE systems, platform.linux_distribution() returns ('','',''), causing ceph-deploy to fail with "UnsupportedPlatform: Platform is not supported:"</p>
<p>Given platform.linux_distribution() is deprecated and doesn't understand /etc/os-release, we should stop using it.</p> Ceph - Bug #18163 (Resolved): platform.linux_distribution() is deprecated; stop using ithttps://tracker.ceph.com/issues/181632016-12-07T04:10:56ZTim Serongtserong@suse.com
<p>platform.linux_distribution() is deprecated, so we should stop using it. Notably it uses /etc/SuSE-release on SUSE systems, and the latest SUSE versions don't ship this file; instead they ship /etc/os-release, which platform.linux_distribution() doesn't know about, so it returns ('','','').</p>
<p>AFAICT, platform.linux_distribution() is currently used by ceph-detect-init, which in turn is used by ceph-disk. If ceph-detect-init can't determine the distro because it sees ('','',''), this results in ceph-disk always tagging the init system as sysvinit.</p>
<p>There are also platform.linux_distribution() invocations in qa/workunits/ceph-disk/ceph-disk-no-lockbox and src/ceph-disk/ceph_disk/main.py, but they look like dead code to me.</p>
<p>See also bug <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: platform.linux_distribution() is deprecated; stop using it (Resolved)" href="https://tracker.ceph.com/issues/18141">#18141</a></p> Ceph - Bug #14864 (Resolved): ceph-detect-init requires python-setuptools at runtimehttps://tracker.ceph.com/issues/148642016-02-25T14:58:49ZTim Serongtserong@suse.com
<p>Testing a reasonably recent ceph-10.0.2 on openSUSE Leap 42.1, my OSDs weren't mounting. I tracked this back to /usr/lib/systemd/system/ceph-disk@.service which invokes `flock /var/lock/ceph-disk /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f`. This in turn results in:</p>
<pre>
ceph-disk: main_trigger: Namespace(dev='/dev/sdb1', func=<function main_trigger at 0x7fa6ebf6b050>, log_stdout=True, prepend_to_path='/usr/bin', prog='ceph-disk', statedir='/var/lib/ceph', sync=True, sysconfdir='/etc/ceph', verbose=True)
ceph-disk: Running command: /sbin/init --version
/sbin/init: unrecognized option '--version'
ceph-disk: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid
ceph-disk: Running command: /usr/sbin/sgdisk -i 1 /dev/sdb
ceph-disk: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid
ceph-disk: Running command: /usr/sbin/sgdisk -i 1 /dev/sdb
ceph-disk: trigger /dev/sdb1 parttype 4fbd7e29-9d25-41b8-afd0-062c0ceff05d uuid 93b72ed5-7d84-4b0b-a227-330fcd22513e
ceph-disk: Running command: /usr/sbin/ceph-disk activate /dev/sdb1
Traceback (most recent call last):
File "/usr/bin/ceph-detect-init", line 5, in <module>
from pkg_resources import load_entry_point
ImportError: No module named pkg_resources
ERROR:ceph-disk:Failed to activate
Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 4036, in <module>
main(sys.argv[1:])
File "/usr/sbin/ceph-disk", line 3992, in main
main_catch(args.func, args)
File "/usr/sbin/ceph-disk", line 4014, in main_catch
func(args)
File "/usr/sbin/ceph-disk", line 2530, in main_activate
reactivate=args.reactivate,
File "/usr/sbin/ceph-disk", line 2296, in mount_activate
(osd_id, cluster) = activate(path, activate_key_template, init)
File "/usr/sbin/ceph-disk", line 2477, in activate
init = init_get()
File "/usr/sbin/ceph-disk", line 799, in init_get
'--default', 'sysvinit',
File "/usr/sbin/ceph-disk", line 902, in _check_output
raise error
subprocess.CalledProcessError: Command '/usr/bin/ceph-detect-init' returned non-zero exit status 1
</pre>
<p>The important part is:</p>
<pre>
Traceback (most recent call last):
File "/usr/bin/ceph-detect-init", line 5, in <module>
from pkg_resources import load_entry_point
ImportError: No module named pkg_resources
</pre>
<p>This is fixable by installing python-setuptools, suggesting that package needs to be added to the RPM Requires and, I assume, the Debian Depends.</p>