Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2023-01-19T07:23:05ZCeph
Redmine Ceph - Bug #58501 (Resolved): ceph.spec.in: need to replace SUSE usrmerged macro with version checkhttps://tracker.ceph.com/issues/585012023-01-19T07:23:05ZTim Serongtserong@suse.com
<p><a class="external" href="https://github.com/ceph/ceph/commit/e4c4a4ce97fff8a5b4efa747d9cffeabcceedd25">https://github.com/ceph/ceph/commit/e4c4a4ce97fff8a5b4efa747d9cffeabcceedd25</a> introduced the use of the <code>usrmerged</code> macro on SUSE distros to guard against installing the /sbin/mount.ceph symlink. This macro has since been deprecated and should be replaced with a version check instead (<code>%if 0%{?suse_version} < 1550</code>). See <a class="external" href="https://en.opensuse.org/openSUSE:Usr_merge">https://en.opensuse.org/openSUSE:Usr_merge</a> for more details.</p> Ceph - Bug #57967 (Resolved): ceph-crash service should run as unprivileged user, not root (CVE-2...https://tracker.ceph.com/issues/579672022-11-03T05:11:53ZTim Serongtserong@suse.com
<p>As reported at <a class="external" href="https://www.openwall.com/lists/oss-security/2022/10/25/1">https://www.openwall.com/lists/oss-security/2022/10/25/1</a>, ceph-crash runs as root, which makes it vulnerable to a potential ceph user to root privilege escalation. This is fixable by making the ceph-crash process drop privileges and run as the ceph user, just as the other ceph daemons do.</p> Orchestrator - Bug #57609 (Resolved): applying osd service spec with size filter fails if there's...https://tracker.ceph.com/issues/576092022-09-20T05:33:28ZTim Serongtserong@suse.com
<p>This issue came up on a system with a 4KB virtual floppy disk drive.</p>
<p><code>ceph-volume inventory</code> gives:</p>
<pre>
Device Path Size rotates available Model name
/dev/fd0 4.00 KB True False
/dev/sda 50.00 GB True False Virtual disk
/dev/sdb 50.00 GB True False Virtual disk
/dev/sdc 50.00 GB True False Virtual disk
/dev/sdd 50.00 GB True False Virtual disk
</pre>
<p>Doing a simple <code>ceph orch apply osd --all-available-devices</code> works just fine, but service specs utilising size specifiers will fail to apply. For example:</p>
<pre>
service_id: at_least_8g
service_type: osd
placement:
host_pattern: '*'
spec:
data_devices:
size: '8G:'
</pre>
<p>Applying the above will give the following error in <code>ceph log last cephadm</code>:</p>
<pre>
ceph.deployment.drive_group.DriveGroupValidationError: Failed to validate OSD spec "at_least_8g.data_devices": Unit 'KB' not supported
</pre>
<p>The problem is that the SizeMatcher class only understands MB, GB and TB. When presented with a disk whose size is expressed in KB, it doesn't know what to do with it.</p> Ceph - Bug #56658 (Resolved): build: cephfs-shell fails to build/install with python setuptools >...https://tracker.ceph.com/issues/566582022-07-21T07:56:14ZTim Serongtserong@suse.com
<p>python setuptools v61 changed package discovery so that if it finds what it thinks are multiple top-level packages in a directory, it will fail to build. This was introduced by <a class="external" href="https://github.com/pypa/setuptools/pull/3177">https://github.com/pypa/setuptools/pull/3177</a>, and causes the ceph RPM build to fail with:</p>
<pre>
...
[ 9562s] error: Multiple top-level packages discovered in a flat-layout: ['top', 'CMakeFiles'].
[ 9562s]
[ 9562s] To avoid accidental inclusion of unwanted files or directories,
[ 9562s] setuptools will not proceed with this build.
[ 9562s]
[ 9562s] If you are trying to create a single distribution with multiple packages
[ 9562s] on purpose, you should not rely on automatic discovery.
[ 9562s] Instead, consider the following options:
[ 9562s]
[ 9562s] 1. set up custom discovery (`find` directive with `include` or `exclude`)
[ 9562s] 2. use a `src-layout`
[ 9562s] 3. explicitly set `py_modules` or `packages` with a list of names
[ 9562s]
[ 9562s] To find more information, look for "package discovery" on setuptools docs.
...
[ 9833s] RPM build errors:
[ 9833s] File not found: /home/abuild/rpmbuild/BUILDROOT/ceph-16.2.9.158+gd93952c7eea-2.3.x86_64/usr/lib/python3.10/site-packages/cephfs_shell-*.egg-info
[ 9833s] File not found: /home/abuild/rpmbuild/BUILDROOT/ceph-16.2.9.158+gd93952c7eea-2.3.x86_64/usr/bin/cephfs-shell
</pre>
<p>This has been fixed in Fedora downstream by moving a/src/tools/cephfs/cephfs-shell to a separate subdirectory (see <a class="external" href="https://src.fedoraproject.org/rpms/ceph/blob/rawhide/f/0021-cephfs-shell.patch">https://src.fedoraproject.org/rpms/ceph/blob/rawhide/f/0021-cephfs-shell.patch</a>). I've confirmed this approach also works for openSUSE.</p> ceph-volume - Bug #53846 (Resolved): ceph-volume should ignore /dev/rbd* deviceshttps://tracker.ceph.com/issues/538462022-01-12T07:10:56ZTim Serongtserong@suse.com
<p>If rbd devices are mapped on ceph cluster nodes (as they may be if you're running an iSCSI gateway for example), then <code>ceph-volume inventory</code> will list those RBD devices, and quite possibly list them as being "available". This causes a couple of problems:</p>
<p>1) Because /dev/rbd0 appears in the list of available devices, the orchestrator will actually try to deploy OSDs on top of those RBD devices. Luckily, this will fail, because the various LVM invocations will die with "Device /dev/rbd0 excluded by a filter", but really we shouldn't even be trying to do this in the first place. Let's not rely on luck ;-)<br />2) It's possible for /dev/rbd* devices to be locked/stuck in such a way that when ceph-volume invokes <code>blkid</code>, it hangs indefinitely (the process ends up in D-state). This can actually block the entire orchestrator, because the orchestrator calls out to cephadm periodically to inventory devices, and the latter tries to acquire a lock, which it can't get because a prior invocation is stuck running <code>ceph-volume inventory</code>.</p>
<p>I suggest we make ceph-volume completely ignore /dev/rbd* when doing a device inventory. I know we had a similar discussion on dev@ceph.io regarding ceph-volume listing, or not listing, GPT devices (see <a class="external" href="https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/N3TK4IO2QYHXIZMQTZ4AMPU5BE56J5MP/#T7UM53WCW2MDD62DDH6KLI4EZXKBXZBY">https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/N3TK4IO2QYHXIZMQTZ4AMPU5BE56J5MP/#T7UM53WCW2MDD62DDH6KLI4EZXKBXZBY</a>) but the difference here is that mapped RBD volumes really <em>aren't</em> part of the host inventory, so IMO should be excluded.</p> Orchestrator - Feature #45996 (New): adopted prometheus instance uses port 9095, regardless of or...https://tracker.ceph.com/issues/459962020-06-15T11:13:22ZTim Serongtserong@suse.com
<p>When adopting prometheus (<code>cephadm adopt --style legacy --name prometheus.HOSTNAME</code>), the new prometheus daemon starts listening on port 9095, regardless of what port the original daemon was running on. This is a problem for upgrades, as if you have an existing grafana instance it will still be looking at the old prometheus port number.</p> Orchestrator - Bug #45973 (Rejected): Adopted MDS daemons are removed by the orchestrator because...https://tracker.ceph.com/issues/459732020-06-11T10:13:13ZTim Serongtserong@suse.com
<p>The <a href="https://docs.ceph.com/docs/master/cephadm/adoption/" class="external">docs</a> say that when converting to cephadm, one needs to redeploy MDS daemons. However, it <strong>is</strong> possible to adopt them (<code>cephadm adopt [...] --name mds.myhost</code> seems to work just fine). The problem is that shortly after being adopted, the cephadm orchestrator decides that the MDS is an orphan (there's no service spec), and goes and removes the daemon.</p>
<p>If the correct procedure is always to redeploy, and never to adopt an MDS, then <code>cephadm adopt</code> should be presumably be changed to refuse to adopt MDSes (the same is possibly true for RGW, but I haven't verified this).</p>
<p>If, on the other hand, it's permitted to adopt an MDS, then I guess a service spec needs to be created for it automatically?</p>
<p>What's the right thing to do here?</p> Orchestrator - Bug #45095 (Resolved): cephadm adopt can't handle offline OSDshttps://tracker.ceph.com/issues/450952020-04-15T09:17:29ZTim Serongtserong@suse.com
<p><code>cephadm adopt</code> for OSDs relies on the OSD actually being up and running (it checks /var/lib/ceph/osd/ceph-$ID/{fsid,type}, and also moves files from that directory to /var/lig/ceph/$FSID/osd.$ID). This means if your OSDs are down for any reason, they can't be adopted.</p> Orchestrator - Bug #40801 (Duplicate): ceph orchestrator cli alleges support for json-pretty, xml...https://tracker.ceph.com/issues/408012019-07-17T11:27:42ZTim Serongtserong@suse.com
<p>The <code>_list_devices()</code> and <code>_list_services()</code> methods of the orchestrator CLI both specify <code>"name=format,type=CephChoices,strings=json|plain,req=false"</code> for the format parameter, so it should only accept json or plain as formats. However, if I run <code>`ceph orchestrator service ls --format=some-unsupported-thing`</code>, I get:</p>
<p><code>ceph: error: argument -f/--format: invalid choice: 'json_pretty' (choose from 'json', 'json-pretty', 'xml', 'xml-pretty', 'plain')</code></p>
<p>Given those methods only support json and plain, I'd not expect to see json-pretty, xml or xml-pretty as available options.</p> mgr - Bug #39662 (Resolved): ceph-mgr should log an error if it can't find any modules to loadhttps://tracker.ceph.com/issues/396622019-05-10T04:49:39ZTim Serongtserong@suse.com
<p>I had a downstream SUSE build of ceph-14.1.0, which was showing "7 mgr modules have failed" in `ceph status`, and `ceph mgr dump` showed "available_modules": [] (i.e. an empty list). The 7 failed modules were of course the always-on modules. I eventually figured out that mgr_module_path was somehow set to /usr/local/share/ceph/mgr in that build, but the modules were actually correctly installed in /usr/share/ceph/mgr, so mgr couldn't find them. We should log the path we're loading modules from, and log an error if none are found, so that if mgr_module_path is ever set incorrectly in future, the problem will be obvious.</p> Ceph - Bug #37503 (Resolved): Audit log: mgr module passwords set on CLI written as plaintext in ...https://tracker.ceph.com/issues/375032018-12-03T10:57:04ZTim Serongtserong@suse.com
<p>A number of mgr modules need passwords set for one reason or another, either to authenticate with external systems (deepsea, influx, diskprediction), or to define credentials for users of those modules (dashboard, restful).</p>
<p>In all cases, these passwords are set from the command line, either via module-specific commands (<code>`ceph dashboard ac-user-create`</code>, <code>`deepsea config-set salt_api_password`</code>, etc.) or via <code>`ceph config set`</code> with some particular key (e.g.: mgr/influx/passsword)</p>
<p>All module-specific commands go through <code>DaemonServer::_handle_command()</code>, which then logs the command via <code>audit_clog->debug()</code> (or <code>audit_clog->info()</code> in case of access denied). This all ends up written to <code>/var/log/ceph/ceph-mgr.$ID.log</code>, which is world-readable, e.g.:</p>
<pre>
2018-12-03 10:45:28.864 7f67e7f8f700 0 log_channel(audit) log [DBG] : from='client.343880 172.16.1.254:39896/3560370796' entity='client.admin' cmd=[{"prefix": "deepsea config-set", "key": "salt_api_password", "value": "foo", "target": ["mgr", ""]}]: dispatch
</pre>
<p>Additionally, anything that results in a "config set" lands in the mon log, e.g.:</p>
<pre>
2018-12-03 10:45:28.881552 [INF] from='mgr.295252 172.16.1.21:56636/175641' entity='mgr.data1' cmd='[{"prefix":"config set","who":"mgr","name":"mgr/deepsea/salt_api_password","value":"foo"}]': finished
</pre>
<p>This also appears in the Audit log in the Dashboard.</p>
<p>Some things that land in the mon log probably don't matter; for any module that hashes passwords before saving them, only the hashed password should land in the mon log. But there's still the problem of the CLI commands in the mgr log, and in any case, modules that need to authenticate with external services will need to store plaintext passwords.</p>
<p>ISTM we need to either never log these things, or somehow keep the command logging, but filter the passwords out, so it renders the value as "*****" instead of the actual password.</p>
<p>I'm not sure how best to approach this, given the way command logging is structured. At the point commands are logged, the commands themselves are just strings. Admittedly, they're strings of JSON, but they're effectively opaque at that point - we'd have to parse the JSON, then look for things that might be passwords, blank them out, and turn the whole lot back into a string. Yuck.</p> mgr - Bug #37377 (New): ceph-mgr/influx: verify "no metadata" fix is completehttps://tracker.ceph.com/issues/373772018-11-23T10:11:07ZTim Serongtserong@suse.com
<p>Seen while reviewing <a class="external" href="https://github.com/ceph/ceph/pull/25184">https://github.com/ceph/ceph/pull/25184</a>. The fix for <a class="external" href="http://tracker.ceph.com/issues/25191">http://tracker.ceph.com/issues/25191</a> in <a class="external" href="https://github.com/ceph/ceph/pull/22794">https://github.com/ceph/ceph/pull/22794</a> is applied to the get_pg_summary() function, but not to the get_daemon_stats() function. We need to verify whether this fix should also be applied to the latter function (my guess is "yes", but I don't know for certain).</p> Ceph - Bug #35906 (Resolved): ceph-disk: is_mounted() returns None for mounted OSDs with Python 3https://tracker.ceph.com/issues/359062018-09-10T11:10:49ZTim Serongtserong@suse.com
<p>`ceph-disk list --format=json` on python 3 gives null for the mount member, even for mounted OSDs, e.g.:</p>
<pre>
# ceph-disk list --format=json|json_pp
...
{
"path" : "/dev/vdg",
"partitions" : [
{
"whoami" : "23",
"is_partition" : true,
"path" : "/dev/vdg1",
"ceph_fsid" : "00296336-7bf2-43f1-a48c-24c7212bf478",
"dmcrypt" : {},
"uuid" : "b447f027-f116-47d0-9cd1-ca2348e8e3db",
"block_uuid" : "dfaf6613-f958-497a-9dfb-ad343e897639",
"block_dev" : "/dev/vdg2",
"type" : "data",
*"mount" : null,*
"ptype" : "4fbd7e29-9d25-41b8-afd0-062c0ceff05d",
"magic" : "ceph osd volume v026",
"cluster" : "ceph",
"state" : "prepared",
"fs_type" : "xfs"
},
{
"type" : "block",
"is_partition" : true,
"path" : "/dev/vdg2",
"ptype" : "cafecafe-9b03-4f30-b4c6-b4b80ceff106",
"block_for" : "/dev/vdg1",
"dmcrypt" : {},
"uuid" : "dfaf6613-f958-497a-9dfb-ad343e897639"
}
]
}
...
</pre> Ceph - Bug #18163 (Resolved): platform.linux_distribution() is deprecated; stop using ithttps://tracker.ceph.com/issues/181632016-12-07T04:10:56ZTim Serongtserong@suse.com
<p>platform.linux_distribution() is deprecated, so we should stop using it. Notably it uses /etc/SuSE-release on SUSE systems, and the latest SUSE versions don't ship this file; instead they ship /etc/os-release, which platform.linux_distribution() doesn't know about, so it returns ('','','').</p>
<p>AFAICT, platform.linux_distribution() is currently used by ceph-detect-init, which in turn is used by ceph-disk. If ceph-detect-init can't determine the distro because it sees ('','',''), this results in ceph-disk always tagging the init system as sysvinit.</p>
<p>There are also platform.linux_distribution() invocations in qa/workunits/ceph-disk/ceph-disk-no-lockbox and src/ceph-disk/ceph_disk/main.py, but they look like dead code to me.</p>
<p>See also bug <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: platform.linux_distribution() is deprecated; stop using it (Resolved)" href="https://tracker.ceph.com/issues/18141">#18141</a></p> Ceph - Bug #14864 (Resolved): ceph-detect-init requires python-setuptools at runtimehttps://tracker.ceph.com/issues/148642016-02-25T14:58:49ZTim Serongtserong@suse.com
<p>Testing a reasonably recent ceph-10.0.2 on openSUSE Leap 42.1, my OSDs weren't mounting. I tracked this back to /usr/lib/systemd/system/ceph-disk@.service which invokes `flock /var/lock/ceph-disk /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f`. This in turn results in:</p>
<pre>
ceph-disk: main_trigger: Namespace(dev='/dev/sdb1', func=<function main_trigger at 0x7fa6ebf6b050>, log_stdout=True, prepend_to_path='/usr/bin', prog='ceph-disk', statedir='/var/lib/ceph', sync=True, sysconfdir='/etc/ceph', verbose=True)
ceph-disk: Running command: /sbin/init --version
/sbin/init: unrecognized option '--version'
ceph-disk: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid
ceph-disk: Running command: /usr/sbin/sgdisk -i 1 /dev/sdb
ceph-disk: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid
ceph-disk: Running command: /usr/sbin/sgdisk -i 1 /dev/sdb
ceph-disk: trigger /dev/sdb1 parttype 4fbd7e29-9d25-41b8-afd0-062c0ceff05d uuid 93b72ed5-7d84-4b0b-a227-330fcd22513e
ceph-disk: Running command: /usr/sbin/ceph-disk activate /dev/sdb1
Traceback (most recent call last):
File "/usr/bin/ceph-detect-init", line 5, in <module>
from pkg_resources import load_entry_point
ImportError: No module named pkg_resources
ERROR:ceph-disk:Failed to activate
Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 4036, in <module>
main(sys.argv[1:])
File "/usr/sbin/ceph-disk", line 3992, in main
main_catch(args.func, args)
File "/usr/sbin/ceph-disk", line 4014, in main_catch
func(args)
File "/usr/sbin/ceph-disk", line 2530, in main_activate
reactivate=args.reactivate,
File "/usr/sbin/ceph-disk", line 2296, in mount_activate
(osd_id, cluster) = activate(path, activate_key_template, init)
File "/usr/sbin/ceph-disk", line 2477, in activate
init = init_get()
File "/usr/sbin/ceph-disk", line 799, in init_get
'--default', 'sysvinit',
File "/usr/sbin/ceph-disk", line 902, in _check_output
raise error
subprocess.CalledProcessError: Command '/usr/bin/ceph-detect-init' returned non-zero exit status 1
</pre>
<p>The important part is:</p>
<pre>
Traceback (most recent call last):
File "/usr/bin/ceph-detect-init", line 5, in <module>
from pkg_resources import load_entry_point
ImportError: No module named pkg_resources
</pre>
<p>This is fixable by installing python-setuptools, suggesting that package needs to be added to the RPM Requires and, I assume, the Debian Depends.</p>