Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2024-02-20T17:57:40ZCeph
Redmine Ceph - Backport #64508 (New): quincy: Debian bookworm package needs to explicitly specify cephadm...https://tracker.ceph.com/issues/645082024-02-20T17:57:40ZBackport BotCeph - Bug #64069 (Pending Backport): Debian bookworm package needs to explicitly specify cephadm...https://tracker.ceph.com/issues/640692024-01-17T18:46:25ZChris Palmer
<p>The bookworm/reef cephadm package needs updating to accommodate the last change in /usr/share/doc/adduser/NEWS.Debian.gz:</p>
<pre><code>System user home defaults to /nonexistent if --home is not specified.<br /> Packages that call adduser to create system accounts should explicitly<br /> specify a location for /home (see Lintian check<br /> maintainer-script-lacks-home-in-adduser).</code></pre>
<p>i.e. when creating the cephadm user as a system user it needs to explicitly specify the expected home directory of /home/cephadm.</p>
<p>A workaround is to manually create the user+directory before installing ceph.</p>
<p>Kefu Chai has created PR55218 to address this.</p> RADOS - Bug #50012 (Fix Under Review): Ceph-osd refuses to bind on an IP on the local loopback lo...https://tracker.ceph.com/issues/500122021-03-26T12:10:11ZKefu Chaitchaikov@gmail.comRADOS - Bug #49359 (New): osd: warning: unused variablehttps://tracker.ceph.com/issues/493592021-02-18T17:17:22ZPatrick Donnellypdonnell@redhat.com
<pre>
/build/ceph-17.0.0-888-g5ce097b4/src/osd/osd_types.h: In member function 'void object_ref_delta_t::mut_ref(const hobject_t&, int)':
/build/ceph-17.0.0-888-g5ce097b4/src/osd/osd_types.h:5577:35: warning: unused variable '_' [-Wunused-variable]
[[maybe_unused]] auto [iter, _] = ref_delta.try_emplace(hoid, 0);
^
</pre>
<p>From: <a class="external" href="https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=bionic,DIST=bionic,MACHINE_SIZE=gigantic/46981//consoleFull">https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=bionic,DIST=bionic,MACHINE_SIZE=gigantic/46981//consoleFull</a></p> Ceph - Bug #48498 (Fix Under Review): octopus: timeout when running the "ceph" commandhttps://tracker.ceph.com/issues/484982020-12-08T16:00:28ZMathew Clarke
<p>Running "Ubuntu 20.04.1" with kernel "5.4.77-217"</p>
<p>I've installed octopus version 15.2.5 from distro armhf repo as "deb-src <a class="external" href="https://download.ceph.com/debian-octopus/">https://download.ceph.com/debian-octopus/</a> focal main" fails with unmet dependencies (see issue: <a class="external" href="https://tracker.ceph.com/issues/45915">https://tracker.ceph.com/issues/45915</a>)</p>
<p>when I run "ceph" from a bash prompt I get the error below.</p>
<pre>
Traceback (most recent call last):
File "/usr/bin/ceph", line 1278, in <module>
retval = main()
File "/usr/bin/ceph", line 984, in main
cluster_handle = run_in_thread(rados.Rados,
File "/usr/lib/python3/dist-packages/ceph_argparse.py", line 1339, in run_in_thread
raise Exception("timed out")
Exception: timed out
</pre>
<p>Even when I run "ceph -h" I get the same error halfway through the help output. I'm new to ceph so any pointers on how I can troubleshoot this would be greatly appreciated.</p>
<p>Thanks</p> Ceph - Bug #46366 (Fix Under Review): Octopus: Recovery and backfilling causes OSDs to crash afte...https://tracker.ceph.com/issues/463662020-07-05T12:36:26ZWout van Heeswijkwout@42on.com
<p>A customer has upgraded the cluster from nautilus to octopus after experiencing issues with osds not being able to connect to each other, clients/mons/mgrs. The connectivity issues was related to the msgrV2 and require_osd_release setting not being set to nautilus. After fixing this the OSDs were restarted and all placement groups became active again.</p>
<p>After unsetting the norecover and nobackfill flag some OSDs started crashing every few minutes. The OSD log, even with high debug settings, don't seem to reveal anything, it just stops logging mid log line.</p>
<p>In the systemd journal there is the following message:</p>
<pre><code class="sh syntaxhl"><span class="CodeRay">Jul 05 13:41:50 st0.r23.spod1.rtm0.transip.io ceph-osd[92605]: *** Caught signal (Segmentation fault) **
Jul 05 13:41:50 st0.r23.spod1.rtm0.transip.io ceph-osd[92605]: in thread 557dc6fb3510 thread_name:tp_osd_tp
Jul 05 13:41:50 st0.r23.spod1.rtm0.transip.io ceph-osd[92605]: src/tcmalloc.cc:283] Attempt to free invalid pointer 0x363bbb77000
Jul 05 13:41:50 st0.r23.spod1.rtm0.transip.io ceph-osd[92605]: *** Caught signal (Aborted) **
Jul 05 13:41:50 st0.r23.spod1.rtm0.transip.io ceph-osd[92605]: in thread 557dc6fb3510 thread_name:tp_osd_tp
Jul 05 13:41:50 st0.r23.spod1.rtm0.transip.io ceph-osd[92605]: src/tcmalloc.cc:283] Attempt to free invalid pointer 0x363bbb77000
</span></code></pre>
<p>snippet of log from time around crash.</p>
<pre><code class="sh syntaxhl"><span class="CodeRay">2020-07-05T06:31:33.547+0200 7f8860296700 -1 osd.127 1496224 heartbeat_check: no reply from 10.200.19.17:6836 osd.111 since back 2020-07-05T06:28:30.776006+0200 front 2020-07-05T06:28:30.775261+0200 (oldest deadline 2020-07-05T06:28:53.0
73588+0200)
2020-07-05T06:31:33.547+0200 7f8860296700 -1 osd.127 1496224 heartbeat_check: no reply from 10.200.19.37:6901 osd.146 since back 2020-07-05T06:31:01.434299+0200 front 2020-07-05T06:31:01.434534+0200 (oldest deadline 2020-07-05T06:31:27.2
33589+0200)
2020-07-05T06:31:33.547+0200 7f8860296700 -1 osd.127 1496224 heartbeat_check: no reply from 10.200.19.38:6929 osd.180 since back 2020-07-05T06:28:18.971489+0200 front 2020-07-05T06:28:18.971597+0200 (oldest deadline 2020-07-05T06:28:50.7
71298+0200)
2020-07-05T06:31:33.547+0200 7f8860296700 -1 osd.127 1496224 heartbeat_check: no reply from 10.200.19.38:6891 osd.189 since back 2020-07-05T06:28:18.971678+0200 front 2020-07-05T06:28:18.971894+0200 (oldest deadline 2020-07-05T06:28:44.8
69635+0200)
2020-07-05T06:31:33.547+0200 7f8860296700 -1 osd.127 1496224 heartbeat_check: no reply from 10.200.19.48:6836 osd.229 since back 2020-07-05T06:31:07.237691+0200 front 2020-07-05T06:31:07.237226+0200 (oldest deadline 2020-07-05T06:31:30.7
34951+0200)
2020-07-05T06:35:04.026+0200 7ff24a7e8d80 0 set uid:gid to 64045:64045 (ceph:ceph)
2020-07-05T06:35:04.026+0200 7ff24a7e8d80 0 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable), process ceph-osd, pid 1667604
2020-07-05T06:35:04.026+0200 7ff24a7e8d80 0 pidfile_write: ignore empty --pid-file
2020-07-05T06:35:04.026+0200 7ff24a7e8d80 1 bdev create path /var/lib/ceph/osd/ceph-127/block type kernel
2020-07-05T06:35:04.026+0200 7ff24a7e8d80 1 bdev(0x55f03b8f6380 /var/lib/ceph/osd/ceph-127/block) open path /var/lib/ceph/osd/ceph-127/block
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f6380 /var/lib/ceph/osd/ceph-127/block) open size 12000134430720 (0xae9ffc00000, 11 TiB) block_size 4096 (4 KiB) rotational discard not supported
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bluestore(/var/lib/ceph/osd/ceph-127) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev create path /var/lib/ceph/osd/ceph-127/block.db type kernel
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f6a80 /var/lib/ceph/osd/ceph-127/block.db) open path /var/lib/ceph/osd/ceph-127/block.db
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f6a80 /var/lib/ceph/osd/ceph-127/block.db) open size 128849018880 (0x1e00000000, 120 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-127/block.db size 120 GiB
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev create path /var/lib/ceph/osd/ceph-127/block type kernel
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f6e00 /var/lib/ceph/osd/ceph-127/block) open path /var/lib/ceph/osd/ceph-127/block
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f6e00 /var/lib/ceph/osd/ceph-127/block) open size 12000134430720 (0xae9ffc00000, 11 TiB) block_size 4096 (4 KiB) rotational discard not supported
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-127/block size 11 TiB
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev create path /var/lib/ceph/osd/ceph-127/block.wal type kernel
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f7180 /var/lib/ceph/osd/ceph-127/block.wal) open path /var/lib/ceph/osd/ceph-127/block.wal
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f7180 /var/lib/ceph/osd/ceph-127/block.wal) open size 2147483648 (0x80000000, 2 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-127/block.wal size 2 GiB
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f7180 /var/lib/ceph/osd/ceph-127/block.wal) close
</span></code></pre>
<p>A gdb backtrace is attached that reveals some more info.</p> mgr - Bug #42655 (New): mgr-diskprediction-cloud is missing python dependencies https://tracker.ceph.com/issues/426552019-11-05T15:28:32ZKaleb KEITHLEY
<p>see <a class="external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1768017">https://bugzilla.redhat.com/show_bug.cgi?id=1768017</a></p>
<p>partial fix: add missing Requires: to mgr subpackage</p> RADOS - Bug #40772 (New): mon: pg size change delayed 1 minute because osdmap 35 delayhttps://tracker.ceph.com/issues/407722019-07-12T21:21:34ZDavid Zafmandzafman@redhat.com
<p>osd-recovery-prio.sh TEST_recovery_pool_priority fails intermittently due to a delay in recovery starting on a pg. The test requires simultaneous recovery of 2 PGs.</p>
<pre><code>ceph osd pool set $pool1 size 2<br /> ceph osd pool set $pool2 size 2</code></pre>
Running the test alone doesn't necessarily reproduce the problem which I saw when running all standalone tests (although they are run sequentially).
<ol>
<li>../qa/run-standalone.sh "osd-recovery-prio.sh TEST_recovery_pool_priority"</li>
</ol>
<p>No pg[2.0 messages for almost 1 minute. Map 35 seems to be the one that has the size change.\</p>
<pre>
2019-07-12T00:29:49.072-0700 7fc35220f700 10 osd.0 pg_epoch: 34 pg[2.0( v 31'600 (0'0,31'600] local-lis/les=29/30 n=200 ec=15/15 lis/c 29/29 les/c/f 30/30/0 29/29/15) [0] r=0 lpr=29 crt=31'600 lcod 31'599 mlcod 31'599 active+clean] do_peering_event: epoch_sent: 34 epoch_requested: 34 NullEvt
2019-07-12T00:30:44.588-0700 7fc35220f700 10 osd.0 pg_epoch: 34 pg[2.0( v 31'600 (0'0,31'600] local-lis/les=29/30 n=200 ec=15/15 lis/c 29/29 les/c/f 30/30/0 29/29/15) [0] r=0 lpr=29 crt=31'600 lcod 31'599 mlcod 31'599 active+clean] handle_advance_map: 35
</pre>
"message": "pg 2.0 is stuck undersized for 61.208213, current state active+recovering+undersized+degraded+remapped, last acting [0]" <br /> "message": "pg 2.0 is stuck undersized for 63.209241, current state active+recovering+undersized+degraded+remapped, last acting [0]" RADOS - Tasks #25186 (In Progress): setup repo for building dependencies like boost, rocksdb, whi...https://tracker.ceph.com/issues/251862018-07-31T03:57:05ZKefu Chaitchaikov@gmail.com
<p>we need to build boost, spdk, dpdk, fio, rocksdb, gperftools, seastar for preparing the build dependencies for each PR of ceph, and for each CI build. the list grows overtime. it'd be much more efficient if we can cache the built artifacts in a repo.</p>
<p>we could use ppa or chacra for hosting the repo. and update the ceph-build and cmake scripts accordingly to pick up these pre-built packages.</p>
<p>action items:</p>
<ul>
<li>package ceph-libboost for centos on amd64/aarch64</li>
<li>package c-ares, libfmt, zstd and rocksdb.</li>
</ul> mgr - Bug #24614 (New): luminous: AssertionError: Lists differ in test_selftest_command_spam()https://tracker.ceph.com/issues/246142018-06-21T23:51:11ZNeha Ojhanojha@redhat.com
<pre>
2018-06-11T22:05:27.606 INFO:tasks.cephfs_test_runner:======================================================================
2018-06-11T22:05:27.606 INFO:tasks.cephfs_test_runner:FAIL: test_selftest_command_spam (tasks.mgr.test_module_selftest.TestModuleSelftest)
2018-06-11T22:05:27.606 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2018-06-11T22:05:27.606 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2018-06-11T22:05:27.606 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-yuri3-testing-2018-06-11-1421-luminous/qa/tasks/mgr/test_module_selftest.py", line 79, in test_selftest_command_spam
2018-06-11T22:05:27.607 INFO:tasks.cephfs_test_runner: self.assertEqual(original_standbys, self.mgr_cluster.get_standby_ids())
2018-06-11T22:05:27.607 INFO:tasks.cephfs_test_runner:AssertionError: Lists differ: [u'z', u'y'] != [u'z']
2018-06-11T22:05:27.607 INFO:tasks.cephfs_test_runner:
2018-06-11T22:05:27.607 INFO:tasks.cephfs_test_runner:First list contains 1 additional elements.
2018-06-11T22:05:27.608 INFO:tasks.cephfs_test_runner:First extra element 1:
2018-06-11T22:05:27.608 INFO:tasks.cephfs_test_runner:u'y'
2018-06-11T22:05:27.608 INFO:tasks.cephfs_test_runner:
2018-06-11T22:05:27.608 INFO:tasks.cephfs_test_runner:- [u'z', u'y']
2018-06-11T22:05:27.608 INFO:tasks.cephfs_test_runner:+ [u'z']
2018-06-11T22:05:27.608 INFO:tasks.cephfs_test_runner:
2018-06-11T22:05:27.608 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2018-06-11T22:05:27.609 INFO:tasks.cephfs_test_runner:Ran 3 tests in 185.305s
2018-06-11T22:05:27.609 INFO:tasks.cephfs_test_runner:
2018-06-11T22:05:27.609 INFO:tasks.cephfs_test_runner:FAILED (failures=1)
2018-06-11T22:05:27.609 INFO:tasks.cephfs_test_runner:
</pre>
<p>/a/yuriw-2018-06-11_16:27:32-rados-wip-yuri3-testing-2018-06-11-1421-luminous-distro-basic-smithi/2654731</p> RADOS - Bug #24515 (New): "[WRN] Health check failed: 1 slow ops, oldest one blocked for 32 sec, ...https://tracker.ceph.com/issues/245152018-06-13T20:13:58ZYuri Weinsteinyweinste@redhat.com
<p>This seems to be rhel specific</p>
<p>Run: <a class="external" href="http://pulpito.ceph.com/yuriw-2018-06-12_21:09:43-fs-master-distro-basic-smithi/">http://pulpito.ceph.com/yuriw-2018-06-12_21:09:43-fs-master-distro-basic-smithi/</a><br />Jobs: '2659781', '2659792', '2659770', '2659840'<br />Logs: <a class="external" href="http://qa-proxy.ceph.com/teuthology/yuriw-2018-06-12_21:09:43-fs-master-distro-basic-smithi/2659781/teuthology.log">http://qa-proxy.ceph.com/teuthology/yuriw-2018-06-12_21:09:43-fs-master-distro-basic-smithi/2659781/teuthology.log</a></p>
<pre>
2018-06-13T03:57:47.950 INFO:teuthology.orchestra.run.smithi075.stdout:2018-06-13 03:05:37.853886 mon.a (mon.0) 197 : cluster [WRN] Health check failed: 1 slow ops, oldest one blocked for 32 sec, mon.c has slow ops (SLOW_OPS)
....
2018-06-13T04:09:34.350 INFO:teuthology.run:Summary data:
{description: 'fs/snaps/{begin.yaml clusters/fixed-2-ucephfs.yaml mount/fuse.yaml
objectstore-ec/bluestore-comp.yaml overrides/{debug.yaml frag_enable.yaml whitelist_health.yaml
whitelist_wrongly_marked_down.yaml} supported-random-distros$/{rhel_latest.yaml}
tasks/snaptests.yaml}', duration: 4418.809953927994, failure_reason: '"2018-06-13
03:05:37.853886 mon.a (mon.0) 197 : cluster [WRN] Health check failed: 1 slow
ops, oldest one blocked for 32 sec, mon.c has slow ops (SLOW_OPS)" in cluster
log', flavor: basic, owner: scheduled_yuriw@teuthology, success: false}
</pre> mgr - Bug #23756 (New): mgr is not updated with latest pgmap in qa/workunits/rados/test_large_oma...https://tracker.ceph.com/issues/237562018-04-16T07:24:08ZKefu Chaitchaikov@gmail.com
<p>see <a class="external" href="http://pulpito.ceph.com/kchai-2018-04-12_15:56:07-rados-wip-kefu-testing-2018-04-12-2211-distro-basic-mira/2390083/">http://pulpito.ceph.com/kchai-2018-04-12_15:56:07-rados-wip-kefu-testing-2018-04-12-2211-distro-basic-mira/2390083/</a></p>
<p>need to test w/o <a class="external" href="https://github.com/ceph/ceph/pull/21410">https://github.com/ceph/ceph/pull/21410</a>. before this PR, we instruct mgr with the OSD id to perform the scrub, if the mgr does not possess the latest pgmap, it will fail to send OSD all the pgs to be scrub.</p>
<p>that's the case of kchai-2018-04-12_15:56:07-rados-wip-kefu-testing-2018-04-12-2211-distro-basic-mira/2390083.</p> mgr - Bug #23136 (In Progress): mgr: disable then enable of Restful plugin does not work without ...https://tracker.ceph.com/issues/231362018-02-26T15:42:19ZHans van den Bogerthansbogert@gmail.com
<p>The restful plugin does not work after a disable/enable cycle:</p>
<blockquote>
<p>curl -k <a class="external" href="https://mon03:8003">https://mon03:8003</a>
{<br />"api_version": 1,<br />"auth": "Use \"ceph restful create-key <key>\" to create a key pair, pass it as HTTP Basic auth to authenticate",<br />"doc": "See /doc endpoint",<br />"info": "Ceph Manager RESTful API server" <br />cephadmin@mon03:~$ ceph mgr module disable restful<br />cephadmin@mon03:~$ ceph mgr module enable restful<br />cephadmin@mon03:~$ curl --connect-timeout 20 -k <a class="external" href="https://mon03:8003">https://mon03:8003</a><br />curl: (28) Operation timed out after 0 milliseconds with 0 out of 0 bytes received<br />cephadmin@mon03:~$</p>
</blockquote> Ceph - Tasks #12797 (In Progress): create the upgrade test suite for gmt and sortbitmap changehttps://tracker.ceph.com/issues/127972015-08-26T16:19:39ZKefu Chaitchaikov@gmail.com
<p>for more context, see <a class="external" href="http://tracker.ceph.com/issues/9732#note-16">http://tracker.ceph.com/issues/9732#note-16</a></p> Ceph - Fix #10877 (In Progress): CLI error numbers are not described anywherehttps://tracker.ceph.com/issues/108772015-02-13T16:37:54ZAlfredo Dezaadeza@redhat.com
<p>While trying to run the following command:</p>
<pre>
rados lspools
</pre>
<p>I got this as the last line of the output:</p>
<pre>
couldn't connect to cluster! error -2
</pre>
<p>The man page doesn't list error numbers and searching around I saw mailing list threads that showed `-1` as well.</p>
<p>I understand that there is an error connecting to the cluster, but if the CLI is using error numbers, those should be<br />listed/explained somewhere, if possible in the man page too.</p>