Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2024-03-07T11:48:12ZCeph
Redmine RADOS - Bug #64788 (Fix Under Review): EpollDriver::del_event() crashes when the nic is unpluggedhttps://tracker.ceph.com/issues/647882024-03-07T11:48:12ZKefu Chaitchaikov@gmail.com
<p>librbd uses msgr to talk to its Ceph cluster. if the client's nic is hot unplugged, there is chance that <code>EpollDriver::del_event()</code> crashes because <code>epoll_ctl(epfd, EPOLL_CTL_DEL, fd, &……)</code> returns <code>-ENOENT</code>. as its caller, <code>EventCenter::delete_file_event()</code> considers its a signal of a bug.</p> Ceph - Backport #64508 (New): quincy: Debian bookworm package needs to explicitly specify cephadm...https://tracker.ceph.com/issues/645082024-02-20T17:57:40ZBackport BotCeph - Bug #64069 (Pending Backport): Debian bookworm package needs to explicitly specify cephadm...https://tracker.ceph.com/issues/640692024-01-17T18:46:25ZChris Palmer
<p>The bookworm/reef cephadm package needs updating to accommodate the last change in /usr/share/doc/adduser/NEWS.Debian.gz:</p>
<pre><code>System user home defaults to /nonexistent if --home is not specified.<br /> Packages that call adduser to create system accounts should explicitly<br /> specify a location for /home (see Lintian check<br /> maintainer-script-lacks-home-in-adduser).</code></pre>
<p>i.e. when creating the cephadm user as a system user it needs to explicitly specify the expected home directory of /home/cephadm.</p>
<p>A workaround is to manually create the user+directory before installing ceph.</p>
<p>Kefu Chai has created PR55218 to address this.</p> RADOS - Bug #50012 (Fix Under Review): Ceph-osd refuses to bind on an IP on the local loopback lo...https://tracker.ceph.com/issues/500122021-03-26T12:10:11ZKefu Chaitchaikov@gmail.comRADOS - Bug #49359 (New): osd: warning: unused variablehttps://tracker.ceph.com/issues/493592021-02-18T17:17:22ZPatrick Donnellypdonnell@redhat.com
<pre>
/build/ceph-17.0.0-888-g5ce097b4/src/osd/osd_types.h: In member function 'void object_ref_delta_t::mut_ref(const hobject_t&, int)':
/build/ceph-17.0.0-888-g5ce097b4/src/osd/osd_types.h:5577:35: warning: unused variable '_' [-Wunused-variable]
[[maybe_unused]] auto [iter, _] = ref_delta.try_emplace(hoid, 0);
^
</pre>
<p>From: <a class="external" href="https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=bionic,DIST=bionic,MACHINE_SIZE=gigantic/46981//consoleFull">https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=bionic,DIST=bionic,MACHINE_SIZE=gigantic/46981//consoleFull</a></p> Ceph - Bug #48498 (Fix Under Review): octopus: timeout when running the "ceph" commandhttps://tracker.ceph.com/issues/484982020-12-08T16:00:28ZMathew Clarke
<p>Running "Ubuntu 20.04.1" with kernel "5.4.77-217"</p>
<p>I've installed octopus version 15.2.5 from distro armhf repo as "deb-src <a class="external" href="https://download.ceph.com/debian-octopus/">https://download.ceph.com/debian-octopus/</a> focal main" fails with unmet dependencies (see issue: <a class="external" href="https://tracker.ceph.com/issues/45915">https://tracker.ceph.com/issues/45915</a>)</p>
<p>when I run "ceph" from a bash prompt I get the error below.</p>
<pre>
Traceback (most recent call last):
File "/usr/bin/ceph", line 1278, in <module>
retval = main()
File "/usr/bin/ceph", line 984, in main
cluster_handle = run_in_thread(rados.Rados,
File "/usr/lib/python3/dist-packages/ceph_argparse.py", line 1339, in run_in_thread
raise Exception("timed out")
Exception: timed out
</pre>
<p>Even when I run "ceph -h" I get the same error halfway through the help output. I'm new to ceph so any pointers on how I can troubleshoot this would be greatly appreciated.</p>
<p>Thanks</p> Ceph - Bug #46366 (Fix Under Review): Octopus: Recovery and backfilling causes OSDs to crash afte...https://tracker.ceph.com/issues/463662020-07-05T12:36:26ZWout van Heeswijkwout@42on.com
<p>A customer has upgraded the cluster from nautilus to octopus after experiencing issues with osds not being able to connect to each other, clients/mons/mgrs. The connectivity issues was related to the msgrV2 and require_osd_release setting not being set to nautilus. After fixing this the OSDs were restarted and all placement groups became active again.</p>
<p>After unsetting the norecover and nobackfill flag some OSDs started crashing every few minutes. The OSD log, even with high debug settings, don't seem to reveal anything, it just stops logging mid log line.</p>
<p>In the systemd journal there is the following message:</p>
<pre><code class="sh syntaxhl"><span class="CodeRay">Jul 05 13:41:50 st0.r23.spod1.rtm0.transip.io ceph-osd[92605]: *** Caught signal (Segmentation fault) **
Jul 05 13:41:50 st0.r23.spod1.rtm0.transip.io ceph-osd[92605]: in thread 557dc6fb3510 thread_name:tp_osd_tp
Jul 05 13:41:50 st0.r23.spod1.rtm0.transip.io ceph-osd[92605]: src/tcmalloc.cc:283] Attempt to free invalid pointer 0x363bbb77000
Jul 05 13:41:50 st0.r23.spod1.rtm0.transip.io ceph-osd[92605]: *** Caught signal (Aborted) **
Jul 05 13:41:50 st0.r23.spod1.rtm0.transip.io ceph-osd[92605]: in thread 557dc6fb3510 thread_name:tp_osd_tp
Jul 05 13:41:50 st0.r23.spod1.rtm0.transip.io ceph-osd[92605]: src/tcmalloc.cc:283] Attempt to free invalid pointer 0x363bbb77000
</span></code></pre>
<p>snippet of log from time around crash.</p>
<pre><code class="sh syntaxhl"><span class="CodeRay">2020-07-05T06:31:33.547+0200 7f8860296700 -1 osd.127 1496224 heartbeat_check: no reply from 10.200.19.17:6836 osd.111 since back 2020-07-05T06:28:30.776006+0200 front 2020-07-05T06:28:30.775261+0200 (oldest deadline 2020-07-05T06:28:53.0
73588+0200)
2020-07-05T06:31:33.547+0200 7f8860296700 -1 osd.127 1496224 heartbeat_check: no reply from 10.200.19.37:6901 osd.146 since back 2020-07-05T06:31:01.434299+0200 front 2020-07-05T06:31:01.434534+0200 (oldest deadline 2020-07-05T06:31:27.2
33589+0200)
2020-07-05T06:31:33.547+0200 7f8860296700 -1 osd.127 1496224 heartbeat_check: no reply from 10.200.19.38:6929 osd.180 since back 2020-07-05T06:28:18.971489+0200 front 2020-07-05T06:28:18.971597+0200 (oldest deadline 2020-07-05T06:28:50.7
71298+0200)
2020-07-05T06:31:33.547+0200 7f8860296700 -1 osd.127 1496224 heartbeat_check: no reply from 10.200.19.38:6891 osd.189 since back 2020-07-05T06:28:18.971678+0200 front 2020-07-05T06:28:18.971894+0200 (oldest deadline 2020-07-05T06:28:44.8
69635+0200)
2020-07-05T06:31:33.547+0200 7f8860296700 -1 osd.127 1496224 heartbeat_check: no reply from 10.200.19.48:6836 osd.229 since back 2020-07-05T06:31:07.237691+0200 front 2020-07-05T06:31:07.237226+0200 (oldest deadline 2020-07-05T06:31:30.7
34951+0200)
2020-07-05T06:35:04.026+0200 7ff24a7e8d80 0 set uid:gid to 64045:64045 (ceph:ceph)
2020-07-05T06:35:04.026+0200 7ff24a7e8d80 0 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable), process ceph-osd, pid 1667604
2020-07-05T06:35:04.026+0200 7ff24a7e8d80 0 pidfile_write: ignore empty --pid-file
2020-07-05T06:35:04.026+0200 7ff24a7e8d80 1 bdev create path /var/lib/ceph/osd/ceph-127/block type kernel
2020-07-05T06:35:04.026+0200 7ff24a7e8d80 1 bdev(0x55f03b8f6380 /var/lib/ceph/osd/ceph-127/block) open path /var/lib/ceph/osd/ceph-127/block
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f6380 /var/lib/ceph/osd/ceph-127/block) open size 12000134430720 (0xae9ffc00000, 11 TiB) block_size 4096 (4 KiB) rotational discard not supported
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bluestore(/var/lib/ceph/osd/ceph-127) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev create path /var/lib/ceph/osd/ceph-127/block.db type kernel
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f6a80 /var/lib/ceph/osd/ceph-127/block.db) open path /var/lib/ceph/osd/ceph-127/block.db
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f6a80 /var/lib/ceph/osd/ceph-127/block.db) open size 128849018880 (0x1e00000000, 120 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-127/block.db size 120 GiB
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev create path /var/lib/ceph/osd/ceph-127/block type kernel
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f6e00 /var/lib/ceph/osd/ceph-127/block) open path /var/lib/ceph/osd/ceph-127/block
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f6e00 /var/lib/ceph/osd/ceph-127/block) open size 12000134430720 (0xae9ffc00000, 11 TiB) block_size 4096 (4 KiB) rotational discard not supported
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-127/block size 11 TiB
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev create path /var/lib/ceph/osd/ceph-127/block.wal type kernel
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f7180 /var/lib/ceph/osd/ceph-127/block.wal) open path /var/lib/ceph/osd/ceph-127/block.wal
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f7180 /var/lib/ceph/osd/ceph-127/block.wal) open size 2147483648 (0x80000000, 2 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-127/block.wal size 2 GiB
2020-07-05T06:35:04.030+0200 7ff24a7e8d80 1 bdev(0x55f03b8f7180 /var/lib/ceph/osd/ceph-127/block.wal) close
</span></code></pre>
<p>A gdb backtrace is attached that reveals some more info.</p> RADOS - Bug #45457 (Pending Backport): CEPH Graylog Logging Missing "host" Fieldhttps://tracker.ceph.com/issues/454572020-05-10T02:39:52ZDaniel Neilson
<p>Hello,</p>
<p>I have tried sending CEPH logs to Graylog with the following configuration:</p>
<p>mon_cluster_log_to_graylog = true<br />mon_cluster_log_to_graylog host = 10.50.100.70<br />mon_cluster_log_to_graylog port = 12202</p>
<p>The logs in the server.log entries in Graylog is showing:</p>
<blockquote>
<p>2020-05-09T20:28:26.063-06:00 ERROR [DecodingProcessor] Error processing message RawMessage{id=ed6876e1-9265-11ea-bc94-1ad2db8b5489, journalOffset=5055367, codec=gelf, payloadSize=317, timestamp=2020-05-10T02:28:26.062Z, remoteAddress=/10.10.10.12:36509}<br />java.lang.IllegalArgumentException: GELF message <ed6876e1-9265-11ea-bc94-1ad2db8b5489> (received from <10.10.10.12:36509>) has empty mandatory "host" field.</p>
</blockquote>
<p>There does not appear to be any other options that can be enabled for Graylog logging in CEPH, not sure what else to try.</p> teuthology - Bug #45384 (Fix Under Review): bootstrapping teuthology on Ubuntu 20.04 does not workhttps://tracker.ceph.com/issues/453842020-05-05T12:39:31ZMichael Roesch
<p>We try to get teuthology working on Ubuntu 20.04, following this guide: <a class="external" href="https://docs.ceph.com/teuthology/docs/LAB_SETUP.html">https://docs.ceph.com/teuthology/docs/LAB_SETUP.html</a>. However we are currently stuck on the Worker part, i.e. when we try to execute "worker_start magna 1", after it git clones the repo, it tries to execute the bootstrap script, which fails, because the packages that the bootstrap script is expecting, are named differently. <br />Changing the bootstrap script also does not work, because on every start, the repo gets cloned again and overwrites the changes.</p> mgr - Bug #42655 (New): mgr-diskprediction-cloud is missing python dependencies https://tracker.ceph.com/issues/426552019-11-05T15:28:32ZKaleb KEITHLEY
<p>see <a class="external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1768017">https://bugzilla.redhat.com/show_bug.cgi?id=1768017</a></p>
<p>partial fix: add missing Requires: to mgr subpackage</p> RADOS - Tasks #25186 (In Progress): setup repo for building dependencies like boost, rocksdb, whi...https://tracker.ceph.com/issues/251862018-07-31T03:57:05ZKefu Chaitchaikov@gmail.com
<p>we need to build boost, spdk, dpdk, fio, rocksdb, gperftools, seastar for preparing the build dependencies for each PR of ceph, and for each CI build. the list grows overtime. it'd be much more efficient if we can cache the built artifacts in a repo.</p>
<p>we could use ppa or chacra for hosting the repo. and update the ceph-build and cmake scripts accordingly to pick up these pre-built packages.</p>
<p>action items:</p>
<ul>
<li>package ceph-libboost for centos on amd64/aarch64</li>
<li>package c-ares, libfmt, zstd and rocksdb.</li>
</ul> mgr - Bug #23136 (In Progress): mgr: disable then enable of Restful plugin does not work without ...https://tracker.ceph.com/issues/231362018-02-26T15:42:19ZHans van den Bogerthansbogert@gmail.com
<p>The restful plugin does not work after a disable/enable cycle:</p>
<blockquote>
<p>curl -k <a class="external" href="https://mon03:8003">https://mon03:8003</a>
{<br />"api_version": 1,<br />"auth": "Use \"ceph restful create-key <key>\" to create a key pair, pass it as HTTP Basic auth to authenticate",<br />"doc": "See /doc endpoint",<br />"info": "Ceph Manager RESTful API server" <br />cephadmin@mon03:~$ ceph mgr module disable restful<br />cephadmin@mon03:~$ ceph mgr module enable restful<br />cephadmin@mon03:~$ curl --connect-timeout 20 -k <a class="external" href="https://mon03:8003">https://mon03:8003</a><br />curl: (28) Operation timed out after 0 milliseconds with 0 out of 0 bytes received<br />cephadmin@mon03:~$</p>
</blockquote> RADOS - Bug #22233 (In Progress): prime_pg_temp breaks on uncreated pgshttps://tracker.ceph.com/issues/222332017-11-24T08:40:53ZKefu Chaitchaikov@gmail.com
<ol>
<li>mon.b instructed osd.3 to create pg 92.4. the upset was [3,6]</li>
<li>osd.3 created pg 92.4, and sent "created" message to mon</li>
<li>but osd.3 was pending on up_thru message from mon, it wanted its up_thru to be 92 in osdmap, but currently 89.</li>
<li>osd.3 was killed, marked down and out by the thrashosd task</li>
<li>mon.b primed pg temp for pg 92.4, and mapped it to [4,6], but osd.4 is not updated with osd_pg_create message.</li>
<li>so wait_for_clean task times out, and fail</li>
</ol>
<p>/a/kchai-2017-11-23_12:43:54-rados-wip-kefu-testing-2017-11-23-1812-distro-basic-mira/1881963</p> Ceph - Tasks #12797 (In Progress): create the upgrade test suite for gmt and sortbitmap changehttps://tracker.ceph.com/issues/127972015-08-26T16:19:39ZKefu Chaitchaikov@gmail.com
<p>for more context, see <a class="external" href="http://tracker.ceph.com/issues/9732#note-16">http://tracker.ceph.com/issues/9732#note-16</a></p> Ceph - Fix #10877 (In Progress): CLI error numbers are not described anywherehttps://tracker.ceph.com/issues/108772015-02-13T16:37:54ZAlfredo Dezaadeza@redhat.com
<p>While trying to run the following command:</p>
<pre>
rados lspools
</pre>
<p>I got this as the last line of the output:</p>
<pre>
couldn't connect to cluster! error -2
</pre>
<p>The man page doesn't list error numbers and searching around I saw mailing list threads that showed `-1` as well.</p>
<p>I understand that there is an error connecting to the cluster, but if the CLI is using error numbers, those should be<br />listed/explained somewhere, if possible in the man page too.</p>