Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2024-03-07T11:48:12ZCeph
Redmine RADOS - Bug #64788 (Fix Under Review): EpollDriver::del_event() crashes when the nic is unpluggedhttps://tracker.ceph.com/issues/647882024-03-07T11:48:12ZKefu Chaitchaikov@gmail.com
<p>librbd uses msgr to talk to its Ceph cluster. if the client's nic is hot unplugged, there is chance that <code>EpollDriver::del_event()</code> crashes because <code>epoll_ctl(epfd, EPOLL_CTL_DEL, fd, &……)</code> returns <code>-ENOENT</code>. as its caller, <code>EventCenter::delete_file_event()</code> considers its a signal of a bug.</p> Ceph - Bug #64069 (Pending Backport): Debian bookworm package needs to explicitly specify cephadm...https://tracker.ceph.com/issues/640692024-01-17T18:46:25ZChris Palmer
<p>The bookworm/reef cephadm package needs updating to accommodate the last change in /usr/share/doc/adduser/NEWS.Debian.gz:</p>
<pre><code>System user home defaults to /nonexistent if --home is not specified.<br /> Packages that call adduser to create system accounts should explicitly<br /> specify a location for /home (see Lintian check<br /> maintainer-script-lacks-home-in-adduser).</code></pre>
<p>i.e. when creating the cephadm user as a system user it needs to explicitly specify the expected home directory of /home/cephadm.</p>
<p>A workaround is to manually create the user+directory before installing ceph.</p>
<p>Kefu Chai has created PR55218 to address this.</p> RADOS - Bug #50012 (Fix Under Review): Ceph-osd refuses to bind on an IP on the local loopback lo...https://tracker.ceph.com/issues/500122021-03-26T12:10:11ZKefu Chaitchaikov@gmail.comRADOS - Bug #49359 (New): osd: warning: unused variablehttps://tracker.ceph.com/issues/493592021-02-18T17:17:22ZPatrick Donnellypdonnell@redhat.com
<pre>
/build/ceph-17.0.0-888-g5ce097b4/src/osd/osd_types.h: In member function 'void object_ref_delta_t::mut_ref(const hobject_t&, int)':
/build/ceph-17.0.0-888-g5ce097b4/src/osd/osd_types.h:5577:35: warning: unused variable '_' [-Wunused-variable]
[[maybe_unused]] auto [iter, _] = ref_delta.try_emplace(hoid, 0);
^
</pre>
<p>From: <a class="external" href="https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=bionic,DIST=bionic,MACHINE_SIZE=gigantic/46981//consoleFull">https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=bionic,DIST=bionic,MACHINE_SIZE=gigantic/46981//consoleFull</a></p> Ceph - Bug #48498 (Fix Under Review): octopus: timeout when running the "ceph" commandhttps://tracker.ceph.com/issues/484982020-12-08T16:00:28ZMathew Clarke
<p>Running "Ubuntu 20.04.1" with kernel "5.4.77-217"</p>
<p>I've installed octopus version 15.2.5 from distro armhf repo as "deb-src <a class="external" href="https://download.ceph.com/debian-octopus/">https://download.ceph.com/debian-octopus/</a> focal main" fails with unmet dependencies (see issue: <a class="external" href="https://tracker.ceph.com/issues/45915">https://tracker.ceph.com/issues/45915</a>)</p>
<p>when I run "ceph" from a bash prompt I get the error below.</p>
<pre>
Traceback (most recent call last):
File "/usr/bin/ceph", line 1278, in <module>
retval = main()
File "/usr/bin/ceph", line 984, in main
cluster_handle = run_in_thread(rados.Rados,
File "/usr/lib/python3/dist-packages/ceph_argparse.py", line 1339, in run_in_thread
raise Exception("timed out")
Exception: timed out
</pre>
<p>Even when I run "ceph -h" I get the same error halfway through the help output. I'm new to ceph so any pointers on how I can troubleshoot this would be greatly appreciated.</p>
<p>Thanks</p> RADOS - Bug #45457 (Pending Backport): CEPH Graylog Logging Missing "host" Fieldhttps://tracker.ceph.com/issues/454572020-05-10T02:39:52ZDaniel Neilson
<p>Hello,</p>
<p>I have tried sending CEPH logs to Graylog with the following configuration:</p>
<p>mon_cluster_log_to_graylog = true<br />mon_cluster_log_to_graylog host = 10.50.100.70<br />mon_cluster_log_to_graylog port = 12202</p>
<p>The logs in the server.log entries in Graylog is showing:</p>
<blockquote>
<p>2020-05-09T20:28:26.063-06:00 ERROR [DecodingProcessor] Error processing message RawMessage{id=ed6876e1-9265-11ea-bc94-1ad2db8b5489, journalOffset=5055367, codec=gelf, payloadSize=317, timestamp=2020-05-10T02:28:26.062Z, remoteAddress=/10.10.10.12:36509}<br />java.lang.IllegalArgumentException: GELF message <ed6876e1-9265-11ea-bc94-1ad2db8b5489> (received from <10.10.10.12:36509>) has empty mandatory "host" field.</p>
</blockquote>
<p>There does not appear to be any other options that can be enabled for Graylog logging in CEPH, not sure what else to try.</p> teuthology - Bug #45384 (Fix Under Review): bootstrapping teuthology on Ubuntu 20.04 does not workhttps://tracker.ceph.com/issues/453842020-05-05T12:39:31ZMichael Roesch
<p>We try to get teuthology working on Ubuntu 20.04, following this guide: <a class="external" href="https://docs.ceph.com/teuthology/docs/LAB_SETUP.html">https://docs.ceph.com/teuthology/docs/LAB_SETUP.html</a>. However we are currently stuck on the Worker part, i.e. when we try to execute "worker_start magna 1", after it git clones the repo, it tries to execute the bootstrap script, which fails, because the packages that the bootstrap script is expecting, are named differently. <br />Changing the bootstrap script also does not work, because on every start, the repo gets cloned again and overwrites the changes.</p> mgr - Bug #42655 (New): mgr-diskprediction-cloud is missing python dependencies https://tracker.ceph.com/issues/426552019-11-05T15:28:32ZKaleb KEITHLEY
<p>see <a class="external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1768017">https://bugzilla.redhat.com/show_bug.cgi?id=1768017</a></p>
<p>partial fix: add missing Requires: to mgr subpackage</p> RADOS - Bug #40772 (New): mon: pg size change delayed 1 minute because osdmap 35 delayhttps://tracker.ceph.com/issues/407722019-07-12T21:21:34ZDavid Zafmandzafman@redhat.com
<p>osd-recovery-prio.sh TEST_recovery_pool_priority fails intermittently due to a delay in recovery starting on a pg. The test requires simultaneous recovery of 2 PGs.</p>
<pre><code>ceph osd pool set $pool1 size 2<br /> ceph osd pool set $pool2 size 2</code></pre>
Running the test alone doesn't necessarily reproduce the problem which I saw when running all standalone tests (although they are run sequentially).
<ol>
<li>../qa/run-standalone.sh "osd-recovery-prio.sh TEST_recovery_pool_priority"</li>
</ol>
<p>No pg[2.0 messages for almost 1 minute. Map 35 seems to be the one that has the size change.\</p>
<pre>
2019-07-12T00:29:49.072-0700 7fc35220f700 10 osd.0 pg_epoch: 34 pg[2.0( v 31'600 (0'0,31'600] local-lis/les=29/30 n=200 ec=15/15 lis/c 29/29 les/c/f 30/30/0 29/29/15) [0] r=0 lpr=29 crt=31'600 lcod 31'599 mlcod 31'599 active+clean] do_peering_event: epoch_sent: 34 epoch_requested: 34 NullEvt
2019-07-12T00:30:44.588-0700 7fc35220f700 10 osd.0 pg_epoch: 34 pg[2.0( v 31'600 (0'0,31'600] local-lis/les=29/30 n=200 ec=15/15 lis/c 29/29 les/c/f 30/30/0 29/29/15) [0] r=0 lpr=29 crt=31'600 lcod 31'599 mlcod 31'599 active+clean] handle_advance_map: 35
</pre>
"message": "pg 2.0 is stuck undersized for 61.208213, current state active+recovering+undersized+degraded+remapped, last acting [0]" <br /> "message": "pg 2.0 is stuck undersized for 63.209241, current state active+recovering+undersized+degraded+remapped, last acting [0]" mgr - Bug #24614 (New): luminous: AssertionError: Lists differ in test_selftest_command_spam()https://tracker.ceph.com/issues/246142018-06-21T23:51:11ZNeha Ojhanojha@redhat.com
<pre>
2018-06-11T22:05:27.606 INFO:tasks.cephfs_test_runner:======================================================================
2018-06-11T22:05:27.606 INFO:tasks.cephfs_test_runner:FAIL: test_selftest_command_spam (tasks.mgr.test_module_selftest.TestModuleSelftest)
2018-06-11T22:05:27.606 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2018-06-11T22:05:27.606 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2018-06-11T22:05:27.606 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-yuri3-testing-2018-06-11-1421-luminous/qa/tasks/mgr/test_module_selftest.py", line 79, in test_selftest_command_spam
2018-06-11T22:05:27.607 INFO:tasks.cephfs_test_runner: self.assertEqual(original_standbys, self.mgr_cluster.get_standby_ids())
2018-06-11T22:05:27.607 INFO:tasks.cephfs_test_runner:AssertionError: Lists differ: [u'z', u'y'] != [u'z']
2018-06-11T22:05:27.607 INFO:tasks.cephfs_test_runner:
2018-06-11T22:05:27.607 INFO:tasks.cephfs_test_runner:First list contains 1 additional elements.
2018-06-11T22:05:27.608 INFO:tasks.cephfs_test_runner:First extra element 1:
2018-06-11T22:05:27.608 INFO:tasks.cephfs_test_runner:u'y'
2018-06-11T22:05:27.608 INFO:tasks.cephfs_test_runner:
2018-06-11T22:05:27.608 INFO:tasks.cephfs_test_runner:- [u'z', u'y']
2018-06-11T22:05:27.608 INFO:tasks.cephfs_test_runner:+ [u'z']
2018-06-11T22:05:27.608 INFO:tasks.cephfs_test_runner:
2018-06-11T22:05:27.608 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2018-06-11T22:05:27.609 INFO:tasks.cephfs_test_runner:Ran 3 tests in 185.305s
2018-06-11T22:05:27.609 INFO:tasks.cephfs_test_runner:
2018-06-11T22:05:27.609 INFO:tasks.cephfs_test_runner:FAILED (failures=1)
2018-06-11T22:05:27.609 INFO:tasks.cephfs_test_runner:
</pre>
<p>/a/yuriw-2018-06-11_16:27:32-rados-wip-yuri3-testing-2018-06-11-1421-luminous-distro-basic-smithi/2654731</p> RADOS - Bug #24515 (New): "[WRN] Health check failed: 1 slow ops, oldest one blocked for 32 sec, ...https://tracker.ceph.com/issues/245152018-06-13T20:13:58ZYuri Weinsteinyweinste@redhat.com
<p>This seems to be rhel specific</p>
<p>Run: <a class="external" href="http://pulpito.ceph.com/yuriw-2018-06-12_21:09:43-fs-master-distro-basic-smithi/">http://pulpito.ceph.com/yuriw-2018-06-12_21:09:43-fs-master-distro-basic-smithi/</a><br />Jobs: '2659781', '2659792', '2659770', '2659840'<br />Logs: <a class="external" href="http://qa-proxy.ceph.com/teuthology/yuriw-2018-06-12_21:09:43-fs-master-distro-basic-smithi/2659781/teuthology.log">http://qa-proxy.ceph.com/teuthology/yuriw-2018-06-12_21:09:43-fs-master-distro-basic-smithi/2659781/teuthology.log</a></p>
<pre>
2018-06-13T03:57:47.950 INFO:teuthology.orchestra.run.smithi075.stdout:2018-06-13 03:05:37.853886 mon.a (mon.0) 197 : cluster [WRN] Health check failed: 1 slow ops, oldest one blocked for 32 sec, mon.c has slow ops (SLOW_OPS)
....
2018-06-13T04:09:34.350 INFO:teuthology.run:Summary data:
{description: 'fs/snaps/{begin.yaml clusters/fixed-2-ucephfs.yaml mount/fuse.yaml
objectstore-ec/bluestore-comp.yaml overrides/{debug.yaml frag_enable.yaml whitelist_health.yaml
whitelist_wrongly_marked_down.yaml} supported-random-distros$/{rhel_latest.yaml}
tasks/snaptests.yaml}', duration: 4418.809953927994, failure_reason: '"2018-06-13
03:05:37.853886 mon.a (mon.0) 197 : cluster [WRN] Health check failed: 1 slow
ops, oldest one blocked for 32 sec, mon.c has slow ops (SLOW_OPS)" in cluster
log', flavor: basic, owner: scheduled_yuriw@teuthology, success: false}
</pre> mgr - Bug #23756 (New): mgr is not updated with latest pgmap in qa/workunits/rados/test_large_oma...https://tracker.ceph.com/issues/237562018-04-16T07:24:08ZKefu Chaitchaikov@gmail.com
<p>see <a class="external" href="http://pulpito.ceph.com/kchai-2018-04-12_15:56:07-rados-wip-kefu-testing-2018-04-12-2211-distro-basic-mira/2390083/">http://pulpito.ceph.com/kchai-2018-04-12_15:56:07-rados-wip-kefu-testing-2018-04-12-2211-distro-basic-mira/2390083/</a></p>
<p>need to test w/o <a class="external" href="https://github.com/ceph/ceph/pull/21410">https://github.com/ceph/ceph/pull/21410</a>. before this PR, we instruct mgr with the OSD id to perform the scrub, if the mgr does not possess the latest pgmap, it will fail to send OSD all the pgs to be scrub.</p>
<p>that's the case of kchai-2018-04-12_15:56:07-rados-wip-kefu-testing-2018-04-12-2211-distro-basic-mira/2390083.</p> RADOS - Bug #23648 (New): max-pg-per-osd.from-primary fails because of activating pghttps://tracker.ceph.com/issues/236482018-04-11T04:56:16ZKefu Chaitchaikov@gmail.com
<p>the reason why we have activating pg when the number of pg is under the hard limit of max-pg-per-osd is that:</p>
<p>1. osd.1 received the osdmap instructing it to create pg 1.0, so it updated its replica osd.2 to create pg 1.0<br />2. osd.2 is capped by the max-pg-per-osd when it's about to create pg 1.0. so it dropped the request to create the pg.<br />3. the pool 15 gets removed in osdmap#58.<br />4. but the updated osdmap is not sent to osd.1 or osd.2 before the wait_for_clean() times out.</p> mgr - Bug #23136 (In Progress): mgr: disable then enable of Restful plugin does not work without ...https://tracker.ceph.com/issues/231362018-02-26T15:42:19ZHans van den Bogerthansbogert@gmail.com
<p>The restful plugin does not work after a disable/enable cycle:</p>
<blockquote>
<p>curl -k <a class="external" href="https://mon03:8003">https://mon03:8003</a>
{<br />"api_version": 1,<br />"auth": "Use \"ceph restful create-key <key>\" to create a key pair, pass it as HTTP Basic auth to authenticate",<br />"doc": "See /doc endpoint",<br />"info": "Ceph Manager RESTful API server" <br />cephadmin@mon03:~$ ceph mgr module disable restful<br />cephadmin@mon03:~$ ceph mgr module enable restful<br />cephadmin@mon03:~$ curl --connect-timeout 20 -k <a class="external" href="https://mon03:8003">https://mon03:8003</a><br />curl: (28) Operation timed out after 0 milliseconds with 0 out of 0 bytes received<br />cephadmin@mon03:~$</p>
</blockquote> RADOS - Bug #22233 (In Progress): prime_pg_temp breaks on uncreated pgshttps://tracker.ceph.com/issues/222332017-11-24T08:40:53ZKefu Chaitchaikov@gmail.com
<ol>
<li>mon.b instructed osd.3 to create pg 92.4. the upset was [3,6]</li>
<li>osd.3 created pg 92.4, and sent "created" message to mon</li>
<li>but osd.3 was pending on up_thru message from mon, it wanted its up_thru to be 92 in osdmap, but currently 89.</li>
<li>osd.3 was killed, marked down and out by the thrashosd task</li>
<li>mon.b primed pg temp for pg 92.4, and mapped it to [4,6], but osd.4 is not updated with osd_pg_create message.</li>
<li>so wait_for_clean task times out, and fail</li>
</ol>
<p>/a/kchai-2017-11-23_12:43:54-rados-wip-kefu-testing-2017-11-23-1812-distro-basic-mira/1881963</p>