https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2020-12-09T09:07:25ZCeph Ceph - Bug #48498: octopus: timeout when running the "ceph" commandhttps://tracker.ceph.com/issues/48498?journal_id=1809462020-12-09T09:07:25ZMathew Clarke
<ul></ul><p>The ubuntu packages have just updated to 15.2.7 and I'm still getting the same issue</p>
<pre>
ceph -v
ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable)
</pre>
<pre>
ceph -s
Traceback (most recent call last):
File "/usr/bin/ceph", line 1278, in <module>
retval = main()
File "/usr/bin/ceph", line 984, in main
cluster_handle = run_in_thread(rados.Rados,
File "/usr/lib/python3/dist-packages/ceph_argparse.py", line 1339, in run_in_thread
raise Exception("timed out")
Exception: timed out
</pre> Ceph - Bug #48498: octopus: timeout when running the "ceph" commandhttps://tracker.ceph.com/issues/48498?journal_id=1818022020-12-22T20:49:23ZRocky Cardwell
<ul></ul><p>Mathew Clarke wrote:</p>
<blockquote>
<p>The ubuntu packages have just updated to 15.2.7 and I'm still getting the same issue</p>
<p>[...]</p>
<p>[...]</p>
</blockquote>
<p>I managed to get ceph running on the odroid-hc2 after having this and other issues. To fix this, I edited /usr/lib/python3/dist-packages/ceph_argparse.py, line 1332, changing the 32 to 16. I suspect it's something to do with armhf being 32bit. Like this:<br /><pre><code class="python syntaxhl"><span class="CodeRay"> <span class="keyword">if</span> timeout == <span class="integer">0</span> <span class="keyword">or</span> timeout == <span class="predefined-constant">None</span>:
<span class="comment"># python threading module will just get blocked if timeout is `None`,</span>
<span class="comment"># otherwise it will keep polling until timeout or thread stops.</span>
<span class="comment"># wait for INT32_MAX, as python 3.6.8 use int32_t to present the</span>
<span class="comment"># timeout in integer when converting it to nanoseconds</span>
timeout = (<span class="integer">1</span> << (<span class="integer">16</span> - <span class="integer">1</span>)) - <span class="integer">1</span>
t = RadosThread(func, *args, **kwargs)
</span></code></pre></p>
<p>The other issue I ran into was a segmentation fault when starting the manager daemon. I fixed that issue by adding UNW_ARM_UNWIND_METHOD=4 to /etc/default/ceph like:<br /><pre>
# /etc/default/ceph
#
# Environment file for ceph daemon systemd unit files.
#
# Increase tcmalloc cache size
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728
UNW_ARM_UNWIND_METHOD=4
</pre></p>
<p>Hope that helps.</p> Ceph - Bug #48498: octopus: timeout when running the "ceph" commandhttps://tracker.ceph.com/issues/48498?journal_id=1818282020-12-23T16:58:30ZMathew Clarke
<ul></ul><p>Thanks for responding.</p>
<p>I'm only running the OSD's on the Odroid HC2 but applied the "UNW_ARM_UNWIND_METHOD=4" anyway as the "TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728" was already set.</p>
<p>I also patched the "timeout = (1 << (16 - 1)) - 1" line in "/usr/lib/python3/dist-packages/ceph_argparse.py".</p>
<p>It now longer errors as before but hangs for around 50 mins and then reports "[errno 110] RADOS timed out (error connecting to the cluster)" when running "ceph -s"</p>
<p>Is this something you've run in to?</p> Ceph - Bug #48498: octopus: timeout when running the "ceph" commandhttps://tracker.ceph.com/issues/48498?journal_id=1818312020-12-23T18:21:50ZRocky Cardwell
<ul></ul><p>I did have some issues with ceph -s hanging. It always turned out that the monitor process wasn't running, or not enough were running for quorum. In my case, there were errors in /var/log/syslog saying that it couldn't access files in /var/lib/ceph. I ended up fixing those by just changing the files to be owned by the ceph user/group.</p> Ceph - Bug #48498: octopus: timeout when running the "ceph" commandhttps://tracker.ceph.com/issues/48498?journal_id=1820542021-01-06T15:32:28ZMathew Clarke
<ul></ul><p>Sorry for the slow reply, just got around to checking this one. "ceph:ceph" was already set on "/var/lib/ceph" but ran "chown ceph:ceph -r /var/lib/ceph" anyway which didn't resolve the hanging issues.</p>
<p>The Syslog only shows the following errors to do with ceph</p>
<pre><code class="text syntaxhl"><span class="CodeRay">Dec 22 21:37:01 cn-svr-osd-01 ansible-ceph_volume: [WARNING] The value 5120 (type int) in a string field was converted to '5120' (type string). If this does not look like what you expect, quote the entire value to ensure it does not change.
Dec 22 21:37:01 cn-svr-osd-01 ansible-ceph_volume: [WARNING] The value -1 (type int) in a string field was converted to '-1' (type string). If this does not look like what you expect, quote the entire value to ensure it does not change.
Dec 22 21:37:01 cn-svr-osd-01 ansible-ceph_volume: [WARNING] The value False (type bool) in a string field was converted to 'False' (type string). If this does not look like what you expect, quote the entire value to ensure it does not change.
</span></code></pre>
<p>Thanks for your help with this one but I think I'm going to invest in so x86 hardware to continue this project.</p> Ceph - Bug #48498: octopus: timeout when running the "ceph" commandhttps://tracker.ceph.com/issues/48498?journal_id=1831732021-01-24T15:15:39ZO W
<ul></ul><p>Hi,</p>
<p>I'm also struggling to get CEPH 15.2.7 running using ceph-ansible on ubuntu hirsute on the Odroid HC2. Currently I only have two HC2s and mix it with vagrant VMs to have quorum mons. Your fixes helped a lot in getting futher but the mgr doesn't come up.</p>
<p>TASK [ceph-mgr : wait for all mgr to be up] *<strong><b></strong>*</b>**<strong>**</strong>**************************************************************<br />Sunday 24 January 2021 15:10:08 +0000 (0:00:00.026) 0:02:34.816 *<strong><b></strong>*</b><br />FAILED - RETRYING: wait for all mgr to be up (30 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (29 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (28 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (27 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (26 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (25 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (24 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (23 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (22 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (21 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (20 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (19 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (18 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (17 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (16 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (15 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (14 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (13 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (12 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (11 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (10 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (9 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (8 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (7 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (6 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (5 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (4 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (3 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (2 retries left).<br />FAILED - RETRYING: wait for all mgr to be up (1 retries left).<br />fatal: [odroidxu4 <del>> odroidxu4]: FAILED! => changed=false<br /> attempts: 30<br /> cmd:<br /> - ceph<br /> - --cluster<br /> - ceph<br /> - mgr<br /> - dump<br /> - -f<br /> - json<br /> delta: '0:00:00.758196'<br /> end: '2021-01-24 16:13:22.642994'<br /> rc: 0<br /> start: '2021-01-24 16:13:21.884798'<br /> stderr: ''<br /> stderr_lines: <omitted><br /> stdout: |2</del></p>
<pre><code>{"epoch":1,"active_gid":0,"active_name":"","active_addrs":{"addrvec":[]},"active_addr":"(unrecognized address family 0)/0","active_change":"0.000000","active_mgr_features":0,"available":false,"standbys":[],"modules":["iostat","restful"],"available_modules":[],"services":{},"always_on_modules":{"nautilus":["balancer","crash","devicehealth","orchestrator_cli","progress","rbd_support","status","volumes"],"octopus":["balancer","crash","devicehealth","orchestrator","pg_autoscaler","progress","rbd_support","status","telemetry","volumes"],"last_failure_osd_epoch":0,"active_clients":[]}}<br /> stdout_lines: &lt;omitted&gt;</code></pre>
<p>root@odroidxu4:~# dpkg -l ceph<br />Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description<br />+++-==============-===============-============-===================================<br />ii ceph 15.2.7-0ubuntu4 armhf distributed storage and file system</p>
<p>Any ideas?</p>
<p>Cheers,<br />Oliver</p> Ceph - Bug #48498: octopus: timeout when running the "ceph" commandhttps://tracker.ceph.com/issues/48498?journal_id=1889132021-03-29T16:13:44ZKefu Chaitchaikov@gmail.com
<ul></ul><p>I think this issue was fixed by <a class="external" href="https://github.com/ceph/ceph/pull/38665">https://github.com/ceph/ceph/pull/38665</a>, <a class="external" href="https://github.com/ceph/ceph/pull/38665/commits/e59693fd96c34672c4c743514bd173fc70a3a544">https://github.com/ceph/ceph/pull/38665/commits/e59693fd96c34672c4c743514bd173fc70a3a544</a> to be specific.</p>
<p>I will backport it to octopus.</p> Ceph - Bug #48498: octopus: timeout when running the "ceph" commandhttps://tracker.ceph.com/issues/48498?journal_id=1889142021-03-29T16:19:04ZKefu Chaitchaikov@gmail.com
<ul><li><strong>Subject</strong> changed from <i>timeout when running the "ceph" command</i> to <i>octopus: timeout when running the "ceph" command</i></li><li><strong>Status</strong> changed from <i>New</i> to <i>Triaged</i></li><li><strong>Assignee</strong> set to <i>Kefu Chai</i></li><li><strong>Pull request ID</strong> set to <i>40476</i></li></ul> Ceph - Bug #48498: octopus: timeout when running the "ceph" commandhttps://tracker.ceph.com/issues/48498?journal_id=1889152021-03-29T16:19:15ZKefu Chaitchaikov@gmail.com
<ul><li><strong>Status</strong> changed from <i>Triaged</i> to <i>Fix Under Review</i></li></ul> Ceph - Bug #48498: octopus: timeout when running the "ceph" commandhttps://tracker.ceph.com/issues/48498?journal_id=1897312021-04-07T15:36:14ZYuri Weinsteinyweinste@redhat.com
<ul></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/40476">https://github.com/ceph/ceph/pull/40476</a> merged</p>