Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2020-07-15T14:14:32ZCeph
Redmine Orchestrator - Tasks #46551 (Resolved): cephadm: Add better a better hint how to add a hosthttps://tracker.ceph.com/issues/465512020-07-15T14:14:32ZStephan Müller
<p>Currently:</p>
<pre>
master:~ # ceph orch host add mgr0 192.168.121.230
Error ENOENT: Failed to connect to mgr0 (192.168.121.230).
Check that the host is reachable and accepts connections using the cephadm SSH key
you may want to run:
> ceph cephadm get-ssh-config > ssh_config
> ceph config-key get mgr/cephadm/ssh_identity_key > key
> ssh -F ssh_config -i key root@mgr0
</pre>
<p>What actually needs to be done:<br /><pre>
master:~ # ceph config-key get mgr/cephadm/ssh_identity_pub > key.pub
master:~ # ssh-copy-id -i "key.pub" root@mgr0
</pre></p>
<p>What the message should look like in the end:<br /><pre>
master:~ # ceph orch host add mgr0 192.168.121.230
Error ENOENT: Failed to connect to mgr0 (192.168.121.230).
Check that the host is reachable and accepts connections using the cephadm SSH key
you may want to add the SSH key to the host:
> ceph config-key get mgr/cephadm/ssh_identity_pub > ~/cephadm_ssh_key.pub
> ssh-copy-id -i ~/cephadm_ssh_key.pub root@mgr0
you may want to check that everything works, before rerunning the command:
> ceph cephadm get-ssh-config > ssh_config
> ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_ssh_key
> ssh -F ssh_config -i ~/cephadm_ssh_key root@mgr0
</pre></p> Orchestrator - Support #46547 (Resolved): cephadm: Exception adding host via FQDN if host was alr...https://tracker.ceph.com/issues/465472020-07-15T12:17:02ZStephan Müller
<p>To reproduce you need nodes that have a subdomain (not like in current Vagrantfile). I used sesdev to find this issue.</p>
<pre>
master:~ # ceph orch host add node1.pacific.test
Error ENOENT: New host node1.pacific.test (node1.pacific.test) failed check: [
'INFO:cephadm:podman|docker (/usr/bin/podman) is present',
'INFO:cephadm:systemctl is present', 'INFO:cephadm:lvcreate is present',
'INFO:cephadm:Unit chronyd.service is enabled and running',
'INFO:cephadm:Hostname "node1.pacific.test" matches what is expected.',
'ERROR: hostname "node1" does not match expected hostname "node1.pacific.test"'
]
</pre>
<p>With `ceph -W cephadm` one observes</p>
<pre>
2020-07-15T13:24:21.159126+0200 mgr.node1.zybwkb [ERR] _Promise failed
Traceback (most recent call last):
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 277, in _finalize
next_result = self._on_complete(self._value)
File "/usr/share/ceph/mgr/cephadm/module.py", line 132, in <lambda>
return CephadmCompletion(on_complete=lambda _: f(*args, **kwargs))
File "/usr/share/ceph/mgr/cephadm/module.py", line 1098, in add_host
return self._add_host(spec)
File "/usr/share/ceph/mgr/cephadm/module.py", line 1087, in _add_host
spec.hostname, spec.addr, err))
orchestrator._interface.OrchestratorError: New host node1.pacific.test (node1.pacific.test) failed check: ['INFO:cephadm:podman|docker (/usr/bin/podman) is present', 'INFO:cephadm:systemctl is present', 'INFO:cephadm:lvcreate is present', 'INFO:cephadm:Unit chronyd.service is enabled and running', 'INFO:cephadm:Hostname "node1.pacific.test" matches what is expected.', 'ERROR: hostname "node1" does not match expected hostname "node1.pacific.test"']
</pre> Orchestrator - Documentation #46377 (Resolved): cephadm: Missing 'service_id' in last example in ...https://tracker.ceph.com/issues/463772020-07-06T15:33:59ZStephan Müller
<p>Missing 'service_id' in last example in orchestrator#service-specification. Example can be found right above <a class="external" href="https://docs.ceph.com/docs/master/mgr/orchestrator/#placement-specification">https://docs.ceph.com/docs/master/mgr/orchestrator/#placement-specification</a> and it should look like specified in <a class="external" href="https://docs.ceph.com/docs/master/cephadm/drivegroups/#osd-service-specification">https://docs.ceph.com/docs/master/cephadm/drivegroups/#osd-service-specification</a> .</p> Orchestrator - Tasks #46376 (Resolved): cephadm: Make vagrant usage more comfortablehttps://tracker.ceph.com/issues/463762020-07-06T15:28:51ZStephan Müller
<p>Currently you can only use a big scale factor using the vagrant setup. You can have x * (mgr, mon, osd with 2 disks). It would be nicer to use the same constants as vstart is using to select how many mgr, mons and osds one likes to have. I would go further and add a disks constant two.</p>
<p>This would make the creation a lot more flexible. Another thing that is missing is an script to easily snapshot the created vm's and recreate them</p> Ceph - Documentation #45874 (Fix Under Review): doc: Extend resolving conflict section in "Submit...https://tracker.ceph.com/issues/458742020-06-04T08:09:20ZStephan Müller
<p>Currently it's not clear how to easily continue with the backport script when a conflict is encountered.</p> Orchestrator - Bug #45724 (Resolved): check-host should not fail using fqdn or not that hardhttps://tracker.ceph.com/issues/457242020-05-27T09:26:51ZStephan Müller
<p>I would suggest either identify that it's an FQDN or answer "Host not found. Use 'ceph orch host ls' to see all managed hosts."</p>
<pre>
# ceph cephadm check-host node3.ses7.com
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1153, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 110, in handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 308, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 72, in <lambda>
wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 63, in wrapper
return func(*args, **kwargs)
File "/usr/share/ceph/mgr/cephadm/module.py", line 1482, in check_host
error_ok=True, no_fsid=True)
File "/usr/share/ceph/mgr/cephadm/module.py", line 1569, in _run_cephadm
conn, connr = self._get_connection(addr)
File "/usr/share/ceph/mgr/cephadm/module.py", line 1521, in _get_connection
n = self.ssh_user + '@' + host
TypeError: must be str, not NoneType
</pre> Dashboard - Feature #44621 (Pending Backport): mgr/dashboard: Automatic preselection of failure d...https://tracker.ceph.com/issues/446212020-03-16T12:12:19ZStephan Müller
<p>Use the automatic preselection of the crush rule creation form inside the erasure code profile form to prevent wrong configured ec profiles which can't be used in the end.</p> Dashboard - Bug #44223 (Duplicate): mgr/dashboard: Timeouts for rbd.py callshttps://tracker.ceph.com/issues/442232020-02-20T10:05:49ZStephan Müller
<p>As the corner cases are not implemented in many rbd methods, they can fail without a response on a specific pool (mostly bad pools).</p>
<p>If this is implemented remove the workaround that was implemented to fix <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: mgr/dashboard: Dashboard breaks on the selection of a bad pool (Resolved)" href="https://tracker.ceph.com/issues/43765">#43765</a>.</p>
<p>For details what known issue exists see <a class="issue tracker-1 status-6 priority-4 priority-default closed" title="Bug: pybind/rbd: config_list hangs if given an pool with a bad pg state (Rejected)" href="https://tracker.ceph.com/issues/43771">#43771</a>.</p>
<p>For details about the discussion that was made look at the PR that fixed <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: mgr/dashboard: Dashboard breaks on the selection of a bad pool (Resolved)" href="https://tracker.ceph.com/issues/43765">#43765</a>.</p>
<p>Make sure that <a class="issue tracker-1 status-6 priority-4 priority-default closed" title="Bug: pybind/rbd: config_list hangs if given an pool with a bad pg state (Rejected)" href="https://tracker.ceph.com/issues/43771">#43771</a> is still not addressed before starting with this issue.</p> Dashboard - Bug #43938 (Duplicate): mgr/dashboard: Test failure in test_safe_to_destroy in tasks....https://tracker.ceph.com/issues/439382020-01-31T15:14:52ZStephan Müller
<p>There is an API test failure in master not sure where it came from.</p>
<pre>
2020-01-29 15:05:17,789.789 INFO:__main__:Stopped test: test_safe_to_destroy (tasks.mgr.dashboard.test_osd.OsdTest) in 1.567141s
2020-01-29 15:05:17,789.789 INFO:__main__:
2020-01-29 15:05:17,790.790 INFO:__main__:======================================================================
2020-01-29 15:05:17,790.790 INFO:__main__:FAIL: test_safe_to_destroy (tasks.mgr.dashboard.test_osd.OsdTest)
2020-01-29 15:05:17,790.790 INFO:__main__:----------------------------------------------------------------------
2020-01-29 15:05:17,790.790 INFO:__main__:Traceback (most recent call last):
2020-01-29 15:05:17,790.790 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/dashboard/test_osd.py", line 113, in test_safe_to_destroy
2020-01-29 15:05:17,790.790 INFO:__main__: 'stored_pgs': [],
2020-01-29 15:05:17,790.790 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/dashboard/helper.py", line 343, in assertJsonBody
2020-01-29 15:05:17,790.790 INFO:__main__: self.assertEqual(body, data)
2020-01-29 15:05:17,790.790 INFO:__main__:AssertionError: {u'is_safe_to_destroy': False, u'message': u'[errno -11] 3 pgs have unknown stat [truncated]... != {'active': [], 'is_safe_to_destroy': True, 'stored_pgs': [], 'safe_to_destroy': [truncated]...
2020-01-29 15:05:17,790.790 INFO:__main__:+ {'active': [],
2020-01-29 15:05:17,791.791 INFO:__main__:- {u'is_safe_to_destroy': False,
2020-01-29 15:05:17,791.791 INFO:__main__:? ^^ ^^^^
2020-01-29 15:05:17,791.791 INFO:__main__:
2020-01-29 15:05:17,791.791 INFO:__main__:+ 'is_safe_to_destroy': True,
2020-01-29 15:05:17,791.791 INFO:__main__:? ^ ^^^
2020-01-29 15:05:17,791.791 INFO:__main__:
2020-01-29 15:05:17,791.791 INFO:__main__:- u'message': u'[errno -11] 3 pgs have unknown state; cannot draw any conclusions'}
2020-01-29 15:05:17,791.791 INFO:__main__:+ 'missing_stats': [],
2020-01-29 15:05:17,791.791 INFO:__main__:+ 'safe_to_destroy': [13],
2020-01-29 15:05:17,791.791 INFO:__main__:+ 'stored_pgs': []}
2020-01-29 15:05:17,791.791 INFO:__main__:
2020-01-29 15:05:17,792.792 INFO:__main__:----------------------------------------------------------------------
2020-01-29 15:05:17,792.792 INFO:__main__:Ran 93 tests in 1125.270s
2020-01-29 15:05:17,792.792 INFO:__main__:
2020-01-29 15:05:17,792.792 INFO:__main__:FAILED (failures=1)
2020-01-29 15:05:17,792.792 INFO:__main__:
2020-01-29 15:05:17,792.792 INFO:__main__:======================================================================
2020-01-29 15:05:17,792.792 INFO:__main__:FAIL: test_safe_to_destroy (tasks.mgr.dashboard.test_osd.OsdTest)
2020-01-29 15:05:17,792.792 INFO:__main__:----------------------------------------------------------------------
2020-01-29 15:05:17,792.792 INFO:__main__:Traceback (most recent call last):
2020-01-29 15:05:17,792.792 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/dashboard/test_osd.py", line 113, in test_safe_to_destroy
2020-01-29 15:05:17,793.793 INFO:__main__: 'stored_pgs': [],
2020-01-29 15:05:17,793.793 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/dashboard/helper.py", line 343, in assertJsonBody
2020-01-29 15:05:17,793.793 INFO:__main__: self.assertEqual(body, data)
2020-01-29 15:05:17,793.793 INFO:__main__:AssertionError: {u'is_safe_to_destroy': False, u'message': u'[errno -11] 3 pgs have unknown stat [truncated]... != {'active': [], 'is_safe_to_destroy': True, 'stored_pgs': [], 'safe_to_destroy': [truncated]...
2020-01-29 15:05:17,793.793 INFO:__main__:+ {'active': [],
2020-01-29 15:05:17,793.793 INFO:__main__:- {u'is_safe_to_destroy': False,
2020-01-29 15:05:17,793.793 INFO:__main__:? ^^ ^^^^
2020-01-29 15:05:17,793.793 INFO:__main__:
2020-01-29 15:05:17,793.793 INFO:__main__:+ 'is_safe_to_destroy': True,
2020-01-29 15:05:17,793.793 INFO:__main__:? ^ ^^^
2020-01-29 15:05:17,793.793 INFO:__main__:
2020-01-29 15:05:17,794.794 INFO:__main__:- u'message': u'[errno -11] 3 pgs have unknown state; cannot draw any conclusions'}
2020-01-29 15:05:17,794.794 INFO:__main__:+ 'missing_stats': [],
2020-01-29 15:05:17,794.794 INFO:__main__:+ 'safe_to_destroy': [13],
2020-01-29 15:05:17,794.794 INFO:__main__:+ 'stored_pgs': []}
2020-01-29 15:05:17,794.794 INFO:__main__:
Using guessed paths /home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/build/lib/ ['/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa', '/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/build/lib/cython_modules/lib.3', '/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/src/pybind']
</pre> Dashboard - Bug #43384 (Duplicate): mgr/dashboard: Pool size is calculated with data with differe...https://tracker.ceph.com/issues/433842019-12-19T14:01:20ZStephan Müller
<p>Currently we use the following calculation in the dashboard to calculate the usage:</p>
<pre><code>const avail = stats.bytes_used.latest + stats.max_avail.latest;<br /> pool['usage'] = avail > 0 ? stats.bytes_used.latest / avail : avail;</code></pre>
<p>[pool-list.component.ts:253-4]</p>
<p>The problem is that "max_avail" is calculated somewhat strangely as it does not show the real available space as it divides the available space at least through the number of replications for a replicated and by "(m+k)/k" for an ec pool.</p>
<p>To look the calculation up go to the following locations:<br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/osd/OSDMap.cc#L6033">https://github.com/ceph/ceph/blob/master/src/osd/OSDMap.cc#L6033</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/osd/OSDMap.cc#L6040">https://github.com/ceph/ceph/blob/master/src/osd/OSDMap.cc#L6040</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/osd/OSDMap.cc#L6054">https://github.com/ceph/ceph/blob/master/src/osd/OSDMap.cc#L6054</a></p>
<p><a class="external" href="https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L909">https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L909</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L920">https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L920</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L924">https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L924</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L947">https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L947</a></p>
<p>This problem was found out by a user who notified us about the percentage mismatch, here is the link to the mail:<br /><a class="external" href="http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-December/037680.html">http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-December/037680.html</a></p>
<p>Also found this thread on the new mailing list regarding the used percentage and max_avail topic:<br /><a class="external" href="https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NH2LMMX5KVRWCURI3BARRUAETKE2T2QN/#JDHXOQKWF6NZLQMOGEPAQCLI44KB54A3">https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NH2LMMX5KVRWCURI3BARRUAETKE2T2QN/#JDHXOQKWF6NZLQMOGEPAQCLI44KB54A3</a></p> Dashboard - Bug #42243 (Duplicate): mgr/dashboard: Fix unit test that is failing in a negative ti...https://tracker.ceph.com/issues/422432019-10-09T08:54:36ZStephan Müller
<p>Found a test that is failing in a negative timezone (Washington (-05:00))</p>
<pre>
● RbdSnapshotListComponent › snapshot modal dialog › should display suggested snapshot name
expect(received).toMatch(expected)
Expected pattern: /^image01_[\d-]+T[\d.:]+\+[\d:]+$/
Received string: "image01_2019-10-09T03:40:18.664-05:00"
204 | it('should display suggested snapshot name', () => {
205 | component.openCreateSnapshotModal();
> 206 | expect(component.modalRef.content.snapName).toMatch(
| ^
207 | RegExp(`^${component.rbdName}_[\\d-]+T[\\d.:]+\\+[\\d:]+\$`)
208 | );
209 | });
at src/app/ceph/block/rbd-snapshot-list/rbd-snapshot-list.component.spec.ts:206:51
at ZoneDelegate.Object.<anonymous>.ZoneDelegate.invoke (node_modules/zone.js/dist/zone.js:391:26)
at ProxyZoneSpec.Object.<anonymous>.ProxyZoneSpec.onInvoke (node_modules/zone.js/dist/proxy.js:129:39)
at ZoneDelegate.Object.<anonymous>.ZoneDelegate.invoke (node_modules/zone.js/dist/zone.js:390:52)
at Zone.Object.<anonymous>.Zone.run (node_modules/zone.js/dist/zone.js:150:43)
at Object.testBody.length (node_modules/jest-preset-angular/zone-patch/index.js:52:27)
</pre> Dashboard - Feature #40332 (Duplicate): mgr/dashboard: Make PGs in pool creation form optional if...https://tracker.ceph.com/issues/403322019-06-13T12:37:47ZStephan Müller
<p>Make PGs in pool creation form optional if pg autoscaler is in use</p>
<p><a class="external" href="https://docker.pkg.github.com/ceph/ceph/tree/master/src/pybind/mgr/pg_autoscaler">https://docker.pkg.github.com/ceph/ceph/tree/master/src/pybind/mgr/pg_autoscaler</a></p> Dashboard - Tasks #37585 (Closed): mgr/dashboard/docker: Add Alertmanager and rules to docker env...https://tracker.ceph.com/issues/375852018-12-10T14:35:24ZStephan Müller
<p>Add Alertmanager and Prometheus rules to the <a href="https://github.com/ricardoasmarques/ceph-dev-docker" class="external">ceph-docker</a> environment which is used for dashboard development.</p> Dashboard - Feature #36723 (Closed): mgr/dashboard: Show the graph of the alert expressionhttps://tracker.ceph.com/issues/367232018-11-07T15:52:07ZStephan Müller
<p>Show the graph of the alert expression</p> Dashboard - Feature #25165 (Rejected): mgr/dashboard: Manage Ceph pool snapshotshttps://tracker.ceph.com/issues/251652018-07-30T14:42:38ZStephan Müller
<p>Currently there is no way to create / delete pool snapshots</p>