Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2020-07-15T14:14:32ZCeph
Redmine Orchestrator - Tasks #46551 (Resolved): cephadm: Add better a better hint how to add a hosthttps://tracker.ceph.com/issues/465512020-07-15T14:14:32ZStephan Müller
<p>Currently:</p>
<pre>
master:~ # ceph orch host add mgr0 192.168.121.230
Error ENOENT: Failed to connect to mgr0 (192.168.121.230).
Check that the host is reachable and accepts connections using the cephadm SSH key
you may want to run:
> ceph cephadm get-ssh-config > ssh_config
> ceph config-key get mgr/cephadm/ssh_identity_key > key
> ssh -F ssh_config -i key root@mgr0
</pre>
<p>What actually needs to be done:<br /><pre>
master:~ # ceph config-key get mgr/cephadm/ssh_identity_pub > key.pub
master:~ # ssh-copy-id -i "key.pub" root@mgr0
</pre></p>
<p>What the message should look like in the end:<br /><pre>
master:~ # ceph orch host add mgr0 192.168.121.230
Error ENOENT: Failed to connect to mgr0 (192.168.121.230).
Check that the host is reachable and accepts connections using the cephadm SSH key
you may want to add the SSH key to the host:
> ceph config-key get mgr/cephadm/ssh_identity_pub > ~/cephadm_ssh_key.pub
> ssh-copy-id -i ~/cephadm_ssh_key.pub root@mgr0
you may want to check that everything works, before rerunning the command:
> ceph cephadm get-ssh-config > ssh_config
> ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_ssh_key
> ssh -F ssh_config -i ~/cephadm_ssh_key root@mgr0
</pre></p> Orchestrator - Support #46547 (Resolved): cephadm: Exception adding host via FQDN if host was alr...https://tracker.ceph.com/issues/465472020-07-15T12:17:02ZStephan Müller
<p>To reproduce you need nodes that have a subdomain (not like in current Vagrantfile). I used sesdev to find this issue.</p>
<pre>
master:~ # ceph orch host add node1.pacific.test
Error ENOENT: New host node1.pacific.test (node1.pacific.test) failed check: [
'INFO:cephadm:podman|docker (/usr/bin/podman) is present',
'INFO:cephadm:systemctl is present', 'INFO:cephadm:lvcreate is present',
'INFO:cephadm:Unit chronyd.service is enabled and running',
'INFO:cephadm:Hostname "node1.pacific.test" matches what is expected.',
'ERROR: hostname "node1" does not match expected hostname "node1.pacific.test"'
]
</pre>
<p>With `ceph -W cephadm` one observes</p>
<pre>
2020-07-15T13:24:21.159126+0200 mgr.node1.zybwkb [ERR] _Promise failed
Traceback (most recent call last):
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 277, in _finalize
next_result = self._on_complete(self._value)
File "/usr/share/ceph/mgr/cephadm/module.py", line 132, in <lambda>
return CephadmCompletion(on_complete=lambda _: f(*args, **kwargs))
File "/usr/share/ceph/mgr/cephadm/module.py", line 1098, in add_host
return self._add_host(spec)
File "/usr/share/ceph/mgr/cephadm/module.py", line 1087, in _add_host
spec.hostname, spec.addr, err))
orchestrator._interface.OrchestratorError: New host node1.pacific.test (node1.pacific.test) failed check: ['INFO:cephadm:podman|docker (/usr/bin/podman) is present', 'INFO:cephadm:systemctl is present', 'INFO:cephadm:lvcreate is present', 'INFO:cephadm:Unit chronyd.service is enabled and running', 'INFO:cephadm:Hostname "node1.pacific.test" matches what is expected.', 'ERROR: hostname "node1" does not match expected hostname "node1.pacific.test"']
</pre> Orchestrator - Documentation #46377 (Resolved): cephadm: Missing 'service_id' in last example in ...https://tracker.ceph.com/issues/463772020-07-06T15:33:59ZStephan Müller
<p>Missing 'service_id' in last example in orchestrator#service-specification. Example can be found right above <a class="external" href="https://docs.ceph.com/docs/master/mgr/orchestrator/#placement-specification">https://docs.ceph.com/docs/master/mgr/orchestrator/#placement-specification</a> and it should look like specified in <a class="external" href="https://docs.ceph.com/docs/master/cephadm/drivegroups/#osd-service-specification">https://docs.ceph.com/docs/master/cephadm/drivegroups/#osd-service-specification</a> .</p> Orchestrator - Tasks #46376 (Resolved): cephadm: Make vagrant usage more comfortablehttps://tracker.ceph.com/issues/463762020-07-06T15:28:51ZStephan Müller
<p>Currently you can only use a big scale factor using the vagrant setup. You can have x * (mgr, mon, osd with 2 disks). It would be nicer to use the same constants as vstart is using to select how many mgr, mons and osds one likes to have. I would go further and add a disks constant two.</p>
<p>This would make the creation a lot more flexible. Another thing that is missing is an script to easily snapshot the created vm's and recreate them</p> Orchestrator - Bug #45724 (Resolved): check-host should not fail using fqdn or not that hardhttps://tracker.ceph.com/issues/457242020-05-27T09:26:51ZStephan Müller
<p>I would suggest either identify that it's an FQDN or answer "Host not found. Use 'ceph orch host ls' to see all managed hosts."</p>
<pre>
# ceph cephadm check-host node3.ses7.com
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1153, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 110, in handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 308, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 72, in <lambda>
wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 63, in wrapper
return func(*args, **kwargs)
File "/usr/share/ceph/mgr/cephadm/module.py", line 1482, in check_host
error_ok=True, no_fsid=True)
File "/usr/share/ceph/mgr/cephadm/module.py", line 1569, in _run_cephadm
conn, connr = self._get_connection(addr)
File "/usr/share/ceph/mgr/cephadm/module.py", line 1521, in _get_connection
n = self.ssh_user + '@' + host
TypeError: must be str, not NoneType
</pre> Dashboard - Bug #44224 (New): mgr/dashboard: Timeouts for rbd.py callshttps://tracker.ceph.com/issues/442242020-02-20T10:10:30ZStephan Müller
<p>As the corner cases are not implemented in many rbd.py methods, they can fail without a response on a specific pool (mostly bad pools).</p>
<p>If this is implemented remove the workaround that was implemented to fix <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: mgr/dashboard: Dashboard breaks on the selection of a bad pool (Resolved)" href="https://tracker.ceph.com/issues/43765">#43765</a>.</p>
<p>For details what known issue exists see <a class="issue tracker-1 status-6 priority-4 priority-default closed" title="Bug: pybind/rbd: config_list hangs if given an pool with a bad pg state (Rejected)" href="https://tracker.ceph.com/issues/43771">#43771</a>.</p>
<p>For details about the discussion that was made look at the PR that fixed <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: mgr/dashboard: Dashboard breaks on the selection of a bad pool (Resolved)" href="https://tracker.ceph.com/issues/43765">#43765</a>.</p>
<p>Make sure that <a class="issue tracker-1 status-6 priority-4 priority-default closed" title="Bug: pybind/rbd: config_list hangs if given an pool with a bad pg state (Rejected)" href="https://tracker.ceph.com/issues/43771">#43771</a> is still not addressed before starting with this issue.</p>
<p>For details how this was implemented in openATTIC look <a href="https://bitbucket.org/openattic/openattic/pull-requests/682/add-librados-command-name-to-external/diff" class="external">here</a></p> rbd - Bug #43771 (Rejected): pybind/rbd: config_list hangs if given an pool with a bad pg statehttps://tracker.ceph.com/issues/437712020-01-23T16:53:11ZStephan Müller
<p>If the dashboard tries to get the configuration of RBDs on a pool basis with a pool in the pg state 'creating+incomplete', it will stop working waiting for a response of `config_list` in `rbd.pyx`.</p>
<p>The pg state 'creating+incomplete' is an edge case as it will only appear if one creates a pool that needs more buckets as the cluster can provide. The current workaround in the dashboard is to omit this call if a pool is in this state.</p>
<p>Here is the manual stack trace found by debugging:<br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/pybind/mgr/dashboard/controllers/pool.py#L206">https://github.com/ceph/ceph/blob/master/src/pybind/mgr/dashboard/controllers/pool.py#L206</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/pybind/mgr/dashboard/services/rbd.py#L104">https://github.com/ceph/ceph/blob/master/src/pybind/mgr/dashboard/services/rbd.py#L104</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/pybind/rbd/rbd.pyx#L2215">https://github.com/ceph/ceph/blob/master/src/pybind/rbd/rbd.pyx#L2215</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/pybind/rbd/rbd.pyx#L2935">https://github.com/ceph/ceph/blob/master/src/pybind/rbd/rbd.pyx#L2935</a></p> Dashboard - Bug #43594 (Resolved): mgr/dashboard: E2E pools page failurehttps://tracker.ceph.com/issues/435942020-01-14T09:58:44ZStephan Müller
<p>On the current master the pools page fail:</p>
<pre>
npm run e2e:ci -- --specs e2e/pools/pools.e2e-spec.ts
> ceph-dashboard@0.0.0 e2e:ci /srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend
> npm run env_build && npm run e2e:update && ng e2e --dev-server-target --webdriverUpdate=false "--specs" "e2e/pools/pools.e2e-spec.ts"
> ceph-dashboard@0.0.0 env_build /srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend
> cp src/environments/environment.tpl.ts src/environments/environment.prod.ts && cp src/environments/environment.tpl.ts src/environments/environment.ts && node ./environment.build.js
Environment variables have been set
> ceph-dashboard@0.0.0 e2e:update /srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend
> npx webdriver-manager update --gecko=false --versions.chrome=$(google-chrome --version | awk '{ print $3 }')
[10:50:42] I/update - chromedriver: file exists /srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend/node_modules/protractor/node_modules/webdriver-manager/selenium/chromedriver_78.0.3904.70.zip
[10:50:42] I/update - chromedriver: unzipping chromedriver_78.0.3904.70.zip
[10:50:42] I/update - chromedriver: setting permissions to 0755 for /srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend/node_modules/protractor/node_modules/webdriver-manager/selenium/chromedriver_78.0.3904.70
[10:50:42] I/update - chromedriver: chromedriver_78.0.3904.70 up to date
[10:50:42] I/update - selenium standalone: file exists /srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend/node_modules/protractor/node_modules/webdriver-manager/selenium/selenium-server-standalone-3.141.59.jar
[10:50:42] I/update - selenium standalone: selenium-server-standalone-3.141.59.jar up to date
[10:50:56] I/launcher - Running 1 instances of WebDriver
[10:50:56] I/direct - Using ChromeDriver directly...
Activated Protractor Screenshoter Plugin, ver. 0.10.3 (c) 2016 - 2020 Andrej Zachar and contributors
Creating reporter at .protractor-report/
Jasmine started
Pools page
breadcrumb and tab tests
✓ should open and show breadcrumb (0.181 sec)
✓ should show two tabs (0.066 sec)
✓ should show pools list tab at first (0.075 sec)
✓ should show overall performance as a second tab (0.08 sec)
✗ should create a pool (15 secs)
- Failed: No element found using locator: By(css selector, input[name=pgNum])
at elementArrayFinder.getWebElements.then (/srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend/node_modules/protractor/built/element.js:814:27)
at process._tickCallback (internal/process/next_tick.js:68:7)Error:
at ElementArrayFinder.applyAction_ (/srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend/node_modules/protractor/built/element.js:459:27)
at ElementArrayFinder.(anonymous function).args [as sendKeys] (/srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend/node_modules/protractor/built/element.js:91:29)
at ElementFinder.(anonymous function).args [as sendKeys] (/srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend/node_modules/protractor/built/element.js:831:22)
at PoolPageHelper.<anonymous> (/srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend/e2e/pools/pools.po.ts:44:34)
at step (/srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend/node_modules/tslib/tslib.js:136:27)
at Object.next (/srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend/node_modules/tslib/tslib.js:117:57)
at fulfilled (/srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend/node_modules/tslib/tslib.js:107:62)
at process._tickCallback (internal/process/next_tick.js:68:7)
From asynchronous test:
Error:
at Suite.<anonymous> (/srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend/e2e/pools/pools.e2e-spec.ts:34:3)
at apply (/srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend/node_modules/lodash/lodash.js:476:27)
at Env.wrapper [as describe] (/srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend/node_modules/lodash/lodash.js:5317:16)
at Object.<anonymous> (/srv/cephmgr/ceph-dev/src/pybind/mgr/dashboard/frontend/e2e/pools/pools.e2e-spec.ts:3:1)
at Module._compile (internal/modules/cjs/loader.js:689:30)
**************************************************
* Failures *
**************************************************
1) Pools page should create a pool
- Failed: No element found using locator: By(css selector, input[name=pgNum])
Executed 5 of 7 specs (1 FAILED) (2 SKIPPED) in 19 secs.
[10:51:46] I/launcher - 0 instance(s) of WebDriver still running
[10:51:46] I/launcher - chrome #01 failed 1 test(s)
[10:51:46] I/launcher - overall: 1 failed spec(s)
[10:51:46] E/launcher - Process exited with error code 1
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! ceph-dashboard@0.0.0 e2e:ci: `npm run env_build && npm run e2e:update && ng e2e --dev-server-target --webdriverUpdate=false "--specs" "e2e/pools/pools.e2e-spec.ts"`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the ceph-dashboard@0.0.0 e2e:ci script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR! /home/albatros/.npm/_logs/2020-01-14T09_51_46_846Z-debug.log
</pre> mgr - Bug #41795 (New): mgr: Time series data of pool decreases itself when reducing the amount o...https://tracker.ceph.com/issues/417952019-09-12T14:18:54ZStephan Müller
<p>Time series data of pool decreases itself when reducing the amount of PGs of a pool.</p>
<p>Time series data should only increase, not decrease.</p>
<p>(I'm not sure if this is the right place for this bug.)</p> mgr - Feature #41793 (Closed): mgr: Run doc tests for mgr_util.pyhttps://tracker.ceph.com/issues/417932019-09-12T13:23:31ZStephan Müller
<p>Run doc tests in mgr_util.py</p> mgr - Feature #40365 (Resolved): mgr: Add get_rates_from_data from the dashboard to the mgr_util.pyhttps://tracker.ceph.com/issues/403652019-06-14T13:31:58ZStephan Müller
<p>Other modules need this too.</p>
<p>Origin: <a class="external" href="https://github.com/ceph/ceph/pull/28153#discussion_r285974000">https://github.com/ceph/ceph/pull/28153#discussion_r285974000</a></p> mgr - Feature #40363 (Resolved): mgr: Run python unit tests with tox in the mgrhttps://tracker.ceph.com/issues/403632019-06-14T13:28:15ZStephan Müller
<p>Jenkins should run python unit tests that are included inside the core of the manager.</p> Dashboard - Bug #39579 (Resolved): mgr/dashboard: Fix run-tox script to accept cli arguments againhttps://tracker.ceph.com/issues/395792019-05-03T10:30:12ZStephan Müller
<p>A regression was introduced by <a href="https://github.com/ceph/ceph/commit/9426f1f2045d0ae0f319530c3dc3a9240d838d07#diff-cc2ee9d8e56f3a2cd98b8148935d3829L37" class="external">this change</a> , causing the script to not accept command line arguments. Therefore command described in the hacking.rst to only run a single tox test ("WITH_PYTHON2=OFF ./run-tox.sh pytest tests/test_rgw_client.py::RgwClientTest::test_ssl_verify") did not work anymore and caused tox to <a href="https://paste.opensuse.org/view//64659695" class="external">fail</a> .</p>
<pre>
Traceback (most recent call last):
File "/usr/bin/tox", line 11, in <module>
load_entry_point('tox==3.7.0', 'console_scripts', 'tox')()
File "/usr/lib/python3.7/site-packages/tox/session.py", line 47, in cmdline
main(args)
File "/usr/lib/python3.7/site-packages/tox/session.py", line 54, in main
retcode = build_session(config).runcommand()
File "/usr/lib/python3.7/site-packages/tox/session.py", line 467, in runcommand
return self.subcommand_test()
File "/usr/lib/python3.7/site-packages/tox/session.py", line 590, in subcommand_test
self.run_sequential()
File "/usr/lib/python3.7/site-packages/tox/session.py", line 609, in run_sequential
self.runtestenv(venv)
File "/usr/lib/python3.7/site-packages/tox/session.py", line 728, in runtestenv
self.hook.tox_runtest(venv=venv, redirect=redirect)
File "/usr/lib/python3.7/site-packages/pluggy/hooks.py", line 289, in __call__
return self._hookexec(self, self.get_hookimpls(), kwargs)
File "/usr/lib/python3.7/site-packages/pluggy/manager.py", line 68, in _hookexec
return self._inner_hookexec(hook, methods, kwargs)
File "/usr/lib/python3.7/site-packages/pluggy/manager.py", line 62, in <lambda>
firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
File "/usr/lib/python3.7/site-packages/pluggy/callers.py", line 208, in _multicall
return outcome.get_result()
File "/usr/lib/python3.7/site-packages/pluggy/callers.py", line 80, in get_result
raise ex[1].with_traceback(ex[2])
File "/usr/lib/python3.7/site-packages/pluggy/callers.py", line 187, in _multicall
res = hook_impl.function(*args)
File "/usr/lib/python3.7/site-packages/tox/venv.py", line 597, in tox_runtest
venv.test(redirect=redirect)
File "/usr/lib/python3.7/site-packages/tox/venv.py", line 468, in test
if argv[0].startswith("-"):
IndexError: list index out of range
</pre> Dashboard - Bug #39300 (Resolved): mgr/dashboard: Can't login with a bigger time difference betwe...https://tracker.ceph.com/issues/393002019-04-15T15:45:19ZStephan Müller
<p>With the time difference of -7h to the backend, I couldn't log in. The log throw the error `AMT: user info changed after token was issued, iat=%s lastUpdate=%s` which can be found in line 150 in `dashboard/services/auth.py`. I removed as a quick fix line 146 in the same document which said that `user.lastUpdate <= token['iat']` has to be true in order to login.</p> mgr - Tasks #25157 (New): Refine the details of the Ceph pools opticallyhttps://tracker.ceph.com/issues/251572018-07-30T14:04:23ZStephan Müller
<p>The details of the Ceph pools in the listing are relatively raw displayed. This should be enhanced and the details should be refined optically.</p>
<a name="Data-Table"></a>
<h2 >Data Table<a href="#Data-Table" class="wiki-anchor">¶</a></h2>
<ul>
<li>Replica size is only valid for replicated pools.</li>
<li>The "type" defines, which column is valid.</li>
<li>The minimum number of replicas is missing. Maybe even as an optional column.</li>
<li>Show the pool quota.</li>
</ul>
<a name="Details"></a>
<h2 >Details<a href="#Details" class="wiki-anchor">¶</a></h2>
<ul>
<li>Show Replica size only for replicated pools.</li>
<li>Only show erasure code profile on erasure coded pools.</li>
<li>Add a mouse over or hyper link for the properties.</li>
</ul>