Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2020-07-29T14:03:08ZCeph
Redmine Dashboard - Bug #46757 (New): mgr/dashboard: Only show identify action if inventory device can blinkhttps://tracker.ceph.com/issues/467572020-07-29T14:03:08ZStephan Müller
<p>If a device can't be blink but is manged by cephadm the action "Identify" will be shown in the inventory page. The problem is that the command doesn't throw an error if it fails on the dashboard. I observed the following error through running `ceph -W cephadm` in parallel to the execution.</p>
<pre>
2020-07-29T08:53:30.649950-0500 mgr.x [ERR] executing blink(([DeviceLightLoc(host='osd0', dev='/dev/vdb', path='/dev/vdb')],)) failed.
Traceback (most recent call last):
File "/ceph/src/pybind/mgr/cephadm/utils.py", line 67, in do_work
return f(*arg)
File "/ceph/src/pybind/mgr/cephadm/module.py", line 1591, in blink
raise OrchestratorError(
orchestrator._interface.OrchestratorError: Unable to affect ident light for osd0:/dev/vdb. Command: lsmcli local-disk-ident-led-on --path /dev/vdb
2020-07-29T08:53:30.653157-0500 mgr.x [ERR] _Promise failed
Traceback (most recent call last):
File "/ceph/src/pybind/mgr/orchestrator/_interface.py", line 292, in _finalize
next_result = self._on_complete(self._value)
File "/ceph/src/pybind/mgr/cephadm/module.py", line 102, in <lambda>
return CephadmCompletion(on_complete=lambda _: f(*args, **kwargs))
File "/ceph/src/pybind/mgr/cephadm/module.py", line 1599, in blink_device_light
return blink(locs)
File "/ceph/src/pybind/mgr/cephadm/utils.py", line 73, in forall_hosts_wrapper
return CephadmOrchestrator.instance._worker_pool.map(do_work, vals)
File "/usr/lib64/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib64/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
File "/usr/lib64/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/lib64/python3.8/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/ceph/src/pybind/mgr/cephadm/utils.py", line 67, in do_work
return f(*arg)
File "/ceph/src/pybind/mgr/cephadm/module.py", line 1591, in blink
raise OrchestratorError(
orchestrator._interface.OrchestratorError: Unable to affect ident light for osd0:/dev/vdb. Command: lsmcli local-disk-ident-led-on --path /dev/vdb
</pre> Dashboard - Bug #44223 (Duplicate): mgr/dashboard: Timeouts for rbd.py callshttps://tracker.ceph.com/issues/442232020-02-20T10:05:49ZStephan Müller
<p>As the corner cases are not implemented in many rbd methods, they can fail without a response on a specific pool (mostly bad pools).</p>
<p>If this is implemented remove the workaround that was implemented to fix <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: mgr/dashboard: Dashboard breaks on the selection of a bad pool (Resolved)" href="https://tracker.ceph.com/issues/43765">#43765</a>.</p>
<p>For details what known issue exists see <a class="issue tracker-1 status-6 priority-4 priority-default closed" title="Bug: pybind/rbd: config_list hangs if given an pool with a bad pg state (Rejected)" href="https://tracker.ceph.com/issues/43771">#43771</a>.</p>
<p>For details about the discussion that was made look at the PR that fixed <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: mgr/dashboard: Dashboard breaks on the selection of a bad pool (Resolved)" href="https://tracker.ceph.com/issues/43765">#43765</a>.</p>
<p>Make sure that <a class="issue tracker-1 status-6 priority-4 priority-default closed" title="Bug: pybind/rbd: config_list hangs if given an pool with a bad pg state (Rejected)" href="https://tracker.ceph.com/issues/43771">#43771</a> is still not addressed before starting with this issue.</p> Dashboard - Bug #43938 (Duplicate): mgr/dashboard: Test failure in test_safe_to_destroy in tasks....https://tracker.ceph.com/issues/439382020-01-31T15:14:52ZStephan Müller
<p>There is an API test failure in master not sure where it came from.</p>
<pre>
2020-01-29 15:05:17,789.789 INFO:__main__:Stopped test: test_safe_to_destroy (tasks.mgr.dashboard.test_osd.OsdTest) in 1.567141s
2020-01-29 15:05:17,789.789 INFO:__main__:
2020-01-29 15:05:17,790.790 INFO:__main__:======================================================================
2020-01-29 15:05:17,790.790 INFO:__main__:FAIL: test_safe_to_destroy (tasks.mgr.dashboard.test_osd.OsdTest)
2020-01-29 15:05:17,790.790 INFO:__main__:----------------------------------------------------------------------
2020-01-29 15:05:17,790.790 INFO:__main__:Traceback (most recent call last):
2020-01-29 15:05:17,790.790 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/dashboard/test_osd.py", line 113, in test_safe_to_destroy
2020-01-29 15:05:17,790.790 INFO:__main__: 'stored_pgs': [],
2020-01-29 15:05:17,790.790 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/dashboard/helper.py", line 343, in assertJsonBody
2020-01-29 15:05:17,790.790 INFO:__main__: self.assertEqual(body, data)
2020-01-29 15:05:17,790.790 INFO:__main__:AssertionError: {u'is_safe_to_destroy': False, u'message': u'[errno -11] 3 pgs have unknown stat [truncated]... != {'active': [], 'is_safe_to_destroy': True, 'stored_pgs': [], 'safe_to_destroy': [truncated]...
2020-01-29 15:05:17,790.790 INFO:__main__:+ {'active': [],
2020-01-29 15:05:17,791.791 INFO:__main__:- {u'is_safe_to_destroy': False,
2020-01-29 15:05:17,791.791 INFO:__main__:? ^^ ^^^^
2020-01-29 15:05:17,791.791 INFO:__main__:
2020-01-29 15:05:17,791.791 INFO:__main__:+ 'is_safe_to_destroy': True,
2020-01-29 15:05:17,791.791 INFO:__main__:? ^ ^^^
2020-01-29 15:05:17,791.791 INFO:__main__:
2020-01-29 15:05:17,791.791 INFO:__main__:- u'message': u'[errno -11] 3 pgs have unknown state; cannot draw any conclusions'}
2020-01-29 15:05:17,791.791 INFO:__main__:+ 'missing_stats': [],
2020-01-29 15:05:17,791.791 INFO:__main__:+ 'safe_to_destroy': [13],
2020-01-29 15:05:17,791.791 INFO:__main__:+ 'stored_pgs': []}
2020-01-29 15:05:17,791.791 INFO:__main__:
2020-01-29 15:05:17,792.792 INFO:__main__:----------------------------------------------------------------------
2020-01-29 15:05:17,792.792 INFO:__main__:Ran 93 tests in 1125.270s
2020-01-29 15:05:17,792.792 INFO:__main__:
2020-01-29 15:05:17,792.792 INFO:__main__:FAILED (failures=1)
2020-01-29 15:05:17,792.792 INFO:__main__:
2020-01-29 15:05:17,792.792 INFO:__main__:======================================================================
2020-01-29 15:05:17,792.792 INFO:__main__:FAIL: test_safe_to_destroy (tasks.mgr.dashboard.test_osd.OsdTest)
2020-01-29 15:05:17,792.792 INFO:__main__:----------------------------------------------------------------------
2020-01-29 15:05:17,792.792 INFO:__main__:Traceback (most recent call last):
2020-01-29 15:05:17,792.792 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/dashboard/test_osd.py", line 113, in test_safe_to_destroy
2020-01-29 15:05:17,793.793 INFO:__main__: 'stored_pgs': [],
2020-01-29 15:05:17,793.793 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/dashboard/helper.py", line 343, in assertJsonBody
2020-01-29 15:05:17,793.793 INFO:__main__: self.assertEqual(body, data)
2020-01-29 15:05:17,793.793 INFO:__main__:AssertionError: {u'is_safe_to_destroy': False, u'message': u'[errno -11] 3 pgs have unknown stat [truncated]... != {'active': [], 'is_safe_to_destroy': True, 'stored_pgs': [], 'safe_to_destroy': [truncated]...
2020-01-29 15:05:17,793.793 INFO:__main__:+ {'active': [],
2020-01-29 15:05:17,793.793 INFO:__main__:- {u'is_safe_to_destroy': False,
2020-01-29 15:05:17,793.793 INFO:__main__:? ^^ ^^^^
2020-01-29 15:05:17,793.793 INFO:__main__:
2020-01-29 15:05:17,793.793 INFO:__main__:+ 'is_safe_to_destroy': True,
2020-01-29 15:05:17,793.793 INFO:__main__:? ^ ^^^
2020-01-29 15:05:17,793.793 INFO:__main__:
2020-01-29 15:05:17,794.794 INFO:__main__:- u'message': u'[errno -11] 3 pgs have unknown state; cannot draw any conclusions'}
2020-01-29 15:05:17,794.794 INFO:__main__:+ 'missing_stats': [],
2020-01-29 15:05:17,794.794 INFO:__main__:+ 'safe_to_destroy': [13],
2020-01-29 15:05:17,794.794 INFO:__main__:+ 'stored_pgs': []}
2020-01-29 15:05:17,794.794 INFO:__main__:
Using guessed paths /home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/build/lib/ ['/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa', '/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/build/lib/cython_modules/lib.3', '/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/src/pybind']
</pre> rbd - Bug #43771 (Rejected): pybind/rbd: config_list hangs if given an pool with a bad pg statehttps://tracker.ceph.com/issues/437712020-01-23T16:53:11ZStephan Müller
<p>If the dashboard tries to get the configuration of RBDs on a pool basis with a pool in the pg state 'creating+incomplete', it will stop working waiting for a response of `config_list` in `rbd.pyx`.</p>
<p>The pg state 'creating+incomplete' is an edge case as it will only appear if one creates a pool that needs more buckets as the cluster can provide. The current workaround in the dashboard is to omit this call if a pool is in this state.</p>
<p>Here is the manual stack trace found by debugging:<br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/pybind/mgr/dashboard/controllers/pool.py#L206">https://github.com/ceph/ceph/blob/master/src/pybind/mgr/dashboard/controllers/pool.py#L206</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/pybind/mgr/dashboard/services/rbd.py#L104">https://github.com/ceph/ceph/blob/master/src/pybind/mgr/dashboard/services/rbd.py#L104</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/pybind/rbd/rbd.pyx#L2215">https://github.com/ceph/ceph/blob/master/src/pybind/rbd/rbd.pyx#L2215</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/pybind/rbd/rbd.pyx#L2935">https://github.com/ceph/ceph/blob/master/src/pybind/rbd/rbd.pyx#L2935</a></p> Dashboard - Bug #43384 (Duplicate): mgr/dashboard: Pool size is calculated with data with differe...https://tracker.ceph.com/issues/433842019-12-19T14:01:20ZStephan Müller
<p>Currently we use the following calculation in the dashboard to calculate the usage:</p>
<pre><code>const avail = stats.bytes_used.latest + stats.max_avail.latest;<br /> pool['usage'] = avail > 0 ? stats.bytes_used.latest / avail : avail;</code></pre>
<p>[pool-list.component.ts:253-4]</p>
<p>The problem is that "max_avail" is calculated somewhat strangely as it does not show the real available space as it divides the available space at least through the number of replications for a replicated and by "(m+k)/k" for an ec pool.</p>
<p>To look the calculation up go to the following locations:<br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/osd/OSDMap.cc#L6033">https://github.com/ceph/ceph/blob/master/src/osd/OSDMap.cc#L6033</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/osd/OSDMap.cc#L6040">https://github.com/ceph/ceph/blob/master/src/osd/OSDMap.cc#L6040</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/osd/OSDMap.cc#L6054">https://github.com/ceph/ceph/blob/master/src/osd/OSDMap.cc#L6054</a></p>
<p><a class="external" href="https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L909">https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L909</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L920">https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L920</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L924">https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L924</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L947">https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L947</a></p>
<p>This problem was found out by a user who notified us about the percentage mismatch, here is the link to the mail:<br /><a class="external" href="http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-December/037680.html">http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-December/037680.html</a></p>
<p>Also found this thread on the new mailing list regarding the used percentage and max_avail topic:<br /><a class="external" href="https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NH2LMMX5KVRWCURI3BARRUAETKE2T2QN/#JDHXOQKWF6NZLQMOGEPAQCLI44KB54A3">https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NH2LMMX5KVRWCURI3BARRUAETKE2T2QN/#JDHXOQKWF6NZLQMOGEPAQCLI44KB54A3</a></p> Dashboard - Feature #42232 (New): mgr/dashboard: CephFs directory size calculationhttps://tracker.ceph.com/issues/422322019-10-08T15:02:25ZStephan Müller
<p>Add a button to calculate the size of the current selected directory</p> mgr - Bug #41795 (New): mgr: Time series data of pool decreases itself when reducing the amount o...https://tracker.ceph.com/issues/417952019-09-12T14:18:54ZStephan Müller
<p>Time series data of pool decreases itself when reducing the amount of PGs of a pool.</p>
<p>Time series data should only increase, not decrease.</p>
<p>(I'm not sure if this is the right place for this bug.)</p> Dashboard - Feature #40332 (Duplicate): mgr/dashboard: Make PGs in pool creation form optional if...https://tracker.ceph.com/issues/403322019-06-13T12:37:47ZStephan Müller
<p>Make PGs in pool creation form optional if pg autoscaler is in use</p>
<p><a class="external" href="https://docker.pkg.github.com/ceph/ceph/tree/master/src/pybind/mgr/pg_autoscaler">https://docker.pkg.github.com/ceph/ceph/tree/master/src/pybind/mgr/pg_autoscaler</a></p> Dashboard - Feature #25166 (New): mgr/dashboard: Add cache pool supporthttps://tracker.ceph.com/issues/251662018-07-30T14:47:06ZStephan Müller
<p>Ceph provides a concept of <a href="http://ceph.com/docs/master/dev/cache-pool/" class="external">cache pools</a>. It should be possible for an administrator to define such a cache pool and assign it to an existing pool via dashboard:</p>
<a name="Purpose"></a>
<h3 >Purpose<a href="#Purpose" class="wiki-anchor">¶</a></h3>
<p>Use a pool of fast storage devices (probably SSDs) and use it as a cache for an existing slower and larger pool.</p>
<p>Use a replicated pool as a front-end to service most I/O, and destage cold data to a separate erasure coded pool that does not currently (and cannot efficiently) handle the workload.</p>
<p>We should be able to create and add a cache pool to an existing pool of data, and later remove it, without disrupting service or migrating data around.</p>
<a name="See-also"></a>
<h3 >See also:<a href="#See-also" class="wiki-anchor">¶</a></h3>
<p><a class="external" href="http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-October/013880.html">http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-October/013880.html</a><br /><a class="external" href="http://docs.ceph.com/docs/master/rados/operations/cache-tiering/">http://docs.ceph.com/docs/master/rados/operations/cache-tiering/</a><br /><a class="external" href="https://www.packtpub.com/packtlib/book/Application-Development/9781784393502/8/ch08lvl1sec91/Creating%20a%20pool%20for%20cache%20tiering">https://www.packtpub.com/packtlib/book/Application-Development/9781784393502/8/ch08lvl1sec91/Creating%20a%20pool%20for%20cache%20tiering</a></p> Dashboard - Feature #25165 (Rejected): mgr/dashboard: Manage Ceph pool snapshotshttps://tracker.ceph.com/issues/251652018-07-30T14:42:38ZStephan Müller
<p>Currently there is no way to create / delete pool snapshots</p> Dashboard - Feature #25164 (New): mgr/dashboard: Display basic performance/utilization metrics of...https://tracker.ceph.com/issues/251642018-07-30T14:30:23ZStephan Müller
<p>When clicking on a pool in the list of pools, the pool details should show graphs of the pool's performance and utilization.</p> Dashboard - Tasks #25163 (New): mgr/dashboard: Extend the Ceph pool by configurationshttps://tracker.ceph.com/issues/251632018-07-30T14:26:53ZStephan Müller
<p>The ceph pool details found on /api/pools aren't complete yet and shall be extended by the missing configurations listed in <a class="external" href="http://docs.ceph.com/docs/master/rados/operations/pools/#get-pool-values">http://docs.ceph.com/docs/master/rados/operations/pools/#get-pool-values</a>.</p> Dashboard - Feature #25160 (New): mgr/dashboard: Create a "Create Ceph Cluster Pool Configuration...https://tracker.ceph.com/issues/251602018-07-30T14:19:56ZStephan Müller
<p>This issue is a port from this openATTIC <a href="https://tracker.openattic.org/browse/OP-1072" class="external">issue</a>.<br />For all comments and pictures please look at the original issue.</p>
<p>This Wizard should consist of these three steps:</p>
<ol>
<li>Provide a check list of options that this cluster will be used for. e.g. "openstack", "iSCSI", RGW, CephFS. Also, ask for the expected final size of this cluster.</li>
<li>Generate a dialog similar to the ceph PG calculator, which contains a table of all pools that will be created. each pool should be editable and removable.</li>
<li>Apply</li>
</ol>
<p>This wizard will only be useful, if the cluster is newly created.</p>
<p>Original Description for creating a single pool:</p>
<blockquote>
<p>Once the basic/generic functionality for creating a Ceph Pool exists (e.g. the required REST API call), we should consider creating a "Create Pool" Wizard, that guides the user through the required steps.</p>
<p>Some rough notes about this that were gathered during a call with SUSE about this:</p>
<p>First step after installation - creation of a Crush map depending if there is just one set of disks (one size, all rotational) vs. rotating disk plus SSDs (likely two different sizes?)<br />(<strong>Not</strong> for SSDs used as journal devices)</p>
<p>Crush Map creation? All your disks in one rule set, or use two separate groupings? Reason: to constrain what disks a pool can use (e.g. for creating a cache pool)<br />Can I query an OSD for the size of its disk?<br />Propose a grouping based on what is reported.</p>
<p>Create a pool for one of the following purposes</p>
<p>- Replicated or Erasure Coded? => Explain the pros and cons in sidebar<br />- Cache tiering (only if there are separate rule sets)</p>
<p>If Erasure Coded: Propose k/m values e.g. 5/3 4/2 (dropdown showing existing profiles), algorithm</p>
<p>- Suggest conservative Placement Group Number (use pgcalc algorithm?) (Hint that it can't be decreased and depends on the estimated number of Pools, probably propose a conservative number)<br />Maybe a Checkbox? "Do you intend to create additional pools?"</p>
<p>- Block devices (iSCSI), Virtual Machine Images<br />- Generic object storage</p>
<p>Note: (Cache tiering does not work with RBDs)</p>
<p>- (CephFS)</p>
</blockquote>
<p>Also Seel: <a class="external" href="http://ceph.com/pgcalc/">http://ceph.com/pgcalc/</a></p> Dashboard - Feature #25159 (New): mgr/dashboard: Add CRUSH ruleset management to CRUSH viewerhttps://tracker.ceph.com/issues/251592018-07-30T14:07:19ZStephan Müller
<p>Add support to view / create / update / delete a crush ruleset in the CRUSH map viewer.</p> mgr - Tasks #25157 (New): Refine the details of the Ceph pools opticallyhttps://tracker.ceph.com/issues/251572018-07-30T14:04:23ZStephan Müller
<p>The details of the Ceph pools in the listing are relatively raw displayed. This should be enhanced and the details should be refined optically.</p>
<a name="Data-Table"></a>
<h2 >Data Table<a href="#Data-Table" class="wiki-anchor">¶</a></h2>
<ul>
<li>Replica size is only valid for replicated pools.</li>
<li>The "type" defines, which column is valid.</li>
<li>The minimum number of replicas is missing. Maybe even as an optional column.</li>
<li>Show the pool quota.</li>
</ul>
<a name="Details"></a>
<h2 >Details<a href="#Details" class="wiki-anchor">¶</a></h2>
<ul>
<li>Show Replica size only for replicated pools.</li>
<li>Only show erasure code profile on erasure coded pools.</li>
<li>Add a mouse over or hyper link for the properties.</li>
</ul>