Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2022-01-28T15:55:57ZCeph
Redmine Orchestrator - Documentation #54056 (New): https://docs.ceph.com/en/latest/cephadm/install/#deplo...https://tracker.ceph.com/issues/540562022-01-28T15:55:57ZSebastian Wagner
<p><a class="external" href="https://docs.ceph.com/en/latest/cephadm/install/#deployment-in-an-isolated-environment">https://docs.ceph.com/en/latest/cephadm/install/#deployment-in-an-isolated-environment</a></p>
<ul>
<li>e.g. the monitoring stack and ingress is completely missing</li>
<li>link to the registry-login functionality</li>
</ul> Orchestrator - Cleanup #54042 (New): move pybind/ceph_argparse.py and pybind/ceph_daemon.py into ...https://tracker.ceph.com/issues/540422022-01-27T15:28:18ZSebastian Wagner
<p>Why? because there is not reason we need those as extra RPM packages. We can simply remove those RPM packages and simply make use of python-common.</p>
<p>This is basically just a <strong>git mv src/pybind/ceph_argparse.py src/python_common/ceph/argparse.py</strong> plus some import fixes plus some ceph.spec cleanup.</p>
<p>Benefit is clear: as those modules are huge, we'd have the ability to properly refactor those finally</p> Orchestrator - Tasks #54011 (New): Improve osdspec previewshttps://tracker.ceph.com/issues/540112022-01-25T11:32:04ZSebastian Wagner
<p>There are three major things that needs to be improved for osd spec previews:</p>
<ul>
<li>Can we make it such that we don't actually need to ssh to remote hosts? And instead rely solely on the data cephadm already has available? This is needed in order to make use of it from the dashboard</li>
<li>can we get some decent testing coverage? I don't know if this is still properly working</li>
<li>Can we prettify the output it generates?</li>
</ul> Orchestrator - Cleanup #54002 (New): orchestrator interface: service type should be a Python Enumhttps://tracker.ceph.com/issues/540022022-01-24T16:36:01ZSebastian Wagner
<pre><code class="python syntaxhl"><span class="CodeRay"><span class="keyword">class</span> <span class="class">ServiceType</span>(enum.Enum):
mon = <span class="string"><span class="delimiter">'</span><span class="content">mon</span><span class="delimiter">'</span></span>
mgr = <span class="string"><span class="delimiter">'</span><span class="content">mgr</span><span class="delimiter">'</span></span>
rbd_mirror = <span class="string"><span class="delimiter">'</span><span class="content">rbd-mirror</span><span class="delimiter">'</span></span>
cephfs_mirror = <span class="string"><span class="delimiter">'</span><span class="content">cephfs-mirror</span><span class="delimiter">'</span></span>
crash = <span class="string"><span class="delimiter">'</span><span class="content">crash</span><span class="delimiter">'</span></span>
alertmanager = <span class="string"><span class="delimiter">'</span><span class="content">alertmanager</span><span class="delimiter">'</span></span>
grafana = <span class="string"><span class="delimiter">'</span><span class="content">grafana</span><span class="delimiter">'</span></span>
node_exporter = <span class="string"><span class="delimiter">'</span><span class="content">node-exporter</span><span class="delimiter">'</span></span>
prometheus = <span class="string"><span class="delimiter">'</span><span class="content">prometheus</span><span class="delimiter">'</span></span>
mds = <span class="string"><span class="delimiter">'</span><span class="content">mds</span><span class="delimiter">'</span></span>
rgw = <span class="string"><span class="delimiter">'</span><span class="content">rgw</span><span class="delimiter">'</span></span>
nfs = <span class="string"><span class="delimiter">'</span><span class="content">nfs</span><span class="delimiter">'</span></span>
iscsi = <span class="string"><span class="delimiter">'</span><span class="content">iscsi</span><span class="delimiter">'</span></span>
snmp_gateway = <span class="string"><span class="delimiter">'</span><span class="content">snmp-gateway</span><span class="delimiter">'</span></span>
</span></code></pre>
<p>should be used throughout the whole cephadm code base. this makes things a bit more clearer.</p>
<p>Is it worth it? I don't know.</p> Orchestrator - Cleanup #54001 (New): type safe Python mon_command API clientshttps://tracker.ceph.com/issues/540012022-01-24T16:13:41ZSebastian Wagner
<p>like for the last 6 years, I always wondered, why we don't have an type<br />safe way to use the mon command api. Turns out I never had the time to<br />actually work on it. But my idea looks more or less like so:</p>
<p><a class="external" href="https://pad.ceph.com/p/mon_command_python_api">https://pad.ceph.com/p/mon_command_python_api</a></p>
<p>what do you think? Is there any use for something like this?</p>
<pre><code class="python syntaxhl"><span class="CodeRay"><span class="comment"># mon_command_api.py</span>
<span class="keyword">class</span> <span class="class">CommandResult</span>(NamedTuple):
ret: <span class="predefined">int</span>
stdout: <span class="predefined">str</span>
stderr: <span class="predefined">str</span>
<span class="keyword">def</span> <span class="function">Call</span>(NamedTuple):
api: MonCommandApi
prefix: <span class="predefined">str</span>
<span class="keyword">def</span> <span class="function">__call__</span>(<span class="predefined-constant">self</span>, **kwargs):
kwargs[<span class="string"><span class="delimiter">'</span><span class="content">prefix</span><span class="delimiter">'</span></span>] = prefix
<span class="keyword">return</span> CommandResult(<span class="predefined-constant">self</span>.api._mon_command(kwargs))
<span class="keyword">class</span> <span class="class">MonCommandApi</span>:
<span class="keyword">def</span> <span class="function">__init__</span>(<span class="predefined-constant">self</span>, rados):
<span class="predefined-constant">self</span>.rados = rados
<span class="keyword">def</span> <span class="function">_mon_command</span>(<span class="predefined-constant">self</span>, <span class="predefined">dict</span>):
<span class="keyword">return</span> CommandResult(<span class="predefined-constant">self</span>.rados.mon_command(<span class="predefined">dict</span>))
<span class="keyword">def</span> <span class="function">__get_attr__</span>(<span class="predefined-constant">self</span>, aname):
<span class="keyword">return</span> Call(<span class="predefined-constant">self</span>, aname.replace(<span class="string"><span class="delimiter">'</span><span class="content">_</span><span class="delimiter">'</span></span>, <span class="string"><span class="delimiter">'</span><span class="content"> </span><span class="delimiter">'</span></span>))
<span class="comment"># mon_command_api.pyi. Autogenerated by a script, exactly like https://docs.ceph.com/en/latest/api/mon_command_api/</span>
<span class="keyword">class</span> <span class="class">MonCommandApi</span>:
<span class="keyword">def</span> <span class="function">__init__</span>(<span class="predefined-constant">self</span>, rados): <span class="predefined-constant">None</span>:
<span class="predefined-constant">self</span>.rados = rados
<span class="keyword">def</span> <span class="function">_mon_command</span>(<span class="predefined-constant">self</span>, <span class="predefined">dict</span>) -> CommandResult:
...
<span class="keyword">def</span> <span class="function">orch_rm_daemon</span>(name: <span class="predefined">str</span>) -> CommandResult
...
<span class="keyword">def</span> <span class="function">osd_create_pool</span>(pool: <span class="predefined">str</span>, pg_num: <span class="predefined">int</span>, size: <span class="predefined">int</span>)
...
<span class="comment"># example use:</span>
<span class="keyword">def</span> <span class="function">orch_rm_daemon</span>(daemon_name):
api = MonCommandApi(...)
<span class="keyword">return</span> api.orch_rm_daemon(name=daemon_name)
<span class="keyword">def</span> <span class="function">other_function</span>(daemon_name):
api = MonCommandApi(...)
<span class="keyword">return</span> api.osd_create_pool(pool=<span class="string"><span class="delimiter">'</span><span class="content">my_pool</span><span class="delimiter">'</span></span>, pg_num=<span class="integer">42</span>, size=<span class="integer">3</span>)
</span></code></pre> Orchestrator - Cleanup #54000 (New): cephadm: upgrade commands should return yamlhttps://tracker.ceph.com/issues/540002022-01-24T15:39:53ZSebastian Wagner
<p>Right now, commands like</p>
<ul>
<li>ceph orch upgrade ls</li>
<li>ceph orch upgrade status</li>
</ul>
<p>are returning json. YAML is much more readable. Let's return yaml instead!</p>
<p><strong>Note</strong>, now that needs a <strong>--format=...</strong> argument.</p> Orchestrator - Cleanup #53999 (New): orch interface: cephadm contains a lot of special apply methodshttps://tracker.ceph.com/issues/539992022-01-24T15:28:49ZSebastian Wagner
<pre><code class="python syntaxhl"><span class="CodeRay"> <span class="decorator">@handle_orch_error</span>
<span class="keyword">def</span> <span class="function">apply_rgw</span>(<span class="predefined-constant">self</span>, spec: ServiceSpec) -> <span class="predefined">str</span>:
<span class="keyword">return</span> <span class="predefined-constant">self</span>._apply(spec)
</span></code></pre>
<p>Let's remove them! It's getting out of hand by now. Just use <strong>apply</strong> for everything.</p> Orchestrator - Cleanup #53998 (New): Drop support for old-style service spechttps://tracker.ceph.com/issues/539982022-01-24T15:25:56ZSebastian Wagner
<pre><code class="yaml syntaxhl"><span class="CodeRay"><span class="key">service_type</span>: <span class="string"><span class="content">nfs</span></span>
<span class="key">service_id</span>: <span class="string"><span class="content">foo</span></span>
<span class="key">pool</span>: <span class="string"><span class="content">mypool</span></span>
<span class="key">namespace</span>: <span class="string"><span class="content">myns</span></span>
</span></code></pre>
<p>I.e. without any indentation. This is irritating for more complicated specs and already deprecated and no longer documented. Let's drop it accordingly!</p> Orchestrator - Feature #53967 (New): support minor downgradeshttps://tracker.ceph.com/issues/539672022-01-21T16:47:54ZSebastian Wagner
<p>There is a general (needs to be properly supported by all components!) demand to support minor downgrades. For that we have to solve things like</p>
<pre>
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.359+0000 7f1657d84700 0 log_channel(cephadm) log [DBG] : mgr option log_to_file = False
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.359+0000 7f1657d84700 0 log_channel(cephadm) log [DBG] : mgr option log_to_cluster_level = debug
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.374+0000 7f1657d84700 -1 mgr load Failed to construct class in 'cephadm'
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.374+0000 7f1657d84700 -1 mgr load Traceback (most recent call last):
Jan 21 15:09:08 standalone.localdomain conmon[286544]: File "/usr/share/ceph/mgr/cephadm/module.py", line 405, in __init__
Jan 21 15:09:08 standalone.localdomain conmon[286544]: self.upgrade = CephadmUpgrade(self)
Jan 21 15:09:08 standalone.localdomain conmon[286544]: File "/usr/share/ceph/mgr/cephadm/upgrade.py", line 76, in __init__
Jan 21 15:09:08 standalone.localdomain conmon[286544]: self.upgrade_state: Optional[UpgradeState] = UpgradeState.from_json(json.loads(t))
Jan 21 15:09:08 standalone.localdomain conmon[286544]: File "/usr/share/ceph/mgr/cephadm/upgrade.py", line 58, in from_json
Jan 21 15:09:08 standalone.localdomain conmon[286544]: return cls(**c)
Jan 21 15:09:08 standalone.localdomain conmon[286544]: TypeError: __init__() got an unexpected keyword argument 'fs_original_allow_standby_replay'
Jan 21 15:09:08 standalone.localdomain conmon[286544]:
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.379+0000 7f1657d84700 -1 mgr operator() Failed to run module in active mode ('cephadm')
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.380+0000 7f1657d84700 0 [crash DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.380+0000 7f1657d84700 1 mgr load Constructed class from module: crash
</pre> Orchestrator - Bug #53965 (New): cephadm: RGW container is crashing at 'rados_nobjects_list_next2...https://tracker.ceph.com/issues/539652022-01-21T15:53:15ZSebastian Wagner
<p>when upgrading from ceph-ansible, we're getting:</p>
<pre>
Jan 12 18:30:14 host conmon[12897]: debug 2022-01-12T17:30:14.112+0000 0 starting handler: beast
Jan 12 18:30:14 host conmon[12897]: ceph version 16.2.0-146.x (sha) pacific (stable)
Jan 12 18:30:14 host conmon[12897]: 1: /lib64/libpthread.so.0(+0x12c20) [0x7fb4b1e86c20]
Jan 12 18:30:14 host conmon[12897]: 2: gsignal()
Jan 12 18:30:14 host conmon[12897]: 3: abort()
Jan 12 18:30:14 host conmon[12897]: 4: /lib64/libstdc++.so.6(+0x9009b) [0x7fb4b0e7a09b]
Jan 12 18:30:14 host conmon[12897]: 5: /lib64/libstdc++.so.6(+0x9653c) [0x7fb4b0e8053c]
Jan 12 18:30:14 host conmon[12897]: 6: /lib64/libstdc++.so.6(+0x96597) [0x7fb4b0e80597]
Jan 12 18:30:14 host conmon[12897]: 7: /lib64/libstdc++.so.6(+0x967f8) [0x7fb4b0e807f8]
Jan 12 18:30:14 host conmon[12897]: 8: /lib64/librados.so.2(+0x3a4c0) [0x7fb4bc3f74c0]
Jan 12 18:30:14 host conmon[12897]: 9: /lib64/librados.so.2(+0x809d2) [0x7fb4bc43d9d2]
Jan 12 18:30:14 host conmon[12897]: 10: (librados::v14_2_0::IoCtx::nobjects_begin(librados::v14_2_0::ObjectCursor const&, ceph::buffer::v15_2_0::list const&)+0x5c) [0x7fb4bc43da5c]
Jan 12 18:30:14 host conmon[12897]: 11: (RGWSI_RADOS::Pool::List::init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWAccessListFilter*)+0x115) [0x7fb4bd26d8e5]
Jan 12 18:30:14 host conmon[12897]: 12: (RGWSI_SysObj_Core::pool_list_objects_init(rgw_pool const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWSI_SysObj::Pool::ListCtx*)+0x24b) [0x7fb4bcd9b0bb]
Jan 12 18:30:14 host conmon[12897]: 13: (RGWSI_MetaBackend_SObj::list_init(RGWSI_MetaBackend::Context*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1e0) [0x7fb4bd2609d0]
Jan 12 18:30:14 host conmon[12897]: 14: (RGWMetadataHandler_GenericMetaBE::list_keys_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void**)+0x40) [0x7fb4bcead940]
Jan 12 18:30:14 host conmon[12897]: 15: (RGWMetadataManager::list_keys_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void**)+0x73) [0x7fb4bceaff13]
Jan 12 18:30:14 host conmon[12897]: 16: (RGWMetadataManager::list_keys_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void**)+0x3e) [0x7fb4bceaffce]
Jan 12 18:30:14 host conmon[12897]: 17: (RGWUserStatsCache::sync_all_users(optional_yield)+0x71) [0x7fb4bd052361]
Jan 12 18:30:14 host conmon[12897]: 18: (RGWUserStatsCache::UserSyncThread::entry()+0x10f) [0x7fb4bd05984f]
Jan 12 18:30:14 host conmon[12897]: 19: /lib64/libpthread.so.0(+0x817a) [0x7fb4b1e7c17a]
Jan 12 18:30:14 host conmon[12897]: 20: clone()
</pre>
<p>One possibility is that the pools might not have the rgw tag enabled.</p>
<p><strong>possible workaround</strong></p>
<p>add the rgw tag to all the rgw pools.</p>
<p><strong>known workaround</strong></p>
<p>change the cephx caps of all affected daemons from</p>
<pre>
mon allow *
mgr allow rw
osd allow rwx tag rgw *=*
</pre>
<p>to</p>
<pre>
mon allow *
mgr allow rw
osd allow rwx
</pre> Orchestrator - Bug #53652 (Closed): cephadm "Verifying IP <ip> port 3300" ... -> "OSError: [Errno...https://tracker.ceph.com/issues/536522021-12-17T11:21:27ZSebastian Wagner
<p>We have to get better at returning better error messages:</p>
<pre>
ubuntu@admin:~$ sudo cephadm --docker --image "admin:5000/ceph/ceph:v16" bootstrap --skip-monitoring-stack --mon-ip 192.168.122.51
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chrony.service is enabled and running
Repeating the final host check...
systemctl is present
lvcreate is present
Unit chrony.service is enabled and running
Host looks OK
Cluster fsid: fsid
Verifying IP <ip> port 3300 ...
Traceback (most recent call last):
File "/usr/sbin/cephadm", line 6242, in <module>
r = args.func()
File "/usr/sbin/cephadm", line 1451, in _default_image
return func()
File "/usr/sbin/cephadm", line 2923, in command_bootstrap
check_ip_port(args.mon_ip, 3300)
File "/usr/sbin/cephadm", line 771, in check_ip_port
attempt_bind(s, ip, port)
File "/usr/sbin/cephadm", line 731, in attempt_bind
raise e
File "/usr/sbin/cephadm", line 724, in attempt_bind
s.bind((address, port))
OSError: [Errno 99] Cannot assign requested address
</pre>
<ol>
<li>This is <strong>not</strong> helpful and users need better guidance: E.g. a better error message with more details</li>
<li>Why port 3000? </li>
<li>Also, we shouldn't print a Traceback here.</li>
</ol> Orchestrator - Feature #53562 (New): cephadm doesn't support osd crush_location_hookhttps://tracker.ceph.com/issues/535622021-12-09T11:59:53ZSebastian Wagner
<p>crush_location_hook is a path to an executable that is executed in order to update the current OSD's crush location. Executed like so:</p>
<pre>
$crush_location_hook --cluster {cluster-name} --id {ID} --type {daemon-type}
</pre>
<p>and prints out the current crush locations.</p>
<p>Workarounds:</p>
<ul>
<li>For a per-host based location, we have: <a class="external" href="https://docs.ceph.com/en/latest/cephadm/host-management/#setting-the-initial-crush-location-of-host">https://docs.ceph.com/en/latest/cephadm/host-management/#setting-the-initial-crush-location-of-host</a> which should cover a lot of use cases.</li>
<li>Build a new container image locally and add the crush_location_hook executable to it. Then set the config option to the file path within the container</li>
</ul> Orchestrator - Feature #53539 (New): ceph orch exposes no way to see what’s queued or remove work...https://tracker.ceph.com/issues/535392021-12-08T13:50:27ZSebastian Wagner
<p>ceph orch exposes no way to see what’s queued or remove work from the queue</p>
<p>We should probably expose the Queue of scheduled ops in <strong>cpeh orch status</strong>.</p> Orchestrator - Tasks #53533 (New): cephadm: performance: Make ceph.conf change reconfiguration Pa...https://tracker.ceph.com/issues/535332021-12-08T12:11:17ZSebastian Wagner
<p>reconfiguring daemon (e.g., ceph.conf change, e.g., new mon) is serialized and slow, and happens before anything else (like deploying new services)</p> Orchestrator - Documentation #53532 (New): cephadm document Prometheus Sizing observationshttps://tracker.ceph.com/issues/535322021-12-08T11:50:52ZSebastian Wagner
Prometheus Sizing observations
<ul>
<li>22/11 48 osds growing to 600+ - 124GB Disk space, peak of 3000 IOPS, 150MB/s, 2 cores (peak 6), 43G RAM</li>
<li>with 4147 OSDs, storage requirement is ~230GB, 2 cores (peak 8)</li>
</ul>
<p>Should be added to <a class="external" href="https://docs.ceph.com/en/latest/cephadm/services/monitoring/">https://docs.ceph.com/en/latest/cephadm/services/monitoring/</a></p>
<p>Paul:</p>
<ul>
<li>What does <strong>22/11</strong> mean?</li>
</ul>