Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2023-03-20T20:41:49ZCeph
Redmine Dashboard - Bug #59111 (New): dashboard should use rgw_dns_name when talking to rgw apihttps://tracker.ceph.com/issues/591112023-03-20T20:41:49ZWyllys Ingersollwyllys.ingersoll@keepertech.com
<p>We have an ochestrated ceph cluster (16.2.11) with 2 radosgw services on 2 separate hosts without HA (i.e. no ingress/haproxy in front). Both of the rgw servers use SSL and have a properly signed certificate. We can access them with standard S3 tools like s3cmd, cyberduck, etc.</p>
<p>The problem seems to be that the the Ceph mgr dashboard fails to access the RGW API because it uses the shortname "gw01" instead of the FQDN "gw01.domain.com" when forming the S3 signature which makes the S3 signature check fail and we get the following error:</p>
<p>Error connecting to Object Gateway: RGW REST API failed request with status code 403 (b'{"Code":"SignatureDoesNotMatch","RequestId":"tx00000521ceca28974e94b-006408e' b'f93-454bbb4e-default","HostId":"454bbb4e-default-default"}')</p>
<p>It seems that the ceph mgr (which we have restarted several times) uses just the short hostname from the cephadm inventory and I don't see how to tell it to use the FQDN (rgw_dns_name). Neither is it possible to configure the RGW to listen on an alternate non-SSL port on the cluster private network since the service spec for RGW only allows to set the rgw_frontend_port and rgw_frontend_type, but not the full frontend spec (which would allow for multiple listeners).</p>
<p>So it seems like there are a couple of issues:<br />1. The RGW spec doesn't support the enough options to fully control the configuration of an RGW gateway.<br />2. ceph-mgr dashboard should probably use the rgw_dns_name for the RGW instead of defaulting to the short hostname from the inventory, especially when using SSL.</p> Orchestrator - Feature #57944 (Resolved): add option to allow for setting extra daemon args for c...https://tracker.ceph.com/issues/579442022-10-28T19:16:52ZWyllys Ingersollwyllys.ingersoll@keepertech.com
<p>The ceph orchestrator YML specs for service templates has an option for "extra_container_args" which allows the user to add additional arguments to the docker/podman command line.<br />We also need to be able to pass additional arguments to the service inside the container, such as node-exporter. Having a parameter like "extra_daemon_args" that passed the given args to the service itself inside the container would allow more control over the containerized service.</p>
<p>In the case of node-exporter it would enable activating (or deactivating) some of the metrics collected.</p> Ceph - Bug #57763 (New): monitor DB grows without bound during rebalancehttps://tracker.ceph.com/issues/577632022-10-04T19:17:27ZWyllys Ingersollwyllys.ingersoll@keepertech.com
<p>We have a very large cluster of about 680 OSDs across 18 storage servers. The largest and most active pool is our RGW data pool which is an 12+4 EC pool (host failure domain).</p>
<p>A few months ago after upgrading to Pacific (from Mimic), we got into a badly unbalanced situation and have never recovered. Eventually, a large number of our OSDs were well over 95% full and yet continued to grow even though no new data was being written to the cluster. Additionally, even after setting norebalance, nobackfill, disabling the PG autoscaler, the monitor DBs continue to grow until they consume all of the on-disk space (800GB) and have to be manually shut down and compacted offline (on a separate host with enough space) and then brought back online.</p>
<p>We eventually shut down all ceph services on all nodes and tried to manually move PGs using ceph-objectstore-tool from full OSDs to less-full OSDs on the same host (to avoid crossing crush boundaries). After bringing up the cluster with all of the "no" flags enabled and even after setting "osd pause", the monitors again continued growing without bound and we were unable to get them into quorum and OSDs appeared to be once again growing and eating up any leftover space on the device.</p>
Several issues that appear to be bugs or at least need some clarification:
<ul>
<li>why do OSDs continue to consume more and more space when no data is being written and rebalance and backfill are disabled?</li>
<li>why do the Monitor DBs continue to grow in size when.</li>
</ul>
<p>Is there any way to quiet a system in this state long enough to slowly bring it back online and get it back into balance?</p> rgw - Bug #48122 (Won't Fix): rgw cannot find keyring after config file is minimizedhttps://tracker.ceph.com/issues/481222020-11-04T20:25:23ZWyllys Ingersollwyllys.ingersoll@keepertech.com
<p>Given a ceph.conf file like this:</p>
<pre>
[client.radosgw.gateway]
keyring = /etc/ceph/ceph.client.radosgw.keyring
</pre>
<p>When converting to the new minimized configuration file format using the "ceph config generate-minimal-conf" command, the gateway configurations are removed. If the keyring location was one of the values and it pointed to a location other than /var/lib/ceph/radosgw/ceph-radosgw.gwname/keyring, the radosgw service cannot start because it cannot find the keyring and it cannot ask the monitor since it doesn't have a keyring to talk to the monitor.</p>
<p>The generate-minimal-conf command should either NOT remove parameters that cannot be stored in the monitor, or at least warn the user that some parameters could not be converted so they can be added to the new config file as needed.</p> RADOS - Bug #23200 (Resolved): invalid JSON returned when querying pool parametershttps://tracker.ceph.com/issues/232002018-03-02T16:18:42ZWyllys Ingersollwyllys.ingersoll@keepertech.com
<p>When requesting JSON formatted results for querying for pool<br />parameters, the list that comes back is not valid JSON. Its just a<br />series of records, not a comma separated JSON list.</p>
<p>For example:</p>
<pre>
# ceph osd pool get cephfs_data all --format json-pretty
{
"pool": "cephfs_data",
"pool_id": 1,
"size": 2
}
{
"pool": "cephfs_data",
"pool_id": 1,
"min_size": 1
}
{
"pool": "cephfs_data",
"pool_id": 1,
"crash_replay_interval": 0
}
{
"pool": "cephfs_data",
"pool_id": 1,
"pg_num": 256
}
...
</pre><br />Note that there are no commas separating the items and the overall<br />list is not enclosed in [].
<p>This cannot be parsed with standard json tools like the python json<br />module. I would expect that when requesting JSON, it would return<br />valid JSON.</p> Ceph - Bug #21761 (Can't reproduce): ceph-osd consumes way too much memory during recoveryhttps://tracker.ceph.com/issues/217612017-10-11T15:01:26ZWyllys Ingersollwyllys.ingersoll@keepertech.com
<p>The ceph-osd processes have been seen consuming upwards of 20GB of RAM per process when a system is recovering. Even on systems that are provisioned with 192GB RAM, 8 or 10 OSD processes can consume all of the RAM and make it impossible to bring up all of the OSDS which further complicates the recovery since the ones that dont have enough RAM to start get marked out which causes more rebalancing.</p>
<p>This is a critical issue for Jewel users (10.2.9) using filestore.</p>
<p>Example of a system with 128GB RAM, 11 OSDs (4TB drives) running a recovery operation (triggered by cephfs snapshot bug <a class="issue tracker-1 status-5 priority-6 priority-high2 closed" title="Bug: cephfs: too many cephfs snapshots chokes the system (Closed)" href="https://tracker.ceph.com/issues/21412">#21412</a>):</p>
<pre>
$ top
Tasks: 460 total, 1 running, 459 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.4 us, 2.0 sy, 0.0 ni, 83.2 id, 10.3 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 13193300+total, 1150304 free, 12910152+used, 1681184 buff/cache
KiB Swap: 999420 total, 2368 free, 997052 used. 483084 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4771 ceph 20 0 18.831g 0.014t 30392 S 25.0 11.3 158:41.12 ceph-osd
35087 ceph 20 0 21.398g 0.017t 20092 S 25.0 13.5 145:30.42 ceph-osd
3929 ceph 20 0 22.269g 0.018t 27516 S 18.8 14.5 105:34.46 ceph-osd
86341 ceph 20 0 23.413g 0.019t 33712 S 12.5 15.4 100:29.06 ceph-osd
108322 ceph 20 0 4398468 2.628g 483264 D 12.5 2.1 0:22.57 ceph-osd
4068 ceph 20 0 22.543g 0.018t 49780 S 6.2 14.4 161:15.71 ceph-osd
11898 ceph 20 0 19.647g 0.014t 34524 S 6.2 11.6 144:21.33 ceph-osd
$ free -h
total used free shared buff/cache available
Mem: 125G 123G 1.1G 88M 1.3G 385M
Swap: 975M 973M 2.3M
</pre>
<p>Starting new OSDs becomes nearly impossible once the system memory becomes low. The ceph-osd process will start, but then soon crashes and restarts again endlessly, the crash log looks like this:</p>
<pre>
-10> 2017-10-11 10:18:44.480698 7f6cf18658c0 5 osd.72 pg_epoch: 303792 pg[28.36b(unlocked)] enter Initial
-9> 2017-10-11 10:18:44.480996 7f6cf18658c0 5 osd.72 pg_epoch: 303792 pg[28.36b( v 269495'33 (0'0,269495'33] local-les=303763 n=12 ec=269489 les/c/f 303608/300422/0 303758/303760/284901) [100,72,37] r=1 lpr=0 pi=269489-303759/139 crt=269495'33 lcod 0'0 inactive NOTIFY NIBBLEWISE] exit Initial 0.000298 0 0.000000
-8> 2017-10-11 10:18:44.481010 7f6cf18658c0 5 osd.72 pg_epoch: 303792 pg[28.36b( v 269495'33 (0'0,269495'33] local-les=303763 n=12 ec=269489 les/c/f 303608/300422/0 303758/303760/284901) [100,72,37] r=1 lpr=0 pi=269489-303759/139 crt=269495'33 lcod 0'0 inactive NOTIFY NIBBLEWISE] enter Reset
-7> 2017-10-11 10:18:44.645087 7f6cf18658c0 5 osd.72 pg_epoch: 303768 pg[1.4f4(unlocked)] enter Initial
-6> 2017-10-11 10:18:46.762577 7f6cf18658c0 5 osd.72 pg_epoch: 303768 pg[1.4f4( v 279387'245575 (202648'235117,279387'245575] local-les=303763 n=8170 ec=23117 les/c/f 303101/280936/0 303758/303760/302603) [67,72,100] r=1 lpr=0 pi=278053-303759/75 crt=279387'245575 lcod 0'0 inactive NOTIFY NIBBLEWISE] exit Initial 2.117488 0 0.000000
-5> 2017-10-11 10:18:46.762616 7f6cf18658c0 5 osd.72 pg_epoch: 303768 pg[1.4f4( v 279387'245575 (202648'235117,279387'245575] local-les=303763 n=8170 ec=23117 les/c/f 303101/280936/0 303758/303760/302603) [67,72,100] r=1 lpr=0 pi=278053-303759/75 crt=279387'245575 lcod 0'0 inactive NOTIFY NIBBLEWISE] enter Reset
-4> 2017-10-11 10:18:46.816171 7f6cf18658c0 5 osd.72 pg_epoch: 303791 pg[9.644(unlocked)] enter Initial
-3> 2017-10-11 10:18:46.825798 7f6cf18658c0 5 osd.72 pg_epoch: 303791 pg[9.644( v 269495'38555 (35683'35552,269495'38555] local-les=303763 n=27 ec=1076 les/c/f 300534/300248/0 303758/303759/295662) [62,68,72] r=2 lpr=0 pi=203019-303758/217 crt=269495'38555 lcod 0'0 inactive NOTIFY NIBBLEWISE] exit Initial 0.009627 0 0.000000
-2> 2017-10-11 10:18:46.825815 7f6cf18658c0 5 osd.72 pg_epoch: 303791 pg[9.644( v 269495'38555 (35683'35552,269495'38555] local-les=303763 n=27 ec=1076 les/c/f 300534/300248/0 303758/303759/295662) [62,68,72] r=2 lpr=0 pi=203019-303758/217 crt=269495'38555 lcod 0'0 inactive NOTIFY NIBBLEWISE] enter Reset
-1> 2017-10-11 10:18:46.974700 7f6cf18658c0 5 osd.72 pg_epoch: 303798 pg[1.5a4(unlocked)] enter Initial
0> 2017-10-11 10:18:47.517124 7f6cf18658c0 -1 *** Caught signal (Aborted) **
in thread 7f6cf18658c0 thread_name:ceph-osd
ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)
1: (()+0x984c4e) [0x56400a2d3c4e]
2: (()+0x11390) [0x7f6cf0723390]
3: (gsignal()+0x38) [0x7f6cee6c1428]
4: (abort()+0x16a) [0x7f6cee6c302a]
5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f6cef00384d]
6: (()+0x8d6b6) [0x7f6cef0016b6]
7: (()+0x8d701) [0x7f6cef001701]
8: (()+0x8d919) [0x7f6cef001919]
9: (()+0x1230f) [0x7f6cf13fb30f]
10: (operator new[](unsigned long)+0x4e7) [0x7f6cf141f4b7]
11: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x313) [0x7f6cf0dace63]
12: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, leveldb::Slice const&)+0x276) [0x7f6cf0db1426]
13: (()+0x421be) [0x7f6cf0db51be]
14: (()+0x42240) [0x7f6cf0db5240]
15: (()+0x4261e) [0x7f6cf0db561e]
16: (()+0x3d835) [0x7f6cf0db0835]
17: (()+0x1fffb) [0x7f6cf0d92ffb]
18: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x8f) [0x56400a18f40f]
19: (DBObjectMap::DBObjectMapIteratorImpl::next(bool)+0x34) [0x56400a142954]
20: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&, std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, bool, DoutPrefixProvider const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)+0xafb) [0x564009f7e97b]
21: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x313) [0x564009dacfb3]
22: (OSD::load_pgs()+0x87a) [0x564009ce796a]
23: (OSD::init()+0x2026) [0x564009cf2c56]
24: (main()+0x2ef1) [0x564009c64391]
25: (__libc_start_main()+0xf0) [0x7f6cee6ac830]
26: (_start()+0x29) [0x564009ca5b99]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
</pre> CephFS - Bug #21412 (Closed): cephfs: too many cephfs snapshots chokes the systemhttps://tracker.ceph.com/issues/214122017-09-15T21:53:10ZWyllys Ingersollwyllys.ingersoll@keepertech.com
<p>We have a cluster with /cephfs/.snap directory with over 4800 entries. Trying to delete older snapshots (some are over 6 months old on a pretty active file system) causes the "rmdir" command to hang, as well as any future operations on the .snap directory (such as 'ls'). Also, it is causing the number of blocked requests to grow indefinitely.</p>
<p>Ceph 10.2.7<br />Ubuntu 16.04.2<br />Kernel: 4.9.10</p> Ceph - Bug #17685 (Resolved): ceph osd metadata produced bad json on errorhttps://tracker.ceph.com/issues/176852016-10-24T17:25:16ZWyllys Ingersollwyllys.ingersoll@keepertech.com
<p>I think there is still a bug in the "osd metadata" reporting in 10.2.3<br />- the JSON structure returned is not terminated when the OSD is added<br />but not running or added to the crush map yet.</p>
<p>Its an odd condition to get into, but when adding a disk and there is<br />an issue that causes it to fail to complete the add operation such as<br />when the permissions in the /var/lib/ceph/osd/osd/XXX are incorrectly<br />set to root:root instead of ceph:ceph, the metadata output does not<br />terminate with the final closing bracket "]".</p>
<p>Here is the end of the (truncated) output from "ceph osd tree" showing<br />the disk just recently added but without any weight and marked "down".<br /><pre>
-2 130.67999 host ic1ss06
0 3.62999 osd.0 up 1.00000 1.00000
6 3.62999 osd.6 up 1.00000 1.00000
7 3.62999 osd.7 up 1.00000 1.00000
13 3.62999 osd.13 up 1.00000 1.00000
21 3.62999 osd.21 up 1.00000 1.00000
27 3.62999 osd.27 up 1.00000 1.00000
33 3.62999 osd.33 up 1.00000 1.00000
39 3.62999 osd.39 up 1.00000 1.00000
46 3.62999 osd.46 up 1.00000 1.00000
48 3.62999 osd.48 up 1.00000 1.00000
55 3.62999 osd.55 up 1.00000 1.00000
60 3.62999 osd.60 up 1.00000 1.00000
66 3.62999 osd.66 up 1.00000 1.00000
72 3.62999 osd.72 up 1.00000 1.00000
75 3.62999 osd.75 up 1.00000 1.00000
81 3.62999 osd.81 up 1.00000 1.00000
88 3.62999 osd.88 up 1.00000 1.00000
97 3.62999 osd.97 up 1.00000 1.00000
99 3.62999 osd.99 up 1.00000 1.00000
102 3.62999 osd.102 up 1.00000 1.00000
110 3.62999 osd.110 up 1.00000 1.00000
120 3.62999 osd.120 up 1.00000 1.00000
127 3.62999 osd.127 up 1.00000 1.00000
129 3.62999 osd.129 up 1.00000 1.00000
136 3.62999 osd.136 up 1.00000 1.00000
140 3.62999 osd.140 up 1.00000 1.00000
147 3.62999 osd.147 up 1.00000 1.00000
155 3.62999 osd.155 up 1.00000 1.00000
165 3.62999 osd.165 up 1.00000 1.00000
166 3.62999 osd.166 up 1.00000 1.00000
174 3.62999 osd.174 up 1.00000 1.00000
184 3.62999 osd.184 up 1.00000 1.00000
190 3.62999 osd.190 up 1.00000 1.00000
194 3.62999 osd.194 up 1.00000 1.00000
202 3.62999 osd.202 up 1.00000 1.00000
209 3.62999 osd.209 up 1.00000 1.00000
173 0 osd.173 down 1.00000 1.00000
</pre></p>
<p>Now when I run "ceph osd metadata", note that the closing "]" is missing.</p>
<pre>
$ ceph osd metadata
[
...
"osd": {
"id": 213,
"arch": "x86_64",
"back_addr": "10.10.21.54:6861\/168468",
"backend_filestore_dev_node": "unknown",
"backend_filestore_partition_path": "unknown",
"ceph_version": "ceph version 10.2.3
(ecc23778eb545d8dd55e2e4735b53cc93f92e65b)",
"cpu": "Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz",
"distro": "Ubuntu",
"distro_codename": "trusty",
"distro_description": "Ubuntu 14.04.3 LTS",
"distro_version": "14.04",
"filestore_backend": "xfs",
"filestore_f_type": "0x58465342",
"front_addr": "10.10.20.54:6825\/168468",
"hb_back_addr": "10.10.21.54:6871\/168468",
"hb_front_addr": "10.10.20.54:6828\/168468",
"hostname": "ic1ss04",
"kernel_description": "#26~14.04.1-Ubuntu SMP Fri Jul 24
21:16:20 UTC 2015",
"kernel_version": "3.19.0-25-generic",
"mem_swap_kb": "15998972",
"mem_total_kb": "131927464",
"os": "Linux",
"osd_data": "\/var\/lib\/ceph\/osd\/ceph-213",
"osd_journal": "\/var\/lib\/ceph\/osd\/ceph-213\/journal",
"osd_objectstore": "filestore"
},
"osd": {
"id": 214,
"arch": "x86_64",
"back_addr": "10.10.21.55:6877\/177645",
"backend_filestore_dev_node": "unknown",
"backend_filestore_partition_path": "unknown",
"ceph_version": "ceph version 10.2.3
(ecc23778eb545d8dd55e2e4735b53cc93f92e65b)",
"cpu": "Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz",
"distro": "Ubuntu",
"distro_codename": "trusty",
"distro_description": "Ubuntu 14.04.3 LTS",
"distro_version": "14.04",
"filestore_backend": "xfs",
"filestore_f_type": "0x58465342",
"front_addr": "10.10.20.55:6844\/177645",
"hb_back_addr": "10.10.21.55:6879\/177645",
"hb_front_addr": "10.10.20.55:6848\/177645",
"hostname": "ic1ss05",
"kernel_description": "#26~14.04.1-Ubuntu SMP Fri Jul 24
21:16:20 UTC 2015",
"kernel_version": "3.19.0-25-generic",
"mem_swap_kb": "15998972",
"mem_total_kb": "131927464",
"os": "Linux",
"osd_data": "\/var\/lib\/ceph\/osd\/ceph-214",
"osd_journal": "\/var\/lib\/ceph\/osd\/ceph-214\/journal",
"osd_objectstore": "filestore"
}
}
^^^^
Missing closing "]"
</pre> Ceph - Bug #13214 (Resolved): ceph upstart script rbdmap.conf incorrectly processes parametershttps://tracker.ceph.com/issues/132142015-09-23T19:28:09ZWyllys Ingersollwyllys.ingersoll@keepertech.com
<p>The upstart script for mapping rbd devices incorrectly creates the "CMDPARAMS" value such that it becomes cumulative for every device being mapped.</p>
<pre>
while read DEV PARAMS; do
case "$DEV" in
""|\#*)
continue
;;
*/*)
;;
*)
DEV=rbd/$DEV
;;
esac
for PARAM in $(echo $PARAMS | tr ',' '\n'); do
CMDPARAMS="$CMDPARAMS --$(echo $PARAM | tr '=' ' ')"
done
if [ ! -b /dev/rbd/$DEV ]; then
echo "rbd map $DEV"
rbd map $DEV $CMDPARAMS
fi
done < $RBDMAPFILE
</pre>
<p>See that "$CMDPARAMS" is constantly growing with each line that gets processed. It needs to be unique for each line, and not accumulate the options associated with previous lines.</p>
<p>Fix:<br />Change: <br /><pre>
for PARAM in $(echo $PARAMS | tr ',' '\n'); do
CMDPARAMS="$CMDPARAMS --$(echo $PARAM | tr '=' ' ')"
done
</pre></p>
<p>To:<br /><pre>
for PARAM in $(echo $PARAMS | tr ',' '\n'); do
CMDPARAMS="--$(echo $PARAM | tr '=' ' ')"
done
</pre></p>