Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2017-12-15T08:16:01ZCeph
Redmine RADOS - Backport #22450 (Resolved): luminous: Visibility for snap trim queue lengthhttps://tracker.ceph.com/issues/224502017-12-15T08:16:01ZPiotr Dalekpiotr.dalek@corp.ovh.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/20098">https://github.com/ceph/ceph/pull/20098</a></p> RADOS - Backport #22449 (Resolved): jewel: Visibility for snap trim queue lengthhttps://tracker.ceph.com/issues/224492017-12-15T08:13:47ZPiotr Dalekpiotr.dalek@corp.ovh.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/21200">https://github.com/ceph/ceph/pull/21200</a></p> RADOS - Feature #22448 (Resolved): Visibility for snap trim queue lengthhttps://tracker.ceph.com/issues/224482017-12-15T08:11:49ZPiotr Dalekpiotr.dalek@corp.ovh.com
<p>We observed unexplained, constant disk space usage increase on a few of our prod clusters. At first we thought that it's because of customers abusing them, but that wasn't it. Then we though that images are constantly filled with data, but space usage reported by Ceph wasn't consistent with filesystem. After further digging, we realized that snap trim queues for some of PGs are in 250k elements territory... We increased the snap trimmer frequency and number of parallel snap trim ops and disk space usage finally started to drop.<br />Ceph needs a features to efficiently and conveniently access snap trim queue lengths so it can be used with monitoring, and a features to warn Ceph cluster admins when snap trim queues are long enough to be requiring some attention.</p>
<p><a class="external" href="https://github.com/ceph/ceph/pull/19520">https://github.com/ceph/ceph/pull/19520</a></p> Ceph - Feature #21862 (Resolved): ceph-conf: dump parsed config in plain text or as jsonhttps://tracker.ceph.com/issues/218622017-10-20T08:30:33ZPiotr Dalekpiotr.dalek@corp.ovh.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/18350">https://github.com/ceph/ceph/pull/18350</a></p>
<p>ceph-conf is missing a possibility to dump entire pre-parsed configs. This PR implements that.</p> RADOS - Bug #21354 (Closed): Possible bug in interval_set.intersect_of()https://tracker.ceph.com/issues/213542017-09-11T15:44:26ZPiotr Dalekpiotr.dalek@corp.ovh.com
<p>I've been working on different kind of optimization of pg_pool_t::build_removed_snaps (that gets rid of intersect interval set, saving time and memory as this itself isn't really needed for PGPool::update), when I realized that I get different results than expected. By backtracking, I managed to get same results by rollbacking <a class="external" href="https://github.com/ceph/ceph/pull/17088/commits/825470fcf9190c94422716454dbf11b24350a748">https://github.com/ceph/ceph/pull/17088/commits/825470fcf9190c94422716454dbf11b24350a748</a> (actually, just commenting out a call to intersection_size_asym() is enough to make it produce correct results). I suspect it's buggy, but I'm not certain of that.<br />My test program and both expected and invalid output are here: <a class="external" href="http://ceph.predictor.org.pl/ceph_intersect_bug.txt">http://ceph.predictor.org.pl/ceph_intersect_bug.txt</a><br />Can someone look into this and either point me to actual error in my code, or confirm my observations?</p> RADOS - Bug #19512 (Won't Fix): Sparse file info in filestore not propagated to other OSDshttps://tracker.ceph.com/issues/195122017-04-06T10:21:14ZPiotr Dalekpiotr.dalek@corp.ovh.com
<p>We recently had an interesting issue with RBD images and filestore on Jewel 10.2.5:<br />We have a pool with RBD images, all of them mostly untouched (large areas of those images unused), and once we added 3 new OSDs to cluster, objects representing these images grew substantially on new OSDs: objects hosting unused areas of these images on original OSDs remained small (~8K of space actually used, 4M allocated), but on new OSDs were large (4M allocated <strong>and</strong> actually used). After investigation we concluded that Ceph didn't propagate sparse file information during cluster rebalance, resulting in correct data contents on all OSDs, but no sparse file data on new OSDs, hence disk space usage increase on those.</p>
<p>Example on test cluster, before growing it by one OSD:</p>
<p>ls:</p>
<p>osd-01-cluster: <del>rw-r--r-</del> 1 root root 4194304 Apr 6 09:18 /var/lib/ceph/osd-01-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0<br />osd-02-cluster: <del>rw-r--r-</del> 1 root root 4194304 Apr 6 09:18 /var/lib/ceph/osd-02-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0<br />osd-03-cluster: <del>rw-r--r-</del> 1 root root 4194304 Apr 6 09:18 /var/lib/ceph/osd-03-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0</p>
<p>du:</p>
<p>osd-01-cluster: 12 /var/lib/ceph/osd-01-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0<br />osd-02-cluster: 12 /var/lib/ceph/osd-02-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0<br />osd-03-cluster: 12 /var/lib/ceph/osd-03-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0</p>
<p>mon-01-cluster:~ # rbd diff test<br />Offset Length Type<br />8388608 4194304 data<br />16777216 4096 data<br />33554432 4194304 data<br />37748736 2048 data</p>
<p>And after growing it:</p>
<p>ls:</p>
<p>clush> find /var/lib/ceph/osd-*/current/0.*head/ <del>type f -name '*data*' -exec ls -l {} \+<br />osd-02-cluster: -rw-r--r-</del> 1 root root 4194304 Apr 6 09:18 /var/lib/ceph/osd-02-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0<br />osd-03-cluster: <del>rw-r--r-</del> 1 root root 4194304 Apr 6 09:18 /var/lib/ceph/osd-03-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0<br />osd-04-cluster: <del>rw-r--r-</del> 1 root root 4194304 Apr 6 09:25 /var/lib/ceph/osd-04-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0</p>
<p>du:</p>
<p>clush> find /var/lib/ceph/osd-*/current/0.*head/ -type f -name '*data*' -exec du -k {} \+<br />osd-02-cluster: 12 /var/lib/ceph/osd-02-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0<br />osd-03-cluster: 12 /var/lib/ceph/osd-03-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0<br />osd-04-cluster: 4100 /var/lib/ceph/osd-04-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0</p>
<p>Note that "rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0" grew from 12 to 4100KB when copied from other OSDs to osd-04.</p> Ceph - Backport #18793 (Resolved): kraken: Client message throttles are not changeable without re...https://tracker.ceph.com/issues/187932017-02-02T08:31:05ZPiotr Dalekpiotr.dalek@corp.ovh.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/13216">https://github.com/ceph/ceph/pull/13216</a></p> Ceph - Backport #18792 (Resolved): jewel: Client message throttles are not changeable without res...https://tracker.ceph.com/issues/187922017-02-02T08:30:01ZPiotr Dalekpiotr.dalek@corp.ovh.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/13214">https://github.com/ceph/ceph/pull/13214</a></p> Ceph - Bug #18791 (Resolved): Client message throttles are not changeable without restarthttps://tracker.ceph.com/issues/187912017-02-02T08:28:16ZPiotr Dalekpiotr.dalek@corp.ovh.com
<p>In cases when throttler limits the cluster performance, it would be desireable to have the ability to change its max values, even if temporarily. Currently it's impossible to do without restarting the OSDs, which in many cases is simply unacceptable.<br /><a class="external" href="https://github.com/ceph/ceph/pull/13213">https://github.com/ceph/ceph/pull/13213</a> fixes this.</p> Ceph - Backport #18582 (Resolved): Issue with upgrade from 0.94.9 to 10.2.5https://tracker.ceph.com/issues/185822017-01-18T11:25:02ZPiotr Dalekpiotr.dalek@corp.ovh.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/13131">https://github.com/ceph/ceph/pull/13131</a></p> teuthology - Bug #14335 (Won't Fix): openstack: 'NoneType' object has no attribute '__getitem__'https://tracker.ceph.com/issues/143352016-01-11T12:50:31ZPiotr Dalekpiotr.dalek@corp.ovh.com
<p>Some jobs fail with:</p>
<p>@2016-01-10T13:26:07.782 DEBUG:teuthology.misc::sh: neutron subnet-list -f json -c id -c ip_version<br />2016-01-10T13:26:08.525 DEBUG:teuthology.misc:[<br />2016-01-10T13:26:08.525 DEBUG:teuthology.misc:{<br />2016-01-10T13:26:08.526 DEBUG:teuthology.misc:"ip_version": 4,<br />2016-01-10T13:26:08.526 DEBUG:teuthology.misc:"id": "a92ff450-0b85-4f46-b98a-33d3cfe0dbe3" <br />2016-01-10T13:26:08.526 DEBUG:teuthology.misc:}<br />2016-01-10T13:26:08.559 DEBUG:teuthology.misc:]<br />2016-01-10T13:26:08.559 DEBUG:teuthology.misc::sh: neutron port-list -f json -c fixed_ips -c device_id<br />2016-01-10T13:26:33.841 DEBUG:teuthology.misc:[<br />2016-01-10T13:26:33.843 DEBUG:teuthology.misc:{<br />2016-01-10T13:26:33.843 DEBUG:teuthology.misc:"fixed_ips": "{\"subnet_id\": \"a92ff450-0b85-4f46-b98a-33d3cfe0dbe3\", \"ip_address\": \"149.202.167.162\"}",<br />2016-01-10T13:26:33.844 DEBUG:teuthology.misc:"device_id": "1ff96af2-4a33-4d21-9aab-4cc580dc9037" <br />2016-01-10T13:26:33.844 DEBUG:teuthology.misc:},<br />2016-01-10T13:26:33.844 DEBUG:teuthology.misc:{<br />2016-01-10T13:26:33.844 DEBUG:teuthology.misc:"fixed_ips": "{\"subnet_id\": \"a92ff450-0b85-4f46-b98a-33d3cfe0dbe3\", \"ip_address\": \"149.202.168.132\"}",<br />2016-01-10T13:26:33.844 DEBUG:teuthology.misc:"device_id": "63b811b5-69c2-4d2b-a9f6-bfff5b81fa49" <br />2016-01-10T13:26:33.845 DEBUG:teuthology.misc:},<br />2016-01-10T13:26:33.845 DEBUG:teuthology.misc:{<br />2016-01-10T13:26:33.845 DEBUG:teuthology.misc:"fixed_ips": "{\"subnet_id\": \"a92ff450-0b85-4f46-b98a-33d3cfe0dbe3\", \"ip_address\": \"149.202.169.159\"}",<br />2016-01-10T13:26:33.845 DEBUG:teuthology.misc:"device_id": "eec60001-87c9-4f80-a542-7967d5e288b0" <br />2016-01-10T13:26:33.845 DEBUG:teuthology.misc:},<br />2016-01-10T13:26:33.845 DEBUG:teuthology.misc:{<br />2016-01-10T13:26:33.845 DEBUG:teuthology.misc:"fixed_ips": "{\"subnet_id\": \"a92ff450-0b85-4f46-b98a-33d3cfe0dbe3\", \"ip_address\": \"149.202.167.15\"}",<br />2016-01-10T13:26:33.846 DEBUG:teuthology.misc:"device_id": "68e5bffe-735a-465f-aab3-fe9d054ef137" <br />2016-01-10T13:26:33.846 DEBUG:teuthology.misc:},<br />2016-01-10T13:26:33.850 DEBUG:teuthology.misc:{<br />2016-01-10T13:26:33.850 DEBUG:teuthology.misc:"fixed_ips": "{\"subnet_id\": \"a92ff450-0b85-4f46-b98a-33d3cfe0dbe3\", \"ip_address\": \"149.202.169.158\"}",<br />2016-01-10T13:26:33.850 DEBUG:teuthology.misc:"device_id": "c35ab4ad-b743-40ea-83d3-45a7cd653987" <br />2016-01-10T13:26:33.850 DEBUG:teuthology.misc:}<br />2016-01-10T13:26:33.877 DEBUG:teuthology.misc:]<br />2016-01-10T13:26:33.878 DEBUG:teuthology.openstack:ignoring get_ip_neutron exception 'NoneType' object has no attribute '__getitem__'<br />2016-01-10T13:26:33.878 ERROR:teuthology.run_tasks:Saw exception from tasks.<br />Traceback (most recent call last):<br /> File "/home/ubuntu/teuthology/teuthology/run_tasks.py", line 53, in run_tasks<br /> manager = run_one_task(taskname, ctx=ctx, config=config)<br /> File "/home/ubuntu/teuthology/teuthology/run_tasks.py", line 41, in run_one_task<br /> return fn(**kwargs)<br /> File "/home/ubuntu/src/ceph-qa-suite_master/tasks/buildpackages.py", line 189, in task<br /> teuth_config.gitbuilder_host = openstack.get_ip('packages-repository', '')<br /> File "/home/ubuntu/teuthology/teuthology/openstack/__init__.py", line 429, in get_ip<br /> return OpenStackInstance(instance_id).get_ip(network)<br /> File "/home/ubuntu/teuthology/teuthology/openstack/__init__.py", line 142, in get_ip<br /> self.get_addresses())[0]<br /> File "/home/ubuntu/teuthology/teuthology/openstack/__init__.py", line 99, in get_addresses<br /> action="get ip " + self['id']) as proceed:<br /> File "/home/ubuntu/teuthology/teuthology/openstack/__init__.py", line 75, in <i>getitem</i><br /> return self.info[name.lower()]<br />TypeError: 'NoneType' object has no attribute '__getitem__'<br />2016-01-10T13:26:33.880 DEBUG:teuthology.run_tasks:Exception was not quenched, exiting: TypeError: 'NoneType' object has no attribute '__getitem__'<br />2016-01-10T13:26:33.883 INFO:teuthology.run:Summary data:
{description: 'rados/thrash/{hobj-sort.yaml 0-size-min-size-overrides/3-size-2-min-size.yaml<br /> 1-pg-log-overrides/short_pg_log.yaml clusters/{fixed-2.yaml openstack.yaml} fs/xfs.yaml<br /> msgr/simple.yaml msgr-failures/fastclose.yaml thrashers/pggrow.yaml workloads/cache-snaps.yaml}',<br /> failure_reason: '''NoneType'' object has no attribute ''__getitem__''', owner: scheduled_ubuntu@teuthology,<br /> status: fail, success: false}<br />@</p>
<p>This <strong>probably</strong> could be fixed by having a guard ion <a class="external" href="https://github.com/ceph/teuthology/blob/master/teuthology/openstack/__init__.py#L66">https://github.com/ceph/teuthology/blob/master/teuthology/openstack/__init__.py#L66</a>.</p> rbd - Bug #14059 (Resolved): Can't build Ceph with --with-rbd (Cython)https://tracker.ceph.com/issues/140592015-12-11T12:50:34ZPiotr Dalekpiotr.dalek@corp.ovh.com
<p>When using Cython version 0.14.1 and</p>
<p><code>CXXFLAGS="-g -ggdb" ./configure \ <br /> --without-fuse --prefix=/usr --localstatedir=/var --sysconfdir=/etc \<br /> --with-nss --without-cryptopp --with-rest-bench --without-lttng \<br /> --disable-gitversion --without-man-page --disable-cephfs-java \<br /> --without-libzs --without-babeltrace --with-debug</code></p>
<p>build fails:</p>
<p><code><br />building 'rbd' extension <br />gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mt<br />une=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-<br />size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -iquote /home/dalekp/ceph-bp-smallpglog/ceph/src/include -Wall -Wtype-limits -Wignored-<br />qualifiers -Winit-self -Wpointer-arith -fno-strict-aliasing -fsigned-char -rdynamic -O2 -g -pipe -Wall -Wp,-U_FORTIFY_SOURCE -Wp,-D_FORTIFY_SO<br />URCE=2 -fexceptions --param=ssp-buffer-size=4 -fPIE -fstack-protector -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builti<br />n-free -g -O2 -iquote /home/dalekp/ceph-bp-smallpglog/ceph/src/include -D__CEPH__ -D_FILE_OFFSET_BITS=64 -D_THREAD_SAFE -D__STDC_FORMAT_MACROS<br /> -D_GNU_SOURCE -DCEPH_LIBDIR=/usr/lib -DCEPH_PKGLIBDIR=/usr/lib/ceph -DGTEST_USE_OWN_TR1_TUPLE=0 -D_REENTRANT -fPIC -I/usr/include/python2.6 -<br />c rbd.c -o /home/dalekp/ceph-bp-smallpglog/ceph/src/build/temp.linux-x86_64-2.6/rbd.o <br />gcc: rbd.c: No such file or directory <br />gcc: no input files <br />error: command 'gcc' failed with exit status 1 <br /></code></p>
<p>Possibly introduced with <a class="external" href="https://github.com/ceph/ceph/pull/6768">https://github.com/ceph/ceph/pull/6768</a>.</p>
<p>If Cython 0.14.1 is deliberately not supported, it should be stated so during "configure" step (Cython presence is already checked).</p> rbd - Bug #14058 (Resolved): Can't build Ceph with --without-rbd and --with-debughttps://tracker.ceph.com/issues/140582015-12-11T12:47:13ZPiotr Dalekpiotr.dalek@corp.ovh.com
<p>When using</p>
<p><code>CXXFLAGS="-g -ggdb" ./configure \ <br /> --without-fuse --prefix=/usr --localstatedir=/var --sysconfdir=/etc \<br /> --with-nss --without-cryptopp --with-rest-bench --without-lttng \<br /> --disable-gitversion --without-man-page --disable-cephfs-java \<br /> --without-libzs --without-babeltrace --without-rbd --with-debug</code></p>
<p>build fails:</p>
<p><code><br />test/test_get_blkdev_size.o: In function `main': <br />/home/dalekp/ceph-bp-smallpglog/ceph/src/test/test_get_blkdev_size.cc:27: undefined reference to `get_block_device_size(int, long*)' <br />collect2: error: ld returned 1 exit status <br />make[3]: *** [ceph_test_get_blkdev_size] Error 1 <br />make[3]: *** Waiting for unfinished jobs.... <br /></code></p> Ceph - Cleanup #13981 (Resolved): configure script doesn't check bzip2-devel presencehttps://tracker.ceph.com/issues/139812015-12-04T11:58:46ZPiotr Dalekpiotr.dalek@corp.ovh.com
<p>After cloning git master (state as for 4 Dec 2015), it is not possible to build Ceph from sources when bzip2-devel (or equivalent) package is not installed. This alone is not a problem, only the fact that this requirement is manifested during link-time:</p>
<p>/usr/bin/ld: cannot find -lbz2<br />collect2: error: ld returned 1 exit status</p>
<p>Bzip2 is required by RocksDB.</p> Ceph - Bug #13962 (Resolved): "shard missing" errors in logs during Teuthology runhttps://tracker.ceph.com/issues/139622015-12-02T18:35:38ZPiotr Dalekpiotr.dalek@corp.ovh.com
<p>In logs we can find:</p>
<p><code>"2015-11-28 14:28:00.302371 osd.3 149.202.179.212:6804/9517 48 : cluster [ERR] 1.16 shard 2 missing 1/638ed856/target179214.teuthology10504-303/head" in cluster log</code></p>
<p>Saw on <a class="external" href="http://149.202.162.14:8081/ubuntu-2015-11-27_20:07:13-rados:thrash-bp-delayed-pglog-index-v2---basic-openstack/13/">http://149.202.162.14:8081/ubuntu-2015-11-27_20:07:13-rados:thrash-bp-delayed-pglog-index-v2---basic-openstack/13/</a> and <a class="external" href="http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2015-11-29_21:00:02-rados-infernalis-distro-basic-openstack/22514/">http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2015-11-29_21:00:02-rados-infernalis-distro-basic-openstack/22514/</a> (both logs attached).</p>