Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2019-09-20T16:35:28ZCeph
Redmine RADOS - Bug #41943 (Closed): ceph-mgr fails to report OSD status correctlyhttps://tracker.ceph.com/issues/419432019-09-20T16:35:28ZBrian Andrusbrian.andrus@inktank.com
<p>After an inexplicable cluster event that resulted in around 10% of our OSDs falsely reported down (and shortly after back up), we had an OSD that was seemingly not functioning correctly, but health was reporting HEALTH_OK, which greatly prolonged the outage.</p>
<p>For some time after the storage blip we were experiencing strange inconsistent behavior in a portion of our libvirt guests wherein some would boot okay, but then go into 100% IOWAIT and crash shortly after. Others had random issues preventing the boot process from completing. Some could be remedied by copying rbd image data to new images, but those copies were often held up indefinitely at some point in the copy process with no failure. Once/if a copy completed successfully, the VM would usually boot successfully.</p>
<p>One of our engineers restarted a ceph-mgr for an unrelated reason, and we then had HEALTH_WARN with 119 PGs reporting inactive with no other information (Bug <a class="issue tracker-1 status-1 priority-4 priority-default" title="Bug: ceph Status shows only WARN when traffic to cluster fails (New)" href="https://tracker.ceph.com/issues/23049">#23049</a> a partial match to this issue). A `ceph health detail` showed 119 PGs inactive <em>since the mgr restart</em> and each of those PGs had no OSDs listed in the OSD list. A `ceph pg map` quickly showed that they all had the same OSD as their primary and after kicking the OSD the cluster was restored to full functionality and any remaining VMs having issues were immediately unblocked. RBD image copies from then on completed without hanging.</p>
<p>In my approximation, it would seem that when write requests were sent to the down PGs/misbehaving OSD, they would hang but not report in ceph health status at all. The cluster did not seem to recognize or report or log any blocked requests.</p>
<p>Other info:</p>
<p>Recently Luminous-upgraded from Jewel, 25600 PGs, 3x replication, 225 OSDs (224 up, 222 in)<br />I am aware the PG count is approx double what should be present in the cluster.<br />Approx 1 hour prior to the as of now cause unknown cluster outage, the machine hosting the OSD that was rebooted experienced a kernel oops for an unknown reason.</p> rbd - Feature #19123 (New): rbd/rados drivers in PyPI repohttps://tracker.ceph.com/issues/191232017-03-01T19:15:27ZBrian Andrusbrian.andrus@inktank.com
<p>We have a need to install modules wholly contained within python virtualenv. As of now, we're extracting compiled drivers from Ceph packages in order to fulfill our requirements. Previously it was possible to point to python modules directly in Ceph source, but those have been removed in favor of the compiled drivers.</p>
<p>It would be nice to be able to install python rados/rbd modules via PyPI to easily support packages installed in VENV.</p> Ceph - Feature #18851 (New): Ability to add comments in certain views of Ceph daemons or statushttps://tracker.ceph.com/issues/188512017-02-07T22:13:31ZBrian Andrusbrian.andrus@inktank.com
<p>It would be nice to maintain a record of why an OSD is down, or why a flag has been set within the Ceph Cluster so that any operator can differentiate at any time what might need immediate attention vs. longer-term maintenance items such as why an OSD is down.</p>
<p>For example, a "ceph comment" command that can be easily searched by daemon/status</p>
<p>Use-case <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a>:<br />Cluster has 10 OSDs down over a period of many months. As a cluster operator, it would be nice for me or the datacenter technicians to be able to know why a particular OSD is down. If the OSD has not been commented on, it should be researched as to why it is down.</p>
<p>Use-case <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: BUG at fs/ceph/caps.c:2178 (Closed)" href="https://tracker.ceph.com/issues/2">#2</a>: If I request my datacenter technicians to replace bad hard drives, it would be nice to search for specific terms or states (replaceme) while doing a monthly hardware sweep.</p>
<p>Use-case <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: leaked dentry ref on umount (Closed)" href="https://tracker.ceph.com/issues/3">#3</a>:<br />Cluster is in no-out, and I would like to know why and/or who to blame.</p>
<p>This would be good information to have in dashboards as well as accessible by anyone who has necessary keys. The comment author could automatically be populated by ceph key, or could be manually entered if shared keys are used.</p> Ceph - Bug #18762 (Closed): OSDs stuck using previous version Upstart script until all have been ...https://tracker.ceph.com/issues/187622017-02-01T18:47:02ZBrian Andrusbrian.andrus@inktank.com
<p>After a Hammer -> Jewel upgrade, OSD daemons will not start. The upstart logs show:</p>
<p><code>/proc/self/fd/9: 8: /proc/self/fd/9: /usr/libexec/ceph/ceph-osd-prestart.sh: not found<br /></code></p>
<p>The correct Jewel init script seems to be in place:</p>
<p><code>$ sudo grep prestart /etc/init/ceph-osd.conf<br /> /usr/lib/ceph/ceph-osd-prestart.sh --cluster="${cluster:-ceph}" -i "$id"</code></p>
<p>However it would seem that open FDs are ensuring the Hammer-era init script is used which references the incorrect location for ceph-osd-prestart.sh. This forces us to upgrade a node at a time which is more destructive than we'd like to be.</p>
<p>It should be noted that changes like this affect upgrade plans, and perhaps noted in documentation that single OSD rolling restarts are not possible if upstart changes are pushed.</p> rbd - Bug #18436 (Resolved): Qemu crash triggered by network issueshttps://tracker.ceph.com/issues/184362017-01-06T00:14:41ZBrian Andrusbrian.andrus@inktank.com
<p>Noticed in the form of I/O errors and disk corruption at the operating system level. Further investigation uncovered this in the qemu log:</p>
<p><code>osdc/ObjectCacher.cc: In function 'void ObjectCacher::trim()' thread 7fdb28f8e700 time 2017-01-05 21:56:28.221215<br />osdc/ObjectCacher.cc: 1001: FAILED assert(bh->is_clean() || bh->is_zero())<br /> ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)<br /> 1: (()+0x16c3db) [0x7fdfe86513db]<br /> 2: (()+0x3a2836) [0x7fdfe8887836]<br /> 3: (()+0x3b0c7b) [0x7fdfe8895c7b]<br /> 4: (()+0x3b7c70) [0x7fdfe889cc70]<br /> 5: (()+0x4c7c9) [0x7fdfe85317c9]<br /> 6: (()+0x3b4bd4) [0x7fdfe8899bd4]<br /> 7: (()+0x3aa12d) [0x7fdfe888f12d]<br /> 8: (()+0x3b6ee1) [0x7fdfe889bee1]<br /> 9: (()+0x4c7c9) [0x7fdfe85317c9]<br /> 10: (()+0xa2105) [0x7fdfe8587105]<br /> 11: (()+0x4c7c9) [0x7fdfe85317c9]<br /> 12: (()+0x953ad) [0x7fdfeab623ad]<br /> 13: (()+0x70469) [0x7fdfeab3d469]<br /> 14: (()+0x134928) [0x7fdfeac01928]<br /> 15: (()+0x8184) [0x7fdfe43aa184]<br /> 16: (clone()+0x6d) [0x7fdfe40d737d]<br /> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.<br />terminate called after throwing an instance of 'ceph::FailedAssertion'<br />2017-01-05 21:56:28.636+0000: shutting down<br /></code></p>
<p>At the time of issues, we were experiencing some network infrastructure issues that were causing intermittent packet loss.</p> Ceph - Documentation #17701 (Resolved): osd_max_backfills default has changed, documentation shou...https://tracker.ceph.com/issues/177012016-10-25T20:54:13ZBrian Andrusbrian.andrus@inktank.com
<p>Jewel default is 1 for osd_max_backfills:</p>
<p><a class="external" href="https://github.com/ceph/ceph/blob/jewel/src/common/config_opts.h#L569">https://github.com/ceph/ceph/blob/jewel/src/common/config_opts.h#L569</a></p>
<p>Yet the documentation states otherwise:</p>
<p><a class="external" href="http://docs.ceph.com/docs/jewel/rados/configuration/osd-config-ref/#backfilling">http://docs.ceph.com/docs/jewel/rados/configuration/osd-config-ref/#backfilling</a></p> Ceph - Feature #16462 (New): Include more detail in perf dumphttps://tracker.ceph.com/issues/164622016-06-23T20:05:44ZBrian Andrusbrian.andrus@inktank.com
<p>More detailed statistic (over avgcount) are desired, Percentage of distribution a la fio would be on the wish list.</p> Ceph-deploy - Bug #16451 (Closed): Using ceph-deploy with --zap-disk and --dmcrypt failshttps://tracker.ceph.com/issues/164512016-06-23T14:30:42ZBrian Andrusbrian.andrus@inktank.com
<p>Description of problem:<br />When using ceph-deploy with the --zap-disk and --dmcrypt option, ceph-deploy seems to call the zap function in ceph-disk without unmounting the disk from the osd-lockbox first. The sgdisk times out (5 timeouts of 60 seconds each) and the osd creation fails.</p>
<p>Version-Release number of selected component (if applicable):<br />tested:<br />ceph-deploy: 1.5.24, 1.5.30, 1.5.34<br />ceph-disk: v10.2.0, v10.2.1, v10.2.2</p>
<p>How reproducible:<br />100%</p>
<p>Steps to Reproduce:<br />1. ceph-deploy osd create --zap-disk --dmcrypt host:sd{a..b}<br />2.<br />3.</p>
<p>Actual results:<br />The OSD creation times out while waiting on udevadm. Note the osd-lockbox does not get unmounted which may or may not be by design. Also that sgdisk zap is run against the drive while the partition is mounted (which fails).</p>
<p>Expected results:<br />The OSD creation should succeed.</p>
<p>Additional info:<br /><pre>
[redacted-host][WARNIN] populate: Mounting lockbox mount -t ext4 /dev/sda3 /var/lib/ceph/osd-lockbox/redacted
[redacted-host][WARNIN] command_check_call: Running command: /bin/mount -t ext4 /dev/sda3 /var/lib/ceph/osd-lockbox/redacted
[redacted-host][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd-lockbox/redacted/osd-uuid.3089.tmp
[redacted-host][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size
[redacted-host][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_type
[redacted-host][WARNIN] command_check_call: Running command: /usr/bin/ceph config-key put dm-crypt/osd/redacted/luks iE0N25NKgkvOlZxfnN9IEJBfBwO6HCcM0oZeIuowpgFuxsn/yLxz8hDmXZzesQY3MKI1wPWkyzETpV+dw0yBECX/TbAldHqTxYj/W+d6zbKkVe61TABZfIYxjdS+KFu80QaFGlHqBnY5Gj3rXalHE/qquS81XUvsXfafAFTqY8E=
[redacted-host][WARNIN] value stored
[redacted-host][WARNIN] command: Running command: /usr/bin/ceph auth get-or-create client.osd-lockbox.redacted mon allow command "config-key get" with key="dm-crypt/osd/redacted/luks"
[redacted-host][WARNIN] create_key: stderr
[redacted-host][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd-lockbox/redacted/key-management-mode.3089.tmp
[redacted-host][WARNIN] adjust_symlink: Creating symlink /var/lib/ceph/osd-lockbox/8dc95d04-65a7-4dee-97d4-6b5ff1117f0d -> /var/lib/ceph/osd-lockbox/redacted
[redacted-host][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd-lockbox/redacted/journal-uuid.3089.tmp
[redacted-host][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd-lockbox/redacted/magic.3089.tmp
[redacted-host][WARNIN] command_check_call: Running command: /sbin/sgdisk --typecode=3:fb3aabf9-d25f-47cc-bf5e-721d1816496b -- /dev/sda
[redacted-host][DEBUG ] Warning: The kernel is still using the old partition table.
[redacted-host][DEBUG ] The new table will be used at the next reboot.
[redacted-host][DEBUG ] The operation has completed successfully.
[redacted-host][WARNIN] get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
[redacted-host][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
[redacted-host][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[redacted-host][WARNIN] get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
[redacted-host][WARNIN] zap: Zapping partition table on /dev/sda
[redacted-host][WARNIN] command_check_call: Running command: /sbin/sgdisk --zap-all -- /dev/sda
[redacted-host][WARNIN] Caution: invalid backup GPT header, but valid main header; regenerating
[redacted-host][WARNIN] backup header from main header.
[redacted-host][WARNIN]
[redacted-host][WARNIN] Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
[redacted-host][WARNIN] on the recovery & transformation menu to examine the two tables.
[redacted-host][WARNIN]
[redacted-host][WARNIN] Warning! One or more CRCs don't match. You should repair the disk!
[redacted-host][WARNIN]
[redacted-host][DEBUG ] ****************************************************************************
[redacted-host][DEBUG ] Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
[redacted-host][DEBUG ] verification and recovery are STRONGLY recommended.
[redacted-host][DEBUG ] ****************************************************************************
[redacted-host][DEBUG ] Warning: The kernel is still using the old partition table.
redacted-host][DEBUG ] The new table will be used at the next reboot.
[redacted-host][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or
[redacted-host][DEBUG ] other utilities.
[redacted-host][WARNIN] command_check_call: Running command: /sbin/sgdisk --clear --mbrtogpt -- /dev/sda
[redacted-host][DEBUG ] Creating new GPT entries.
[redacted-host][DEBUG ] Warning: The kernel is still using the old partition table.
[redacted-host][DEBUG ] The new table will be used at the next reboot.
[redacted-host][DEBUG ] The operation has completed successfully.
[redacted-host][WARNIN] update_partition: Calling partprobe on zapped device /dev/sda
[redacted-host][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[redacted-host][WARNIN] command: Running command: /sbin/partprobe /dev/sda
[redacted-host][WARNIN] update_partition: partprobe /dev/sda failed : Error: Partition(s) 3 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
[redacted-host][WARNIN] (ignored, waiting 60s)
[redacted-host][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[redacted-host][WARNIN] command: Running command: /sbin/partprobe /dev/sda
[redacted-host][WARNIN] update_partition: partprobe /dev/sda failed : Error: Partition(s) 3 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
[redacted-host][WARNIN] (ignored, waiting 60s)
[redacted-host][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[redacted-host][WARNIN] command: Running command: /sbin/partprobe /dev/sda
[redacted-host][WARNIN] update_partition: partprobe /dev/sda failed : Error: Partition(s) 3 on /dev/sda have been written, but we have been unable to inform the kernel of he change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
[redacted-host][WARNIN] (ignored, waiting 60s)
Eventually times out.
</pre></p> CephFS - Support #11923 (Resolved): MDS init script starts multiple instances when MDS is referen...https://tracker.ceph.com/issues/119232015-06-08T21:55:53ZBrian Andrusbrian.andrus@inktank.com
<p>MDS component init script does not seem to be able to properly differentiate between auto-detected instances and instances stated explicitly in the ceph.conf.</p>
<p>If an MDS is defined in ceph.conf, it is seemingly started twice.</p>
<pre>
root 4042 0.0 0.0 115212 1472 ? Ss 13:46 0:00 /usr/bin/bash -c ulimit -n 32768; /usr/bin/ceph-mds -i <NAME> --pid-file /var/run/ceph/mds.<NAME>.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root 4047 0.0 0.0 159016 6928 ? Sl 13:46 0:00 /usr/bin/ceph-mds -i <NAME> --pid-file /var/run/ceph/mds.<NAME>.pid -c /etc/ceph/ceph.conf --cluster ceph -f
</pre>
<p>If the statement is removed from ceph.conf, the daemon starts as expected (with only one process).</p>
<p>As with OSD and MON daemons, the init script should not start more than one process, even if the MDS is explicitly stated in ceph.conf (for convenient stop/start and other administrative purposes).</p> rgw - Bug #10099 (Duplicate): radosgw-agent - error geting op state: list index out of rangehttps://tracker.ceph.com/issues/100992014-11-13T12:25:23ZBrian Andrusbrian.andrus@inktank.com
<p>radosgw-agent logs the following, and objects are not synced to the secondary gateway.</p>
<p>INFO:urllib3.connectionpool:Starting new HTTP connection (1): redacted.com<br />DEBUG:urllib3.connectionpool:"GET /admin/opstate?client-id=radosgw-agent&object=Sample%2F%3CKey%3A+Sample_Folder_1%2FF%3A%2FShared%2FSAMPLE+DOCS%2FSAMPLE+SAMPLE+%26+SAMPLE%2FSAMPLE+series%2FSAMPLE_SPEC.pdf%3E&op-id=redacted%3A17649%3A37606 HTTP/1.1" 200 None<br />DEBUG:radosgw_agent.worker:op state is []<br />DEBUG:radosgw_agent.worker:error geting op state: list index out of range<br />Traceback (most recent call last):<br /> File "/usr/lib/python2.7/site-packages/radosgw_agent/worker.py", line 220, in wait_for_object<br /> state = state<sup><a href="#fn0">0</a></sup>['state']<br />IndexError: list index out of range<br />DEBUG:boto:StringToSign:<br />GET</p>
<p>Thu, 13 Nov 2014 19:12:16 GMT<br />/admin/opstate<br />DEBUG:boto:Signature:<br />AWS ZBR8UZ8FV1RMQDLSAGDH:0yaxu0zutgVnq1STiSOEMYnXuyM=<br />DEBUG:boto:url = 'http://redacted.com/admin/opstate'<br />params={'client-id': 'radosgw-agent', 'object': u'Backups/<Key: Backups,Sample_Folder/F:/Shared/SAMPLE DOCS/SAMPLE SAMPLE & SAMPLE/SAMPLE series/SAMPLE_SPEC.pdf>', 'op-id': 'redacted:17649:37606'}<br />headers={'Date': 'Thu, 13 Nov 2014 19:12:16 GMT', 'Content-Length': 0, 'Authorization': u'AWS ZBR8UZ8FV1RMQDLSAGDH:0yaxu0zutgVnq1STiSOEMYnXuyM=', 'User-Agent': 'Boto/2.32.1 Python/2.7.5 Linux/3.10.0-123.6.3.el7.x86_64'}<br />data=None</p> Ceph - Feature #9031 (Resolved): List RADOS namespaces and list all objects in all namespaceshttps://tracker.ceph.com/issues/90312014-08-06T14:30:39ZBrian Andrusbrian.andrus@inktank.com
<p>We can currently create namespaces, but cannot easily view those that have been created. A method of listing namespaces with the rados utility is desired.</p> rgw - Bug #9002 (Duplicate): Creating swift key with --gen-secret in separate step from subuser c...https://tracker.ceph.com/issues/90022014-08-04T11:20:10ZBrian Andrusbrian.andrus@inktank.com
<p>Customer reported on CentOS with Ceph v0.80.4</p>
<p>Steps to reproduce:<br />radosgw-admin user create --uid=testuser1 --display-name="Test User One" <br />radosgw-admin subuser create --uid=testuser1 --subuser=testuser1:swift --access=full<br />radosgw-admin key create --uid=testuser1 --subuser=testuser1:swift --key-type=swift --gen-secret</p>
<p>Result:<br />could not create key: unable to add access key, unable to store user info<br />WARNING: can't store user info, swift id () already mapped to another user (testuser1)</p>
<p>Customer discovered workaround:<br />radosgw-admin user create --subuser=testuser1:swift --display-name="Test User One" --key-type=swift --access=full</p> rgw - Bug #9001 (Won't Fix): Starting gateway with radosgw init script fails to create sockethttps://tracker.ceph.com/issues/90012014-08-04T11:00:00ZBrian Andrusbrian.andrus@inktank.com
<p>Ceph Version: v0.80.4<br />Distro: CentOS</p>
<p>Customer reported, unable to reproduce.</p>
<p>/var/run/ceph directory owned by apache:apache<br />chmod 644 /var/run/ceph (even temporarily elevated permissions do not allow creation)</p>
<p>The gateway will start, but no socket has been created. Ceph.conf socket settings match that in the radosgw.conf http conf file. Starting the gateway manually as root will allow socket creation.</p> Ceph - Bug #8598 (Can't reproduce): Monitor performance during intensive operationshttps://tracker.ceph.com/issues/85982014-06-13T22:03:24ZBrian Andrusbrian.andrus@inktank.com
<p>Problem: During times of intensive operations such as deep scrubbing, monitors are observed coming in and out of quorum frequently, seemingly spurring frequent reelections.</p>
<p>Symptoms: Elections can compound other issues, such as OSDs timing out while starting up or being wrongly marked down. The monitor process is often observed at 100% CPU utilization during these times.</p>
<p>This is a tracking issue, and more information will be added as we gather it.</p> RADOS - Feature #8594 (New): Monitor multi-thread awarenesshttps://tracker.ceph.com/issues/85942014-06-13T17:00:10ZBrian Andrusbrian.andrus@inktank.com
<p><del>First noticed by a support customer in firefly while starting OSDs, creating this as a tracking issue initially until more information can be gathered.</del></p>
<p>Revised to a feature request per the information provided in Greg's update.</p>
<p>Especially with future growth of Ceph, monitors may be able to make good use of multiple cores in some scenarios. Currently, a user is observing mon processes utilizing 100% CPU on one core during intensive deep scrub operations.</p>