Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2023-11-02T23:45:33ZCeph
Redmine Ceph - Bug #63425 (Pending Backport): tasks.cephadm: ceph.log No such file or directoryhttps://tracker.ceph.com/issues/634252023-11-02T23:45:33ZDan van der Ster
<p>cephadm tasks don't have a cluster log to egrep for ERR|WRN|SEC, e.g:</p>
<p><a class="external" href="http://qa-proxy.ceph.com/teuthology/teuthology-2023-10-27_14:23:02-upgrade:pacific-x-quincy-distro-default-smithi/7438907/teuthology.log">http://qa-proxy.ceph.com/teuthology/teuthology-2023-10-27_14:23:02-upgrade:pacific-x-quincy-distro-default-smithi/7438907/teuthology.log</a><br /><pre>
2023-10-27T16:06:59.111 DEBUG:teuthology.orchestra.run.smithi150:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/38cc7fce-74d9-11ee-8db9-212e2dc638e7/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-10-27T16:06:59.141 INFO:teuthology.orchestra.run.smithi150.stderr:grep: /var/log/ceph/38cc7fce-74d9-11ee-8db9-212e2dc638e7/ceph.log: No such file or directory
</pre></p>
<p><a class="external" href="https://pulpito.ceph.com/teuthology-2023-10-28_14:23:03-upgrade:quincy-x-reef-distro-default-smithi/7439369/">https://pulpito.ceph.com/teuthology-2023-10-28_14:23:03-upgrade:quincy-x-reef-distro-default-smithi/7439369/</a><br /><pre>
2023-10-28T15:59:53.486 DEBUG:teuthology.orchestra.run.smithi007:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/10bc8c0a-75a0-11ee-8db9-212e2dc638e7/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-10-28T15:59:53.517 INFO:teuthology.orchestra.run.smithi007.stderr:grep: /var/log/ceph/10bc8c0a-75a0-11ee-8db9-212e2dc638e7/ceph.log: No such file or directory
</pre></p>
<p>mon_cluster_log_to_file is not true for the cephadm task.</p> Ceph - Bug #54432 (Fix Under Review): it is unclear to disable mon_osd_down_out_subtree_limit fun...https://tracker.ceph.com/issues/544322022-03-01T07:20:27ZYuma Ogami
<p>I'd like to make any DOWN OSDs OUT regardless of the status of other OSDs. In other words, I want to disable the effect of mon_osd_down_out_subtree_limit parameter.</p>
<p>I found that setting the following two values to that parameter seems to work well.<br />A. Specify "root". It means that all OSDs are prevent to be out if all OSDs get down. So it practically accomplish my purpose.<br />B. Specify "osd" (*1). However, from the official document (*2), it looks like all OSDs shouldn't be OUT in this case.</p>
<p>Please let me know what is the proper way to achieve my goal.</p>
<p>(*1) <a class="external" href="https://github.com/ceph/ceph/blob/5cdf8929e9f857a53820c4690ccfe30288b6ca91/src/mon/OSDMonitor.cc#L5189">https://github.com/ceph/ceph/blob/5cdf8929e9f857a53820c4690ccfe30288b6ca91/src/mon/OSDMonitor.cc#L5189</a><br />(*2) <a class="external" href="https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/#confval-mon_osd_down_out_subtree_limit">https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/#confval-mon_osd_down_out_subtree_limit</a></p> Ceph - Bug #54385 (Fix Under Review): better test mon and osd smart commandhttps://tracker.ceph.com/issues/543852022-02-23T14:50:03ZDan van der Ster
<p>It appears the daemon smart command from mon and osd is not directly tested.</p> bluestore - Support #54315 (New): 1 fsck error per osd during nautilus -> octopus upgrade (S3 clu...https://tracker.ceph.com/issues/543152022-02-17T17:23:37ZDan van der Ster
<p>At the end of the conversion to per-pool omap, around half of our OSDs had 1 error, but the log didn't show the error.</p>
<pre>
2022-02-15T16:02:16.554+0100 7fdfde8d8f00 0 bluestore(/var/lib/ceph/osd/ceph-1247) _fsck_check_objects partial offload, done myself 7925084 of 7942492objects, threads 2
2022-02-15T16:02:16.678+0100 7fdfde8d8f00 1 bluestore(/var/lib/ceph/osd/ceph-1247) _fsck_on_open checking shared_blobs
2022-02-15T16:02:16.693+0100 7fdfde8d8f00 1 bluestore(/var/lib/ceph/osd/ceph-1247) _fsck_on_open checking pool_statfs
2022-02-15T16:17:37.407+0100 7fdfde8d8f00 1 bluestore(/var/lib/ceph/osd/ceph-1247) _fsck_on_open <<<FINISH>>> with 1 errors, 318 warnings, 319 repaired, 0 remaining in 1672.130946 seconds
</pre>
<p>Full log is posted: ceph-post-file: 82f661a7-b10f-4a80-acaf-37f1268f275e</p> Ceph - Bug #54313 (Fix Under Review): device health scraping trigger monitor electionshttps://tracker.ceph.com/issues/543132022-02-17T12:14:49ZRuben Kerkhof
<p>One of our customers has 5 monitors, and each night one or more of them are shortly marked down around the same time.<br />After a bit of digging through the logs I noticed that each time this happens, I see this:</p>
<p>mon.mon0.log:2022-02-17T04:23:58.487+0100 7f51ed252700 0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd='smart' args=[json]: dispatch<br />mon.mon0.log:2022-02-17T04:24:24.365+0100 7f51ed252700 0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd=smart args=[json]: finished<br />mon.mon0.log-2022-02-17T04:24:24.375+0100 7f51e59c4700 0 log_channel(cluster) log [INF] : mon.mon0 calling monitor election</p>
<p>I can reproduce this locally on Ceph master on a vagrant-libvirt cluster by running ceph device scrape-daemon-health-metrics mon.mon0.</p>