Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2024-03-11T19:07:31ZCeph
Redmine Ceph - Feature #64845 (New): Support read_from_replica everywherehttps://tracker.ceph.com/issues/648452024-03-11T19:07:31ZGreg Farnumgfarnum@redhat.com
<p>RADOS supports reads from replicas now, and has done so for a while. It is not on by default and requires setting a flag in the Objecter/librados, because reading from random OSDs is less cache-efficient. But it is desirable in stretch cluster scenarios, which Ceph is seeing more of as time goes by.</p>
<p>Right now, only RBD supports read-from-replica, because the initial implementation did not properly order reads with in-progress writes on the OSD side, so it was only viable for snapshots. Happily, that changed several releases ago. So, we should extend this option to every component! This ticket will serve as an umbrella, and perhaps holds the actual work.</p>
<p>I see two obvious approaches to resolving this:<br />1) Every component adds a config analogous to rbd_read_from_replica_policy. (I see CephFS has a libcephfs::ceph_localize_reads() function, but no obvious way to set it — weird?)<br />2) We make a RADOS config option that can be set by any random Ceph client.</p>
<p>The second is obviously broadly applicable and a shorter config, but leaves out the possibility of components applying appropriate system-specific checks or tweaks. The first is a little more work?</p> bluestore - Bug #47453 (New): checksum failures lead to assert on OSD shutdown in lab testshttps://tracker.ceph.com/issues/474532020-09-15T02:33:27ZGreg Farnumgfarnum@redhat.com
<pre>2020-09-14T11:50:01.150 INFO:tasks.ceph.osd.0.smithi186.stderr:2020-09-14T11:50:01.151+0000 7f68f74b6700 -1 received signal: Hangup from /usr/bin/python3 /usr/bin/daemon-helper kill ceph-osd -f --cluster ceph -i 0 (PID: 13568) UID: 0
2020-09-14T11:50:01.161 INFO:teuthology.orchestra.run.smithi186.stderr:/build/ceph-16.0.0-5071-gd026253/src/kv/RocksDBStore.cc: In function 'virtual int RocksDBStore::get(const string&, const string&, ceph::bufferlist*)' thread 7fd8452fdb80 time 2020-09-14T11:50:01.157742+0000
2020-09-14T11:50:01.161 INFO:teuthology.orchestra.run.smithi186.stderr:/build/ceph-16.0.0-5071-gd026253/src/kv/RocksDBStore.cc: 1616: ceph_abort_msg("block checksum mismatch: expected 4204593633, got 976185286 in db/000073.sst offset 155593 size 4231")
2020-09-14T11:50:01.161 INFO:teuthology.orchestra.run.smithi186.stderr: ceph version 16.0.0-5071-gd026253 (d02625331c4e06ca213d9720d98137d83a87cb90) pacific (dev)
2020-09-14T11:50:01.161 INFO:teuthology.orchestra.run.smithi186.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe1) [0x7fd83b35ba27]
2020-09-14T11:50:01.161 INFO:teuthology.orchestra.run.smithi186.stderr: 2: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list*)+0x3ec) [0x563d20f485ec]
2020-09-14T11:50:01.162 INFO:teuthology.orchestra.run.smithi186.stderr: 3: (()+0xdd3fb9) [0x563d20da7fb9]
2020-09-14T11:50:01.162 INFO:teuthology.orchestra.run.smithi186.stderr: 4: (()+0xdbf8c1) [0x563d20d938c1]
2020-09-14T11:50:01.162 INFO:teuthology.orchestra.run.smithi186.stderr: 5: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x27b) [0x563d20daeaeb]
2020-09-14T11:50:01.162 INFO:teuthology.orchestra.run.smithi186.stderr: 6: (BlueStore::fsck_check_objects_shallow(BlueStore::FSCKDepth, long, boost::intrusive_ptr<BlueStore::Collection>, ghobject_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list const&, std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char
> >, mempool::pool_allocator<(mempool::pool_index_t)11, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, std::map<boost::intrusive_ptr<BlueStore::Blob>, unsigned short, std::less<boost::intrusive_ptr<BlueStore::Blob> >, std::allocator<std::pair<boost::intrusive_ptr<BlueStore::Blob> const, unsigned short> > >*, BlueStore::FSCK_ObjectCtx const&)+0x36b) [0x563d20e07ceb]
2020-09-14T11:50:01.162 INFO:teuthology.orchestra.run.smithi186.stderr: 7: (BlueStore::_fsck_check_objects(BlueStore::FSCKDepth, BlueStore::FSCK_ObjectCtx&)+0x18a2) [0x563d20e0bac2]
2020-09-14T11:50:01.162 INFO:teuthology.orchestra.run.smithi186.stderr: 8: (BlueStore::_fsck_on_open(BlueStore::FSCKDepth, bool)+0x1626) [0x563d20e0ffc6]
2020-09-14T11:50:01.163 INFO:teuthology.orchestra.run.smithi186.stderr: 9: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c2) [0x563d20e29252]
2020-09-14T11:50:01.163 INFO:teuthology.orchestra.run.smithi186.stderr: 10: (BlueStore::_mount(bool, bool)+0x504) [0x563d20e29da4]
2020-09-14T11:50:01.163 INFO:teuthology.orchestra.run.smithi186.stderr: 11: (main()+0x2cb1) [0x563d2084c231]
2020-09-14T11:50:01.163 INFO:teuthology.orchestra.run.smithi186.stderr: 12: (__libc_start_main()+0xe7) [0x7fd839af6b97]
2020-09-14T11:50:01.163 INFO:teuthology.orchestra.run.smithi186.stderr: 13: (_start()+0x2a) [0x563d2086109a]
</pre>
<p>Showed up twice on two different machines:<br /><a class="external" href="https://pulpito.ceph.com/gregf-2020-09-14_05:25:36-rados-wip-stretch-mode-distro-basic-smithi/5433145">https://pulpito.ceph.com/gregf-2020-09-14_05:25:36-rados-wip-stretch-mode-distro-basic-smithi/5433145</a><br /><a class="external" href="https://pulpito.ceph.com/gregf-2020-09-14_05:25:36-rados-wip-stretch-mode-distro-basic-smithi/5433164">https://pulpito.ceph.com/gregf-2020-09-14_05:25:36-rados-wip-stretch-mode-distro-basic-smithi/5433164</a></p> bluestore - Bug #45133 (Resolved): BlueStore asserting on fs upgrade testshttps://tracker.ceph.com/issues/451332020-04-17T15:10:01ZGreg Farnumgfarnum@redhat.com
<p>See for instance <a class="external" href="http://pulpito.front.sepia.ceph.com/gregf-2020-04-15_20:54:50-fs-wip-greg-testing-415-1-distro-basic-smithi/4957531">http://pulpito.front.sepia.ceph.com/gregf-2020-04-15_20:54:50-fs-wip-greg-testing-415-1-distro-basic-smithi/4957531</a></p>
<pre>2020-04-15T21:27:04.353 INFO:tasks.ceph.osd.0.smithi095.stderr:/build/ceph-16.0.0-779-g9da325c/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_prepare_ondisk_format_super(KeyValueDB::Transaction&)' thread 7f45ba1aed80 time 2020-04-15T21:27:04.351535+0000
2020-04-15T21:27:04.354 INFO:tasks.ceph.osd.0.smithi095.stderr:/build/ceph-16.0.0-779-g9da325c/src/os/bluestore/BlueStore.cc: 10916: FAILED ceph_assert(ondisk_format == latest_ondisk_format)
2020-04-15T21:27:04.358 INFO:tasks.ceph.osd.0.smithi095.stderr: ceph version 16.0.0-779-g9da325c (9da325c9a5f9c40917142095f49236282c250d8e) pacific (dev)
2020-04-15T21:27:04.358 INFO:tasks.ceph.osd.0.smithi095.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x154) [0x559c71a93504]
2020-04-15T21:27:04.358 INFO:tasks.ceph.osd.0.smithi095.stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x559c71a936df]
2020-04-15T21:27:04.358 INFO:tasks.ceph.osd.0.smithi095.stderr: 3: (BlueStore::_prepare_ondisk_format_super(std::shared_ptr<KeyValueDB::TransactionImpl>&)+0x332) [0x559c71fca0a2]
2020-04-15T21:27:04.358 INFO:tasks.ceph.osd.0.smithi095.stderr: 4: (BlueStore::_upgrade_super()+0x4fe) [0x559c71fcbfae]
2020-04-15T21:27:04.358 INFO:tasks.ceph.osd.0.smithi095.stderr: 5: (BlueStore::_mount(bool, bool)+0x599) [0x559c7201f719]
2020-04-15T21:27:04.358 INFO:tasks.ceph.osd.0.smithi095.stderr: 6: (OSD::init()+0x4d1) [0x559c71b30031]
2020-04-15T21:27:04.358 INFO:tasks.ceph.osd.0.smithi095.stderr: 7: (main()+0x3da5) [0x559c71a9a8d5]
2020-04-15T21:27:04.359 INFO:tasks.ceph.osd.0.smithi095.stderr: 8: (__libc_start_main()+0xe7) [0x7f45b6fadb97]
2020-04-15T21:27:04.359 INFO:tasks.ceph.osd.0.smithi095.stderr: 9: (_start()+0x2a) [0x559c71aaf52a]
2020-04-15T21:27:04.359 INFO:tasks.ceph.osd.0.smithi095.stderr:2020-04-15T21:27:04.351+0000 7f45ba1aed80 -1 /build/ceph-16.0.0-779-g9da325c/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_prepare_ondisk_format_super(KeyValueDB::Transaction&)' thread 7f45ba1aed80 time 2020-04-15T21:27:04.351535+0000
2020-04-15T21:27:04.359 INFO:tasks.ceph.osd.0.smithi095.stderr:/build/ceph-16.0.0-779-g9da325c/src/os/bluestore/BlueStore.cc: 10916: FAILED ceph_assert(ondisk_format == latest_ondisk_format)
</pre>
<p>As far as I can tell this all looks kosher — we install nautilus, then upgrade to master branch, and this occurs 11 seconds after restarting the OSDs.</p> bluestore - Feature #44978 (New): support "bad block" isolationhttps://tracker.ceph.com/issues/449782020-04-07T16:11:51ZGreg Farnumgfarnum@redhat.com
<p>A while ago we had a question at a meetup:</p>
<blockquote>
<p>Does BlueStore support bad block isolation?<br />In FileStore if you hit a bad block file you could just move the file into a root "badblock" folder and it wouldn't bother you again</p>
</blockquote>
<p>Sam suggested you <strong>might</strong> be able to do this by mounting BlueStore via FUSE and moving the object around, but we're not sure if that's actually possible. If it is, perhaps document that and close this ticket. If not, come up with a way to avoid using bad blocks on disk in the future (presumably by similarly leaking the space).</p> bluestore - Bug #40557 (Duplicate): Rocksdb lookups slow until manual compactionhttps://tracker.ceph.com/issues/405572019-06-26T15:01:10ZGreg Farnumgfarnum@redhat.com
<p><a class="external" href="https://www.spinics.net/lists/ceph-users/msg53623.html">https://www.spinics.net/lists/ceph-users/msg53623.html</a><br />"[ceph-users] OSDs taking a long time to boot due to 'clear_temp_objects', even with fresh PGs"</p>
<p>I'm guessing this is due to a combination of RocksDB not being compacted to clear out many deleted objects, and the lookup issue fixed in github.com/ceph/ceph/pull/27627</p>
<p>ceph-post-file: 1829bf40-cce1-4f65-8b35-384935d11446</p> Ceph - Bug #38173 (Resolved): Remove helm+kubernetes install from ceph docshttps://tracker.ceph.com/issues/381732019-02-04T22:16:22ZGreg Farnumgfarnum@redhat.com
<p>We have three install methods described on docs.ceph.com: Ceph-deploy, manual, and "(KUBERNETES + HELM)". We should certainly remove the last one as the project is not maintained or in anybody's future plans.</p> bluestore - Bug #23141 (Resolved): BlueFS reports rotational journals if BDEV_WAL is not sethttps://tracker.ceph.com/issues/231412018-02-26T21:32:45ZGreg Farnumgfarnum@redhat.com
<p>This came in from two different users on the mailing list (<a class="external" href="https://www.spinics.net/lists/ceph-users/msg42873.html">https://www.spinics.net/lists/ceph-users/msg42873.html</a>, <a class="external" href="https://www.spinics.net/lists/ceph-users/msg42977.html">https://www.spinics.net/lists/ceph-users/msg42977.html</a>) and is important due to the OSD making different config choices when the journal is rotational. In these cases, by inserting default sleeps on backfill.</p> bluestore - Bug #22161 (Resolved): bluestore: do not crash on over-large objectshttps://tracker.ceph.com/issues/221612017-11-19T10:21:52ZGreg Farnumgfarnum@redhat.com
<p>See "[ceph-users] Getting errors on erasure pool writes k=2, m=1" for one example. We've had a number of reports from users putting in overly-large objects and causing their OSDs to crash.</p>
<p>Obviously it would be good if we prevented them doing this from higher layers (...which I thought we did, now?), but we also need BlueStore to start rejecting IOs instead of crashing.</p> Ceph - Bug #21738 (Closed): audit for asserts on shutdownhttps://tracker.ceph.com/issues/217382017-10-09T20:23:49ZGreg Farnumgfarnum@redhat.com
<p>In various places we assert conditions are true on shutdown. We would really prefer not to do that for users when it's just internal bookkeeping. See <a class="issue tracker-1 status-3 priority-5 priority-high3 closed" title="Bug: OSDMap cache assert on shutdown (Resolved)" href="https://tracker.ceph.com/issues/21737">#21737</a> for one example.</p>
<p>I tried to do a general audit for this by going through destructors, but the best regex I could come up with was: <pre>git grep "~[a-zA-Z]*()[^}]*{[^}]*" -- src/</pre><br />That returned 783 results. Many of them are actually empty, but all my attempts to exclude those failed, for my regex-fu was not strong enough. (The plan I had was to find a regex that only returned non-empty implementations, then stick a "-A 20" or similar on it, and then search the results for "assert", which ought to have a high enough real hit ratio to be useful.)</p> Ceph - Feature #20300 (New): run check_commands.sh as part of make checkhttps://tracker.ceph.com/issues/203002017-06-14T18:05:51ZGreg Farnumgfarnum@redhat.com
<p>We don't have any programmatic way of making sure all our commands are actually tested in either make check or teuthology. We should.</p> Ceph - Backport #20084 (Resolved): OSDs assert on shutdown when PGs are in snaptrim_wait() statehttps://tracker.ceph.com/issues/200842017-05-25T20:55:04ZGreg Farnumgfarnum@redhat.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/15322">https://github.com/ceph/ceph/pull/15322</a></p> Ceph - Feature #19799 (New): Expose snap trim speed along with client and recovery IO rateshttps://tracker.ceph.com/issues/197992017-04-27T21:59:23ZGreg Farnumgfarnum@redhat.com
<p>Something like<br /><pre>"snaptrim io 144 MB/s, 721 objects/s"</pre></p>
<p>I'm not sure we actually want to expose a MB/s rate since the limiter is actually IOPS and that's all about the total object counts, but that number can be an important component of cluster load which we should surface to administrators.</p> Ceph - Bug #19752 (Rejected): pg_pool_t::remove_snap() allocates a snapid that is never deletedhttps://tracker.ceph.com/issues/197522017-04-24T19:56:22ZGreg Farnumgfarnum@redhat.com
<p>That means the removed_snaps interval set is remarkably inefficient. If you always create a snapshot before removing one, it is the worst-case size!</p> Ceph - Backport #19702 (Resolved): kraken: osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first)https://tracker.ceph.com/issues/197022017-04-20T00:42:54ZGreg Farnumgfarnum@redhat.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/14646">https://github.com/ceph/ceph/pull/14646</a></p> Ceph - Backport #19701 (Resolved): jewel: osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first)https://tracker.ceph.com/issues/197012017-04-20T00:40:46ZGreg Farnumgfarnum@redhat.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/14596">https://github.com/ceph/ceph/pull/14596</a></p>