Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2021-10-18T02:27:18ZCeph
Redmine rbd - Bug #52962 (Fix Under Review): rbd: increase overlap range when overlap occur.https://tracker.ceph.com/issues/529622021-10-18T02:27:18Zjianpeng majianpeng.ma@intel.com
<p>We use BlockGuard to avoid reorder when overlap occur. But the following<br />case don't work:<br />a: write(3,5)<br />b: write(4,9)<br />c: write(8, 11).<br />b overlap w/ a, so it append in a. c overlap w/ b.<br />In fact, c shuld append b. But currently, it can't change overlap range<br />when b overlap a occur. The c can't be detained and issue before b.<br />We should change overlap from (3,5) to (3, 9) when b overlap a.</p> RADOS - Feature #49089 (Fix Under Review): msg: add new func support_reencodehttps://tracker.ceph.com/issues/490892021-02-02T08:13:27Zjianpeng majianpeng.ma@intel.com
<p>Currently, we use Messenger::ms_can_fast_dispatch to verfiy Message whether support reencode. Now we add a new api of Message to support</p>
<p>Why MOSDMap can't support reencode: in MOSDMap::encode_payload, it clear data in<br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/messages/MOSDMap.h#L116">https://github.com/ceph/ceph/blob/master/src/messages/MOSDMap.h#L116</a><br /><a class="external" href="https://github.com/ceph/ceph/blob/master/src/messages/MOSDMap.h#L139">https://github.com/ceph/ceph/blob/master/src/messages/MOSDMap.h#L139</a>.<br />So it can't reencode.</p>
<p>But for other message, we can reencode. So we add new func to support reencode.</p> Ceph - Bug #41216 (Resolved): os/bluestore: Don't forget sub kv_submitted_waiters.https://tracker.ceph.com/issues/412162019-08-13T01:18:42Zjianpeng majianpeng.ma@intel.com
<p>in func flush_all_but_last, it forgets to dec kv_submitted_waiters when it returns for condition "it->state >= TransContext::STATE_KV_SUBMITTE".<br />void flush_all_but_last() {<br /> std::unique_lock l(qlock);<br /> assert (q.size() >= 1);<br /> while (true) {<br /> // set flag before the check because the condition<br /> // may become true outside qlock, and we need to make<br /> // sure those threads see waiters and signal qcond.<br /> +<ins>kv_submitted_waiters;<br /> if (q.size() <= 1) { <br /> --kv_submitted_waiters;<br /> return;<br /> } else {<br /> auto it = q.rbegin();<br /> it</ins>+;<br /> if (it->state >= TransContext::STATE_KV_SUBMITTED) {<br /> return;<br /> } <br /> } <br /> qcond.wait(l);<br /> --kv_submitted_waiters;<br /> } <br /> }</p> Ceph - Bug #39643 (In Progress): make jemalloc/tcmalloc work.https://tracker.ceph.com/issues/396432019-05-09T08:26:53Zjianpeng majianpeng.ma@intel.com
<p>In commit 2e01287, it let libcommon or libceph-common take care of this.<br />But in fact, libcommon/libceph-common didn't link ALLOC_LIBS which make other<br />bin-files couldn't use jemalloc/tcmalloc.</p> Ceph - Bug #39623 (Resolved): make cluster_network work well.https://tracker.ceph.com/issues/396232019-05-08T03:26:59Zjianpeng majianpeng.ma@intel.com
<p>This temporary parameter make address is zero. So make cluster_addr is<br />equal public_addr and make cluster_network disable.</p> bluestore - Bug #24761 (Resolved): set correctly shard for existed Collection.https://tracker.ceph.com/issues/247612018-07-03T23:29:03Zjianpeng majianpeng.ma@intel.com
<p>For existed Collection, the constructor be called in _open_collections.<br />But m_finisher_num can't setup when enable bluestore_shard_finishers.</p>
<p>So move m_finisher_num setup before _open_collections.</p> Ceph - Bug #24569 (Fix Under Review): Fix outorder between Thread::create and Thread::set_iopriohttps://tracker.ceph.com/issues/245692018-06-19T06:25:06Zjianpeng majianpeng.ma@intel.com
<p>In TheadPool::start_threads:</p>
<blockquote><blockquote>
<p>int r = wt->set_ioprio(ioprio_class, ioprio_priority);<br />if (r < 0)<br />lderr(cct) << " set_ioprio got " << cpp_strerror(r) << dendl;</p>
</blockquote></blockquote>
<blockquote><blockquote>
<p>wt->create(thread_name.c_str());</p>
</blockquote></blockquote>
<p>In fact, it should firstly call create.</p> Ceph - Bug #24567 (Fix Under Review): fix a race between Thread::create and Thread::set_iopriohttps://tracker.ceph.com/issues/245672018-06-19T06:07:39Zjianpeng majianpeng.ma@intel.com
<p>We may do the following process:<br />a)Thread::create/try_create<br />b)Thread::set_ioprio</p>
<p>But the entry of Thread maybe not exec after step a) which cause step b)<br />can't set currently ioprio.<br />To fix potential race, i add mutex which make step b) wait until entry<br />exec.</p> bluestore - Bug #24561 (Resolved): if disableWAL is set, submit_transacton_sync will met error.https://tracker.ceph.com/issues/245612018-06-19T03:03:44Zjianpeng majianpeng.ma@intel.com
<p>If disableWAL is set, it will met those error:</p>
<p>rocksdb: submit_common error: Invalid argument: Sync writes has to enable WAL. code = 4 Rocksdb transaction:</p> bluestore - Bug #24560 (Resolved): BitmapAllocator::_mark_allocated parameter overflow.https://tracker.ceph.com/issues/245602018-06-19T02:20:05Zjianpeng majianpeng.ma@intel.com
<p>In fact, length of 'struct interval_t' and 'struct bluestore_pextent_t'<br />is uint32_t. But len of AllocatorLevel02::_mark_allocated is uint64_t.<br />So it may cause data overflow which cause bug.</p> Ceph - Bug #14920 (Resolved): If bluestore_sync_submit_transaction == true, make ceph don't work.https://tracker.ceph.com/issues/149202016-02-29T06:21:06Zjianpeng majianpeng.ma@intel.com
<p>If bluestore_sync_submit_transaction = true; it submit transaction in _txc_state_proc.<br />For the extent alloca/release info need update in _kv_sync_thread. But because blustore_sync_submit_transaction == true, it don't submit transaction again.<br />So the extent alloca/release don't store in kv. After reboot, fsck met error.</p> Ceph - Bug #14918 (Resolved): [BlueStore]If bluestore_sync_transaction == true, make bluestore de...https://tracker.ceph.com/issues/149182016-02-29T05:28:26Zjianpeng majianpeng.ma@intel.com
<p>In _txc_finish_io, it get the lock of osr and call _txc_state_proc.<br />If bluestore_sync_transaction true adn txd->wal_txn null, it will<br />call _txc_finish which need get lock of osr. So the deadlock occru.<br />The simple wal is put txc in kv_queue whether sync or async.</p> RADOS - Bug #14100 (New): the semantics of CEPH_OSD_OP_ZERO.https://tracker.ceph.com/issues/141002015-12-17T07:23:45Zjianpeng majianpeng.ma@intel.com
<p>Now in ReplicatedPG::do_osd_ops, for CEPH_OSD_OP_ZERO, there are the following handle:</p>
<p>a)if object din't exist, the op became no-op.<br />b)if object existed<br /><pre>
if (offset + len >= ob.size)
if (offset >= ob.size)
op became no-op
else
op became truncate and the ob.size became offset
else
zero(offset, len) like normal write opeartions
</pre></p>
<p>Those behaviors are ok for rbd. From the native semantics of zero, it write the data(off, len) w/ zero.Make later read data is zero.<br />For rbd, we know the object size. If non-exist or smaller, we will append zero data in the end.</p>
<p>rados has a api: rados_write_op_zero, it use zero as write which don't carry data from client to osd.<br />But for a normal object, <br />a)zero a non-exist object don't work<br />b)zero(offset >= ob.size) don't work(but write(offset >= ob.size can work)<br />c)zero(offset + len >= ob.size & offset < ob.size) don't work(zero become a truncate and ob.size=offset).Later read(offset, len) don't return any data.</p>
<p>In fact, for rbd, zero mean discard. But for rados, zero is a speical write.</p> CephFS - Feature #12265 (New): Add Iohint in cephfshttps://tracker.ceph.com/issues/122652015-07-10T05:27:31Zjianpeng majianpeng.ma@intel.com
<p>This is the plan of BP "Add Iohint in Cephfs"(<a class="external" href="http://tracker.ceph.com/projects/ceph/wiki/Add_IOhint_in_CephFS">http://tracker.ceph.com/projects/ceph/wiki/Add_IOhint_in_CephFS</a>).</p>
<p>By the following steps to implement this BP<br />a:add iohint in mds, mostly for mds journal. Like osd journal, most journal data don't read again and will overwrite. So w/ DONTNEED is make sense.<br />b:add iohint mount option. Different sub directory have different purpose. For example, one directory don't need cache data.<br />c:in cephfs client(kernel/client) add iohint. Hope the the posix_fadvise can effect the whole ceph cluster rather than client kernel page cache.<br />d:Maybe add iohint in cephfs repair.(Need review code).</p> Ceph - Bug #9342 (Resolved): Different implementation for PGTransaction::get_bytes_written betwee...https://tracker.ceph.com/issues/93422014-09-04T07:20:02Zjianpeng majianpeng.ma@intel.com
<p>The l_osd_op_w_inb use this value. So the perf-dump display different value when write same object to erasure-pool/replicated-pool.<br />I think the implementation of ECTransaction is right. It should record the size of data which client wrote.</p>