Bug #62918
openall write io block but no flush is triggered
0%
Description
If memory page is large, _maybe_wait_for_writeback may wait for dirty bufferheads exceed limit, but dirty data size is smaller than target dirty,
so fluser_entry will not run flush, then all write io block.
Updated by Venky Shankar 7 months ago
- Project changed from CephFS to RADOS
Jack Lv wrote:
If memory page is large, _maybe_wait_for_writeback may wait for dirty bufferheads exceed limit, but dirty data size is smaller than target dirty,
so fluser_entry will not run flush, then all write io block.
Do you have a reproducer? Going by your description, setting some ObjectCacher config lower possibly reproduce this?
Also, this is more suited for the RADOS project.
Updated by Radoslaw Zarzynski 7 months ago
- Status changed from New to Need More Info
Updated by Jack Lv 6 months ago
Hi, run ceph-fuse in OS that have a 64k(or 32k etc) memory page, then do multithread small io(4k) write on ceph-fuse directory, it is easy to reproduce(use lagger client_oc_max_dirty_age).
When bufferheads num of dirty data exceed limit, but dirty data size does not, flusher_entry just do nothing even all ios have block at _maybe_wait_for_writeback, it is perhaps unreasonable.
void ObjectCacher::_maybe_wait_for_writeback(uint64_t len,
ZTracer::Trace *trace)
{
ceph_assert(ceph_mutex_is_locked(lock));
ceph::mono_time start = ceph::mono_clock::now();
int blocked = 0;
// wait for writeback?
// - wait for dirty and tx bytes (relative to the max_dirty threshold)
// - do not wait for bytes other waiters are waiting on. this means that
// threads do not wait for each other. this effectively allows the cache
// size to balloon proportional to the data that is in flight.
*uint64_t max_dirty_bh = max_dirty >> BUFFER_MEMORY_WEIGHT;*
Updated by Radoslaw Zarzynski 6 months ago
The BUFFER_MEMORY_WEIGHT
is just an alias over CEPH_PAGE_SHIFT
:
./osdc/ObjectCacher.cc:#define BUFFER_MEMORY_WEIGHT CEPH_PAGE_SHIFT // memory usage of BufferHead, count in (1<<n)
which in turn is:
namespace ceph { // these are in common/page.cc extern unsigned _page_size; extern unsigned long _page_mask; extern unsigned _page_shift; } #endif #define CEPH_PAGE_SIZE ceph::_page_size #define CEPH_PAGE_MASK ceph::_page_mask #define CEPH_PAGE_SHIFT ceph::_page_shift
These globals are calculated the following way:
namespace ceph { // page size crap, see page.h int _get_bits_of(int v) { int n = 0; while (v) { n++; v = v >> 1; } return n; } #ifdef _WIN32 unsigned _get_page_size() { SYSTEM_INFO system_info; GetSystemInfo(&system_info); return system_info.dwPageSize; } unsigned _page_size = _get_page_size(); #else unsigned _page_size = sysconf(_SC_PAGESIZE); #endif unsigned long _page_mask = ~(unsigned long)(_page_size - 1); unsigned _page_shift = _get_bits_of(_page_size - 1); }
Could you provide the value of sysconf(_SC_PAGESIZE)
maybe?
Updated by Radoslaw Zarzynski 6 months ago
So your point that the while
's body isn't executed when max_dirty_bh
is calculated for 64k pages, right?
void ObjectCacher::_maybe_wait_for_writeback(uint64_t len,
ZTracer::Trace *trace)
{
ceph_assert(ceph_mutex_is_locked(lock));
ceph::mono_time start = ceph::mono_clock::now();
int blocked = 0;
// wait for writeback?
// - wait for dirty and tx bytes (relative to the max_dirty threshold)
// - do not wait for bytes other waiters are waiting on. this means that
// threads do not wait for each other. this effectively allows the cache
// size to balloon proportional to the data that is in flight.
uint64_t max_dirty_bh = max_dirty >> BUFFER_MEMORY_WEIGHT;
while (get_stat_dirty() + get_stat_tx() > 0 &&
(((uint64_t)(get_stat_dirty() + get_stat_tx()) >=
max_dirty + get_stat_dirty_waiting()) ||
(dirty_or_tx_bh.size() >=
max_dirty_bh + get_stat_nr_dirty_waiters()))) {
if (blocked == 0) {
trace->event("start wait for writeback");
}
ldout(cct, 10) << __func__ << " waiting for dirty|tx "
<< (get_stat_dirty() + get_stat_tx()) << " >= max "
<< max_dirty << " + dirty_waiting "
<< get_stat_dirty_waiting() << dendl;
flusher_cond.notify_all();
stat_dirty_waiting += len;
++stat_nr_dirty_waiters;
std::unique_lock l{lock, std::adopt_lock};
stat_cond.wait(l);
l.release();
stat_dirty_waiting -= len;
--stat_nr_dirty_waiters;
++blocked;
ldout(cct, 10) << __func__ << " woke up" << dendl;
}
Updated by Patrick Donnelly 5 months ago
- Status changed from Need More Info to Fix Under Review
- Pull request ID set to 53622