Project

General

Profile

Actions

Bug #18046

closed

filestore: sync_entry timed out cause osd crash

Added by huang jun over 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

hello, i got filestore sync_entry thread hung, and crashed osd after 600s
here is the sync_entry thread backtrace:

(gdb) thread 192
[Switching to thread 192 (Thread 0x7fe425457700 (LWP 1165749))]
#0 0x00007fe4340c3f4d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007fe4340c3f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fe4340bfd02 in _L_lock_791 () from /lib64/libpthread.so.0
#2 0x00007fe4340bfc08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x0000000000abe12f in Mutex::Lock (this=this@entry=0x7fe4318d0c88, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:92
#4 0x00000000009f5385 in Locker (m=..., this=<synthetic pointer>) at common/Mutex.h:115
#5 FileJournal::committed_thru (this=0x7fe4318d0700, seq=43449) at os/FileJournal.cc:1601
#6 0x00000000009017ae in JournalingObjectStore::ApplyManager::commit_finish (this=this@entry=0x7fe4318651c8) at os/JournalingObjectStore.cc:247
#7 0x00000000008b7a00 in FileStore::sync_entry (this=0x7fe431865000) at os/FileStore.cc:3900
#8 0x00000000008efc7d in FileStore::SyncThread::entry (this=<optimized out>) at os/FileStore.h:173
#9 0x00007fe4340bddc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fe432ba117d in clone () from /lib64/libc.so.6
(gdb) f 5
#5 FileJournal::committed_thru (this=0x7fe4318d0700, seq=43449) at os/FileJournal.cc:1601
1601 Mutex::Locker locker(write_lock);
(gdb) p write_lock
$2 = {name = 0xc8a36c "FileJournal::write_lock", id = -1, recursive = false, lockdep = true, backtrace = false, _m = {__data = {__lock = 2, __count = 0, __owner = 1165754, __nusers = 1,
__kind = 2, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\272\311\021\000\001\000\000\000\002", '\000' <repeats 22 times>,
__align = 2}, nlock = 1, locked_by = 140617846187776, cct = 0x7fe431835100, logger = 0x7fe431813d80}
(gdb) thread 492
[Switching to thread 492 (Thread 0x7fe424c56700 (LWP 1165754))]
#0 0x00007fe432bf0aae in __memcpy_ssse3_back () from /lib64/libc.so.6
(gdb) bt
#0 0x00007fe432bf0aae in __memcpy_ssse3_back () from /lib64/libc.so.6
#1 0x0000000000b85389 in ceph::buffer::list::rebuild (this=0x7fe424c557f0, nb=...) at common/buffer.cc:1223
#2 0x0000000000b85b2d in ceph::buffer::list::rebuild_aligned_size_and_memory (this=this@entry=0x7fe424c55ce0, align_size=4096, align_memory=4096) at common/buffer.cc:1271
#3 0x0000000000b85c77 in ceph::buffer::list::rebuild_aligned (this=this@entry=0x7fe424c55ce0, align=<optimized out>) at common/buffer.cc:1233
#4 0x0000000000b85c8e in ceph::buffer::list::rebuild_page_aligned (this=this@entry=0x7fe424c55ce0) at common/buffer.cc:1280
#5 0x00000000009ead1e in FileJournal::align_bl (this=this@entry=0x7fe4318d0700, pos=7729348608, bl=...) at os/FileJournal.cc:960
#6 0x00000000009f2ca5 in FileJournal::do_write (this=this@entry=0x7fe4318d0700, bl=...) at os/FileJournal.cc:1020
#7 0x00000000009f73eb in FileJournal::write_thread_entry (this=0x7fe4318d0700) at os/FileJournal.cc:1245
#8 0x00000000008ed8dd in FileJournal::Writer::entry (this=<optimized out>) at os/FileJournal.h:338
#9 0x00007fe4340bddc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fe432ba117d in clone () from /lib64/libc.so.6
(gdb) f 5
#5 0x00000000009ead1e in FileJournal::align_bl (this=this@entry=0x7fe4318d0700, pos=7729348608, bl=...) at os/FileJournal.cc:960
960 bl.rebuild_page_aligned();
(gdb) p bl
$3 = (ceph::bufferlist &) @0x7fe424c55ce0: {_buffers = empty std::list, _len = 210534400, _memcopy_count = 0, append_buffer = {_raw = 0x7fe429046080, _off = 0, _len = 3904}, last_p = {
bl = 0x7fe424c55ce0, ls = 0x7fe424c55ce0, off = 0, p = {_raw = , _off = 688152704, _len = 32740}, p_off = 0}, static CLAIM_DEFAULT = 0, static CLAIM_ALLOW_NONSHAREABLE = 1}

we use jemalloc-3.6.0-1.el7.x86_64 as memory allocator,
the ceph version is 0.94.5
os version:
Linux server2 4.1.12-37.5.1.el7uek.x86_64 #2 SMP Wed Jun 15 14:46:02 CST 2016 x86_64 x86_64 x86_64 GNU/Linux
what will cause this problem? thank you

Actions #1

Updated by huang jun over 7 years ago

can we do this?

@diff --git a/src/os/filestore/FileJournal.cc b/src/os/filestore/FileJournal.cc
index a0985ff..016f606 100644
--- a/src/os/filestore/FileJournal.cc
+++ b/src/os/filestore/FileJournal.cc
@@ -1045,13 +1045,15 @@ void FileJournal::do_write(bufferlist& bl)
   off64_t pos = write_pos;

   // Adjust write_pos
-  align_bl(pos, bl);
   write_pos += bl.length();
   if (write_pos >= header.max_size)
     write_pos = write_pos - header.max_size + get_top();

   write_lock.Unlock();

+  // align bl after unlock the write_lock
+  align_bl(pos, bl);
+@
Actions #2

Updated by Samuel Just over 7 years ago

Hmm, I'm not sure about that. Why would align_bl take that long to begin with? Is this reproducible? If so, can you add debugging to determine what is being passed in?

Actions #3

Updated by huang jun over 7 years ago

it maybe related to our configuration, we set vm.zone_reclaim_mode=3, which will block the memory allocation if there is not enough memory in the zone.

Actions #4

Updated by huang jun over 7 years ago

we change vm.zone_reclaim_mode=0, and do the same tests, it doesn't turn up until now

Actions #5

Updated by Greg Farnum almost 7 years ago

  • Status changed from New to Closed

I don't think we can do anything at a micro level about running out of memory inside of mutexes.

Actions

Also available in: Atom PDF