Project

General

Profile

Actions

Bug #41346

closed

mds: MDSIOContextBase instance leak

Added by Xuehan Xu over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
nautilus,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From time to time, we see mds crushes when shutting down:

#0  0x00002adc071141a0 in std::ostream::operator<<(unsigned int) () from /lib64/libstdc++.so.6
#1  0x00002adbfc818169 in operator<< (out=..., e=...) at /home/xuxuehan/ceph-orig/src/common/escape.cc:278
#2  0x00002adbfc6796c3 in ceph::JSONFormatter::print_quoted_string (this=0x7ffd366c74b0, s=...) at /home/xuxuehan/ceph-orig/src/common/Formatter.cc:167
#3  0x00002adbfc679e4f in ceph::JSONFormatter::add_value (this=0x7ffd366c74b0, name=0x55a09b79d518 "entity_name", val=..., quoted=true)
    at /home/xuxuehan/ceph-orig/src/common/Formatter.cc:280
#4  0x00002adbfc679f19 in ceph::JSONFormatter::dump_string (this=0x7ffd366c74b0, name=0x55a09b79d518 "entity_name", s=...) at /home/xuxuehan/ceph-orig/src/common/Formatter.cc:301
#5  0x000055a09b6913b7 in handle_fatal_signal (signum=11) at /home/xuxuehan/ceph-orig/src/global/signal_handler.cc:200
#6  <signal handler called>
#7  0x00002adc06610c30 in pthread_mutex_lock () from /lib64/libpthread.so.0
#8  0x00002adbfc6eae4f in __gthread_mutex_lock (__mutex=0x38) at /opt/rh/devtoolset-8/root/usr/include/c++/8/x86_64-redhat-linux/bits/gthr-default.h:748
#9  0x00002adbfc6eaecc in std::mutex::lock (this=0x38) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_mutex.h:103
#10 0x00002adbfc6ec765 in std::unique_lock<std::mutex>::lock (this=0x7ffd366cd5a0) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_mutex.h:267
#11 0x00002adbfc6ebed6 in std::unique_lock<std::mutex>::unique_lock (this=0x7ffd366cd5a0, __m=...) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_mutex.h:197
#12 0x00002adbfcaef351 in ceph::logging::Log::submit_entry(ceph::logging::Entry&&) (this=0x0, 
    e=<unknown type in /home/xuxuehan/ceph-orig/build/lib/libceph-common.so.0, CU 0x3c1c626, DIE 0x3c6a97c>) at /home/xuxuehan/ceph-orig/src/log/Log.cc:180
#13 0x000055a09b5a31ac in elist<MDSIOContextBase*>::~elist (this=0x55a09bbfe040 <MDSIOContextBase::ctx_list>, __in_chrg=<optimized out>) at /home/xuxuehan/ceph-orig/src/include/elist.h:95
#14 0x00002adc078d5c29 in __run_exit_handlers () from /lib64/libc.so.6
#15 0x00002adc078d5c77 in exit () from /lib64/libc.so.6
#16 0x00002adc078be49c in __libc_start_main () from /lib64/libc.so.6
#17 0x000055a09b0ecb19 in _start ()

After debugging, we believe this is due to MDSIOContextBase::complete not deleting itself during the shutdown process.


Related issues 3 (0 open3 closed)

Related to CephFS - Bug #44295: mds: MDCache.cc: 6400: FAILED ceph_assert(r == 0 || r == -2)ResolvedPatrick Donnelly

Actions
Copied to CephFS - Backport #41851: nautilus: mds: MDSIOContextBase instance leakResolvedPrashant DActions
Copied to CephFS - Backport #41852: mimic: mds: MDSIOContextBase instance leakResolvedPrashant DActions
Actions #1

Updated by Patrick Donnelly over 4 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Zheng Yan
  • Start date deleted (08/20/2019)
  • Backport set to nautilus,mimic
Actions #2

Updated by Patrick Donnelly over 4 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #3

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #41851: nautilus: mds: MDSIOContextBase instance leak added
Actions #4

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #41852: mimic: mds: MDSIOContextBase instance leak added
Actions #5

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions #6

Updated by Patrick Donnelly about 4 years ago

  • Related to Bug #44295: mds: MDCache.cc: 6400: FAILED ceph_assert(r == 0 || r == -2) added
Actions

Also available in: Atom PDF