Project

General

Profile

Actions

Bug #5105

closed

mds/CInode.cc: 1996: FAILED assert(auth_pins >= 0)

Added by Jim Schutt almost 11 years ago. Updated almost 11 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While trying to reproduce #4999, I collected this in an MDS.

I was running next branch (commit c80c6a032c) merged into wip-4999.

   -16> 2013-05-17 15:53:06.697268 7ffff26e5700 10 monclient: tick
   -15> 2013-05-17 15:53:06.697302 7ffff26e5700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2013-05-17 15:52:36.697300)
   -14> 2013-05-17 15:53:06.697317 7ffff26e5700 10 monclient: renew subs? (now: 2013-05-17 15:53:06.697317; renew after: 2013-05-17 15:54:16.696127) -- no
   -13> 2013-05-17 15:53:06.744855 7ffff36e7700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 230) from client.4590
   -12> 2013-05-17 15:53:06.746292 7ffff36e7700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 229) from client.4656
   -11> 2013-05-17 15:53:06.756144 7ffff36e7700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 229) from client.4644
   -10> 2013-05-17 15:53:06.757865 7ffff36e7700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 228) from client.4536
    -9> 2013-05-17 15:53:06.765052 7ffff36e7700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 229) from client.4641
    -8> 2013-05-17 15:53:06.830053 7ffff36e7700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 229) from client.4593
    -7> 2013-05-17 15:53:06.872074 7ffff36e7700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 230) from client.4518
    -6> 2013-05-17 15:53:06.923462 7ffff36e7700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 230) from client.4599
    -5> 2013-05-17 15:53:07.022330 7ffff36e7700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 229) from client.4572
    -4> 2013-05-17 15:53:07.084271 7ffff36e7700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 229) from client.4632
    -3> 2013-05-17 15:53:07.188288 7ffff14e1700 10 monclient: _send_mon_message to mon.cs30 at 172.17.131.30:6789/0
    -2> 2013-05-17 15:53:09.779806 7ffff14e1700  2 mds.0.cache check_memory_usage total 4358556, rss 2717152, heap 4444, malloc 9336 mmap 0, baseline 8540, buffers 0, max 1048576, 795 / 808 inodes have caps, 1387 caps, 1.71658 caps per inode
    -1> 2013-05-17 15:53:09.779942 7ffff14e1700  5 mds.0.bal mds.0 epoch 290 load mdsload<[0,0 0]/[0,0 0], req 0, hr 0, qlen 6, cpu 2.27>
     0> 2013-05-17 15:53:09.786259 7ffff36e7700 -1 mds/CInode.cc: In function 'virtual void CInode::auth_unpin(void*)' thread 7ffff36e7700 time 2013-05-17 15:53:09.780291
mds/CInode.cc: 1996: FAILED assert(auth_pins >= 0)

 ceph version 0.62-190-g2265d88 (2265d88dbc5bbafa097b8fc8fb8895493d572fef)
 1: (CInode::auth_unpin(void*)+0x59d) [0x6b6a5d]
 2: (Mutation::drop_local_auth_pins()+0x34) [0x56a114]
 3: (Mutation::cleanup()+0x9) [0x56a899]
 4: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, Capability*, MClientCaps*)+0x19f) [0x62ba7f]
 5: (C_Locker_FileUpdate_finish::finish(int)+0x2c) [0x63c25c]
 6: (Context::complete(int)+0xa) [0x4a97ea]
 7: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x9d) [0x6662dd]
 8: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x20e) [0x6fc7ce]
 9: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe84) [0x71e504]
 10: (MDS::handle_core_message(Message*)+0x957) [0x4caf07]
 11: (MDS::_dispatch(Message*)+0x2f) [0x4cafbf]
 12: (MDS::ms_dispatch(Message*)+0x1fb) [0x4ccaeb]
 13: (DispatchQueue::entry()+0x3e1) [0x83da11]
 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x79a07d]
 15: (()+0x77f1) [0x7ffff76507f1]
 16: (clone()+0x6d) [0x7ffff6ca5ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 0 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/ceph/log/mds.cs29.log
--- end dump of recent events ---
2013-05-17 15:53:10.015470 7ffff36e7700 -1 *** Caught signal (Aborted) **
 in thread 7ffff36e7700

 ceph version 0.62-190-g2265d88 (2265d88dbc5bbafa097b8fc8fb8895493d572fef)
 1: /usr/bin/ceph-mds() [0x85ede9]
 2: (()+0xf4a0) [0x7ffff76584a0]
 3: (gsignal()+0x35) [0x7ffff6bf2885]
 4: (abort()+0x175) [0x7ffff6bf4065]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3be84bea7d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
     0> 2013-05-17 15:53:10.015470 7ffff36e7700 -1 *** Caught signal (Aborted) **
 in thread 7ffff36e7700

 ceph version 0.62-190-g2265d88 (2265d88dbc5bbafa097b8fc8fb8895493d572fef)
 1: /usr/bin/ceph-mds() [0x85ede9]
 2: (()+0xf4a0) [0x7ffff76584a0]
 3: (gsignal()+0x35) [0x7ffff6bf2885]
 4: (abort()+0x175) [0x7ffff6bf4065]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3be84bea7d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 0 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/ceph/log/mds.cs29.log
--- end dump of recent events ---

From gdb:

Core was generated by `/usr/bin/ceph-mds -i cs29 --pid-file /var/run/ceph/mds.cs29.pid -c /etc/ceph/ce'.
Program terminated with signal 6, Aborted.
#0  0x00007ffff765836b in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42

warning: Source file is more recent than executable.
42                 sig);
Missing separate debuginfos, use: debuginfo-install gperftools-libs-2.0-3.el6.2.x86_64 libgcc-4.4.6-3.el6.x86_64 libstdc++-4.4.6-3.el6.x86_64 libuuid-2.17.2-12.4.el6.x86_64 nspr-4.9-1.el6.x86_64 nss-3.13.3-8.el6.x86_64 nss-softokn-3.12.9-11.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64 nss-util-3.13.3-2.el6.x86_64 sqlite-3.6.20-1.el6.x86_64
(gdb) bt
#0  0x00007ffff765836b in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#1  0x000000000085f217 in reraise_fatal (signum=6) at global/signal_handler.cc:58
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:104
#3  <signal handler called>
#4  0x00007ffff6bf2885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#5  0x00007ffff6bf4065 in abort () at abort.c:92
#6  0x0000003be84bea7d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
#7  0x0000003be84bcc06 in ?? () from /usr/lib64/libstdc++.so.6
#8  0x0000003be84bcc33 in std::terminate() () from /usr/lib64/libstdc++.so.6
#9  0x0000003be84bcd2e in __cxa_throw () from /usr/lib64/libstdc++.so.6
#10 0x0000000000755c09 in ceph::__ceph_assert_fail (assertion=0x1438000 "\001", file=0x19fe180 "pC\035\002", line=1996, func=0x88a0e0 "virtual void CInode::auth_unpin(void*)") at common/assert.cc:77
#11 0x00000000006b6a5d in CInode::auth_unpin (this=0x15b29e0, by=0x6d2fd280) at mds/CInode.cc:1996
#12 0x000000000056a114 in Mutation::drop_local_auth_pins (this=0x6d2fd280) at mds/Mutation.cc:93
#13 0x000000000056a899 in Mutation::cleanup (this=0x6d2fd280) at mds/Mutation.cc:165
#14 0x000000000062ba7f in Locker::file_update_finish (this=0x14d5fc0, in=0x15b29e0, mut=0x6d2fd280, share=true, client=..., cap=0x0, ack=0x0) at mds/Locker.cc:1589
#15 0x000000000063c25c in C_Locker_FileUpdate_finish::finish (this=<value optimized out>, r=<value optimized out>) at mds/Locker.cc:1571
#16 0x00000000004a97ea in Context::complete (this=0x71975500, r=<value optimized out>) at ./include/Context.h:41
#17 0x00000000006662dd in finish_contexts (cct=0x1438000, finished=<value optimized out>, result=0) at ./include/Context.h:78
#18 0x00000000006fc7ce in Journaler::_finish_flush (this=0x15509c0, r=<value optimized out>, start=<value optimized out>, stamp=<value optimized out>) at osdc/Journaler.cc:429
#19 0x000000000071e504 in Objecter::handle_osd_op_reply (this=0x1568000, m=0x69023e00) at osdc/Objecter.cc:1486
#20 0x00000000004caf07 in MDS::handle_core_message (this=0x1500b00, m=0x69023e00) at mds/MDS.cc:1733
#21 0x00000000004cafbf in MDS::_dispatch (this=0x1500b00, m=0x69023e00) at mds/MDS.cc:1857
#22 0x00000000004ccaeb in MDS::ms_dispatch (this=0x1500b00, m=0x69023e00) at mds/MDS.cc:1668
#23 0x000000000083da11 in ms_deliver_dispatch (this=0x15000e8) at msg/Messenger.h:566
#24 DispatchQueue::entry (this=0x15000e8) at msg/DispatchQueue.cc:119
#25 0x000000000079a07d in DispatchQueue::DispatchThread::entry (this=<value optimized out>) at msg/DispatchQueue.h:104
#26 0x00007ffff76507f1 in start_thread (arg=0x7ffff36e7700) at pthread_create.c:301
#27 0x00007ffff6ca5ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115


Related issues 1 (0 open1 closed)

Related to CephFS - Bug #4832: mds: failed auth_unpin assertResolvedSage Weil04/26/2013

Actions
Actions #1

Updated by Sage Weil almost 11 years ago

  • Project changed from Ceph to CephFS

probably a dup of #4832?

Actions #2

Updated by Sage Weil almost 11 years ago

  • Priority changed from Normal to High
Actions #3

Updated by Zheng Yan almost 11 years ago

I think uncomment MDS_AUTHPIN_SET in src/mds/mdstypes.h would help

Actions #4

Updated by Sage Weil almost 11 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF