Actions
Bug #5105
closedmds/CInode.cc: 1996: FAILED assert(auth_pins >= 0)
Status:
Duplicate
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
While trying to reproduce #4999, I collected this in an MDS.
I was running next branch (commit c80c6a032c) merged into wip-4999.
-16> 2013-05-17 15:53:06.697268 7ffff26e5700 10 monclient: tick -15> 2013-05-17 15:53:06.697302 7ffff26e5700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2013-05-17 15:52:36.697300) -14> 2013-05-17 15:53:06.697317 7ffff26e5700 10 monclient: renew subs? (now: 2013-05-17 15:53:06.697317; renew after: 2013-05-17 15:54:16.696127) -- no -13> 2013-05-17 15:53:06.744855 7ffff36e7700 3 mds.0.server handle_client_session client_session(request_renewcaps seq 230) from client.4590 -12> 2013-05-17 15:53:06.746292 7ffff36e7700 3 mds.0.server handle_client_session client_session(request_renewcaps seq 229) from client.4656 -11> 2013-05-17 15:53:06.756144 7ffff36e7700 3 mds.0.server handle_client_session client_session(request_renewcaps seq 229) from client.4644 -10> 2013-05-17 15:53:06.757865 7ffff36e7700 3 mds.0.server handle_client_session client_session(request_renewcaps seq 228) from client.4536 -9> 2013-05-17 15:53:06.765052 7ffff36e7700 3 mds.0.server handle_client_session client_session(request_renewcaps seq 229) from client.4641 -8> 2013-05-17 15:53:06.830053 7ffff36e7700 3 mds.0.server handle_client_session client_session(request_renewcaps seq 229) from client.4593 -7> 2013-05-17 15:53:06.872074 7ffff36e7700 3 mds.0.server handle_client_session client_session(request_renewcaps seq 230) from client.4518 -6> 2013-05-17 15:53:06.923462 7ffff36e7700 3 mds.0.server handle_client_session client_session(request_renewcaps seq 230) from client.4599 -5> 2013-05-17 15:53:07.022330 7ffff36e7700 3 mds.0.server handle_client_session client_session(request_renewcaps seq 229) from client.4572 -4> 2013-05-17 15:53:07.084271 7ffff36e7700 3 mds.0.server handle_client_session client_session(request_renewcaps seq 229) from client.4632 -3> 2013-05-17 15:53:07.188288 7ffff14e1700 10 monclient: _send_mon_message to mon.cs30 at 172.17.131.30:6789/0 -2> 2013-05-17 15:53:09.779806 7ffff14e1700 2 mds.0.cache check_memory_usage total 4358556, rss 2717152, heap 4444, malloc 9336 mmap 0, baseline 8540, buffers 0, max 1048576, 795 / 808 inodes have caps, 1387 caps, 1.71658 caps per inode -1> 2013-05-17 15:53:09.779942 7ffff14e1700 5 mds.0.bal mds.0 epoch 290 load mdsload<[0,0 0]/[0,0 0], req 0, hr 0, qlen 6, cpu 2.27> 0> 2013-05-17 15:53:09.786259 7ffff36e7700 -1 mds/CInode.cc: In function 'virtual void CInode::auth_unpin(void*)' thread 7ffff36e7700 time 2013-05-17 15:53:09.780291 mds/CInode.cc: 1996: FAILED assert(auth_pins >= 0) ceph version 0.62-190-g2265d88 (2265d88dbc5bbafa097b8fc8fb8895493d572fef) 1: (CInode::auth_unpin(void*)+0x59d) [0x6b6a5d] 2: (Mutation::drop_local_auth_pins()+0x34) [0x56a114] 3: (Mutation::cleanup()+0x9) [0x56a899] 4: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, Capability*, MClientCaps*)+0x19f) [0x62ba7f] 5: (C_Locker_FileUpdate_finish::finish(int)+0x2c) [0x63c25c] 6: (Context::complete(int)+0xa) [0x4a97ea] 7: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x9d) [0x6662dd] 8: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x20e) [0x6fc7ce] 9: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe84) [0x71e504] 10: (MDS::handle_core_message(Message*)+0x957) [0x4caf07] 11: (MDS::_dispatch(Message*)+0x2f) [0x4cafbf] 12: (MDS::ms_dispatch(Message*)+0x1fb) [0x4ccaeb] 13: (DispatchQueue::entry()+0x3e1) [0x83da11] 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x79a07d] 15: (()+0x77f1) [0x7ffff76507f1] 16: (clone()+0x6d) [0x7ffff6ca5ccd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 0/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 hadoop 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/ceph/log/mds.cs29.log --- end dump of recent events --- 2013-05-17 15:53:10.015470 7ffff36e7700 -1 *** Caught signal (Aborted) ** in thread 7ffff36e7700 ceph version 0.62-190-g2265d88 (2265d88dbc5bbafa097b8fc8fb8895493d572fef) 1: /usr/bin/ceph-mds() [0x85ede9] 2: (()+0xf4a0) [0x7ffff76584a0] 3: (gsignal()+0x35) [0x7ffff6bf2885] 4: (abort()+0x175) [0x7ffff6bf4065] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3be84bea7d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2013-05-17 15:53:10.015470 7ffff36e7700 -1 *** Caught signal (Aborted) ** in thread 7ffff36e7700 ceph version 0.62-190-g2265d88 (2265d88dbc5bbafa097b8fc8fb8895493d572fef) 1: /usr/bin/ceph-mds() [0x85ede9] 2: (()+0xf4a0) [0x7ffff76584a0] 3: (gsignal()+0x35) [0x7ffff6bf2885] 4: (abort()+0x175) [0x7ffff6bf4065] 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3be84bea7d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 0/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 hadoop 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/ceph/log/mds.cs29.log --- end dump of recent events ---
From gdb:
Core was generated by `/usr/bin/ceph-mds -i cs29 --pid-file /var/run/ceph/mds.cs29.pid -c /etc/ceph/ce'. Program terminated with signal 6, Aborted. #0 0x00007ffff765836b in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42 warning: Source file is more recent than executable. 42 sig); Missing separate debuginfos, use: debuginfo-install gperftools-libs-2.0-3.el6.2.x86_64 libgcc-4.4.6-3.el6.x86_64 libstdc++-4.4.6-3.el6.x86_64 libuuid-2.17.2-12.4.el6.x86_64 nspr-4.9-1.el6.x86_64 nss-3.13.3-8.el6.x86_64 nss-softokn-3.12.9-11.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64 nss-util-3.13.3-2.el6.x86_64 sqlite-3.6.20-1.el6.x86_64 (gdb) bt #0 0x00007ffff765836b in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42 #1 0x000000000085f217 in reraise_fatal (signum=6) at global/signal_handler.cc:58 #2 handle_fatal_signal (signum=6) at global/signal_handler.cc:104 #3 <signal handler called> #4 0x00007ffff6bf2885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #5 0x00007ffff6bf4065 in abort () at abort.c:92 #6 0x0000003be84bea7d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6 #7 0x0000003be84bcc06 in ?? () from /usr/lib64/libstdc++.so.6 #8 0x0000003be84bcc33 in std::terminate() () from /usr/lib64/libstdc++.so.6 #9 0x0000003be84bcd2e in __cxa_throw () from /usr/lib64/libstdc++.so.6 #10 0x0000000000755c09 in ceph::__ceph_assert_fail (assertion=0x1438000 "\001", file=0x19fe180 "pC\035\002", line=1996, func=0x88a0e0 "virtual void CInode::auth_unpin(void*)") at common/assert.cc:77 #11 0x00000000006b6a5d in CInode::auth_unpin (this=0x15b29e0, by=0x6d2fd280) at mds/CInode.cc:1996 #12 0x000000000056a114 in Mutation::drop_local_auth_pins (this=0x6d2fd280) at mds/Mutation.cc:93 #13 0x000000000056a899 in Mutation::cleanup (this=0x6d2fd280) at mds/Mutation.cc:165 #14 0x000000000062ba7f in Locker::file_update_finish (this=0x14d5fc0, in=0x15b29e0, mut=0x6d2fd280, share=true, client=..., cap=0x0, ack=0x0) at mds/Locker.cc:1589 #15 0x000000000063c25c in C_Locker_FileUpdate_finish::finish (this=<value optimized out>, r=<value optimized out>) at mds/Locker.cc:1571 #16 0x00000000004a97ea in Context::complete (this=0x71975500, r=<value optimized out>) at ./include/Context.h:41 #17 0x00000000006662dd in finish_contexts (cct=0x1438000, finished=<value optimized out>, result=0) at ./include/Context.h:78 #18 0x00000000006fc7ce in Journaler::_finish_flush (this=0x15509c0, r=<value optimized out>, start=<value optimized out>, stamp=<value optimized out>) at osdc/Journaler.cc:429 #19 0x000000000071e504 in Objecter::handle_osd_op_reply (this=0x1568000, m=0x69023e00) at osdc/Objecter.cc:1486 #20 0x00000000004caf07 in MDS::handle_core_message (this=0x1500b00, m=0x69023e00) at mds/MDS.cc:1733 #21 0x00000000004cafbf in MDS::_dispatch (this=0x1500b00, m=0x69023e00) at mds/MDS.cc:1857 #22 0x00000000004ccaeb in MDS::ms_dispatch (this=0x1500b00, m=0x69023e00) at mds/MDS.cc:1668 #23 0x000000000083da11 in ms_deliver_dispatch (this=0x15000e8) at msg/Messenger.h:566 #24 DispatchQueue::entry (this=0x15000e8) at msg/DispatchQueue.cc:119 #25 0x000000000079a07d in DispatchQueue::DispatchThread::entry (this=<value optimized out>) at msg/DispatchQueue.h:104 #26 0x00007ffff76507f1 in start_thread (arg=0x7ffff36e7700) at pthread_create.c:301 #27 0x00007ffff6ca5ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Updated by Sage Weil almost 11 years ago
- Project changed from Ceph to CephFS
probably a dup of #4832?
Updated by Zheng Yan almost 11 years ago
I think uncomment MDS_AUTHPIN_SET in src/mds/mdstypes.h would help
Updated by Sage Weil almost 11 years ago
- Status changed from New to Duplicate
Actions