Project

General

Profile

Actions

Bug #170

closed

null pointer dereference in journal_cow_dentry causes assertion failure

Added by Greg Farnum almost 14 years ago. Updated over 7 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I've seen this a few times today.
Using the latest unstable servers(08afc8df680dc0cd5ad26f3f89152aa25a72b639), and master kclient(558d3499bd059d4534b1f2b69dc1c562acc733fe).
Running standard vstart (so a 3 MDS system.)
Attempting to restart all the MDSes by means of "./init-ceph restart mds" results in more crashed MDSes, for different reasons involving refcounting.

Core was generated by `./cmds i a -c ceph.conf'.
Program terminated with signal 6, Aborted.
#0 0x00007f849c15df45 in GI_raise (sig=<value optimized out>)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) bt
#0 0x00007f849c15df45 in *_GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007f849c160d80 in *
_GI_abort () at abort.c:88
#2 0x00007f849c15708a in *_GI
_assert_fail (assertion=0x6cfa70 "oldfirst == dnl
>get_inode()->first", file=<value optimized out>, line=1352, function=0x6d30a0 "void MDCache::journal_cow_dentry(Mutation, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)") at assert.c:78
#3 0x0000000000526a4d in MDCache::journal_cow_dentry (this=0x2625170, mut=0x7f8471454760, metablob=<value optimized out>, dn=0x53fa990, follows=..., pcow_inode=0x0, dnl=0x0) at mds/MDCache.cc:1352
#4 0x0000000000526be9 in MDCache::journal_dirty_inode (this=0x7eb0, mut=0x7eb3, metablob=0x7f84718d5d18, in=0x7f848eadf290, follows=<value optimized out>) at mds/MDCache.cc:1390
#5 0x000000000057689c in Locker::_do_cap_update (this=0x2620b60, in=0x7f848eadf290, cap=<value optimized out>, dirty=<value optimized out>, follows=..., m=<value optimized out>, ack=0x0) at mds/Locker.cc:2087
#6 0x000000000057808b in Locker::handle_client_caps (this=0x2620b60, m=0x7f847228ae30) at mds/Locker.cc:1796
#7 0x000000000049e995 in MDS::_dispatch (this=0x2622660, m=0x7f847228ae30) at mds/MDS.cc:1427
#8 0x000000000049ee6d in MDS::ms_dispatch (this=0x2622660, m=0x7f847228ae30) at mds/MDS.cc:1281
#9 0x00000000004801d9 in Messenger::ms_deliver_dispatch (this=<value optimized out>) at msg/Messenger.h:97
#10 SimpleMessenger::dispatch_entry (this=<value optimized out>) at msg/SimpleMessenger.cc:332
#11 0x00000000004739dc in SimpleMessenger::DispatchThread::entry (this=0x2623f70) at msg/SimpleMessenger.h:494
#12 0x000000000048523a in Thread::_entry_func (arg=0x7eb0) at ./common/Thread.h:39
#13 0x00007f849cfd173a in start_thread (arg=<value optimized out>) at pthread_create.c:300
#14 0x00007f849c1f769d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#15 0x0000000000000000 in ?? ()
(gdb) up
#1 0x00007f849c160d80 in GI_abort () at abort.c:88
88 abort.c: No such file or directory.
in abort.c
(gdb)
#2 0x00007f849c15708a in *
_GI_assert_fail (assertion=0x6cfa70 "oldfirst == dnl->get_inode()->first", file=<value optimized out>, line=1352, function=0x6d30a0 "void MDCache::journal_cow_dentry(Mutation, EMetaBlob*, CDentry*, snapid_t, CInode**, CDentry::linkage_t*)") at assert.c:78
78 assert.c: No such file or directory.
in assert.c
(gdb)
#3 0x0000000000526a4d in MDCache::journal_cow_dentry (this=0x2625170, mut=0x7f8471454760, metablob=<value optimized out>, dn=0x53fa990, follows=..., pcow_inode=0x0, dnl=0x0) at mds/MDCache.cc:1352
Current language: auto
The current source language is "auto; currently c++".
(gdb) p oldfirst
$1 = <value optimized out>
(gdb) p dnl
$2 = (CDentry::linkage_t *) 0x0
(gdb)

Actions #1

Updated by Sage Weil almost 14 years ago

this is actually a failed assertion, not a null deref. it looks like gdb is having trouble resolving the symbols properly or something, making it look like dnl is null. (and dnl->is_primary() strongly implies dnl->inode is non-null.)

hmm.. do you have matching mds logs for this? i was fiddling with the ->first stuff for replicas yesterday, that may be related in some way.

Actions #2

Updated by Greg Farnum almost 14 years ago

Unfortunately I don't -- on Yehuda's suggestion I recompiled with optimization off and have been trying to reproduce it, but I keep running into other issues or just not hitting any problems so far.
Will keep trying, though!

Actions #3

Updated by Sage Weil almost 14 years ago

  • Status changed from New to Rejected
Actions #4

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF