Project

General

Profile

Actions

Bug #20494

closed

cephfs_data_scan: try_remove_dentries_for_stray assertion failure

Added by Ivan Guan almost 7 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
fsck/damage handling
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
tools
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Using teuthology run data-scan.yaml and when completed test_data_scan.py:test_parallel_execution test case i want to delete all dentries of my mointpoint.The mds core somehow when i run "rm -rf *" under my mointpoint.

the stack is:
#0 0x00007f16a6b3bfcb in raise () from /lib64/libpthread.so.0
#1 0x00007f16a7c610d5 in reraise_fatal (signum=6) at global/signal_handler.cc:71
#2 handle_fatal_signal (signum=6) at global/signal_handler.cc:133
#3 <signal handler called>
#4 0x00007f16a55405f7 in raise () from /lib64/libc.so.6
#5 0x00007f16a5541ce8 in abort () from /lib64/libc.so.6
#6 0x00007f16a7d5b0c7 in ceph::__ceph_assert_fail (assertion=assertion@entry=0x7f16a7eead67 "dn->get_linkage()->is_null()",
file=file@entry=0x7f16a7f00492 "mds/CDir.cc", line=line@entry=697,
func=func@entry=0x7f16a7f02d40 <CDir::try_remove_dentries_for_stray()::__PRETTY_FUNCTION__> "void CDir::try_remove_dentries_for_stray()") at common/assert.cc:78
#7 0x00007f16a7b30860 in CDir::try_remove_dentries_for_stray (this=0x7f16b2904400) at mds/CDir.cc:697
#8 0x00007f16a7ab9b49 in StrayManager::__eval_stray (this=0x7f16b28c0b28, dn=dn@entry=0x7f16b2953c20, delay=<optimized out>) at mds/StrayManager.cc:575
#9 0x00007f16a7ab9fee in StrayManager::eval_stray (this=<optimized out>, dn=0x7f16b2953c20, delay=<optimized out>) at mds/StrayManager.cc:656
#10 0x00007f16a7a05e61 in put (by=-1003, this=0x7f16b2953c20) at mds/mdstypes.h:1521
#11 MutationImpl::drop_pins (this=0x7f16b2ab2d00) at mds/Mutation.cc:53
#12 0x00007f16a7a2bf28 in MDCache::request_cleanup (this=this@entry=0x7f16b28c0000, mdr=std::shared_ptr (count 3, weak 0) 0x7f16b2ab2d00) at mds/MDCache.cc:9003
#13 0x00007f16a7a2c3c1 in MDCache::request_finish (this=0x7f16b28c0000, mdr=std::shared_ptr (count 3, weak 0) 0x7f16b2ab2d00) at mds/MDCache.cc:8853
#14 0x00007f16a79b16a8 in Server::reply_client_request (this=this@entry=0x7f16b27f59d0, mdr=std::shared_ptr (count 3, weak 0) 0x7f16b2ab2d00,
reply=reply@entry=0x7f16b2b84b00) at mds/Server.cc:1210
#15 0x00007f16a79b22a1 in Server::respond_to_request (this=this@entry=0x7f16b27f59d0, mdr=std::shared_ptr (count 3, weak 0) 0x7f16b2ab2d00, r=r@entry=0)
at mds/Server.cc:1040
#16 0x00007f16a79bd4df in Server::_unlink_local_finish (this=0x7f16b27f59d0, mdr=std::shared_ptr (count 3, weak 0) 0x7f16b2ab2d00, dn=0x7f16b2952000,
straydn=0x7f16b2953c20, dnpv=4) at mds/Server.cc:5666
#17 0x00007f16a7bb054b in complete (r=0, this=0x7f16b27c1440) at include/Context.h:64
#18 MDSInternalContextBase::complete (this=0x7f16b27c1440, r=0) at mds/MDSContext.cc:30
#19 0x00007f16a7bb054b in complete (r=0, this=0x7f16b2af8c60) at include/Context.h:64
#20 MDSInternalContextBase::complete (this=0x7f16b2af8c60, r=0) at mds/MDSContext.cc:30
#21 0x00007f16a7bc7463 in C_MDL_Flushed::finish (this=0x7f16b291bd20, r=<optimized out>) at mds/MDLog.cc:350
#22 0x00007f16a7bb0874 in complete (r=0, this=0x7f16b291bd20) at include/Context.h:64
#23 MDSIOContextBase::complete (this=0x7f16b291bd20, r=0) at mds/MDSContext.cc:65
#24 0x00007f16a7c8a856 in Finisher::finisher_thread_entry (this=0x7f16b28b26e0) at common/Finisher.cc:68
#25 0x00007f16a6b34dc5 in start_thread () from /lib64/libpthread.so.0
#26 0x00007f16a560128d in clone () from /lib64/libc.so.6

ceph_version:jewel ceph-v10.2.2

Bug Description:
My mointpoint have one dir and it have 25 files,like
/subdir/0
/subidr/1
/subidr/2
...
/subdir/24

Root cause is that the subdir's fnode.nfiles is 1 when completed cephfs-data-scan.As everyone know all files will be deleted before their parent director 'subdir' be deleted.The emphasis is when deleted the first file subdir's fnode.nfils became to 0,so the 'subdir' directory be deleted deservedly which will lead to mds core.

Solutions:
First:
We can subtract dir->fnoe.nfiles when do 'rodos rmomapkey' and increase it when do cephfs-data-scan correspondingly.

Second:
We don't need subtract dir->fnode,nfiles when do 'rados rmomapkey' but we have to increate it when the file's parent directry‘s omap_header not exist and we create it manually during cephfs-data-scan.

Actions

Also available in: Atom PDF