Actions
Bug #56592
openmds: crash when mounting a client during the scrub repair is going on
% Done:
0%
Source:
Tags:
Backport:
quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash, scrub
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
13650 -18> 2022-07-18T13:43:18.426+0800 14b9e13ee700 2 mds.0.cache Memory usage: total 242512, rss 96724, heap 57612, baseline 47372, 0 / 14 inode s have caps, 0 caps, 0 caps per inode 13651 -17> 2022-07-18T13:43:18.678+0800 14b9e23f6700 5 mds.c ms_handle_reset on v1:10.72.47.117:0/707352342 13652 -16> 2022-07-18T13:43:18.678+0800 14b9e23f6700 3 mds.c ms_handle_reset closing connection for session client.5694 v1:10.72.47.117:0/707352342 13653 -15> 2022-07-18T13:43:18.728+0800 14b9e2ffc700 1 mds.c asok_command: scrub start {path=/,prefix=scrub start,scrubops=[recursive,repair]} (sta rting...) 13654 -14> 2022-07-18T13:43:18.729+0800 14b9e0bea700 1 lockdep reusing last freed id 89 13655 -13> 2022-07-18T13:43:18.729+0800 14b9e0bea700 0 log_channel(cluster) log [INF] : scrub queued for path: / 13656 -12> 2022-07-18T13:43:18.729+0800 14b9e0bea700 0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [/] 13657 -11> 2022-07-18T13:43:18.729+0800 14b9e0bea700 0 log_channel(cluster) log [INF] : scrub summary: active paths [/] 13658 -10> 2022-07-18T13:43:18.730+0800 14b9e0bea700 0 log_channel(cluster) log [WRN] : bad backtrace on inode 0x10000000000(/mydir), rewriting it 13659 -9> 2022-07-18T13:43:18.730+0800 14b9e0bea700 0 log_channel(cluster) log [INF] : Scrub repaired inode 0x10000000000 (/mydir) 13660 -8> 2022-07-18T13:43:18.730+0800 14b9e0bea700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x10000000000 [...2,head] /mydir/ auth v4 pv7 ap=3 DIRTYPARENT f(v1 1=1+0) n() (inest lock dirty) (ifile lock->sync w=1 flushing) (iversion lock) | dirtyscattered=2 lock=1 dirfrag=1 dirtyrstat=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 scrubqueue=0 0x55dc6ca32680]: {"performed_validation":true,"passed_validation":f alse,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(4)0x10000000000:[<0x1/mydir v4 >]//[]","error_str":"failed to read off disk; see retval"},"raw_stats":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v 0 1=1+0)","ondisk_value.rstat":"n(v0 1=0+1)","memory_value.dirstat":"f(v1 1=1+0)","memory_value.rstat":"n()","error_str":"freshly-calculated rsta ts don't match existing ones (will be fixed)"},"return_code":-61} 13661 -7> 2022-07-18T13:43:18.731+0800 14b9e07e8700 5 mds.0.log _submit_thread 4200668~1381 : EUpdate scatter_writebehind [metablob 0x1, 2 dirs] 13662 -6> 2022-07-18T13:43:18.731+0800 14b9e07e8700 5 mds.0.log _submit_thread 4202069~872 : ESubtreeMap 2 subtrees , 0 ambiguous [metablob 0x1, 2 dirs] 13663 -5> 2022-07-18T13:43:18.731+0800 14b9e23f6700 5 mds.c ms_handle_reset on v1:10.72.47.117:0/2723416997 13664 -4> 2022-07-18T13:43:18.731+0800 14b9e23f6700 3 mds.c ms_handle_reset closing connection for session client.5650 v1:10.72.47.117:0/272341699 7 13665 -3> 2022-07-18T13:43:18.731+0800 14b9e11ed700 1 mds.0.cache.dir(0x10000000000) mismatch between head items and fnode.fragstat! printing dent ries 13666 -2> 2022-07-18T13:43:18.731+0800 14b9e11ed700 1 mds.0.cache.dir(0x10000000000) get_num_head_items() = 2; fnode.fragstat.nfiles=1 fnode.frags tat.nsubdirs=0 13667 -1> 2022-07-18T13:43:18.736+0800 14b9e11ed700 -1 /data/ceph/src/mds/ScrubStack.cc: In function 'void ScrubStack::dequeue(MDSCacheObject*)' th read 14b9e11ed700 time 2022-07-18T13:43:18.733038+0800 13668 /data/ceph/src/mds/ScrubStack.cc: 57: FAILED ceph_assert(obj->item_scrub.is_on_list()) 13669 13670 ceph version 17.0.0-13587-gdcc92e07b25 (dcc92e07b2557170293e55675763614717c12d98) quincy (dev) 13671 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12e) [0x14b9ea40c207] 13672 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x14b9ea40c439] 13673 3: (ScrubStack::dequeue(MDSCacheObject*)+0x1cc) [0x55dc6b140cee] 13674 4: (ScrubStack::kick_off_scrubs()+0x68b) [0x55dc6b136cd5] 13675 5: (ScrubStack::remove_from_waiting(MDSCacheObject*, bool)+0xbc) [0x55dc6b137280] 13676 6: (C_RetryScrub::finish(int)+0x19) [0x55dc6b140d79] 13677 7: (MDSContext::complete(int)+0x6dc) [0x55dc6b1740b6] 13678 8: (MDSRank::_advance_queues()+0x386) [0x55dc6ae0c460] 13679 9: (MDSRank::ProgressThread::entry()+0xe19) [0x55dc6ae0d63f] 13680 10: (Thread::entry_wrapper()+0x3f) [0x14b9ea3dec29] 13681 11: (Thread::_entry_func(void*)+0x9) [0x14b9ea3dec41] 13682 12: /lib64/libpthread.so.0(+0x817a) [0x14b9e860817a] 13683 13: clone() 13684 13685 0> 2022-07-18T13:43:18.741+0800 14b9e11ed700 -1 *** Caught signal (Aborted) ** 13686 in thread 14b9e11ed700 thread_name:mds_rank_progr 13687
Updated by Venky Shankar over 1 year ago
Xiubo,
Were you trying to mount /mydir when it was getting repaired?
Updated by Xiubo Li over 1 year ago
Venky Shankar wrote:
Xiubo,
Were you trying to mount /mydir when it was getting repaired?
No, I was just trying to mount the */* directory.
Updated by Xiubo Li over 1 year ago
More info:
I was just simulating the cu case we hit by just removing one object of the directory from the metadata pool, and then run scrub command to repair it and during that just try to mount it.
Updated by Venky Shankar over 1 year ago
- Category set to Correctness/Safety
- Status changed from New to Triaged
- Assignee set to Xiubo Li
- Target version set to v18.0.0
Actions