Project

General

Profile

Actions

Bug #56592

open

mds: crash when mounting a client during the scrub repair is going on

Added by Xiubo Li almost 2 years ago. Updated 8 months ago.

Status:
Triaged
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash, scrub
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

13650    -18> 2022-07-18T13:43:18.426+0800 14b9e13ee700  2 mds.0.cache Memory usage:  total 242512, rss 96724, heap 57612, baseline 47372, 0 / 14 inode      s have caps, 0 caps, 0 caps per inode
13651    -17> 2022-07-18T13:43:18.678+0800 14b9e23f6700  5 mds.c ms_handle_reset on v1:10.72.47.117:0/707352342
13652    -16> 2022-07-18T13:43:18.678+0800 14b9e23f6700  3 mds.c ms_handle_reset closing connection for session client.5694 v1:10.72.47.117:0/707352342
13653    -15> 2022-07-18T13:43:18.728+0800 14b9e2ffc700  1 mds.c asok_command: scrub start {path=/,prefix=scrub start,scrubops=[recursive,repair]} (sta      rting...)
13654    -14> 2022-07-18T13:43:18.729+0800 14b9e0bea700  1 lockdep reusing last freed id 89
13655    -13> 2022-07-18T13:43:18.729+0800 14b9e0bea700  0 log_channel(cluster) log [INF] : scrub queued for path: /
13656    -12> 2022-07-18T13:43:18.729+0800 14b9e0bea700  0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [/]
13657    -11> 2022-07-18T13:43:18.729+0800 14b9e0bea700  0 log_channel(cluster) log [INF] : scrub summary: active paths [/]
13658    -10> 2022-07-18T13:43:18.730+0800 14b9e0bea700  0 log_channel(cluster) log [WRN] : bad backtrace on inode 0x10000000000(/mydir), rewriting it
13659     -9> 2022-07-18T13:43:18.730+0800 14b9e0bea700  0 log_channel(cluster) log [INF] : Scrub repaired inode 0x10000000000 (/mydir)
13660     -8> 2022-07-18T13:43:18.730+0800 14b9e0bea700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x10000000000 [...2,head]       /mydir/ auth v4 pv7 ap=3 DIRTYPARENT f(v1 1=1+0) n() (inest lock dirty) (ifile lock->sync w=1 flushing) (iversion lock) | dirtyscattered=2 lock=1       dirfrag=1 dirtyrstat=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 scrubqueue=0 0x55dc6ca32680]: {"performed_validation":true,"passed_validation":f      alse,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(4)0x10000000000:[<0x1/mydir v4      >]//[]","error_str":"failed to read off disk; see retval"},"raw_stats":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v      0 1=1+0)","ondisk_value.rstat":"n(v0 1=0+1)","memory_value.dirstat":"f(v1 1=1+0)","memory_value.rstat":"n()","error_str":"freshly-calculated rsta      ts don't match existing ones (will be fixed)"},"return_code":-61}
13661     -7> 2022-07-18T13:43:18.731+0800 14b9e07e8700  5 mds.0.log _submit_thread 4200668~1381 : EUpdate scatter_writebehind [metablob 0x1, 2 dirs]
13662     -6> 2022-07-18T13:43:18.731+0800 14b9e07e8700  5 mds.0.log _submit_thread 4202069~872 : ESubtreeMap 2 subtrees , 0 ambiguous [metablob 0x1, 2       dirs]
13663     -5> 2022-07-18T13:43:18.731+0800 14b9e23f6700  5 mds.c ms_handle_reset on v1:10.72.47.117:0/2723416997
13664     -4> 2022-07-18T13:43:18.731+0800 14b9e23f6700  3 mds.c ms_handle_reset closing connection for session client.5650 v1:10.72.47.117:0/272341699      7
13665     -3> 2022-07-18T13:43:18.731+0800 14b9e11ed700  1 mds.0.cache.dir(0x10000000000) mismatch between head items and fnode.fragstat! printing dent      ries
13666     -2> 2022-07-18T13:43:18.731+0800 14b9e11ed700  1 mds.0.cache.dir(0x10000000000) get_num_head_items() = 2; fnode.fragstat.nfiles=1 fnode.frags      tat.nsubdirs=0
13667     -1> 2022-07-18T13:43:18.736+0800 14b9e11ed700 -1 /data/ceph/src/mds/ScrubStack.cc: In function 'void ScrubStack::dequeue(MDSCacheObject*)' th      read 14b9e11ed700 time 2022-07-18T13:43:18.733038+0800
13668 /data/ceph/src/mds/ScrubStack.cc: 57: FAILED ceph_assert(obj->item_scrub.is_on_list())
13669 
13670  ceph version 17.0.0-13587-gdcc92e07b25 (dcc92e07b2557170293e55675763614717c12d98) quincy (dev)
13671  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12e) [0x14b9ea40c207]
13672  2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x14b9ea40c439]
13673  3: (ScrubStack::dequeue(MDSCacheObject*)+0x1cc) [0x55dc6b140cee]
13674  4: (ScrubStack::kick_off_scrubs()+0x68b) [0x55dc6b136cd5]
13675  5: (ScrubStack::remove_from_waiting(MDSCacheObject*, bool)+0xbc) [0x55dc6b137280]
13676  6: (C_RetryScrub::finish(int)+0x19) [0x55dc6b140d79]
13677  7: (MDSContext::complete(int)+0x6dc) [0x55dc6b1740b6]
13678  8: (MDSRank::_advance_queues()+0x386) [0x55dc6ae0c460]
13679  9: (MDSRank::ProgressThread::entry()+0xe19) [0x55dc6ae0d63f]
13680  10: (Thread::entry_wrapper()+0x3f) [0x14b9ea3dec29]
13681  11: (Thread::_entry_func(void*)+0x9) [0x14b9ea3dec41]
13682  12: /lib64/libpthread.so.0(+0x817a) [0x14b9e860817a]
13683  13: clone()
13684 
13685      0> 2022-07-18T13:43:18.741+0800 14b9e11ed700 -1 *** Caught signal (Aborted) **
13686  in thread 14b9e11ed700 thread_name:mds_rank_progr
13687 
Actions

Also available in: Atom PDF