Project

General

Profile

Actions

Bug #51178

open

MDS became read-only while using rsync to copy files

Added by Jérôme Poulin almost 3 years ago. Updated almost 3 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Crash info on rank.1 before the FS became read-only.

{
    "os_version_id": "18.04", 
    "utsname_release": "4.15.0-142-generic", 
    "os_name": "Ubuntu", 
    "entity_name": "mds.sg1vosrv46", 
    "timestamp": "2021-06-11 14:34:14.603569Z", 
    "process_name": "ceph-mds", 
    "utsname_machine": "x86_64", 
    "utsname_sysname": "Linux", 
    "os_version": "18.04.5 LTS (Bionic Beaver)", 
    "os_id": "ubuntu", 
    "utsname_version": "#146-Ubuntu SMP Tue Apr 13 01:11:19 UTC 2021", 
    "backtrace": [
        "(()+0x12980) [0x7fb52da1d980]", 
        "(gsignal()+0xc7) [0x7fb52cb15fb7]", 
        "(abort()+0x141) [0x7fb52cb17921]", 
        "(()+0x8c957) [0x7fb52d50a957]", 
        "(()+0x92ae6) [0x7fb52d510ae6]", 
        "(()+0x92b21) [0x7fb52d510b21]", 
        "(()+0x92d54) [0x7fb52d510d54]", 
        "(Capability::Import::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x2d5) [0x55708d27dc75]", 
        "(Server::_commit_slave_rename(boost::intrusive_ptr<MDRequestImpl>&, int, CDentry*, CDentry*, CDentry*)+0x7c3) [0x55708d013163]", 
        "(MDSContext::complete(int)+0x73) [0x55708d235993]", 
        "(MDCache::request_finish(boost::intrusive_ptr<MDRequestImpl>&)+0x1c3) [0x55708d085a13]", 
        "(Server::dispatch_slave_request(boost::intrusive_ptr<MDRequestImpl>&)+0xd5) [0x55708d022d85]", 
        "(Server::handle_slave_request(boost::intrusive_ptr<MMDSSlaveRequest const> const&)+0x9f0) [0x55708d025bb0]", 
        "(Server::dispatch(boost::intrusive_ptr<Message const> const&)+0x82) [0x55708d026492]", 
        "(MDSRank::handle_message(boost::intrusive_ptr<Message const> const&)+0x72c) [0x55708cf8c93c]", 
        "(MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x4b3) [0x55708cf8efe3]", 
        "(MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0xb0) [0x55708cf8f870]", 
        "(MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xfc) [0x55708cf7bc9c]", 
        "(DispatchQueue::entry()+0x1219) [0x7fb52e32e1d9]", 
        "(DispatchQueue::DispatchThread::entry()+0xd) [0x7fb52e3df08d]", 
        "(()+0x76db) [0x7fb52da126db]", 
        "(clone()+0x3f) [0x7fb52cbf871f]" 
    ], 
    "utsname_hostname": "sg1vosrv46", 
    "crash_id": "2021-06-11_14:34:14.603569Z_9c0aad25-7231-43da-97f7-ab7f3aff54c6", 
    "ceph_version": "14.2.21" 
}

I had to run recover_dentries and reset the journal, sessions and FS before it came back online.

root@sg1vosrv43:~# cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
2021-06-11 11:54:03.240 7f9e10e07c40  1 recover_dentries: frag 1000001bcb4.00000000 is corrupt, overwriting
2021-06-11 11:55:45.188 7f9e10e07c40  1 recover_dentries: frag 1000001bcb6.00000000 is corrupt, overwriting
2021-06-11 12:01:34.220 7f9e10e07c40  1 recover_dentries: frag 1000001f566.00000000 is corrupt, overwriting
Events by type:
  COMMITTED: 412
  EXPORT: 29
  FRAGMENT: 3
  IMPORTFINISH: 35
  IMPORTSTART: 35
  OPEN: 33
  SESSION: 3
  SESSIONS: 2
  SLAVEUPDATE: 782
  SUBTREEMAP: 129
  UPDATE: 116151
Errors: 0

root@sg1vosrv43:~# cephfs-journal-tool --rank=cephfs:1 event recover_dentries summary
Events by type:
  COMMITTED: 393
  EXPORT: 66
  IMPORTFINISH: 70
  IMPORTSTART: 70
  OPEN: 15
  SESSION: 12
  SESSIONS: 3
  SLAVEUPDATE: 826
  SUBTREEMAP: 40
  UPDATE: 31545
Errors: 0

MDS log file with ino errors attached.


Files

ceph-mds.sg1vosrv43.log.zst (8.15 KB) ceph-mds.sg1vosrv43.log.zst Jérôme Poulin, 06/11/2021 04:35 PM
ceph-mds.sg1vosrv43.log (287 KB) ceph-mds.sg1vosrv43.log Jérôme Poulin, 06/14/2021 02:02 PM
ceph-mds.sg1vosrv43-second-crash.log.zip (467 KB) ceph-mds.sg1vosrv43-second-crash.log.zip Jérôme Poulin, 06/14/2021 03:34 PM
Actions

Also available in: Atom PDF