Project

General

Profile

Actions

Bug #45080

open

mds slow journal ops then cephfs damaged after restarting an osd host

Added by Dan van der Ster about 4 years ago. Updated about 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We had a cephfs cluster get damaged following a routine osd restart.

  • All MDS's have ms type = simple, which seems to solve a buffer_anon memory leak.
  • There are two active MDSs, with automatic balancing enabled and working well for weeks.
  • We restarted osds on one out of 3 osd hosts. (20 out of 60 osds restarted -- PGs were active throughout).
  • After all OSDs were back up and all PGs were active+clean, there were a few hundred stuck ops on both MDSs, e.g.
    2020-04-14 10:29:54.817599 mon.cephdwight-mon-1633994557 (mon.0) 1591523 : cluster [WRN] Health check failed: 1 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO)
    2020-04-14 10:29:54.833090 mon.cephdwight-mon-1633994557 (mon.0) 1591524 : cluster [DBG] mds.1 v1:137.138.122.17:6800/871421306 up:active
    2020-04-14 10:29:54.833187 mon.cephdwight-mon-1633994557 (mon.0) 1591525 : cluster [DBG] fsmap cephfs:2 {0=cephdwightmds0=up:active,1=cephdwightmds2=up:active} 1 up:standby
    2020-04-14 10:29:53.123282 mds.cephdwightmds0 (mds.0) 137 : cluster [WRN] 2 slow requests, 2 included below; oldest blocked for > 30.240923 secs
    2020-04-14 10:29:53.123300 mds.cephdwightmds0 (mds.0) 138 : cluster [WRN] slow request 30.240922 seconds old, received at 2020-04-14 10:29:22.881996: client_request(client.893240419:110923293 create #0x10039d3aa76/.83f17681-34d5-4a23-a6be-08b6dcfad25e.tmp 2020-04-14 10:29:22.881825 caller_uid=0, caller_gid=0{}) currently submit entry: journal_and_reply
    2020-04-14 10:29:53.123303 mds.cephdwightmds0 (mds.0) 139 : cluster [WRN] slow request 30.220358 seconds old, received at 2020-04-14 10:29:22.902560: client_request(client.893240419:110923299 create #0x10039d3aa76/..83f17681-34d5-4a23-a6be-08b6dcfad25e.tmp.crc 2020-04-14 10:29:22.902402 caller_uid=0, caller_gid=0{}) currently submit entry: journal_and_reply
    
  • Then I restarted mds.1, which asserted during shutdown then the rank was damaged:
2020-04-14 10:37:19.806 7f2f93596700  0 -- v1:137.138.122.17:6800/871421306 >> v1:188.185.72.230:0/1226312550 pipe(0x55fd33ea9400 sd=205 :6800 s=2 pgs=58569 cs=1 l=0 c=0x55fd25f2b520).fault with nothing to send, going to standby
2020-04-14 10:37:20.097 7f2f942a3700  0 -- v1:137.138.122.17:6800/871421306 >> v1:188.184.84.39:0/1767657975 pipe(0x55fd15d1e800 sd=197 :6800 s=2 pgs=405022 cs=1 l=0 c=0x55fd24c89860).fault with nothing to send, going to standby
2020-04-14 10:37:22.125 7f2fb247f700  1 mds.1.77095 shutdown: shutting down rank 1
2020-04-14 10:37:22.127 7f2fabce4700  2 mds.1.cache Memory usage:  total 3199000, rss 1287996, heap 675068, baseline 773372, 11094 / 424727 inodes have caps, 12151 caps, 0.028609 caps per inode
2020-04-14 10:37:22.127 7f2fb11f3700  0 ms_deliver_dispatch: unhandled message 0x55fd128ba280 osd_map(315270..315270 src has 314665..315270) v4 from mon.2 v1:188.185.66.208:6790/0
2020-04-14 10:37:22.127 7f2fb11f3700  0 ms_deliver_dispatch: unhandled message 0x55fd2cebf680 mdsmap(e 77284) v1 from mon.2 v1:188.185.66.208:6790/0
2020-04-14 10:37:22.127 7f2fb11f3700  0 ms_deliver_dispatch: unhandled message 0x55fd9ef25d40 mdsmap(e 77285) v1 from mon.2 v1:188.185.66.208:6790/0
2020-04-14 10:37:22.129 7f2fb11f3700  0 ms_deliver_dispatch: unhandled message 0x55fd9ef24d80 mdsmap(e 77286) v1 from mon.2 v1:188.185.66.208:6790/0
2020-04-14 10:37:22.136 7f2fa9ce0700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/mds/Server.cc: In f
unction 'virtual void C_MDS_unlink_local_finish::finish(int)' thread 7f2fa9ce0700 time 2020-04-14 10:37:22.132033
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/mds/Server.cc: 6810: FAILED ceph_assert(r == 0)

 ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f2fbaf07325]
 2: (()+0x2534ed) [0x7f2fbaf074ed]
 3: (()+0x20611d) [0x55fd0f8de11d]
 4: (MDSContext::complete(int)+0x74) [0x55fd0faa3cd4]
 5: (MDSIOContextBase::complete(int)+0x155) [0x55fd0faa3f15]
 6: (MDSLogContextBase::complete(int)+0x40) [0x55fd0faa4150]
 7: (Finisher::finisher_thread_entry()+0x16f) [0x7f2fbaf9026f]
 8: (()+0x7e65) [0x7f2fb8dc7e65]
 9: (clone()+0x6d) [0x7f2fb7a7588d]

and mds.1 wouldn't restart:
2020-04-14 10:41:49.204 7fc473d64700  1 mds.cephdwightmds2 Updating MDS map to version 77344 from mon.1
2020-04-14 10:41:56.410 7fc473d64700  1 mds.cephdwightmds2 Updating MDS map to version 77345 from mon.1
2020-04-14 10:41:59.955 7fc469a38700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from _is_readable
2020-04-14 10:41:59.955 7fc469a38700  0 mds.1.log _replay journaler got error -22, aborting
2020-04-14 10:41:59.956 7fc469a38700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: (22) Invalid argument
2020-04-14 10:41:59.976 7fc469a38700  1 mds.cephdwightmds2 respawn!

Following the cephfs/disaster-recovery-experts docs, I reset the journals and all sessions then the mds's could go active again. I am now running a recursive scrub.

Logs are all at ceph-post-file: 5d09c35d-eb9f-430c-9936-91ee3ed2ef3e

Any idea why the journal ops got blocked following this routine osd restart ?
Is there a better way to unblock the slow journal ops in this scenario ?

Actions

Also available in: Atom PDF