Actions
Bug #45080
openmds slow journal ops then cephfs damaged after restarting an osd host
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
ceph-qa-suite:
fs
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We had a cephfs cluster get damaged following a routine osd restart.
- All MDS's have ms type = simple, which seems to solve a buffer_anon memory leak.
- There are two active MDSs, with automatic balancing enabled and working well for weeks.
- We restarted osds on one out of 3 osd hosts. (20 out of 60 osds restarted -- PGs were active throughout).
- After all OSDs were back up and all PGs were active+clean, there were a few hundred stuck ops on both MDSs, e.g.
2020-04-14 10:29:54.817599 mon.cephdwight-mon-1633994557 (mon.0) 1591523 : cluster [WRN] Health check failed: 1 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO) 2020-04-14 10:29:54.833090 mon.cephdwight-mon-1633994557 (mon.0) 1591524 : cluster [DBG] mds.1 v1:137.138.122.17:6800/871421306 up:active 2020-04-14 10:29:54.833187 mon.cephdwight-mon-1633994557 (mon.0) 1591525 : cluster [DBG] fsmap cephfs:2 {0=cephdwightmds0=up:active,1=cephdwightmds2=up:active} 1 up:standby 2020-04-14 10:29:53.123282 mds.cephdwightmds0 (mds.0) 137 : cluster [WRN] 2 slow requests, 2 included below; oldest blocked for > 30.240923 secs 2020-04-14 10:29:53.123300 mds.cephdwightmds0 (mds.0) 138 : cluster [WRN] slow request 30.240922 seconds old, received at 2020-04-14 10:29:22.881996: client_request(client.893240419:110923293 create #0x10039d3aa76/.83f17681-34d5-4a23-a6be-08b6dcfad25e.tmp 2020-04-14 10:29:22.881825 caller_uid=0, caller_gid=0{}) currently submit entry: journal_and_reply 2020-04-14 10:29:53.123303 mds.cephdwightmds0 (mds.0) 139 : cluster [WRN] slow request 30.220358 seconds old, received at 2020-04-14 10:29:22.902560: client_request(client.893240419:110923299 create #0x10039d3aa76/..83f17681-34d5-4a23-a6be-08b6dcfad25e.tmp.crc 2020-04-14 10:29:22.902402 caller_uid=0, caller_gid=0{}) currently submit entry: journal_and_reply
- Then I restarted mds.1, which asserted during shutdown then the rank was damaged:
2020-04-14 10:37:19.806 7f2f93596700 0 -- v1:137.138.122.17:6800/871421306 >> v1:188.185.72.230:0/1226312550 pipe(0x55fd33ea9400 sd=205 :6800 s=2 pgs=58569 cs=1 l=0 c=0x55fd25f2b520).fault with nothing to send, going to standby 2020-04-14 10:37:20.097 7f2f942a3700 0 -- v1:137.138.122.17:6800/871421306 >> v1:188.184.84.39:0/1767657975 pipe(0x55fd15d1e800 sd=197 :6800 s=2 pgs=405022 cs=1 l=0 c=0x55fd24c89860).fault with nothing to send, going to standby 2020-04-14 10:37:22.125 7f2fb247f700 1 mds.1.77095 shutdown: shutting down rank 1 2020-04-14 10:37:22.127 7f2fabce4700 2 mds.1.cache Memory usage: total 3199000, rss 1287996, heap 675068, baseline 773372, 11094 / 424727 inodes have caps, 12151 caps, 0.028609 caps per inode 2020-04-14 10:37:22.127 7f2fb11f3700 0 ms_deliver_dispatch: unhandled message 0x55fd128ba280 osd_map(315270..315270 src has 314665..315270) v4 from mon.2 v1:188.185.66.208:6790/0 2020-04-14 10:37:22.127 7f2fb11f3700 0 ms_deliver_dispatch: unhandled message 0x55fd2cebf680 mdsmap(e 77284) v1 from mon.2 v1:188.185.66.208:6790/0 2020-04-14 10:37:22.127 7f2fb11f3700 0 ms_deliver_dispatch: unhandled message 0x55fd9ef25d40 mdsmap(e 77285) v1 from mon.2 v1:188.185.66.208:6790/0 2020-04-14 10:37:22.129 7f2fb11f3700 0 ms_deliver_dispatch: unhandled message 0x55fd9ef24d80 mdsmap(e 77286) v1 from mon.2 v1:188.185.66.208:6790/0 2020-04-14 10:37:22.136 7f2fa9ce0700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/mds/Server.cc: In f unction 'virtual void C_MDS_unlink_local_finish::finish(int)' thread 7f2fa9ce0700 time 2020-04-14 10:37:22.132033 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/mds/Server.cc: 6810: FAILED ceph_assert(r == 0) ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f2fbaf07325] 2: (()+0x2534ed) [0x7f2fbaf074ed] 3: (()+0x20611d) [0x55fd0f8de11d] 4: (MDSContext::complete(int)+0x74) [0x55fd0faa3cd4] 5: (MDSIOContextBase::complete(int)+0x155) [0x55fd0faa3f15] 6: (MDSLogContextBase::complete(int)+0x40) [0x55fd0faa4150] 7: (Finisher::finisher_thread_entry()+0x16f) [0x7f2fbaf9026f] 8: (()+0x7e65) [0x7f2fb8dc7e65] 9: (clone()+0x6d) [0x7f2fb7a7588d]
and mds.1 wouldn't restart:
2020-04-14 10:41:49.204 7fc473d64700 1 mds.cephdwightmds2 Updating MDS map to version 77344 from mon.1 2020-04-14 10:41:56.410 7fc473d64700 1 mds.cephdwightmds2 Updating MDS map to version 77345 from mon.1 2020-04-14 10:41:59.955 7fc469a38700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from _is_readable 2020-04-14 10:41:59.955 7fc469a38700 0 mds.1.log _replay journaler got error -22, aborting 2020-04-14 10:41:59.956 7fc469a38700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: (22) Invalid argument 2020-04-14 10:41:59.976 7fc469a38700 1 mds.cephdwightmds2 respawn!
Following the cephfs/disaster-recovery-experts docs, I reset the journals and all sessions then the mds's could go active again. I am now running a recursive scrub.
Logs are all at ceph-post-file: 5d09c35d-eb9f-430c-9936-91ee3ed2ef3e
Any idea why the journal ops got blocked following this routine osd restart ?
Is there a better way to unblock the slow journal ops in this scenario ?
Actions