Bug #22523
closedJewel10.2.10 cephfs journal corrupt,later event jump into previous position.
0%
Description
Hi all.
==============================
version: jewel 10.2.10 (professional rpms)
nodes : 3 centos7.3
cephfs : kernel client
pool : meta:3 replicas(2ssd*3), data:2replicas(26HDD*3)
network: 10gbs( 2 *3)
================================
In a enviroument, we have a testing HA (pull out and inserting optical cable).
because of mds status changed, mds replay journal(want from standy to active),
mds throw exception:
throw buffer::malformed_input("Invalid sentinel"); (src/osdc/Journaler.cc:1361)
all mds stop replay journal,and status stopped at standy. fs is not available ls/read/write
=================================
we used cephfs-journal-tool journal inspect found corrupet region.
cephfs-jounnal-tool event get list(add event time print)found the strange pos.:
event time:2017-12-16 03:50:32.543091
event time:2017-12-16 03:50:32.543180
event time:2017-12-16 03:50:32.543296
event time:2017-12-16 03:50:32.543393
event time:2017-12-16 03:50:32.543518
h1. event time:2017-12-16 03:14:44.205316
event time:2017-12-16 03:14:44.206388
event time:2017-12-16 03:14:44.207265
event time:2017-12-16 03:14:44.208103
there are 20 events(2017-12-16 03:50:32.*) before event 2017-12-16 03:14:44.
which should be display after 2017-12-16 03:50:31.*
======================
we erased the corrueption pos journal. mds coredump .
we modify two assert failed(osdmap version),and adjust argument (wip_session).
mds started, and fs is availalbe read and write.
==============
at last ,we changed to previous mds version. fs is ok.
but it seems like output so much dump inodes links.
======
journal and event list please referece to attachment file.
!!!!
!!!!!!
event list file :
h1. https://pan.baidu.com/s/1bo7rlwj
journal file:
h1. https://pan.baidu.com/s/1slV1zGh
Files