Project

General

Profile

Bug #57048

osdc/Journaler: better handle ENOENT during replay as up:standby-replay

Added by Patrick Donnelly 4 months ago. Updated 4 months ago.

Status:
New
Priority:
High
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Support
Tags:
Backport:
quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

   -15> 2022-07-29T13:23:34.738+0000 7f1ee3d5a700  1 mds.21387370.journaler.mdlog(ro) recover start
   -14> 2022-07-29T13:23:34.738+0000 7f1ee3d5a700  1 mds.21387370.journaler.mdlog(ro) read_head
   -13> 2022-07-29T13:23:34.738+0000 7f1ee3d5a700  4 mds.0.log Waiting for journal 0x200 to recover...
   -12> 2022-07-29T13:23:34.742+0000 7f1ee455b700  1 mds.21387370.journaler.mdlog(ro) _finish_read_head loghead(trim 7788696698880, expire 7788721262080, write 7789114335530, stream_format 1).  probing for end of log (from 7789114335530)...
   -11> 2022-07-29T13:23:34.742+0000 7f1ee455b700  1 mds.21387370.journaler.mdlog(ro) probing for end of the log
   -10> 2022-07-29T13:23:34.742+0000 7f1eed56d700 10 monclient: get_auth_request con 0x55c5deab7000 auth_method 0
    -9> 2022-07-29T13:23:34.755+0000 7f1ee455b700  1 mds.21387370.journaler.mdlog(ro) _finish_probe_end write_pos = 7789125276317 (header had 7789114335530). recovered.
    -8> 2022-07-29T13:23:34.755+0000 7f1ee3d5a700  4 mds.0.log Journal 0x200 recovered.
    -7> 2022-07-29T13:23:34.755+0000 7f1ee3d5a700  4 mds.0.log Recovered journal 0x200 in format 1
    -6> 2022-07-29T13:23:34.755+0000 7f1ee3d5a700  2 mds.0.0 Booting: 1: loading/discovering base inodes
    -5> 2022-07-29T13:23:34.755+0000 7f1ee3d5a700  0 mds.0.cache creating system inode with ino:0x100
    -4> 2022-07-29T13:23:34.755+0000 7f1ee3d5a700  0 mds.0.cache creating system inode with ino:0x1
    -3> 2022-07-29T13:23:34.757+0000 7f1ee455b700  2 mds.0.0 Booting: 2: replaying mds log
    -2> 2022-07-29T13:23:34.798+0000 7f1ee455b700  0 mds.21387370.journaler.mdlog(ro) _finish_read got error -2
    -1> 2022-07-29T13:23:34.800+0000 7f1ee2d58700 -1 /builddir/build/BUILD/ceph-16.2.0/src/mds/MDLog.cc: In function 'void MDLog::_replay_thread()' thread 7f1ee2d58700 time 2022-07-29T13:23:34.799865+0000
/builddir/build/BUILD/ceph-16.2.0/src/mds/MDLog.cc: 1383: FAILED ceph_assert(journaler->is_readable() || mds->is_daemon_stopping())

 ceph version 16.2.0-152.el8cp (e456e8b705cb2f4a779689a0d80b122bcb0d67c9) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f1ef2fc2b60]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x274d7a) [0x7f1ef2fc2d7a]
 3: (MDLog::_replay_thread()+0x1d7c) [0x55c5dc72e2ec]
 4: (MDLog::ReplayThread::entry()+0x11) [0x55c5dc430101]
 5: /lib64/libpthread.so.0(+0x817a) [0x7f1ef1d6317a]
 6: clone()

     0> 2022-07-29T13:23:34.801+0000 7f1ee2d58700 -1 *** Caught signal (Aborted) **
 in thread 7f1ee2d58700 thread_name:md_log_replay

 ceph version 16.2.0-152.el8cp (e456e8b705cb2f4a779689a0d80b122bcb0d67c9) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12c20) [0x7f1ef1d6dc20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f1ef2fc2bb1]
 5: /usr/lib64/ceph/libceph-common.so.2(+0x274d7a) [0x7f1ef2fc2d7a]
 6: (MDLog::_replay_thread()+0x1d7c) [0x55c5dc72e2ec]
 7: (MDLog::ReplayThread::entry()+0x11) [0x55c5dc430101]
 8: /lib64/libpthread.so.0(+0x817a) [0x7f1ef1d6317a]
 9: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Journaler should not cause the MDS to assert in this situation. We should handle this more gracefully.

History

#1 Updated by Venky Shankar 4 months ago

Patrick,

Do you mean a standby-replay MDS should tolerate missing journal objects? How can it end up in such a situation. (I recall you mentioning this in a BZ somewhere).

#2 Updated by Patrick Donnelly 4 months ago

Venky Shankar wrote:

Patrick,

Do you mean a standby-replay MDS should tolerate missing journal objects?

Well, it supposed to already:

https://github.com/ceph/ceph/blob/5ab5ff46d6d799c46d8a1c01cc08068a22e4c9b5/src/mds/MDLog.cc#L1320-L1325

Note, ENOENT (-2) got converted to EAGAIN. However, I don't see why the assert on 1381 gets triggered. Maybe the assert is a different bug?

How can it end up in such a situation. (I recall you mentioning this in a BZ somewhere).

It can happen if the standby-replay daemon gets behind the MDS it's following. In that case, the only thing to do is reboot the standby-replay daemon because it cannot know what the missing journal segments contained. It should do that gracefully in a way which does not freak out the cluster admin. But a cluster INFO/WRN would be useful to record the event in case there is a pattern.

#3 Updated by Greg Farnum 4 months ago

Patrick Donnelly wrote:

Venky Shankar wrote:

Patrick,

Do you mean a standby-replay MDS should tolerate missing journal objects?

Well, it supposed to already:

https://github.com/ceph/ceph/blob/5ab5ff46d6d799c46d8a1c01cc08068a22e4c9b5/src/mds/MDLog.cc#L1320-L1325

Note, ENOENT (-2) got converted to EAGAIN. However, I don't see why the assert on 1381 gets triggered. Maybe the assert is a different bug?

is_readable() is guaranteed to be false if we've hit the block at 1322 because of:

if (error != 0) {
return false;

I've seen plenty of instances of standby-replay MDSes respawning when they fall behind the journal, so there are some working code paths, but this doesn't seem like one of them to me.

#4 Updated by Patrick Donnelly 4 months ago

Greg Farnum wrote:

Patrick Donnelly wrote:

Venky Shankar wrote:

Patrick,

Do you mean a standby-replay MDS should tolerate missing journal objects?

Well, it supposed to already:

https://github.com/ceph/ceph/blob/5ab5ff46d6d799c46d8a1c01cc08068a22e4c9b5/src/mds/MDLog.cc#L1320-L1325

Note, ENOENT (-2) got converted to EAGAIN. However, I don't see why the assert on 1381 gets triggered. Maybe the assert is a different bug?

is_readable() is guaranteed to be false if we've hit the block at 1322 because of:

if (error != 0) {
return false;

If block 1322 is run then 1381 should be unreachable because:

https://github.com/ceph/ceph/blob/5ab5ff46d6d799c46d8a1c01cc08068a22e4c9b5/src/mds/MDLog.cc#L1374

Also available in: Atom PDF