Project

General

Profile

Bug #36093

mds: fix mds damaged due to unexpected journal length

Added by Zhi Zhang 3 months ago. Updated 3 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
Start date:
09/20/2018
Due date:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

Recently we met a rare but serious mds damaged issue during switching/restarting mds frequently. MDS reports damaged due to journaler found log size is less than expected. Since this is production cluster, we have to recalculate write_pos and reset it to activate mds quickly, but this might cause some data loss in some case.

2018-09-10 17:26:54.499445 7fe688cc0180  0 pidfile_write: ignore empty --pid-file
2018-09-10 17:26:54.678165 7fe68249e700  1 mds.xxx handle_mds_map standby
2018-09-10 17:26:54.690824 7fe68249e700  1 mds.0.1900764 handle_mds_map i am now mds.0.1900764
2018-09-10 17:26:54.690828 7fe68249e700  1 mds.0.1900764 handle_mds_map state change up:boot --> up:replay
2018-09-10 17:26:54.690836 7fe68249e700  1 mds.0.1900764 replay_start
2018-09-10 17:26:54.690850 7fe68249e700  1 mds.0.1900764  recovery set is 
2018-09-10 17:26:54.690857 7fe68249e700  1 mds.0.1900764  waiting for osdmap 106280 (which blacklists prior instance)
2018-09-10 17:26:54.800741 7fe67dc95700  0 mds.0.cache creating system inode with ino:100
2018-09-10 17:26:54.800861 7fe67dc95700  0 mds.0.cache creating system inode with ino:1
2018-09-10 17:26:56.465455 7fe67dc95700  0 mds.0.journaler(ro) _finish_read got less than expected (4194304)
2018-09-10 17:26:56.465481 7fe67bc91700  0 mds.0.log _replay journaler got error -22, aborting
2018-09-10 17:26:56.467651 7fe67bc91700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 0: (22) Invalid argument
2018-09-10 17:26:56.471741 7fe67bc91700  1 mds.xxx respawn

We use "cephfs-journal-tool journal inspect" to locate the last log object was damaged (200.00834942). This object size is only 4193284 bytes. Then the write_pos we recalculated from this offset should be 36087669980164, but from "cephfs-journal-tool header get", we get the write_pos is 36087669981184. This means the write_pos in the journal header is larger than the real pos, so mds thinks it is damaged.

It is very difficult to reproduce this issue, but we seems to find out such a scenario that might trigger this issue.

t1:
Journaler::_do_flush
pending_safe[flush_pos_1] -> next_safe_pos_1
filer.write

            t2: 
            Journaler::_do_flush
            pending_safe[flush_pos_2] -> next_safe_pos_2
            (next_safe_pos_2 > next_safe_pos_1)
            filer.write

                        t3:
                        Journaler::_finish_flush (for flush_pos_1)
                        pending_safe.erase (flush_pos_1)
                        pending_safe not empty, safe_pos = next_safe_pos_2 

                                                 t4:
                                                 switch/restart mds
                                                 Journaler::_write_head
                                                 write_pos = safe_pos (next_safe_pos_2)
                                                 (something wrong with flush_pos_2, not commit to osd?)

Related issues

Copied to fs - Backport #36199: mimic: mds: fix mds damaged due to unexpected journal length Resolved
Copied to fs - Backport #36200: luminous: mds: fix mds damaged due to unexpected journal length In Progress

History

#1 Updated by Zhi Zhang 3 months ago

I submit a PR according to above analysis.

https://github.com/ceph/ceph/pull/24194

#2 Updated by Patrick Donnelly 3 months ago

  • Assignee set to Zhi Zhang
  • Target version set to v14.0.0
  • Source set to Community (dev)
  • Backport set to mimic,luminous

#3 Updated by Patrick Donnelly 3 months ago

  • Status changed from New to Need Review

#4 Updated by Patrick Donnelly 3 months ago

  • Status changed from Need Review to Pending Backport

#5 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #36199: mimic: mds: fix mds damaged due to unexpected journal length added

#6 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #36200: luminous: mds: fix mds damaged due to unexpected journal length added

Also available in: Atom PDF