Project

General

Profile

Bug #59119

mds: segmentation fault during replay of snaptable updates

Added by Patrick Donnelly 11 months ago. Updated 3 months ago.

Status:
New
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
quincy,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

For a standby-replay daemon:

023-03-17T00:50:01.288+0000 7f7d86761700 10 mds.0.log _replay 4199266~121 / 4200313 2023-03-17T00:50:00.527030+0000: ETableServer snaptable prepare reqid 2 mds.0 tid 2 version 2 mutation=57 bytes
2023-03-17T00:50:01.288+0000 7f7d86761700 10 mds.0.journal  ETableServer.replay snaptable prepare event 2 - 1 == table 1
2023-03-17T00:50:01.288+0000 7f7d86761700 10 mds.0.snap prepare v2 create snap(2 0x1 'scheduled-2023-03-17-00_50_00_UTC' 2023-03-17T00:50:00.526971+0000)
2023-03-17T00:50:01.288+0000 7f7d86761700 10 mds.0.log _replay 4199407~886 / 4200313 2023-03-17T00:50:00.529078+0000: EUpdate mksnap [metablob table_tids=[^A,2]]
2023-03-17T00:50:01.288+0000 7f7d86761700 10 mds.0.journal EMetaBlob.replay 0 dirlumps by unknown.0
2023-03-17T00:50:01.288+0000 7f7d86761700 20 mds.0.cache.ino(0x1) decode_snap_blob snaprealm(0x1 seq 2 lc 2 cr 2 cps 1 snaps={2=snap(2 0x1 'scheduled-2023-03-17-00_50_00_UTC' 2023-03-17T00:50:00.526288+0000)} last_modified 2023-03-17T00:50:00.526288+0000 change_attr 1 0x560d0156a480)
2023-03-17T00:50:01.288+0000 7f7d86761700 10 mds.0.journal EMetaBlob.replay  updated root [inode 0x1 [...2,head] / auth v4 snaprealm=0x560d0156a480 f(v0 m2023-03-17T00:49:55.917230+0000 1=0+1) n(v0 rc2023-03-17T00:50:00.526288+0000 rs1 2=0+2)/n(v0 rc2023-03-17T00:28:01.024379+0000 1=0+1) (iversion lock) | dirfrag=1 dirty=1 0x560d01567600]
2023-03-17T00:50:01.288+0000 7f7d86761700 10 mds.0.journal EMetaBlob.replay noting snaptable transaction 2
2023-03-17T00:50:01.288+0000 7f7d86761700 10 mds.0.tableclient(snaptable) got_journaled_agree 2
2023-03-17T00:50:01.288+0000 7f7d86761700 10 mds.0.snapclient notify_commit tid 2
...
2023-03-17T00:50:03.292+0000 7f7d86761700 10 mds.0.log _replay 4200313~64 / 4203576 2023-03-17T00:50:00.531089+0000: ETableServer snaptable commit tid 2 version 3
2023-03-17T00:50:03.292+0000 7f7d86761700 10 mds.0.journal  ETableServer.replay snaptable commit event 3 - 1 == table 2
2023-03-17T00:50:03.292+0000 7f7d86761700  7 mds.0.snap commit 2 create snap(2 0x1 'scheduled-2023-03-17-00_50_00_UTC' 2023-03-17T00:50:00.526971+0000)
...
2023-03-17T00:50:04.293+0000 7f7d87f64700 10 MDSContext::complete: N7MDSRank32C_MDS_StandbyReplayRestartFinishE
2023-03-17T00:50:04.293+0000 7f7d87f64700 10 mds.0.log standby_trim_segments
2023-03-17T00:50:04.293+0000 7f7d87f64700 10 mds.0.log  expire_pos=4202684
2023-03-17T00:50:04.293+0000 7f7d87f64700 10 mds.0.log  segment seq=1 4194304~8380
2023-03-17T00:50:04.293+0000 7f7d87f64700 10 mds.0.log  removing segment
2023-03-17T00:50:04.293+0000 7f7d87f64700 10 mds.0.cache.dir(0x10000000000) mark_clean [dir 0x10000000000 /client.0/ [2,head] auth v=1 cv=0/0 state=1610612737|complete f() n() hs=0+0,ss=0+0 | dirty=1 0x560d00601600] version 1
2023-03-17T00:50:04.293+0000 7f7d87f64700 10 mds.0.cache.dir(0x1) mark_clean [dir 0x1 / [2,head] auth v=5 cv=0/0 dir_auth=0 state=1610612736 f(v0 m2023-03-17T00:49:55.917230+0000 1=0+1) n(v0 rc2023-03-17T00:49:55.922230+0000 1=0+1) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 dirty=1 0x560d00600d00] version 5
2023-03-17T00:50:04.293+0000 7f7d87f64700 10 mds.0.cache.ino(0x1) mark_clean [inode 0x1 [...3,head] / auth v5 snaprealm=0x560d0156a480 f(v0 m2023-03-17T00:49:55.917230+0000 1=0+1) n(v0 rc2023-03-17T00:50:00.526288+0000 rs1 2=0+2)/n(v0 rc2023-03-17T00:28:01.024379+0000 1=0+1) old_inodes=1 (iversion lock) | dirfrag=1 dirty=1 0x560d01567600]
2023-03-17T00:50:04.293+0000 7f7d87f64700 10 mds.0.cache.ino(0x10000000000) mark_clean [inode 0x10000000000 [...2,head] /client.0/ auth v4 DIRTYPARENT f() n(v0 rc2023-03-17T00:49:55.922230+0000 1=0+1) (iversion lock) | dirfrag=0 dirtyparent=1 dirty=1 0x560d01568c00]
2023-03-17T00:50:04.293+0000 7f7d87f64700 10 mds.0.cache.den(0x1 client.0) mark_clean [dentry #0x1/client.0 [2,head] auth (dversion lock) v=4 ino=0x10000000000 state=1610612736 | inodepin=1 dirty=1 0x560d012e8500]
2023-03-17T00:50:04.293+0000 7f7d87f64700 10 mds.0.cache.ino(0x10000000000) clear_dirty_parent
2023-03-17T00:50:04.293+0000 7f7d87f64700 10 mds.0.log  segment seq=12 4202684~892
2023-03-17T00:50:04.293+0000 7f7d87f64700 10 mds.0.log  won't remove, not expired!
2023-03-17T00:50:04.293+0000 7f7d87f64700 20 mds.0.log  calling mdcache->trim!
2023-03-17T00:50:04.293+0000 7f7d87f64700  7 mds.0.cache trim bytes_used=14kB limit=4GB reservation=0.05% count=0
2023-03-17T00:50:04.293+0000 7f7d87f64700  7 mds.0.cache trim_lru trimming 0 items from LRU size=1 mid=0 pintail=0 pinned=0
2023-03-17T00:50:04.293+0000 7f7d87f64700  7 mds.0.cache trim_lru trimmed 0 items
2023-03-17T00:50:04.293+0000 7f7d87f64700  2 mds.0.0 Booting: 2: replaying mds log
2023-03-17T00:50:04.293+0000 7f7d87f64700 10 mds.0.log replay start, from 4203576 to 4203636
2023-03-17T00:50:04.293+0000 7f7d86761700 10 mds.0.log _replay_thread start
2023-03-17T00:50:04.293+0000 7f7d86761700  1 -- [v2:172.21.15.146:6838/2651546611,v1:172.21.15.146:6839/2651546611] --> [v2:172.21.15.146:6824/672977052,v1:172.21.15.146:6825/672977052] -- osd_op(unknown.0.0:2692 2.14 2:2e2fa760:::200.00000001:head [read 9272~60 [fadvise_dontneed]] snapc 0=[] ondisk+read+known_if_redirected+full_force+supports_pool_eio e91) v8 -- 0x560d015cb800 con 0x560d01536c00
2023-03-17T00:50:04.294+0000 7f7d91777700  1 -- [v2:172.21.15.146:6838/2651546611,v1:172.21.15.146:6839/2651546611] <== osd.10 v2:172.21.15.146:6824/672977052 2512 ==== osd_op_reply(2692 200.00000001 [read 9272~60 [fadvise_dontneed] out=60b] v0'0 uv22 ondisk = 0) v8 ==== 156+0+60 (crc 0 0 0) 0x560d0156a000 con 0x560d01536c00
2023-03-17T00:50:04.294+0000 7f7d86761700 10 mds.0.log _replay 4203576~40 / 4203636 2023-03-17T00:50:02.359670+0000: ETableClient snaptable ack tid 2
2023-03-17T00:50:04.294+0000 7f7d86761700 10 mds.0.journal  ETableClient.replay snaptable op ack tid 2
2023-03-17T00:50:04.294+0000 7f7d86761700 10 mds.0.tableclient(snaptable) got_journaled_ack 2

From: /teuthology/pdonnell-2023-03-16_23:18:17-fs:workload-wip-pdonnell-testing-20230316.214637-distro-default-smithi/7211196/remote/smithi146/log/1a80245e-c457-11ed-9afb-001a4aab830c/ceph-mds.b.log.gz

The LogSegment storing the pending commit tids was expired/removed. This happened because rank 0 received a "flush journal" request (part of the new snapshot stress testing in fs:workload). The standby-replay daemon wrongly thinks it can continue replay when all of its segments in memory have been expired.


Related issues

Related to CephFS - Bug #54741: crash: MDSTableClient::got_journaled_ack(unsigned long) New
Related to CephFS - Bug #63806: ffsb.sh workunit failure (MDS: std::out_of_range, damaged) Closed

History

#1 Updated by Patrick Donnelly 11 months ago

  • Pull request ID set to 50615

#2 Updated by Patrick Donnelly 4 months ago

  • Status changed from In Progress to New
  • Assignee deleted (Patrick Donnelly)
  • Target version deleted (v19.0.0)
  • Backport deleted (reef,quincy,pacific)

#3 Updated by Patrick Donnelly 4 months ago

  • Pull request ID deleted (50615)

#4 Updated by Venky Shankar 4 months ago

Patrick, https://github.com/ceph/ceph/pull/50615 doesn't really fix the underlying issues and is just a fail safe approach to restart the s-r daemon, correct? We do want to backport the change I guess??

#5 Updated by Patrick Donnelly 4 months ago

Venky Shankar wrote:

Patrick, https://github.com/ceph/ceph/pull/50615 doesn't really fix the underlying issues and is just a fail safe approach to restart the s-r daemon, correct? We do want to backport the change I guess??

It doesn't change the behavior at all. It just adds extra debugging.

Yes, the current approach is to just restart the s-r daemon. It's mostly just an annoyance we see in QA because most folks are not sending "flush journal" commands to their MDS routinely.

#6 Updated by Patrick Donnelly 4 months ago

Patrick Donnelly wrote:

Venky Shankar wrote:

We do want to backport the change I guess??

No need. I think it may be helpful for debugging the next time we see it in main QA.

#7 Updated by Patrick Donnelly 3 months ago

  • Related to Bug #54741: crash: MDSTableClient::got_journaled_ack(unsigned long) added

#8 Updated by Venky Shankar 3 months ago

https://tracker.ceph.com/issues/54741 was seen in a cluster where it's unlikely that "flush journal" was run. It's possible that the standby-replay daemon replaying logic isn't handling certain edge cases which could have lead to this.

#9 Updated by Venky Shankar 3 months ago

  • Category set to Correctness/Safety
  • Assignee set to Venky Shankar
  • Target version set to v19.0.0
  • Backport set to quincy,reef

#10 Updated by Patrick Donnelly 2 months ago

  • Related to Bug #63806: ffsb.sh workunit failure (MDS: std::out_of_range, damaged) added

Also available in: Atom PDF