Bug #21749
PurgeQueue corruption in 12.2.1
Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
From "[ceph-users] how to debug (in order to repair) damaged MDS (rank)?"
Log snippet during MDS startup:
56 2017-10-10 13:21:55.421122 7f3f2990d700 1 mds.6.journaler.pq(ro) recover start 57 2017-10-10 13:21:55.421124 7f3f2990d700 1 mds.6.journaler.pq(ro) read_head 58 2017-10-10 13:21:55.421231 7f3f2990d700 0 mds.6.cache creating system inode with ino:0x1 59 2017-10-10 13:21:55.422532 7f3f2a10e700 10 MDSIOContextBase::complete: 18C_IO_Inode_Fetched 60 2017-10-10 13:21:55.422548 7f3f2a10e700 10 mds.6.cache.ino(0x106) _fetched got 0 and 536 61 2017-10-10 13:21:55.422556 7f3f2a10e700 10 mds.6.cache.ino(0x106) magic is 'ceph fs volume v011' (expecting 'ceph fs volume v011') 62 2017-10-10 13:21:55.422584 7f3f2a10e700 10 mds.6.cache.snaprealm(0x106 seq 1 0x55b192f65c00) open_parents [1,head] 63 2017-10-10 13:21:55.422593 7f3f2a10e700 20 mds.6.cache.ino(0x106) decode_snap_blob snaprealm(0x106 seq 1 lc 0 cr 0 cps 1 snaps={} 0x55b192f65c00) 64 2017-10-10 13:21:55.422598 7f3f2a10e700 10 mds.6.cache.ino(0x106) _fetched [inode 0x106 [...2,head] ~mds6/ auth v19 snaprealm=0x55b192f65c00 f(v0 10=0+10) n(v3 rc2017-10-03 22 :56:32.400835 b6253 88=11+77)/n(v0 11=0+11) (iversion lock) 0x55b193176700] 65 2017-10-10 13:21:55.831091 7f3f2b110700 1 mds.6.journaler.pq(ro) _finish_read_head loghead(trim 104857600, expire 108687220, write 108868115, stream_format 1). probing for e nd of log (from 108868115)... 66 2017-10-10 13:21:55.831107 7f3f2b110700 1 mds.6.journaler.pq(ro) probing for end of the log 67 2017-10-10 13:21:55.841213 7f3f2b110700 1 mds.6.journaler.pq(ro) _finish_probe_end write_pos = 134217728 (header had 108868115). recovered. 68 2017-10-10 13:21:55.841234 7f3f2b110700 4 mds.6.purge_queue operator(): open complete 69 2017-10-10 13:21:55.841236 7f3f2b110700 4 mds.6.purge_queue operator(): recovering write_pos 70 2017-10-10 13:21:55.841239 7f3f2b110700 10 mds.6.journaler.pq(ro) _prefetch 71 2017-10-10 13:21:55.841241 7f3f2b110700 10 mds.6.journaler.pq(ro) _prefetch 41943040 requested_pos 108868115 < target 134217728 (150811155), prefetching 25349613 72 2017-10-10 13:21:55.841246 7f3f2b110700 10 mds.6.journaler.pq(ro) _issue_read reading 108868115~25349613, read pointers 108868115/108868115/134217728 73 2017-10-10 13:21:55.841564 7f3f2b110700 10 mds.6.journaler.pq(ro) wait_for_readable at 108868115 onreadable 0x55b193232840 74 2017-10-10 13:21:55.842864 7f3f2b110700 10 mds.6.journaler.pq(ro) _finish_read got 108868115~183789 75 2017-10-10 13:21:55.842882 7f3f2b110700 10 mds.6.journaler.pq(ro) _assimilate_prefetch 108868115~183789 76 2017-10-10 13:21:55.842886 7f3f2b110700 10 mds.6.journaler.pq(ro) _assimilate_prefetch read_buf now 108868115~183789, read pointers 108868115/109051904/134217728 77 2017-10-10 13:21:55.842965 7f3f2b110700 -1 mds.6.journaler.pq(ro) _decode error from assimilate_prefetch 78 2017-10-10 13:21:55.842979 7f3f2b110700 -1 mds.6.purge_queue _recover: Error -22 recovering write_pos 79 2017-10-10 13:21:55.842983 7f3f2b110700 10 mds.beacon.mds9 set_want_state: up:replay -> down:damaged
Related issues
History
#1 Updated by Daniel Baumann over 5 years ago
I saved all information/logs/objects, feel free to ask for any of it and further things.
Regards,
Daniel
#2 Updated by Zheng Yan over 5 years ago
likely caused by http://tracker.ceph.com/issues/19593.
ping 'yanzheng' at ceph@OFTC, I will help you to recover the FS
#3 Updated by Daniel Baumann over 5 years ago
Hi Yan,
yes, we had 3 MDS running in standby-replay mode (I switched them to standby now).
Thanks for the offer for help with recovery, I already could bring it back by removing the objects in the purge queue.
(for reference)
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021386.html
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021390.html
Regards,
Daniel
#5 Updated by Patrick Donnelly over 5 years ago
- Duplicates Bug #19593: purge queue and standby replay mds added