Project

General

Profile

Bug #40409

possible crash when replaying journal with invalid/corrupted ranges

Added by Mykola Golub about 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus,mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When testing replay of a journal with intentionally introduced empty regions (pads) the crash was observed:

2019-06-18T07:06:24.578+0100 7f55c6ffd700 10 ObjectPlayer: 0x55fe07d1a2f0 handle_fetch_complete: journal_data.3.16e6c09c15b.0, r=0, len=65536
2019-06-18T07:06:24.578+0100 7f55c6ffd700 20 ObjectPlayer: 0x55fe07d1a2f0 : Entry[tag_tid=1, entry_tid=0, data size=24] decoded
2019-06-18T07:06:24.578+0100 7f55c6ffd700 -1 ObjectPlayer: 0x55fe07d1a2f0 : detected corrupt journal entry at offset 57
2019-06-18T07:06:24.590+0100 7f55c6ffd700 20 ObjectPlayer: 0x55fe07d1a2f0 : Entry[tag_tid=1, entry_tid=4, data size=4140] decoded
2019-06-18T07:06:24.590+0100 7f55c6ffd700 -1 ObjectPlayer: 0x55fe07d1a2f0 : corruption range [57, 16384)
2019-06-18T07:06:24.590+0100 7f55c6ffd700 -1 ObjectPlayer: 0x55fe07d1a2f0 : detected corrupt journal entry at offset 4230
2019-06-18T07:06:24.598+0100 7f55c6ffd700 20 ObjectPlayer: 0x55fe07d1a2f0 : Entry[tag_tid=1, entry_tid=8, data size=4140] decoded
2019-06-18T07:06:24.598+0100 7f55c6ffd700 -1 ObjectPlayer: 0x55fe07d1a2f0 : corruption range [4230, 16441)
2019-06-18T07:06:24.606+0100 7f55c6ffd700 -1 /home/mgolub/ceph/ceph.ci/src/include/interval_set.h: In function 'void interval_set<T, Map>::insert(T, T, T*, T*) [with T = long unsigned int; Map = std::map<long unsigned int, long unsigned int>]' thread 7f55c6ffd700 time 2019-06-18T07:06:24.601057+0100
/home/mgolub/ceph/ceph.ci/src/include/interval_set.h: 461: ceph_abort_msg("abort() called")

 ceph version v15.0.0-1916-g1ecfa493b8 (1ecfa493b8b6cfb81361396d92a5ff34f54ae70e) octopus (dev)
 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xf6) [0x7f55f4e0324c]
 2: (interval_set<unsigned long, std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > > >::insert(unsigned long, unsigned long, unsigned long*, unsigned long*)+0x184) [0x55fe05c8bf86]
 3: (journal::ObjectPlayer::handle_fetch_complete(int, ceph::buffer::v14_2_0::list const&, bool*)+0xa44) [0x55fe05c89d44]
 4: (journal::ObjectPlayer::C_Fetch::finish(int)+0x51) [0x55fe05c8b0b7]

 #6  0x00007f55f4e034d0 in ceph::__ceph_abort (file=0x55fe05d18380 "/home/mgolub/ceph/ceph.ci/src/include/interval_set.h", line=461, 
    func=0x55fe05d183b8 "void interval_set<T, Map>::insert(T, T, T*, T*) [with T = long unsigned int; Map = std::map<long unsigned int, long unsigned int>]", msg="abort() called")
    at /home/mgolub/ceph/ceph.ci/src/common/assert.cc:196
#7  0x000055fe05c8bf86 in interval_set<unsigned long, std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > > >::insert (
    this=0x55fe07d1a470, start=4230, len=12211, pstart=0x0, plen=0x0) at /home/mgolub/ceph/ceph.ci/src/include/interval_set.h:461
#8  0x000055fe05c89d44 in journal::ObjectPlayer::handle_fetch_complete (this=0x55fe07d1a2f0, r=0, bl=..., refetch=0x7f55c6ff847f) at /home/mgolub/ceph/ceph.ci/src/journal/ObjectPlayer.cc:172
#9  0x000055fe05c8b0b7 in journal::ObjectPlayer::C_Fetch::finish (this=0x55fe07e00ce0, r=0) at /home/mgolub/ceph/ceph.ci/src/journal/ObjectPlayer.cc:291
#10 0x000055fe05ad9de3 in Context::complete (this=0x55fe07e00ce0, r=0) at /home/mgolub/ceph/ceph.ci/src/include/Context.h:77

due to read block offset was not properly updated after skipping the invalid range.


Related issues

Copied to rbd - Backport #40461: mimic: possible crash when replaying journal with invalid/corrupted ranges Resolved
Copied to rbd - Backport #40462: nautilus: possible crash when replaying journal with invalid/corrupted ranges Resolved
Copied to rbd - Backport #40463: luminous: possible crash when replaying journal with invalid/corrupted ranges Resolved

History

#1 Updated by Mykola Golub about 2 years ago

  • Status changed from In Progress to Fix Under Review

#2 Updated by Jason Dillaman about 2 years ago

  • Status changed from Fix Under Review to Pending Backport

#3 Updated by Nathan Cutler about 2 years ago

  • Copied to Backport #40461: mimic: possible crash when replaying journal with invalid/corrupted ranges added

#4 Updated by Nathan Cutler about 2 years ago

  • Copied to Backport #40462: nautilus: possible crash when replaying journal with invalid/corrupted ranges added

#5 Updated by Nathan Cutler about 2 years ago

  • Copied to Backport #40463: luminous: possible crash when replaying journal with invalid/corrupted ranges added

#6 Updated by Mykola Golub about 2 years ago

  • Pull request ID set to 28627

#7 Updated by Nathan Cutler about 2 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF