Bug #42605: KernelDevice.cc: 688: FAILED assert(off % block_size == 0) - bluestore - Ceph

Actions

Copy link

Bug #42605

closed

KernelDevice.cc: 688: FAILED assert(off % block_size == 0)

Added by 黄维 over 4 years ago. Updated about 1 year ago.

Status:

Closed

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v12.2.9

ceph-qa-suite:

rados

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

OSD start failed after server power down

ceph version: v12.2.9

stack info:
/clove/vm/zstor/ceph/rpmbuild/BUILD/ceph-12.2.9/src/os/bluestore/KernelDevice.cc: In function 'virtual int KernelDevice::read(uint64_t, uint64_t, ceph::bufferlist*, IOContext*, bool)' thread 7f10ebd5ce40 time 2019-10-19 09:24:23.121071
/clove/vm/zstor/ceph/rpmbuild/BUILD/ceph-12.2.9/src/os/bluestore/*KernelDevice.cc: 688: FAILED assert(off % block_size == 0)*
ceph version 12.2.9-2-34-g8d920dc (8d920dcaaa949f3a08659d9db6e560ccb1896736) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f10e2612080]
2: (KernelDevice::read(unsigned long, unsigned long, ceph::buffer::list*, IOContext*, bool)+0x6af) [0x5602e540914f]
3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x646) [0x5602e520bad6]
4: (BlueFS::_replay(bool)+0x6cf) [0x5602e522326f]
5: (BlueFS::mount()+0x1e4) [0x5602e52273d4]
6: (open_bluefs(CephContext*, std::string const&, std::vector<std::string, std::allocator<std::string> > const&)+0x3c2) [0x5602e51fad42]
7: (main()+0x1f50) [0x5602e5164410]
8: (__libc_start_main()+0xf5) [0x7f10dfb10c05]
9: (()+0x1c231f) [0x5602e51fa31f]

Breakpoint 2, BlueFS::_read (this=this@entry=0x555556684300, h=h@entry=0x5555566ca680, buf=buf@entry=0x5555566ca688, off=off@entry=3883008,
len=len@entry=1077248, outbl=outbl@entry=0x7fffffffb9a0, out=out@entry=0x0) at /usr/src/debug/ceph-12.2.9/src/os/bluestore/BlueFS.cc:935
935 {
(gdb)
Continuing.

Breakpoint 2, BlueFS::_read (this=this@entry=0x555556684300, h=h@entry=0x5555566ca680, buf=buf@entry=0x5555566ca688, off=off@entry=4960256,
len=4096, outbl=outbl@entry=0x7fffffffb950, out=out@entry=0x0) at /usr/src/debug/ceph-12.2.9/src/os/bluestore/BlueFS.cc:935
935 {
(gdb)
Continuing.

Breakpoint 2, BlueFS::_read (this=this@entry=0x555556684300, h=h@entry=0x5555566ca680, buf=buf@entry=0x5555566ca688, off=off@entry=4964352, //4964352 + 720896 = 5685248
len=len@entry=720896, outbl=outbl@entry=0x7fffffffb9a0, out=out@entry=0x0) at /usr/src/debug/ceph-12.2.9/src/os/bluestore/BlueFS.cc:935
935 {
(gdb)
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000555555727ac3 in BlueFS::_read (this=this@entry=0x555556684300, h=h@entry=0x5555566ca680, buf=buf@entry=0x5555566ca688, off=5324800,
off@entry=4964352, len=360448, len@entry=720896, outbl=outbl@entry=0x7fffffffb9a0, out=out@entry=0x0)
at /usr/src/debug/ceph-12.2.9/src/os/bluestore/BlueFS.cc:975
975 cct->_conf->bluefs_buffered_io);

$3 = std::vector of length 2, capacity 2 = {{<AllocExtent> = {offset = 1618051072, length = 1130496}, bdev = 1 '\001'}, {<AllocExtent> = {
offset = 1409875968, length = 4194304}, bdev = 0 '\000'}} //*size: 1130496 + 4194304 < 5685248*
(gdb) n
90 if ((int64_t) offset >= p->length) { //5324800 > 1130496
(gdb) p offset
$4 = 5324800
(gdb) p p->length
Attempt to take address of value not located in memory.
(gdb) n
91 offset ~~= p~~>length;
(gdb) p 5324800 - 1130496
$5 = 4194304
(gdb) n
89 while (p != extents.end()) {
(gdb) n
92 ++p;
(gdb) n
89 while (p != extents.end()) {
(gdb) n
91 offset ~~= p~~>length;
(gdb) n
89 while (p != extents.end()) {
(gdb) n
92 ++p;
(gdb) n
89 while (p != extents.end()) {
(gdb) p p
$6 = {<AllocExtent> = {offset = 93825012845280, length = 1449961104}, bdev = 85 'U'} *//offset = 93825012845280=85(T) overstep the bound *
(gdb) n
97 *x_off = offset;
(gdb) p offset
$7 = 0
(gdb) n
99 }

Files

ceph-osd.13.log.7z (156 KB) ceph-osd.13.log.7z

黄维, 11/04/2019 02:49 AM

Actions

Copy link

Updated by 黄维 over 4 years ago

File ceph-osd.13.log.7z ceph-osd.13.log.7z added

Actions

Copy link

Updated by Igor Fedotov over 4 years ago

Looks like bluefs replay tries to read an out-of-bound extent (#3 while just 2 are present for log file (aka ino 1) in an attempt to locate log tail. Most probably this is an error in replay logic.

would you be able to attach binary dumps for the following regions to inspect bluefs log content and have a way a repro to verify a fix:

DB device:
0x60718000+114000
WAL device:
0x54090000+400000

Actions

Copy link

Updated by 黄维 over 4 years ago

Igor Fedotov wrote:

Looks like bluefs replay tries to read an out-of-bound extent (#3 while just 2 are present for log file (aka ino 1) in an attempt to locate log tail. Most probably this is an error in replay logic.

would you be able to attach binary dumps for the following regions to inspect bluefs log content and have a way a repro to verify a fix:

DB device:
0x60718000+114000
WAL device:
0x54090000+400000

Sorry，i cann't. The OSD has been accidentally destroyed.

Actions

Copy link

Updated by 黄维 over 4 years ago

黄维 wrote:

Igor Fedotov wrote:

Looks like bluefs replay tries to read an out-of-bound extent (#3 while just 2 are present for log file (aka ino 1) in an attempt to locate log tail. Most probably this is an error in replay logic.

would you be able to attach binary dumps for the following regions to inspect bluefs log content and have a way a repro to verify a fix:

DB device:
0x60718000+114000
WAL device:
0x54090000+400000

Sorry，i can't. The OSD has been accidentally destroyed.

Actions

Copy link

Updated by Igor Fedotov about 1 year ago

Status changed from New to Closed

Outdated

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » bluestore

Custom queries

Bug #42605

KernelDevice.cc: 688: FAILED assert(off % block_size == 0)

Updated by 黄维 over 4 years ago

Updated by Igor Fedotov over 4 years ago

Updated by 黄维 over 4 years ago

Updated by 黄维 over 4 years ago

Updated by Igor Fedotov about 1 year ago

Project

General

Profile

Ceph » bluestore

Custom queries

Bug #42605

KernelDevice.cc: 688: FAILED assert(off % block_size == 0)

Updated by 黄 维 over 4 years ago

Updated by Igor Fedotov over 4 years ago

Updated by 黄 维 over 4 years ago

Updated by 黄 维 over 4 years ago

Updated by Igor Fedotov about 1 year ago

Updated by 黄维 over 4 years ago

Updated by 黄维 over 4 years ago

Updated by 黄维 over 4 years ago