Project

General

Profile

Actions

Bug #24639

closed

[segfault] segfault in BlueFS::read

Added by Rowan James almost 6 years ago. Updated about 5 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Via ceph-deploy on my admin host; I created two encrypted bluestore OSDs which after between 4 and 24 hours started persistently flapping with a segfault in the systemd logs. The crash happens immediately on launch, 100% of the time on both OSDs.

The crash happens in the same stack for both ceph-osd and ceph-bluestore-tool.

Realizing the affected host was behind the rest of the recently-upgraded-to-Luminous cluster (exact version unknown, probably latest 16.04 LTS in Canonical repo), I proceeded to upgrade via the Luminous PPA, hoping it was an issue with the experimental bluestore code).

I have now reproduced the bug with Luminous, upgraded the host to 18.04 LTS, and further to Mimic, and not seen a change in behavior, leading me to believe these OSDs are now in some data state which reliably reproduces this issue.

Jun 23 20:12:19 Largo systemd[1]: ceph-osd@0.service: Service hold-off time over, scheduling restart.
Jun 23 20:12:19 Largo systemd[1]: ceph-osd@0.service: Scheduled restart job, restart counter is at 51.
Jun 23 20:12:19 Largo systemd[1]: Stopped Ceph object storage daemon osd.0.
Jun 23 20:12:19 Largo systemd[1]: Starting Ceph object storage daemon osd.0...
Jun 23 20:12:19 Largo systemd[1]: Started Ceph object storage daemon osd.0.
Jun 23 20:12:20 Largo ceph-osd[58404]: 2018-06-23 20:12:20.022 7fdb54cc2280 -1 Public network was set, but cluster network was not set
Jun 23 20:12:20 Largo ceph-osd[58404]: 2018-06-23 20:12:20.022 7fdb54cc2280 -1     Using public network also for cluster network
Jun 23 20:12:20 Largo ceph-osd[58404]: starting osd.0 at - osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
Jun 23 20:12:20 Largo ceph-osd[58404]: *** Caught signal (Segmentation fault) **
Jun 23 20:12:20 Largo ceph-osd[58404]:  in thread 7fdb54cc2280 thread_name:ceph-osd
Jun 23 20:12:20 Largo ceph-osd[58404]:  ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)
Jun 23 20:12:20 Largo ceph-osd[58404]:  1: (()+0x915140) [0x55ed327c7140]
Jun 23 20:12:20 Largo ceph-osd[58404]:  2: (()+0x12890) [0x7fdb4a5dc890]
Jun 23 20:12:20 Largo ceph-osd[58404]:  3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x367) [
Jun 23 20:12:20 Largo ceph-osd[58404]:  4: (BlueFS::_replay(bool, bool)+0x214) [0x55ed3277c654]
Jun 23 20:12:20 Largo ceph-osd[58404]:  5: (BlueFS::mount()+0x1f1) [0x55ed32780ea1]
Jun 23 20:12:20 Largo ceph-osd[58404]:  6: (BlueStore::_open_db(bool, bool)+0x1840) [0x55ed326abae0]
Jun 23 20:12:20 Largo ceph-osd[58404]:  7: (BlueStore::_mount(bool, bool)+0x4b7) [0x55ed326db407]
Jun 23 20:12:20 Largo ceph-osd[58404]:  8: (OSD::init()+0x295) [0x55ed32286305]
Jun 23 20:12:20 Largo ceph-osd[58404]:  9: (main()+0x268d) [0x55ed3217507d]
Jun 23 20:12:20 Largo ceph-osd[58404]:  10: (__libc_start_main()+0xe7) [0x7fdb49495b97]
Jun 23 20:12:20 Largo ceph-osd[58404]:  11: (_start()+0x2a) [0x55ed3223d38a]
Jun 23 20:12:20 Largo ceph-osd[58404]: 2018-06-23 20:12:20.318 7fdb54cc2280 -1 *** Caught signal (Segmentation fault) **
Jun 23 20:12:20 Largo ceph-osd[58404]:  in thread 7fdb54cc2280 thread_name:ceph-osd
Jun 23 20:12:20 Largo ceph-osd[58404]:  ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)
Jun 23 20:12:20 Largo ceph-osd[58404]:  1: (()+0x915140) [0x55ed327c7140]
Jun 23 20:12:20 Largo ceph-osd[58404]:  2: (()+0x12890) [0x7fdb4a5dc890]
Jun 23 20:12:20 Largo ceph-osd[58404]:  3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x367) [
Jun 23 20:12:20 Largo ceph-osd[58404]:  4: (BlueFS::_replay(bool, bool)+0x214) [0x55ed3277c654]
Jun 23 20:12:20 Largo ceph-osd[58404]:  5: (BlueFS::mount()+0x1f1) [0x55ed32780ea1]
Jun 23 20:12:20 Largo ceph-osd[58404]:  6: (BlueStore::_open_db(bool, bool)+0x1840) [0x55ed326abae0]
Jun 23 20:12:20 Largo ceph-osd[58404]:  7: (BlueStore::_mount(bool, bool)+0x4b7) [0x55ed326db407]
Jun 23 20:12:20 Largo ceph-osd[58404]:  8: (OSD::init()+0x295) [0x55ed32286305]
Jun 23 20:12:20 Largo ceph-osd[58404]:  9: (main()+0x268d) [0x55ed3217507d]
Jun 23 20:12:20 Largo ceph-osd[58404]:  10: (__libc_start_main()+0xe7) [0x7fdb49495b97]
Jun 23 20:12:20 Largo ceph-osd[58404]:  11: (_start()+0x2a) [0x55ed3223d38a]
Jun 23 20:12:20 Largo ceph-osd[58404]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Jun 23 20:12:20 Largo ceph-osd[58404]:    -39> 2018-06-23 20:12:20.022 7fdb54cc2280 -1 Public network was set, but cluster network was not set
Jun 23 20:12:20 Largo ceph-osd[58404]:    -38> 2018-06-23 20:12:20.022 7fdb54cc2280 -1     Using public network also for cluster network
Jun 23 20:12:20 Largo ceph-osd[58404]:      0> 2018-06-23 20:12:20.318 7fdb54cc2280 -1 *** Caught signal (Segmentation fault) **
Jun 23 20:12:20 Largo ceph-osd[58404]:  in thread 7fdb54cc2280 thread_name:ceph-osd
Jun 23 20:12:20 Largo ceph-osd[58404]:  ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)
Jun 23 20:12:20 Largo ceph-osd[58404]:  1: (()+0x915140) [0x55ed327c7140]
Jun 23 20:12:20 Largo ceph-osd[58404]:  2: (()+0x12890) [0x7fdb4a5dc890]
Jun 23 20:12:20 Largo ceph-osd[58404]:  3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x367) [
Jun 23 20:12:20 Largo ceph-osd[58404]:  4: (BlueFS::_replay(bool, bool)+0x214) [0x55ed3277c654]
Jun 23 20:12:20 Largo ceph-osd[58404]:  5: (BlueFS::mount()+0x1f1) [0x55ed32780ea1]
Jun 23 20:12:20 Largo ceph-osd[58404]:  6: (BlueStore::_open_db(bool, bool)+0x1840) [0x55ed326abae0]
Jun 23 20:12:20 Largo ceph-osd[58404]:  7: (BlueStore::_mount(bool, bool)+0x4b7) [0x55ed326db407]
Jun 23 20:12:20 Largo ceph-osd[58404]:  8: (OSD::init()+0x295) [0x55ed32286305]
Jun 23 20:12:20 Largo ceph-osd[58404]:  9: (main()+0x268d) [0x55ed3217507d]
Jun 23 20:12:20 Largo ceph-osd[58404]:  10: (__libc_start_main()+0xe7) [0x7fdb49495b97]
Jun 23 20:12:20 Largo ceph-osd[58404]:  11: (_start()+0x2a) [0x55ed3223d38a]
Jun 23 20:12:20 Largo ceph-osd[58404]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Jun 23 20:12:20 Largo systemd[1]: ceph-osd@0.service: Main process exited, code=dumped, status=11/SEGV
Jun 23 20:12:20 Largo systemd[1]: ceph-osd@0.service: Failed with result 'core-dump'.
Starting program: /usr/bin/ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-0 --no-mon-config
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffea952700 (LWP 62958)]
[New Thread 0x7fffe997a700 (LWP 62959)]
[New Thread 0x7fffe9179700 (LWP 62960)]
[New Thread 0x7fffe8978700 (LWP 62961)]
[New Thread 0x7fffe8177700 (LWP 62962)]
[New Thread 0x7fffe7976700 (LWP 62963)]

Thread 1 "ceph-bluestore-" hit Breakpoint 1, BlueFS::_read (this=this@entry=0x55555688c600, h=h@entry=0x555556862e80, buf=buf@entry=0x555556862e88, off=0, 
    len=<optimised out>, outbl=outbl@entry=0x7fffffffaf10, out=0x0) at ./src/os/bluestore/BlueFS.cc:1097
1097    in ./src/os/bluestore/BlueFS.cc
(gdb) bt
#0  BlueFS::_read (this=this@entry=0x55555688c600, h=h@entry=0x555556862e80, buf=buf@entry=0x555556862e88, off=0, len=<optimised out>, outbl=outbl@entry=0x7fffffffaf10, 
    out=0x0) at ./src/os/bluestore/BlueFS.cc:1097
#1  0x00005555557a99c4 in BlueFS::_replay (this=this@entry=0x55555688c600, noop=noop@entry=false, to_stdout=to_stdout@entry=false) at ./src/os/bluestore/BlueFS.cc:596
#2  0x00005555557ae211 in BlueFS::mount (this=0x55555688c600) at ./src/os/bluestore/BlueFS.cc:440
#3  0x0000555555812400 in BlueStore::_open_db (this=this@entry=0x7fffffffc680, create=create@entry=false, to_repair_db=to_repair_db@entry=false)
    at ./src/os/bluestore/BlueStore.cc:4845
#4  0x0000555555836c71 in BlueStore::_fsck (this=0x7fffffffc680, deep=false, repair=<optimised out>) at ./src/os/bluestore/BlueStore.cc:5867
#5  0x00005555556ccd57 in BlueStore::fsck (deep=<optimised out>, this=0x7fffffffc680) at ./src/os/bluestore/BlueStore.h:2171
#6  main (argc=<optimised out>, argv=<optimised out>) at ./src/os/bluestore/bluestore_tool.cc:306
(gdb) s

Thread 1 "ceph-bluestore-" received signal SIGSEGV, Segmentation fault.
BlueFS::_read (this=this@entry=0x55555688c600, h=h@entry=0x555556862e80, buf=buf@entry=0x555556862e88, off=0, len=<optimised out>, outbl=outbl@entry=0x7fffffffaf10, 
    out=0x0) at ./src/os/bluestore/BlueFS.cc:1097
1097    in ./src/os/bluestore/BlueFS.cc
(gdb) 


Files


Related issues 1 (1 open0 closed)

Related to bluestore - Bug #45519: OSD asserts during block allocation for BlueFSNew

Actions
Actions

Also available in: Atom PDF