Project

General

Profile

Actions

Bug #23599

closed

Segfault while scrubbing Bluestore OSD

Added by Alex Gorbachev about 6 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Core dump at https://drive.google.com/open?id=13MupSyHsY_zA5CM7DOILG8GA9hmQqX24

0> 2018-04-08 21:56:41.911362 7f4d3fdc7700 -1 ** Caught signal (Segmentation fault) *
in thread 7f4d3fdc7700 thread_name:bstore_mempool
ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)
1: (()+0xa74234) [0x55f5482f8234]
2: (()+0x11390) [0x7f4d4b940390]
3: (BlueStore::TwoQCache::_trim(unsigned long, unsigned long)+0x518) [0x55f5481a84b8]
4: (BlueStore::Cache::trim(unsigned long, float, float, float)+0x4e4) [0x55f548177594]
5: (BlueStore::MempoolThread::entry()+0x155) [0x55f54817ddd5]
6: (()+0x76ba) [0x7f4d4b9366ba]
7: (clone()+0x6d) [0x7f4d4a9ad41d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

dmesg:
[126029.543698] safe_timer28863: segfault at 8d ip 00007fa9ad4dcccb sp 00007fa9a6629f70 error 4 in libgcc_s.so.1[7fa9ad4ce000+16000]

Actions #1

Updated by Igor Fedotov about 6 years ago

This looks similar to https://tracker.ceph.com/issues/21259 that's already marked as resolved in v12.2.5
I'm not completely sure about that hence could you please give that fix a try and report back when done.

It would be also great if you collect OSD logs for the case with debug bluestore = 20

Actions #2

Updated by Igor Fedotov about 6 years ago

  • Status changed from New to Need More Info
Actions #3

Updated by Alex Gorbachev about 6 years ago

I will update to 12.2.5 when available and report back. This is in production, so I am also wondering if the debug settings are going to impact performance - during scrub I get a lot of these segfaults now.

Actions #4

Updated by Igor Fedotov about 6 years ago

Alex Gorbachev wrote:

I will update to 12.2.5 when available and report back. This is in production, so I am also wondering if the debug settings are going to impact performance - during scrub I get a lot of these segfaults now.

Unfortunately yes - they might affect the performance.

Actions #5

Updated by Alex Gorbachev about 6 years ago

OK thanks - this pretty much happens to some of our 342 OSDs on every deep scrub operation. I will try to set the debug briefly and collect logs

Actions #6

Updated by Alex Gorbachev about 6 years ago

Had an OSD crash today without deep scrub, log and core dump available at:

https://drive.google.com/open?id=1MFNdn7uoovQ05SdTE_cCIfmMDigBMlw8

https://drive.google.com/open?id=18PPg4pn4sIc49ENrYSUDas_9Z2wsRAS1

I also keep trying to catch one of these with occasional bluestore debug setting, but no luck so far.

Actions #7

Updated by Alex Gorbachev almost 6 years ago

No longer seeing these after upgrade to 12.2.5, also the load is spread much better now with proper WAL/DB NVMe devices and enough OSDs

Actions #9

Updated by Sage Weil almost 3 years ago

  • Status changed from Need More Info to Closed
Actions

Also available in: Atom PDF