Bug #23599
closedSegfault while scrubbing Bluestore OSD
0%
Description
Core dump at https://drive.google.com/open?id=13MupSyHsY_zA5CM7DOILG8GA9hmQqX24
0> 2018-04-08 21:56:41.911362 7f4d3fdc7700 -1 ** Caught signal (Segmentation fault) *
in thread 7f4d3fdc7700 thread_name:bstore_mempool
ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)
1: (()+0xa74234) [0x55f5482f8234]
2: (()+0x11390) [0x7f4d4b940390]
3: (BlueStore::TwoQCache::_trim(unsigned long, unsigned long)+0x518) [0x55f5481a84b8]
4: (BlueStore::Cache::trim(unsigned long, float, float, float)+0x4e4) [0x55f548177594]
5: (BlueStore::MempoolThread::entry()+0x155) [0x55f54817ddd5]
6: (()+0x76ba) [0x7f4d4b9366ba]
7: (clone()+0x6d) [0x7f4d4a9ad41d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
dmesg:
[126029.543698] safe_timer28863: segfault at 8d ip 00007fa9ad4dcccb sp 00007fa9a6629f70 error 4 in libgcc_s.so.1[7fa9ad4ce000+16000]
Updated by Igor Fedotov about 6 years ago
This looks similar to https://tracker.ceph.com/issues/21259 that's already marked as resolved in v12.2.5
I'm not completely sure about that hence could you please give that fix a try and report back when done.
It would be also great if you collect OSD logs for the case with debug bluestore = 20
Updated by Igor Fedotov about 6 years ago
- Status changed from New to Need More Info
Updated by Alex Gorbachev about 6 years ago
I will update to 12.2.5 when available and report back. This is in production, so I am also wondering if the debug settings are going to impact performance - during scrub I get a lot of these segfaults now.
Updated by Igor Fedotov about 6 years ago
Alex Gorbachev wrote:
I will update to 12.2.5 when available and report back. This is in production, so I am also wondering if the debug settings are going to impact performance - during scrub I get a lot of these segfaults now.
Unfortunately yes - they might affect the performance.
Updated by Alex Gorbachev about 6 years ago
OK thanks - this pretty much happens to some of our 342 OSDs on every deep scrub operation. I will try to set the debug briefly and collect logs
Updated by Alex Gorbachev about 6 years ago
Had an OSD crash today without deep scrub, log and core dump available at:
https://drive.google.com/open?id=1MFNdn7uoovQ05SdTE_cCIfmMDigBMlw8
https://drive.google.com/open?id=18PPg4pn4sIc49ENrYSUDas_9Z2wsRAS1
I also keep trying to catch one of these with occasional bluestore debug setting, but no luck so far.
Updated by Alex Gorbachev almost 6 years ago
No longer seeing these after upgrade to 12.2.5, also the load is spread much better now with proper WAL/DB NVMe devices and enough OSDs
Updated by Beom-Seok Park almost 6 years ago
got segfault in v12.2.5 too.
coredump at:
https://drive.google.com/open?id=1P34QCnhfh7v_3wPqSceHows1yrTp6A0N
https://drive.google.com/open?id=1j_RuNpjymBNwIsQcHzHMFfGEC-oSMMKa
https://drive.google.com/open?id=17MD0ib7LyVGQC4F8njsBh_q4PCrJ5ACc
osd log at:
https://drive.google.com/open?id=1J3ghhM2bXHgLvRXtOaVc6C0quZ1lF_hY
Updated by Sage Weil almost 3 years ago
- Status changed from Need More Info to Closed