Bug #42683: OSD Segmentation fault - bluestore - Ceph

Actions

Copy link

Bug #42683

closed

OSD Segmentation fault

Added by Antonio Falabella over 4 years ago. Updated over 4 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v14.2.4

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Dear support,
I have a small ceph cluster installed with nautilus 14.2.4 meant for a test for future larger deployment. The cluster consists of 8 machines:

kernel 3.10.0-693.21.1.el7.x86_64
CentOS Linux release 7

Two of them host the OSD services (20 8TB disks each). The cluster is running fine but on one of the machines just one OSD is unable to start with this error:

ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
 1: (()+0xf5e0) [0x7fa7ec45c5e0]
 2: (gsignal()+0x37) [0x7fa7eb46f1f7]
 3: (abort()+0x148) [0x7fa7eb4708e8]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x56514f412a73]
 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x56514f412bf2]
 6: (BitmapAllocator::init_add_free(unsigned long, unsigned long)+0x740) [0x56514fa43870]
 7: (BlueStore::_open_alloc()+0x258) [0x56514f8e9ab8]
 8: (BlueStore::_open_db_and_around(bool)+0x146) [0x56514f90b306]
 9: (BlueStore::_mount(bool, bool)+0x6a4) [0x56514f949c24]
 10: (OSD::init()+0x3aa) [0x56514f4bcefa]
 11: (main()+0x14fa) [0x56514f4171da]
 12: (__libc_start_main()+0xf5) [0x7fa7eb45bc05]
 13: (()+0x4b2695) [0x56514f44c695]

The relevant configuration for the OSDs is:

[osd]
osd_recovery_max_active = 40
osd_max_backfills = 64
osd_memory_target = 2684354560

I searched around the issue tracker and it seems like the error has been already reported here:

https://tracker.ceph.com/issues/39334

but I crosschecked and I appear to be running a version with the bugfix mentioned there.
Could you please help me with this?

Antonio

Files

Download all files

osd-20.txt (232 KB) osd-20.txt		Antonio Falabella, 11/07/2019 03:32 PM
osd-20.log (151 KB) osd-20.log		Antonio Falabella, 11/07/2019 04:22 PM
osd-20-first-error.log.tar.gz (246 KB) osd-20-first-error.log.tar.gz		Antonio Falabella, 11/08/2019 10:15 AM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Igor Fedotov over 4 years ago

@Antonio - could you please provide the whole log for the failure.

Actions

Copy link

Updated by Antonio Falabella over 4 years ago

File osd-20.txt osd-20.txt added

Dear Igor,
I attached the log portion you required.

Thanks
Antonio

Actions

Copy link

Updated by Igor Fedotov over 4 years ago

@Antonio - thanks for the info, but I need more...

First of all please preserve all the available logs for this specific OSD
1) Could you please set debug bluestore to 20 and restart OSD and collect the resulting log.
2) Please search for earlier failures over existing OSD logs, "__ceph_assert_fail" and/or "ceph version 14.2.4" look like a good candidate for grep.

PS: my general impression you're hitting https://tracker.ceph.com/issues/42223 with a bit different appearance

Actions

Copy link

Updated by Antonio Falabella over 4 years ago

File osd-20.log osd-20.log added

I attached the portion of the log produced after the restart with increased debug level, you can find it attached.
Thanks
Antonio

Actions

Copy link

Updated by Igor Fedotov over 4 years ago

So brief log analysis:
Freelist init shows some garbage in DB for its key records

-9> 2019-11-07 17:08:33.207 7f6438c7edc0 10 freelist init size 0x557164041758 bytes_per_block 0x5571640415d8 blocks 0x557164041818 blocks_per_key 0x5571640417b8

which strengthens my assumption about being duplicate of https://tracker.ceph.com/issues/42223

So any previous assertions in earlier OSD logs?

Actions

Copy link

Updated by Antonio Falabella over 4 years ago

No, any assertions before this. What we were doing was to test the lower limit at which the OSD could operate, given that our server have only 64GB of RAM. So first we set the limit to 2GB per OSD and stress tested the CephFS then 1GB per OSD and stress tested again. This last change caused the problem.
Given that this is a test cluster if we can fix the problem by cleaning up the garbage data you mentioned we can go for it, but I don't know how to do it except by purging adn deployng the OSD.

Antonio

Actions

Copy link

Updated by Igor Fedotov over 4 years ago

@Antonio - IMO it doesn't make much sense to fix these specific parameters - who knows what else has been broken... Hence I suggest OSD redeployment as a final cure. May be makes sense to postpone this a bit for root cause analysis if possible..

And I'm still not assured that there were no previous issues. The first log snippet you shared contains lines for broken OSD startup and mentions some recovery:
2019-11-07 16:24:11.438 7f873c700dc0 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1573140251439726, "job": 1, "event": "recovery_started", "log_files": [3677]}
2019-11-07 16:24:11.438 7f873c700dc0 4 rocksdb: [db/db_impl_open.cc:583] Recovering log #3677 mode 0
2019-11-07 16:24:11.438 7f873c700dc0 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1573140251439772, "job": 1, "event": "recovery_finished"}

Which presumably means non-graceful earlier shutdown.

So I'd like to know what happened before the startup procedure shared in osd-20.txt

Actions

Copy link

Updated by Igor Fedotov over 4 years ago

well, please disregard my words about recovery referenced in the log - they are present after regular shutdown as well. But I'd like to know what happened before the first broken startup anyway.
Was it graceful shutdown, tuning mem limit to 1Gb and immediate startup failure or something else, e.g. OSD worked for a while with this new limit, crashed and is unable to start then...

Actions

Copy link

Updated by Antonio Falabella over 4 years ago

After the 1Gb tuning the cluster was all fine. Then we started our stress test that is fio with with increasing number of threads/block size/stripe unit. After few minutes the OSD went down. During the tests with the smallest block size (64k) which we found they are more stressing.

Actions

Copy link

#10

Updated by Igor Fedotov over 4 years ago

@Antonio, could you please share the log for this first crash

Actions

Copy link

#11

Updated by Antonio Falabella over 4 years ago

File osd-20-first-error.log.tar.gz osd-20-first-error.log.tar.gz added

@Igor Gajowiak You can find attached the very first error portion of the log. I am afraid at that moment the debug level was not set to 20.

Actions

Copy link

#12

Updated by Igor Fedotov over 4 years ago

@Antonio, thanks a lot.

So the same pattern:
-49> 2019-11-06 13:08:33.333 7f30b5b93700 3 rocksdb: [db/db_impl_compaction_flush.cc:2660] Compaction error: Corruption: block checksum mismatch: expected 1564692794, got 2324967102 in db/003602.sst offset 0 size 3880

as in https://tracker.ceph.com/issues/42223

Marking duplicate...

Actions

Copy link

#13