Project

General

Profile

Bug #42683

OSD Segmentation fault

Added by Antonio Falabella 7 months ago. Updated 7 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Dear support,
I have a small ceph cluster installed with nautilus 14.2.4 meant for a test for future larger deployment. The cluster consists of 8 machines:

kernel 3.10.0-693.21.1.el7.x86_64
CentOS Linux release 7

Two of them host the OSD services (20 8TB disks each). The cluster is running fine but on one of the machines just one OSD is unable to start with this error:

ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
1: (()+0xf5e0) [0x7fa7ec45c5e0]
2: (gsignal()+0x37) [0x7fa7eb46f1f7]
3: (abort()+0x148) [0x7fa7eb4708e8]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x56514f412a73]
5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x56514f412bf2]
6: (BitmapAllocator::init_add_free(unsigned long, unsigned long)+0x740) [0x56514fa43870]
7: (BlueStore::_open_alloc()+0x258) [0x56514f8e9ab8]
8: (BlueStore::_open_db_and_around(bool)+0x146) [0x56514f90b306]
9: (BlueStore::_mount(bool, bool)+0x6a4) [0x56514f949c24]
10: (OSD::init()+0x3aa) [0x56514f4bcefa]
11: (main()+0x14fa) [0x56514f4171da]
12: (__libc_start_main()+0xf5) [0x7fa7eb45bc05]
13: (()+0x4b2695) [0x56514f44c695]

The relevant configuration for the OSDs is:

[osd]
osd_recovery_max_active = 40
osd_max_backfills = 64
osd_memory_target = 2684354560

I searched around the issue tracker and it seems like the error has been already reported here:

https://tracker.ceph.com/issues/39334

but I crosschecked and I appear to be running a version with the bugfix mentioned there.
Could you please help me with this?

Antonio

osd-20.txt View (232 KB) Antonio Falabella, 11/07/2019 03:32 PM

osd-20.log View (151 KB) Antonio Falabella, 11/07/2019 04:22 PM

osd-20-first-error.log.tar.gz (246 KB) Antonio Falabella, 11/08/2019 10:15 AM


Related issues

Duplicates bluestore - Bug #42223: ceph-14.2.4/src/os/bluestore/fastbmap_allocator_impl.h: 750: FAILED ceph_assert(available >= allocated) Resolved

History

#1 Updated by Igor Fedotov 7 months ago

@Antonio - could you please provide the whole log for the failure.

#2 Updated by Antonio Falabella 7 months ago

Dear Igor,
I attached the log portion you required.

Thanks
Antonio

#3 Updated by Igor Fedotov 7 months ago

@Antonio - thanks for the info, but I need more...

First of all please preserve all the available logs for this specific OSD
1) Could you please set debug bluestore to 20 and restart OSD and collect the resulting log.
2) Please search for earlier failures over existing OSD logs, "__ceph_assert_fail" and/or "ceph version 14.2.4" look like a good candidate for grep.

PS: my general impression you're hitting https://tracker.ceph.com/issues/42223 with a bit different appearance

#4 Updated by Antonio Falabella 7 months ago

I attached the portion of the log produced after the restart with increased debug level, you can find it attached.
Thanks
Antonio

#5 Updated by Igor Fedotov 7 months ago

So brief log analysis:
Freelist init shows some garbage in DB for its key records

-9> 2019-11-07 17:08:33.207 7f6438c7edc0 10 freelist init size 0x557164041758 bytes_per_block 0x5571640415d8 blocks 0x557164041818 blocks_per_key 0x5571640417b8

which strengthens my assumption about being duplicate of https://tracker.ceph.com/issues/42223

So any previous assertions in earlier OSD logs?

#6 Updated by Antonio Falabella 7 months ago

No, any assertions before this. What we were doing was to test the lower limit at which the OSD could operate, given that our server have only 64GB of RAM. So first we set the limit to 2GB per OSD and stress tested the CephFS then 1GB per OSD and stress tested again. This last change caused the problem.
Given that this is a test cluster if we can fix the problem by cleaning up the garbage data you mentioned we can go for it, but I don't know how to do it except by purging adn deployng the OSD.

Antonio

#7 Updated by Igor Fedotov 7 months ago

@Antonio - IMO it doesn't make much sense to fix these specific parameters - who knows what else has been broken... Hence I suggest OSD redeployment as a final cure. May be makes sense to postpone this a bit for root cause analysis if possible..

And I'm still not assured that there were no previous issues. The first log snippet you shared contains lines for broken OSD startup and mentions some recovery:
2019-11-07 16:24:11.438 7f873c700dc0 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1573140251439726, "job": 1, "event": "recovery_started", "log_files": [3677]}
2019-11-07 16:24:11.438 7f873c700dc0 4 rocksdb: [db/db_impl_open.cc:583] Recovering log #3677 mode 0
2019-11-07 16:24:11.438 7f873c700dc0 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1573140251439772, "job": 1, "event": "recovery_finished"}

Which presumably means non-graceful earlier shutdown.

So I'd like to know what happened before the startup procedure shared in osd-20.txt

#8 Updated by Igor Fedotov 7 months ago

well, please disregard my words about recovery referenced in the log - they are present after regular shutdown as well. But I'd like to know what happened before the first broken startup anyway.
Was it graceful shutdown, tuning mem limit to 1Gb and immediate startup failure or something else, e.g. OSD worked for a while with this new limit, crashed and is unable to start then...

#9 Updated by Antonio Falabella 7 months ago

After the 1Gb tuning the cluster was all fine. Then we started our stress test that is fio with with increasing number of threads/block size/stripe unit. After few minutes the OSD went down. During the tests with the smallest block size (64k) which we found they are more stressing.

#10 Updated by Igor Fedotov 7 months ago

@Antonio, could you please share the log for this first crash

#11 Updated by Antonio Falabella 7 months ago

@Igor You can find attached the very first error portion of the log. I am afraid at that moment the debug level was not set to 20.

#12 Updated by Igor Fedotov 7 months ago

@Antonio, thanks a lot.

So the same pattern:
-49> 2019-11-06 13:08:33.333 7f30b5b93700 3 rocksdb: [db/db_impl_compaction_flush.cc:2660] Compaction error: Corruption: block checksum mismatch: expected 1564692794, got 2324967102 in db/003602.sst offset 0 size 3880

as in https://tracker.ceph.com/issues/42223

Marking duplicate...

#13 Updated by Igor Fedotov 7 months ago

  • Tracker changed from Support to Bug
  • Status changed from New to Duplicate
  • Regression set to No
  • Severity set to 3 - minor

#14 Updated by Igor Fedotov 7 months ago

@Antonio - would you be able to export bluefs for the broken OSD to regular file system and then share content of db/003602.sst file?

#15 Updated by Antonio Falabella 7 months ago

@Igor I don't know how exactly I should to do this, if you can give some instruction of point me to some docs I would go for it.

Thanks

#16 Updated by Igor Fedotov 7 months ago

Here it is.

The proper command is:
ceph-bluestore-tool --path <osd-path> --out-dir <destination dir> --command bluefs-export

Please make sure you have enough space at <destination dir>

#17 Updated by Antonio Falabella 7 months ago

@Igor here is a link to the file:

https://drive.google.com/open?id=1sd3507O58wyb0a1iGt4fjWjQGkoOgnwH

Thanks again for your highly professional support.

#18 Updated by Antonio Falabella 7 months ago

@Igor if you don't object I would scratch the OSD to keep testing the system.

#19 Updated by Igor Fedotov 7 months ago

@Antonio - yes, please go head.

#20 Updated by Igor Fedotov 7 months ago

  • Duplicates Bug #42223: ceph-14.2.4/src/os/bluestore/fastbmap_allocator_impl.h: 750: FAILED ceph_assert(available >= allocated) added

Also available in: Atom PDF