Bug #42683
closedOSD Segmentation fault
0%
Description
Dear support,
I have a small ceph cluster installed with nautilus 14.2.4 meant for a test for future larger deployment. The cluster consists of 8 machines:
kernel 3.10.0-693.21.1.el7.x86_64
CentOS Linux release 7
Two of them host the OSD services (20 8TB disks each). The cluster is running fine but on one of the machines just one OSD is unable to start with this error:
ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
1: (()+0xf5e0) [0x7fa7ec45c5e0]
2: (gsignal()+0x37) [0x7fa7eb46f1f7]
3: (abort()+0x148) [0x7fa7eb4708e8]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x56514f412a73]
5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x56514f412bf2]
6: (BitmapAllocator::init_add_free(unsigned long, unsigned long)+0x740) [0x56514fa43870]
7: (BlueStore::_open_alloc()+0x258) [0x56514f8e9ab8]
8: (BlueStore::_open_db_and_around(bool)+0x146) [0x56514f90b306]
9: (BlueStore::_mount(bool, bool)+0x6a4) [0x56514f949c24]
10: (OSD::init()+0x3aa) [0x56514f4bcefa]
11: (main()+0x14fa) [0x56514f4171da]
12: (__libc_start_main()+0xf5) [0x7fa7eb45bc05]
13: (()+0x4b2695) [0x56514f44c695]
The relevant configuration for the OSDs is:
[osd]
osd_recovery_max_active = 40
osd_max_backfills = 64
osd_memory_target = 2684354560
I searched around the issue tracker and it seems like the error has been already reported here:
https://tracker.ceph.com/issues/39334
but I crosschecked and I appear to be running a version with the bugfix mentioned there.
Could you please help me with this?
Antonio
Files
Updated by Igor Fedotov over 4 years ago
@Antonio - could you please provide the whole log for the failure.
Updated by Antonio Falabella over 4 years ago
- File osd-20.txt osd-20.txt added
Dear Igor,
I attached the log portion you required.
Thanks
Antonio
Updated by Igor Fedotov over 4 years ago
@Antonio - thanks for the info, but I need more...
First of all please preserve all the available logs for this specific OSD
1) Could you please set debug bluestore to 20 and restart OSD and collect the resulting log.
2) Please search for earlier failures over existing OSD logs, "__ceph_assert_fail" and/or "ceph version 14.2.4" look like a good candidate for grep.
PS: my general impression you're hitting https://tracker.ceph.com/issues/42223 with a bit different appearance
Updated by Antonio Falabella over 4 years ago
- File osd-20.log osd-20.log added
I attached the portion of the log produced after the restart with increased debug level, you can find it attached.
Thanks
Antonio
Updated by Igor Fedotov over 4 years ago
So brief log analysis:
Freelist init shows some garbage in DB for its key records
-9> 2019-11-07 17:08:33.207 7f6438c7edc0 10 freelist init size 0x557164041758 bytes_per_block 0x5571640415d8 blocks 0x557164041818 blocks_per_key 0x5571640417b8
which strengthens my assumption about being duplicate of https://tracker.ceph.com/issues/42223
So any previous assertions in earlier OSD logs?
Updated by Antonio Falabella over 4 years ago
No, any assertions before this. What we were doing was to test the lower limit at which the OSD could operate, given that our server have only 64GB of RAM. So first we set the limit to 2GB per OSD and stress tested the CephFS then 1GB per OSD and stress tested again. This last change caused the problem.
Given that this is a test cluster if we can fix the problem by cleaning up the garbage data you mentioned we can go for it, but I don't know how to do it except by purging adn deployng the OSD.
Antonio
Updated by Igor Fedotov over 4 years ago
@Antonio - IMO it doesn't make much sense to fix these specific parameters - who knows what else has been broken... Hence I suggest OSD redeployment as a final cure. May be makes sense to postpone this a bit for root cause analysis if possible..
And I'm still not assured that there were no previous issues. The first log snippet you shared contains lines for broken OSD startup and mentions some recovery:
2019-11-07 16:24:11.438 7f873c700dc0 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1573140251439726, "job": 1, "event": "recovery_started", "log_files": [3677]}
2019-11-07 16:24:11.438 7f873c700dc0 4 rocksdb: [db/db_impl_open.cc:583] Recovering log #3677 mode 0
2019-11-07 16:24:11.438 7f873c700dc0 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1573140251439772, "job": 1, "event": "recovery_finished"}
Which presumably means non-graceful earlier shutdown.
So I'd like to know what happened before the startup procedure shared in osd-20.txt
Updated by Igor Fedotov over 4 years ago
well, please disregard my words about recovery referenced in the log - they are present after regular shutdown as well. But I'd like to know what happened before the first broken startup anyway.
Was it graceful shutdown, tuning mem limit to 1Gb and immediate startup failure or something else, e.g. OSD worked for a while with this new limit, crashed and is unable to start then...
Updated by Antonio Falabella over 4 years ago
After the 1Gb tuning the cluster was all fine. Then we started our stress test that is fio with with increasing number of threads/block size/stripe unit. After few minutes the OSD went down. During the tests with the smallest block size (64k) which we found they are more stressing.
Updated by Igor Fedotov over 4 years ago
@Antonio, could you please share the log for this first crash
Updated by Antonio Falabella over 4 years ago
@Igor Gajowiak You can find attached the very first error portion of the log. I am afraid at that moment the debug level was not set to 20.
Updated by Igor Fedotov over 4 years ago
@Antonio, thanks a lot.
So the same pattern:
-49> 2019-11-06 13:08:33.333 7f30b5b93700 3 rocksdb: [db/db_impl_compaction_flush.cc:2660] Compaction error: Corruption: block checksum mismatch: expected 1564692794, got 2324967102 in db/003602.sst offset 0 size 3880
as in https://tracker.ceph.com/issues/42223
Marking duplicate...
Updated by Igor Fedotov over 4 years ago
- Tracker changed from Support to Bug
- Status changed from New to Duplicate
- Regression set to No
- Severity set to 3 - minor
Duplicate of https://tracker.ceph.com/issues/42223
Updated by Igor Fedotov over 4 years ago
@Antonio - would you be able to export bluefs for the broken OSD to regular file system and then share content of db/003602.sst file?
Updated by Antonio Falabella over 4 years ago
@Igor Gajowiak I don't know how exactly I should to do this, if you can give some instruction of point me to some docs I would go for it.
Thanks
Updated by Igor Fedotov over 4 years ago
Here it is.
The proper command is:
ceph-bluestore-tool --path <osd-path> --out-dir <destination dir> --command bluefs-export
Please make sure you have enough space at <destination dir>
Updated by Antonio Falabella over 4 years ago
@Igor Gajowiak here is a link to the file:
https://drive.google.com/open?id=1sd3507O58wyb0a1iGt4fjWjQGkoOgnwH
Thanks again for your highly professional support.
Updated by Antonio Falabella over 4 years ago
@Igor Gajowiak if you don't object I would scratch the OSD to keep testing the system.
Updated by Igor Fedotov over 4 years ago
- Is duplicate of Bug #42223: ceph-14.2.4/src/os/bluestore/fastbmap_allocator_impl.h: 750: FAILED ceph_assert(available >= allocated) added