Bug #47271
closedceph version 14.2.10-OSD fails
0%
Description
Hi
we have updated ceph from version 14.2.9 to version.
14.2.10 and since then we are getting osd crash and the osd if flapping.
added the full log can you please assist ?
root@admin01:/# ceph -s
cluster:
id: b41468a7-45b9-4812-a943-3b531a72ea6d
health: HEALTH_WARN
102 daemons have recently crashed
root@admin01:~# ceph crash info 2020-08-17_10:59:51.381771Z_88860476-55f8-4ab7-8739-72121035b218
{
"os_version_id": "18.04",
"assert_condition": "r 0",
"utsname_release": "4.15.0-111-generic",
"os_name": "Ubuntu",
"entity_name": "osd.8",
"assert_file": "/build/ceph-14.2.10/src/os/bluestore/BlueStore.cc",
"timestamp": "2020-08-17 10:59:51.381771Z",
"process_name": "ceph-osd",
"utsname_machine": "x86_64",
"assert_line": 11068,
"utsname_sysname": "Linux",
"os_version": "18.04.4 LTS (Bionic Beaver)",
"os_id": "ubuntu",
"assert_thread_name": "bstore_kv_sync",
"utsname_version": "#112-Ubuntu SMP Thu Jul 9 20:32:34 UTC 2020",
"backtrace": [
"(()+0x128a0) [0x7fa1cfa798a0]",
"(gsignal()+0xc7) [0x7fa1ce72bf47]",
"(abort()+0x141) [0x7fa1ce72d8b1]",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x55608325cebf]",
"(ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55608325d049]",
"(BlueStore::_kv_sync_thread()+0x1144) [0x5560837cd3b4]",
"(BlueStore::KVSyncThread::entry()+0xd) [0x5560837f024d]",
"(()+0x76db) [0x7fa1cfa6e6db]",
"(clone()+0x3f) [0x7fa1ce80ea3f]"
],
"utsname_hostname": "osd105",
"assert_msg": "/build/ceph-14.2.10/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_kv_sync_thread()' thread 7fa1bdd8b700 time 2020-08-17 05:59:51.202525\n/build/ceph-14.2.10/src/os/bluestore/BlueStore.cc: 11068: FAILED ceph_assert(r 0)\n",
"crash_id": "2020-08-17_10:59:51.381771Z_88860476-55f8-4ab7-8739-72121035b218",
"assert_func": "void BlueStore::_kv_sync_thread()",
"ceph_version": "14.2.10"
}
Files
Updated by Igor Fedotov over 3 years ago
- Project changed from Ceph to bluestore
- Category deleted (
OSD)
Updated by Igor Fedotov over 3 years ago
Could you please answer the following questions:
1) Is this happening to multiple OSDs?
2) Are OSDs able to start properly after the crash?
3) Did you run fsck for failing OSD?
4) Haven't you observed any issues with excessive memory usage since the upgrade for these nodes?
Updated by cephuser2345 user over 3 years ago
1) Is this happening to multiple OSDs?-yes all the osds has all so the same hdd module
2) Are OSDs able to start properly after the crash?-yes they are flapping going into up sate
3) Did you run fsck for failing OSD?-no didn't try it
4) Haven't you observed any issues with excessive memory usage since the upgrade for these nodes?-cant tell in accurate but yes we have increased the virtual machine to 24gb osd memory 4gb for disk (total of 6 disks ), and the changed the osd_target_memory to 2.5 for each osd
Updated by Igor Fedotov over 3 years ago
- Related to Feature #47718: intoduce means to detect/workaround spurios read errors in bluefs added
Updated by Igor Fedotov about 1 year ago
- Status changed from New to Closed
Looks like a kernel issue, good summary can be found at: https://tracker.ceph.com/issues/22464#note-72