Project

General

Profile

Actions

Bug #47271

closed

ceph version 14.2.10-OSD fails

Added by cephuser2345 user over 3 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
ceph version 14.2.10-OSD fails
Backport:
Regression:
Yes
Severity:
2 - major
Reviewed:
09/02/2020
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi
we have updated ceph from version 14.2.9 to version.
14.2.10 and since then we are getting osd crash and the osd if flapping.
added the full log can you please assist ?

root@admin01:/# ceph -s
cluster:
id: b41468a7-45b9-4812-a943-3b531a72ea6d
health: HEALTH_WARN
102 daemons have recently crashed

root@admin01:~# ceph crash info 2020-08-17_10:59:51.381771Z_88860476-55f8-4ab7-8739-72121035b218 {
"os_version_id": "18.04",
"assert_condition": "r 0",
"utsname_release": "4.15.0-111-generic",
"os_name": "Ubuntu",
"entity_name": "osd.8",
"assert_file": "/build/ceph-14.2.10/src/os/bluestore/BlueStore.cc",
"timestamp": "2020-08-17 10:59:51.381771Z",
"process_name": "ceph-osd",
"utsname_machine": "x86_64",
"assert_line": 11068,
"utsname_sysname": "Linux",
"os_version": "18.04.4 LTS (Bionic Beaver)",
"os_id": "ubuntu",
"assert_thread_name": "bstore_kv_sync",
"utsname_version": "#112-Ubuntu SMP Thu Jul 9 20:32:34 UTC 2020",
"backtrace": [
"(()+0x128a0) [0x7fa1cfa798a0]",
"(gsignal()+0xc7) [0x7fa1ce72bf47]",
"(abort()+0x141) [0x7fa1ce72d8b1]",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x55608325cebf]",
"(ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55608325d049]",
"(BlueStore::_kv_sync_thread()+0x1144) [0x5560837cd3b4]",
"(BlueStore::KVSyncThread::entry()+0xd) [0x5560837f024d]",
"(()+0x76db) [0x7fa1cfa6e6db]",
"(clone()+0x3f) [0x7fa1ce80ea3f]"
],
"utsname_hostname": "osd105",
"assert_msg": "/build/ceph-14.2.10/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_kv_sync_thread()' thread 7fa1bdd8b700 time 2020-08-17 05:59:51.202525\n/build/ceph-14.2.10/src/os/bluestore/BlueStore.cc: 11068: FAILED ceph_assert(r 0)\n",
"crash_id": "2020-08-17_10:59:51.381771Z_88860476-55f8-4ab7-8739-72121035b218",
"assert_func": "void BlueStore::_kv_sync_thread()",
"ceph_version": "14.2.10"
}


Files

log-1-osd-crash.txt (984 KB) log-1-osd-crash.txt cephuser2345 user, 09/02/2020 12:28 PM

Related issues 1 (0 open1 closed)

Related to bluestore - Feature #47718: intoduce means to detect/workaround spurios read errors in bluefsResolved

Actions
Actions #1

Updated by Igor Fedotov over 3 years ago

  • Project changed from Ceph to bluestore
  • Category deleted (OSD)
Actions #2

Updated by Igor Fedotov over 3 years ago

Could you please answer the following questions:

1) Is this happening to multiple OSDs?

2) Are OSDs able to start properly after the crash?

3) Did you run fsck for failing OSD?

4) Haven't you observed any issues with excessive memory usage since the upgrade for these nodes?

Actions #3

Updated by cephuser2345 user over 3 years ago

1) Is this happening to multiple OSDs?-yes all the osds has all so the same hdd module

2) Are OSDs able to start properly after the crash?-yes they are flapping going into up sate

3) Did you run fsck for failing OSD?-no didn't try it

4) Haven't you observed any issues with excessive memory usage since the upgrade for these nodes?-cant tell in accurate but yes we have increased the virtual machine to 24gb osd memory 4gb for disk (total of 6 disks ), and the changed the osd_target_memory to 2.5 for each osd

Actions #4

Updated by Igor Fedotov over 3 years ago

  • Related to Feature #47718: intoduce means to detect/workaround spurios read errors in bluefs added
Actions #5

Updated by Igor Fedotov about 1 year ago

  • Status changed from New to Closed

Looks like a kernel issue, good summary can be found at: https://tracker.ceph.com/issues/22464#note-72

Actions

Also available in: Atom PDF