Project

General

Profile

Actions

Bug #48163

closed

osd: osd crash due to FAILED ceph_assert(current_best)

Added by wencong wan over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

More than 90 osds crash in my cluster, crash info is as follows:

"os_version_id": "7", 
"assert_condition": "current_best",
"utsname_release": "4.16.13-1.el7.elrepo.x86_64",
"os_name": "CentOS Linux",
"entity_name": "osd.751",
"assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/msg/async/Stack.cc",
"timestamp": "2020-11-10 10:50:26.949807Z",
"process_name": "ceph-osd",
"utsname_machine": "x86_64",
"assert_line": 167,
"utsname_sysname": "Linux",
"os_version": "7 (Core)",
"os_id": "centos",
"assert_thread_name": "msgr-worker-0",
"utsname_version": "#1 SMP Wed May 30 14:31:51 EDT 2018",
"backtrace": [
"(()+0xf5d0) [0x7fda5ddeb5d0]",
"(gsignal()+0x37) [0x7fda5cbe22c7]",
"(abort()+0x148) [0x7fda5cbe39b8]",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x5556cc476a9c]",
"(()+0x4cac15) [0x5556cc476c15]",
"(NetworkStack::get_worker()+0x293) [0x5556ccc9ea53]",
"(Processor::accept()+0x387) [0x5556ccc936f7]",
"(EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >)+0xa15) [0x5556ccc99ae5]",
"(()+0xcf2247) [0x5556ccc9e247]",
"(()+0x11c599f) [0x5556cd17199f]",
"(()+0x7dd5) [0x7fda5dde3dd5]",
"(clone()+0x6d) [0x7fda5ccaa02d]"
],
"utsname_hostname": "ceph-63",
"assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/msg/async/Stack.cc: In function 'virtual Worker
NetworkStack::get_worker()' thread 7fda5a6ba700 time 2020-11-10 18:50:26.942773\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/msg/async/Stack.cc: 167: FAILED ceph_assert(current_best)\n",
"crash_id": "2020-11-10_10:50:26.949807Z_f7399fa3-1130-4ac8-a8ed-1e74f049dfce",
"assert_func": "virtual Worker* NetworkStack::get_worker()",
"ceph_version": "14.2.8"

Files

QQ图片20201111141557.png (20.5 KB) QQ图片20201111141557.png wencong wan, 11/11/2020 06:16 AM
Actions #1

Updated by Brad Hubbard over 3 years ago

  • Project changed from mgr to RADOS
Actions #2

Updated by wencong wan over 3 years ago

Perf dump shows the crashed osd has too many active connections. Finally, we found some ceph-fuse clients on other hosts(connected to a destroyed ceph cluster) are trying to communicate with these osds.

Actions #3

Updated by wencong wan over 3 years ago

Please close this issue.

Actions #4

Updated by Neha Ojha over 3 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF