Project

General

Profile

Actions

Bug #36250

closed

ceph-osd process crashing

Added by Josh Haft over 5 years ago. Updated almost 5 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph-osd process crashes in thread msgr-worker. This happens with all OSDs in the cluster, roughly once per day at the peak frequency. It does seem to happen more often during evening/overnight hours when there is more load on the cluster. Originally posted on ml: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/030040.html

Version 12.2.2

From the log:
Sep 28 00:30:10 sn02 ceph-osd192103: 2018-09-28 00:30:10.399237 7fb5031f6700 -1 ** Caught signal (Aborted) *
in thread 7fb5031f6700 thread_name:msgr-worker-0

Stack:
#0 0x00007f9e738764ab in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1 0x000055925e1edab6 in reraise_fatal (signum=6) at /usr/src/debug/ceph-12.2.2/src/global/signal_handler.cc:74
#2 handle_fatal_signal (signum=6) at /usr/src/debug/ceph-12.2.2/src/global/signal_handler.cc:138
#3 <signal handler called>
#4 0x00007f9e7289f1f7 in _GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#5 0x00007f9e728a08e8 in __GI_abort () at abort.c:90
#6 0x00007f9e731a5ac5 in __gnu_cxx::
_verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#7 0x00007f9e731a3a36 in _cxxabiv1::_terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:38
#8 0x00007f9e731a3a63 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#9 0x00007f9e731fa345 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/thread.cc:92
#10 0x00007f9e7386ee25 in start_thread (arg=0x7f9e6ff94700) at pthread_create.c:308
#11 0x00007f9e7296234d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

I've uploaded the log from a process which crashed, debug_ms set to 5 long before the crash happened. Id from ceph-post-file: 83aa1468-7dc5-401a-82fd-22c344322efe

Actions

Also available in: Atom PDF