Project

General

Profile

Actions

Bug #24023

closed

Segfault on OSD in 12.2.5

Added by Alex Gorbachev almost 6 years ago. Updated almost 6 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2018-05-05 06:33:42.383231 7f83289a4700 -1 ** Caught signal (Segmentation fault) *
in thread 7f83289a4700 thread_name:safe_timer

ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
1: (()+0xa7cab4) [0x55cbee7c7ab4]
2: (()+0x11390) [0x7f8330515390]
3: [0x55cc0000005c]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

core dump at https://drive.google.com/open?id=1uWN1QIRY2mFXe52MS7k1XzxhdOOuxYow

full OSD log at https://drive.google.com/open?id=1_ZNR2Y9VV8riKCKFYPXAKDMQQoZsT-C5


Related issues 2 (0 open2 closed)

Related to RADOS - Bug #23431: OSD Segmentation fault in thread_name:safe_timerDuplicateBrad Hubbard03/21/2018

Actions
Is duplicate of RADOS - Bug #23352: osd: segfaults under normal operationResolvedBrad Hubbard03/14/2018

Actions
Actions #1

Updated by Alex Gorbachev almost 6 years ago

Another one occurred today on a different OSD:

2018-05-06 19:48:33.636221 7f0f55922700 -1 ** Caught signal (Segmentation fault) *
in thread 7f0f55922700 thread_name:safe_timer

ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
1: (()+0xa7cab4) [0x555b892fbab4]
2: (()+0x11390) [0x7f0f5d493390]
3: [0x555c00080000]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
-9346> 2018-04-25 05:01:32.221706 7f0f5efafe00 5 asok(0x555b935014a0) register_command perfcounters_dump hook 0x555b934b41b0
-9345> 2018-04-25 05:01:32.221736 7f0f5efafe00 5 asok(0x555b935014a0) register_command 1 hook 0x555b934b41b0
-9344> 2018-04-25 05:01:32.221742 7f0f5efafe00 5 asok(0x555b935014a0) register_command perf dump hook 0x555b934b41b0
-9343> 2018-04-25 05:01:32.221746 7f0f5efafe00 5 asok(0x555b935014a0) register_command perfcounters_schema hook 0x555b934b41b0
-9342> 2018-04-25 05:01:32.221752 7f0f5efafe00 5 asok(0x555b935014a0) register_command perf histogram dump hook 0x555b934b41b0
-9341> 2018-04-25 05:01:32.221756 7f0f5efafe00 5 asok(0x555b935014a0) register_command 2 hook 0x555b934b41b0
-9340> 2018-04-25 05:01:32.221762 7f0f5efafe00 5 asok(0x555b935014a0) register_command perf schema hook 0x555b934b41b0
-9339> 2018-04-25 05:01:32.221765 7f0f5efafe00 5 asok(0x555b935014a0) register_command perf histogram schema hook 0x555b934b41b0
-9338> 2018-04-25 05:01:32.221771 7f0f5efafe00 5 asok(0x555b935014a0) register_command perf reset hook 0x555b934b41b0
-9337> 2018-04-25 05:01:32.221775 7f0f5efafe00 5 asok(0x555b935014a0) register_command config show hook 0x555b934b41b0
-9336> 2018-04-25 05:01:32.221781 7f0f5efafe00 5 asok(0x555b935014a0) register_command config help hook 0x555b934b41b0
-9335> 2018-04-25 05:01:32.221785 7f0f5efafe00 5 asok(0x555b935014a0) register_command config set hook 0x555b934b41b0
-9334> 2018-04-25 05:01:32.221791 7f0f5efafe00 5 asok(0x555b935014a0) register_command config get hook 0x555b934b41b0
-9333> 2018-04-25 05:01:32.221794 7f0f5efafe00 5 asok(0x555b935014a0) register_command config diff hook 0x555b934b41b0
-9332> 2018-04-25 05:01:32.221800 7f0f5efafe00 5 asok(0x555b935014a0) register_command config diff get hook 0x555b934b41b0
-9331> 2018-04-25 05:01:32.221805 7f0f5efafe00 5 asok(0x555b935014a0) register_command log flush hook 0x555b934b41b0
-9330> 2018-04-25 05:01:32.221810 7f0f5efafe00 5 asok(0x555b935014a0) register_command log dump hook 0x555b934b41b0
-9329> 2018-04-25 05:01:32.221815 7f0f5efafe00 5 asok(0x555b935014a0) register_command log reopen hook 0x555b934b41b0
-9328> 2018-04-25 05:01:32.221829 7f0f5efafe00 5 asok(0x555b935014a0) register_command dump_mempools hook 0x555b937729e8
-9327> 2018-04-25 05:01:32.230596 7f0f5efafe00 0 set uid:gid to 64045:64045 (ceph:ceph)
-9326> 2018-04-25 05:01:32.230615 7f0f5efafe00 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 23577
-9325> 2018-04-25 05:01:32.237921 7f0f5efafe00 0 pidfile_write: ignore empty --pid-file
-9324> 2018-04-25 05:01:32.246657 7f0f5efafe00 0 load: jerasure load: lrc load: isa
-9323> 2018-04-25 05:01:32.246729 7f0f5efafe00 1 bdev create path /var/lib/ceph/osd/ceph-123/block type kernel
-9322> 2018-04-25 05:01:32.246741 7f0f5efafe00 1 bdev(0x555b93524d80 /var/lib/ceph/osd/ceph-123/block) open path /var/lib/ceph/osd/ceph-123/block
-9321> 2018-04-25 05:01:32.247000 7f0f5efafe00 1 bdev(0x555b93524d80 /var/lib/ceph/osd/ceph-123/block) open size 4000668520448 (0x3a37a6d1000, 3725 GB) block_size 4096 (4096 B) rotational
-9320> 2018-04-25 05:01:32.247108 7f0f5efafe00 1 bdev(0x555b93524d80 /var/lib/ceph/osd/ceph-123/block) close
-9319> 2018-04-25 05:01:32.556089 7f0f5efafe00 1 bdev create path /var/lib/ceph/osd/ceph-123/block type kernel

Actions #2

Updated by Alex Gorbachev almost 6 years ago

This is happening on a regular basis, 1-2 per day

Actions #3

Updated by Greg Farnum almost 6 years ago

  • Project changed from Ceph to RADOS
Actions #4

Updated by Alex Gorbachev almost 6 years ago

This continues to happen every day, usually during scrub

Actions #5

Updated by Alexander Morozov almost 6 years ago

Alex Gorbachev wrote:

This continues to happen every day, usually during scrub

I've faced with the same issue

May 20 10:09:49 kub01 ceph-osd[3924392]: *** Caught signal (Segmentation fault) **
May 20 10:09:49 kub01 ceph-osd[3924392]:  in thread 7f273a6c0700 thread_name:safe_timer
May 20 10:09:49 kub01 ceph-osd[3924392]:  ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
May 20 10:09:49 kub01 ceph-osd[3924392]:  1: (()+0xa7cab4) [0x55e81c3e6ab4]
May 20 10:09:49 kub01 ceph-osd[3924392]:  2: (()+0x11390) [0x7f2742231390]
May 20 10:09:49 kub01 ceph-osd[3924392]:  3: [0x55e82d042d80]
May 20 10:09:49 kub01 ceph-osd[3924392]: 2018-05-20 10:09:49.157946 7f273a6c0700 -1 *** Caught signal (Segmentation fault) **
May 20 10:09:49 kub01 ceph-osd[3924392]:  in thread 7f273a6c0700 thread_name:safe_timer
May 20 10:09:49 kub01 ceph-osd[3924392]:  ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
May 20 10:09:49 kub01 ceph-osd[3924392]:  1: (()+0xa7cab4) [0x55e81c3e6ab4]
May 20 10:09:49 kub01 ceph-osd[3924392]:  2: (()+0x11390) [0x7f2742231390]
May 20 10:09:49 kub01 ceph-osd[3924392]:  3: [0x55e82d042d80]
May 20 10:09:49 kub01 ceph-osd[3924392]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
May 20 10:09:49 kub01 ceph-osd[3924392]:      0> 2018-05-20 10:09:49.157946 7f273a6c0700 -1 *** Caught signal (Segmentation fault) **
May 20 10:09:49 kub01 ceph-osd[3924392]:  in thread 7f273a6c0700 thread_name:safe_timer
May 20 10:09:49 kub01 ceph-osd[3924392]:  ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
May 20 10:09:49 kub01 ceph-osd[3924392]:  1: (()+0xa7cab4) [0x55e81c3e6ab4]
May 20 10:09:49 kub01 ceph-osd[3924392]:  2: (()+0x11390) [0x7f2742231390]
May 20 10:09:49 kub01 ceph-osd[3924392]:  3: [0x55e82d042d80]
May 20 10:09:49 kub01 ceph-osd[3924392]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #6

Updated by Alexander Morozov almost 6 years ago

Alexander M wrote:

Alex Gorbachev wrote:

This continues to happen every day, usually during scrub

I've faced with the same issue

[...]

Distributor ID:    Ubuntu
Description:    Ubuntu 16.04.4 LTS
Release:    16.04
Codename:    xenial

ii  ceph                                12.2.5-1xenial                             amd64        distributed storage and file system
ii  ceph-base                           12.2.5-1xenial                             amd64        common ceph daemon libraries and management tools
ii  ceph-common                         12.2.5-1xenial                             amd64        common utilities to mount and interact with a ceph storage cluster
ii  ceph-fuse                           12.2.5-1xenial                             amd64        FUSE-based client for the Ceph distributed file system
ii  ceph-mds                            12.2.5-1xenial                             amd64        metadata server for the ceph distributed file system
ii  ceph-mgr                            12.2.5-1xenial                             amd64        manager for the ceph distributed storage system
ii  ceph-mon                            12.2.5-1xenial                             amd64        monitor server for the ceph storage system
ii  ceph-osd                            12.2.5-1xenial                             amd64        OSD server for the ceph storage system
ii  libcephfs2                          12.2.5-1xenial                             amd64        Ceph distributed file system client library
ii  python-cephfs                       12.2.5-1xenial                             amd64        Python 2 libraries for the Ceph libcephfs library
Actions #7

Updated by Josh Durgin almost 6 years ago

  • Related to Bug #23352: osd: segfaults under normal operation added
Actions #8

Updated by Josh Durgin almost 6 years ago

  • Related to Bug #23431: OSD Segmentation fault in thread_name:safe_timer added
Actions #9

Updated by Jan Krcmar almost 6 years ago

hi,

i've noticed similar/same segfault on my deployment. random segfaults on random osds appears under load or withnout load.

   -10> 2018-05-24 16:19:14.940493 7fc08baf7700  1 -- 10.2.4.31:6803/16 <== osd.28 10.2.4.28:0/16 193156 ==== osd_ping(ping e33877 stamp 2018-05-24 16:19:14.939266) v4 ==== 2004+0+0 (672660280 0 0) 0x55b971003a00 con 0x55b96ef8c000
    -9> 2018-05-24 16:19:14.940488 7fc08b2f6700  1 -- 10.2.4.31:6802/16 <== osd.28 10.2.4.28:0/16 193156 ==== osd_ping(ping e33877 stamp 2018-05-24 16:19:14.939266) v4 ==== 2004+0+0 (672660280 0 0) 0x55b964435400 con 0x55b96ed50800
    -8> 2018-05-24 16:19:14.940508 7fc08baf7700  1 -- 10.2.4.31:6803/16 --> 10.2.4.28:0/16 -- osd_ping(ping_reply e33877 stamp 2018-05-24 16:19:14.939266) v4 -- 0x55b97386e800 con 0
    -7> 2018-05-24 16:19:14.940554 7fc08b2f6700  1 -- 10.2.4.31:6802/16 --> 10.2.4.28:0/16 -- osd_ping(ping_reply e33877 stamp 2018-05-24 16:19:14.939266) v4 -- 0x55b96efb8600 con 0
    -6> 2018-05-24 16:19:15.130579 7fc08baf7700  5 -- 10.2.4.31:6802/16 >> 10.2.4.31:0/17 conn(0x55b96ebfe000 :6802 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=18 cs=1 l=1). rx osd.35 seq 192494 0x55b970e77800 osd_ping(ping e33877 
stamp 2018-05-24 16:19:15.129723) v4
    -5> 2018-05-24 16:19:15.130613 7fc08baf7700  1 -- 10.2.4.31:6802/16 <== osd.35 10.2.4.31:0/17 192494 ==== osd_ping(ping e33877 stamp 2018-05-24 16:19:15.129723) v4 ==== 2004+0+0 (1201433143 0 0) 0x55b970e77800 con 0x55b96ebfe000
    -4> 2018-05-24 16:19:15.130627 7fc08baf7700  1 -- 10.2.4.31:6802/16 --> 10.2.4.31:0/17 -- osd_ping(ping_reply e33877 stamp 2018-05-24 16:19:15.129723) v4 -- 0x55b97386e800 con 0
    -3> 2018-05-24 16:19:15.130605 7fc08c2f8700  5 -- 10.2.4.31:6803/16 >> 10.2.4.31:0/17 conn(0x55b96ebff800 :6803 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=18 cs=1 l=1). rx osd.35 seq 192494 0x55b970f47800 osd_ping(ping e33877 
stamp 2018-05-24 16:19:15.129723) v4
    -2> 2018-05-24 16:19:15.130642 7fc08c2f8700  1 -- 10.2.4.31:6803/16 <== osd.35 10.2.4.31:0/17 192494 ==== osd_ping(ping e33877 stamp 2018-05-24 16:19:15.129723) v4 ==== 2004+0+0 (1201433143 0 0) 0x55b970f47800 con 0x55b96ebff800
    -1> 2018-05-24 16:19:15.130677 7fc08c2f8700  1 -- 10.2.4.31:6803/16 --> 10.2.4.31:0/17 -- osd_ping(ping_reply e33877 stamp 2018-05-24 16:19:15.129723) v4 -- 0x55b9739af000 con 0
     0> 2018-05-24 16:19:15.140522 7fc087b10700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fc087b10700 thread_name:safe_timer

 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
 1: (()+0xa558bc) [0x55b9583408bc]
 2: (()+0xf8d0) [0x7fc08f19c8d0]
 3: [0x55b96ff49800]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

running kernel 3.16.0-5-amd64 on debian 8 (jessie).

fous

Actions #11

Updated by Brad Hubbard almost 6 years ago

Alex,

Why are we running multiple trackers for the same issue?

Can we close this as a duplicate?

Actions #12

Updated by Brad Hubbard almost 6 years ago

  • Status changed from New to Duplicate

Duplicate of 23352

Actions #13

Updated by Brad Hubbard almost 6 years ago

  • Related to deleted (Bug #23352: osd: segfaults under normal operation)
Actions #14

Updated by Brad Hubbard almost 6 years ago

  • Is duplicate of Bug #23352: osd: segfaults under normal operation added
Actions

Also available in: Atom PDF