Bug #24023
closedSegfault on OSD in 12.2.5
0%
Description
2018-05-05 06:33:42.383231 7f83289a4700 -1 ** Caught signal (Segmentation fault) *
in thread 7f83289a4700 thread_name:safe_timer
ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
1: (()+0xa7cab4) [0x55cbee7c7ab4]
2: (()+0x11390) [0x7f8330515390]
3: [0x55cc0000005c]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
core dump at https://drive.google.com/open?id=1uWN1QIRY2mFXe52MS7k1XzxhdOOuxYow
full OSD log at https://drive.google.com/open?id=1_ZNR2Y9VV8riKCKFYPXAKDMQQoZsT-C5
Updated by Alex Gorbachev almost 6 years ago
Another one occurred today on a different OSD:
2018-05-06 19:48:33.636221 7f0f55922700 -1 ** Caught signal (Segmentation fault) *
in thread 7f0f55922700 thread_name:safe_timer
ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
1: (()+0xa7cab4) [0x555b892fbab4]
2: (()+0x11390) [0x7f0f5d493390]
3: [0x555c00080000]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
-9346> 2018-04-25 05:01:32.221706 7f0f5efafe00 5 asok(0x555b935014a0) register_command perfcounters_dump hook 0x555b934b41b0
-9345> 2018-04-25 05:01:32.221736 7f0f5efafe00 5 asok(0x555b935014a0) register_command 1 hook 0x555b934b41b0
-9344> 2018-04-25 05:01:32.221742 7f0f5efafe00 5 asok(0x555b935014a0) register_command perf dump hook 0x555b934b41b0
-9343> 2018-04-25 05:01:32.221746 7f0f5efafe00 5 asok(0x555b935014a0) register_command perfcounters_schema hook 0x555b934b41b0
-9342> 2018-04-25 05:01:32.221752 7f0f5efafe00 5 asok(0x555b935014a0) register_command perf histogram dump hook 0x555b934b41b0
-9341> 2018-04-25 05:01:32.221756 7f0f5efafe00 5 asok(0x555b935014a0) register_command 2 hook 0x555b934b41b0
-9340> 2018-04-25 05:01:32.221762 7f0f5efafe00 5 asok(0x555b935014a0) register_command perf schema hook 0x555b934b41b0
-9339> 2018-04-25 05:01:32.221765 7f0f5efafe00 5 asok(0x555b935014a0) register_command perf histogram schema hook 0x555b934b41b0
-9338> 2018-04-25 05:01:32.221771 7f0f5efafe00 5 asok(0x555b935014a0) register_command perf reset hook 0x555b934b41b0
-9337> 2018-04-25 05:01:32.221775 7f0f5efafe00 5 asok(0x555b935014a0) register_command config show hook 0x555b934b41b0
-9336> 2018-04-25 05:01:32.221781 7f0f5efafe00 5 asok(0x555b935014a0) register_command config help hook 0x555b934b41b0
-9335> 2018-04-25 05:01:32.221785 7f0f5efafe00 5 asok(0x555b935014a0) register_command config set hook 0x555b934b41b0
-9334> 2018-04-25 05:01:32.221791 7f0f5efafe00 5 asok(0x555b935014a0) register_command config get hook 0x555b934b41b0
-9333> 2018-04-25 05:01:32.221794 7f0f5efafe00 5 asok(0x555b935014a0) register_command config diff hook 0x555b934b41b0
-9332> 2018-04-25 05:01:32.221800 7f0f5efafe00 5 asok(0x555b935014a0) register_command config diff get hook 0x555b934b41b0
-9331> 2018-04-25 05:01:32.221805 7f0f5efafe00 5 asok(0x555b935014a0) register_command log flush hook 0x555b934b41b0
-9330> 2018-04-25 05:01:32.221810 7f0f5efafe00 5 asok(0x555b935014a0) register_command log dump hook 0x555b934b41b0
-9329> 2018-04-25 05:01:32.221815 7f0f5efafe00 5 asok(0x555b935014a0) register_command log reopen hook 0x555b934b41b0
-9328> 2018-04-25 05:01:32.221829 7f0f5efafe00 5 asok(0x555b935014a0) register_command dump_mempools hook 0x555b937729e8
-9327> 2018-04-25 05:01:32.230596 7f0f5efafe00 0 set uid:gid to 64045:64045 (ceph:ceph)
-9326> 2018-04-25 05:01:32.230615 7f0f5efafe00 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 23577
-9325> 2018-04-25 05:01:32.237921 7f0f5efafe00 0 pidfile_write: ignore empty --pid-file
-9324> 2018-04-25 05:01:32.246657 7f0f5efafe00 0 load: jerasure load: lrc load: isa
-9323> 2018-04-25 05:01:32.246729 7f0f5efafe00 1 bdev create path /var/lib/ceph/osd/ceph-123/block type kernel
-9322> 2018-04-25 05:01:32.246741 7f0f5efafe00 1 bdev(0x555b93524d80 /var/lib/ceph/osd/ceph-123/block) open path /var/lib/ceph/osd/ceph-123/block
-9321> 2018-04-25 05:01:32.247000 7f0f5efafe00 1 bdev(0x555b93524d80 /var/lib/ceph/osd/ceph-123/block) open size 4000668520448 (0x3a37a6d1000, 3725 GB) block_size 4096 (4096 B) rotational
-9320> 2018-04-25 05:01:32.247108 7f0f5efafe00 1 bdev(0x555b93524d80 /var/lib/ceph/osd/ceph-123/block) close
-9319> 2018-04-25 05:01:32.556089 7f0f5efafe00 1 bdev create path /var/lib/ceph/osd/ceph-123/block type kernel
Updated by Alex Gorbachev almost 6 years ago
This is happening on a regular basis, 1-2 per day
Updated by Alex Gorbachev almost 6 years ago
This continues to happen every day, usually during scrub
Updated by Alexander Morozov almost 6 years ago
Alex Gorbachev wrote:
This continues to happen every day, usually during scrub
I've faced with the same issue
May 20 10:09:49 kub01 ceph-osd[3924392]: *** Caught signal (Segmentation fault) ** May 20 10:09:49 kub01 ceph-osd[3924392]: in thread 7f273a6c0700 thread_name:safe_timer May 20 10:09:49 kub01 ceph-osd[3924392]: ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable) May 20 10:09:49 kub01 ceph-osd[3924392]: 1: (()+0xa7cab4) [0x55e81c3e6ab4] May 20 10:09:49 kub01 ceph-osd[3924392]: 2: (()+0x11390) [0x7f2742231390] May 20 10:09:49 kub01 ceph-osd[3924392]: 3: [0x55e82d042d80] May 20 10:09:49 kub01 ceph-osd[3924392]: 2018-05-20 10:09:49.157946 7f273a6c0700 -1 *** Caught signal (Segmentation fault) ** May 20 10:09:49 kub01 ceph-osd[3924392]: in thread 7f273a6c0700 thread_name:safe_timer May 20 10:09:49 kub01 ceph-osd[3924392]: ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable) May 20 10:09:49 kub01 ceph-osd[3924392]: 1: (()+0xa7cab4) [0x55e81c3e6ab4] May 20 10:09:49 kub01 ceph-osd[3924392]: 2: (()+0x11390) [0x7f2742231390] May 20 10:09:49 kub01 ceph-osd[3924392]: 3: [0x55e82d042d80] May 20 10:09:49 kub01 ceph-osd[3924392]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. May 20 10:09:49 kub01 ceph-osd[3924392]: 0> 2018-05-20 10:09:49.157946 7f273a6c0700 -1 *** Caught signal (Segmentation fault) ** May 20 10:09:49 kub01 ceph-osd[3924392]: in thread 7f273a6c0700 thread_name:safe_timer May 20 10:09:49 kub01 ceph-osd[3924392]: ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable) May 20 10:09:49 kub01 ceph-osd[3924392]: 1: (()+0xa7cab4) [0x55e81c3e6ab4] May 20 10:09:49 kub01 ceph-osd[3924392]: 2: (()+0x11390) [0x7f2742231390] May 20 10:09:49 kub01 ceph-osd[3924392]: 3: [0x55e82d042d80] May 20 10:09:49 kub01 ceph-osd[3924392]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Alexander Morozov almost 6 years ago
Alexander M wrote:
Alex Gorbachev wrote:
This continues to happen every day, usually during scrub
I've faced with the same issue
[...]
Distributor ID: Ubuntu Description: Ubuntu 16.04.4 LTS Release: 16.04 Codename: xenial ii ceph 12.2.5-1xenial amd64 distributed storage and file system ii ceph-base 12.2.5-1xenial amd64 common ceph daemon libraries and management tools ii ceph-common 12.2.5-1xenial amd64 common utilities to mount and interact with a ceph storage cluster ii ceph-fuse 12.2.5-1xenial amd64 FUSE-based client for the Ceph distributed file system ii ceph-mds 12.2.5-1xenial amd64 metadata server for the ceph distributed file system ii ceph-mgr 12.2.5-1xenial amd64 manager for the ceph distributed storage system ii ceph-mon 12.2.5-1xenial amd64 monitor server for the ceph storage system ii ceph-osd 12.2.5-1xenial amd64 OSD server for the ceph storage system ii libcephfs2 12.2.5-1xenial amd64 Ceph distributed file system client library ii python-cephfs 12.2.5-1xenial amd64 Python 2 libraries for the Ceph libcephfs library
Updated by Josh Durgin almost 6 years ago
- Related to Bug #23352: osd: segfaults under normal operation added
Updated by Josh Durgin almost 6 years ago
- Related to Bug #23431: OSD Segmentation fault in thread_name:safe_timer added
Updated by Jan Krcmar almost 6 years ago
hi,
i've noticed similar/same segfault on my deployment. random segfaults on random osds appears under load or withnout load.
-10> 2018-05-24 16:19:14.940493 7fc08baf7700 1 -- 10.2.4.31:6803/16 <== osd.28 10.2.4.28:0/16 193156 ==== osd_ping(ping e33877 stamp 2018-05-24 16:19:14.939266) v4 ==== 2004+0+0 (672660280 0 0) 0x55b971003a00 con 0x55b96ef8c000 -9> 2018-05-24 16:19:14.940488 7fc08b2f6700 1 -- 10.2.4.31:6802/16 <== osd.28 10.2.4.28:0/16 193156 ==== osd_ping(ping e33877 stamp 2018-05-24 16:19:14.939266) v4 ==== 2004+0+0 (672660280 0 0) 0x55b964435400 con 0x55b96ed50800 -8> 2018-05-24 16:19:14.940508 7fc08baf7700 1 -- 10.2.4.31:6803/16 --> 10.2.4.28:0/16 -- osd_ping(ping_reply e33877 stamp 2018-05-24 16:19:14.939266) v4 -- 0x55b97386e800 con 0 -7> 2018-05-24 16:19:14.940554 7fc08b2f6700 1 -- 10.2.4.31:6802/16 --> 10.2.4.28:0/16 -- osd_ping(ping_reply e33877 stamp 2018-05-24 16:19:14.939266) v4 -- 0x55b96efb8600 con 0 -6> 2018-05-24 16:19:15.130579 7fc08baf7700 5 -- 10.2.4.31:6802/16 >> 10.2.4.31:0/17 conn(0x55b96ebfe000 :6802 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=18 cs=1 l=1). rx osd.35 seq 192494 0x55b970e77800 osd_ping(ping e33877 stamp 2018-05-24 16:19:15.129723) v4 -5> 2018-05-24 16:19:15.130613 7fc08baf7700 1 -- 10.2.4.31:6802/16 <== osd.35 10.2.4.31:0/17 192494 ==== osd_ping(ping e33877 stamp 2018-05-24 16:19:15.129723) v4 ==== 2004+0+0 (1201433143 0 0) 0x55b970e77800 con 0x55b96ebfe000 -4> 2018-05-24 16:19:15.130627 7fc08baf7700 1 -- 10.2.4.31:6802/16 --> 10.2.4.31:0/17 -- osd_ping(ping_reply e33877 stamp 2018-05-24 16:19:15.129723) v4 -- 0x55b97386e800 con 0 -3> 2018-05-24 16:19:15.130605 7fc08c2f8700 5 -- 10.2.4.31:6803/16 >> 10.2.4.31:0/17 conn(0x55b96ebff800 :6803 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=18 cs=1 l=1). rx osd.35 seq 192494 0x55b970f47800 osd_ping(ping e33877 stamp 2018-05-24 16:19:15.129723) v4 -2> 2018-05-24 16:19:15.130642 7fc08c2f8700 1 -- 10.2.4.31:6803/16 <== osd.35 10.2.4.31:0/17 192494 ==== osd_ping(ping e33877 stamp 2018-05-24 16:19:15.129723) v4 ==== 2004+0+0 (1201433143 0 0) 0x55b970f47800 con 0x55b96ebff800 -1> 2018-05-24 16:19:15.130677 7fc08c2f8700 1 -- 10.2.4.31:6803/16 --> 10.2.4.31:0/17 -- osd_ping(ping_reply e33877 stamp 2018-05-24 16:19:15.129723) v4 -- 0x55b9739af000 con 0 0> 2018-05-24 16:19:15.140522 7fc087b10700 -1 *** Caught signal (Segmentation fault) ** in thread 7fc087b10700 thread_name:safe_timer ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable) 1: (()+0xa558bc) [0x55b9583408bc] 2: (()+0xf8d0) [0x7fc08f19c8d0] 3: [0x55b96ff49800] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
running kernel 3.16.0-5-amd64 on debian 8 (jessie).
fous
Updated by Alex Gorbachev almost 6 years ago
ALso posted this in bug http://tracker.ceph.com/issues/23352
Hi Brad, we had one too just now, core dump and log:
https://drive.google.com/open?id=1t1jfjqwjhUUBzWjxamos3Hr7ghjxRPg6
https://drive.google.com/open?id=1iuu0GQ8yy8UT3d2qLBITyDUseJIVYmh-
Updated by Brad Hubbard almost 6 years ago
Alex,
Why are we running multiple trackers for the same issue?
Can we close this as a duplicate?
Updated by Brad Hubbard almost 6 years ago
- Related to deleted (Bug #23352: osd: segfaults under normal operation)
Updated by Brad Hubbard almost 6 years ago
- Is duplicate of Bug #23352: osd: segfaults under normal operation added