Project

General

Profile

Actions

Bug #24935

closed

SafeTimer? osd killed by kernel for Segmentation fault

Added by 伟杰 谭 almost 6 years ago. Updated over 5 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

My environment :
[root@gz-ceph-52-203 log]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@gz-ceph-52-203 log]# ceph -v
ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)

Two osds killed by kernel for segmentation fault but different reason and different node.
message tell me the first one's info:
Jul 15 21:43:18 gz-ceph-52-203 ceph-osd: ** Caught signal (Segmentation fault) *
Jul 15 21:43:18 gz-ceph-52-203 ceph-osd: in thread 7f838fa53700 thread_name:rocksdb:bg3
Jul 15 21:43:18 gz-ceph-52-203 kernel: rocksdb:bg32618122: segfault at 1d ip 00007f83c1b8bee3 sp 00007f838fa4e198 error 4 in libtcmalloc.so.4.4.5[7f83c1b56000+46000]
Jul 15 21:43:18 gz-ceph-52-203 systemd: : main process exited, code=killed, status=11/SEGV
Jul 15 21:43:18 gz-ceph-52-203 systemd: Unit entered failed state.
Jul 15 21:43:18 gz-ceph-52-203 systemd: failed.
Jul 15 21:43:38 gz-ceph-52-203 systemd: holdoff time over, scheduling restart.
Jul 15 21:43:38 gz-ceph-52-203 systemd: Starting Ceph object storage daemon osd.142...
Jul 15 21:43:38 gz-ceph-52-203 systemd: Started Ceph object storage daemon osd.142.
Jul 15 21:43:38 gz-ceph-52-203 ceph-osd: starting osd.142 at - osd_data /var/lib/ceph/osd/ceph-142 /var/lib/ceph/osd/ceph-142/journal

And the second one:
Jul 15 22:41:37 gz-ceph-52-204 ceph-osd: ** Caught signal (Segmentation fault) *
Jul 15 22:41:37 gz-ceph-52-204 ceph-osd: in thread 7fe6cec09700 thread_name:safe_timer
Jul 15 22:41:37 gz-ceph-52-204 kernel: safe_timer88805: segfault at 7fe600010000 ip 00007fe6d4e33118 sp 00007fe6cec058b0 error 4 in libgcc_s-4.8.5-20150702.so.1[7fe6d4e24000+15000]
Jul 15 22:41:37 gz-ceph-52-204 systemd: : main process exited, code=killed, status=11/SEGV
Jul 15 22:41:37 gz-ceph-52-204 systemd: Unit entered failed state.
Jul 15 22:41:37 gz-ceph-52-204 systemd: failed.
Jul 15 22:41:47 gz-ceph-52-204 ceph-mgr: ::ffff:172.25.52.205 - - [15/Jul/2018:22:41:47] "GET /metrics HTTP/1.1" 200 - "" "Prometheus/2.2.1"
Jul 15 22:41:58 gz-ceph-52-204 systemd: holdoff time over, scheduling restart.
Jul 15 22:41:58 gz-ceph-52-204 systemd: Starting Ceph object storage daemon osd.112...
Jul 15 22:41:58 gz-ceph-52-204 systemd: Started Ceph object storage daemon osd.112.
Jul 15 22:41:58 gz-ceph-52-204 ceph-osd: starting osd.112 at - osd_data /var/lib/ceph/osd/ceph-112 /var/lib/ceph/osd/ceph-112/journal

Unfortunately,ceph-osd's log contain nothing about this tragedy,it seem osd himself just restart suddenly
2018-07-15 21:43:17.111581 7f83b429c700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1531662197111557, "job": 1052, "event": "compaction_started", "files_L0": [8258, 8254, 8250, 8246], "files_L1": [8115, 8121, 8123, 8128, 8133, 8135, 8139, 8144, 8146, 8149, 8151, 8152, 8104, 8114, 8118, 8120, 8125, 8130, 8134, 8137, 8142, 8143, 8147, 8150, 8153, 8105, 8109, 8113, 8116, 8119, 8124, 8138, 8141, 8145, 8108, 8112, 8122, 8126, 8129, 8131, 8140], "score": 1, "input_data_size": 1432481812}
2018-07-15 21:43:38.649032 7f9fe66cdd40 0 set uid:gid to 167:167 (ceph:ceph)
2018-07-15 21:43:38.649049 7f9fe66cdd40 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 2618147
2018-07-15 21:43:38.657973 7f9fe66cdd40 0 pidfile_write: ignore empty --pid-file
2018-07-15 21:43:38.685717 7f9fe66cdd40 0 load: jerasure load: lrc load: isa
2018-07-15 21:43:38.685811 7f9fe66cdd40 1 bdev create path /var/lib/ceph/osd/ceph-142/block type kernel
2018-07-15 21:43:38.685818 7f9fe66cdd40 1 bdev(0x7f9ff161c800 /var/lib/ceph/osd/ceph-142/block) open path /var/lib/ceph/osd/ceph-142/block
2018-07-15 21:43:38.686284 7f9fe66cdd40 1 bdev(0x7f9ff161c800 /var/lib/ceph/osd/ceph-142/block) open size 10000827154432 (0x9187fc00000, 9313 GB) block_size 4096 (4096 B) rotational
2018-07-15 21:43:38.686464 7f9fe66cdd40 1 bluestore(/var/lib/ceph/osd/ceph-142) _set_cache_sizes cache_size 2147483648 meta 0.5 kv 0.5 data 0
2018-07-15 21:43:38.686487 7f9fe66cdd40 1 bdev(0x7f9ff161c800 /var/lib/ceph/osd/ceph-142/block) close
2018-07-15 21:43:38.946414 7f9fe66cdd40 1 bluestore(/var/lib/ceph/osd/ceph-142) _mount path /var/lib/ceph/osd/ceph-142
2018-07-15 21:43:38.946727 7f9fe66cdd40 1 bdev create path /var/lib/ceph/osd/ceph-142/block type kernel
2018-07-15 21:43:38.946737 7f9fe66cdd40 1 bdev(0x7f9ff1541800 /var/lib/ceph/osd/ceph-142/block) open path /var/lib/ceph/osd/ceph-142/block
2018-07-15 21:43:38.947016 7f9fe66cdd40 1 bdev(0x7f9ff1541800 /var/lib/ceph/osd/ceph-142/block) open size 10000827154432 (0x9187fc00000, 9313 GB) block_size 4096 (4096 B) rotational
2018-07-15 21:43:38.947190 7f9fe66cdd40 1 bluestore(/var/lib/ceph/osd/ceph-142) _set_cache_sizes cache_size 2147483648 meta 0.5 kv 0.5 data 0
2018-07-15 21:43:38.947294 7f9fe66cdd40 1 bdev create path /dev/sdf5 type kernel
2018-07-15 21:43:38.947301 7f9fe66cdd40 1 bdev(0x7f9ff161ca00 /dev/sdf5) open path /dev/sdf5
2018-07-15 21:43:38.947604 7f9fe66cdd40 1 bdev(0x7f9ff161ca00 /dev/sdf5) open size 153631064064 (0x23c5200000, 143 GB) block_size 4096 (4096 B) non-rotational
2018-07-15 21:43:38.947618 7f9fe66cdd40 1 bluefs add_block_device bdev 1 path /dev/sdf5 size 143 GB
2018-07-15 21:43:38.947876 7f9fe66cdd40 1 bdev create path /var/lib/ceph/osd/ceph-142/block type kernel
2018-07-15 21:43:38.947885 7f9fe66cdd40 1 bdev(0x7f9ff161de00 /var/lib/ceph/osd/ceph-142/block) open path /var/lib/ceph/osd/ceph-142/block
2018-07-15 21:43:38.948100 7f9fe66cdd40 1 bdev(0x7f9ff161de00 /var/lib/ceph/osd/ceph-142/block) open size 10000827154432 (0x9187fc00000, 9313 GB) block_size 4096 (4096 B) rotational
2018-07-15 21:43:38.948110 7f9fe66cdd40 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-142/block size 9313 GB


Related issues 1 (0 open1 closed)

Is duplicate of RADOS - Bug #23352: osd: segfaults under normal operationResolvedBrad Hubbard03/14/2018

Actions
Actions #1

Updated by Greg Farnum almost 6 years ago

  • Project changed from Ceph to RADOS
  • Subject changed from osd killed by kernel for Segmentation fault to SafeTimer? osd killed by kernel for Segmentation fault
  • Category deleted (OSD)
Actions #2

Updated by Josh Durgin over 5 years ago

  • Is duplicate of Bug #23352: osd: segfaults under normal operation added
Actions #3

Updated by Josh Durgin over 5 years ago

  • Status changed from New to Duplicate

This appears to be another instance of #23352.

Actions

Also available in: Atom PDF