Bug #23372
osd: segfault
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We are having 5 node cluster with 5 mons and 120 OSDs.
One of the OSD (osd.7) crashed with following logs:
-4> 2018-03-14 22:14:01.748116 7f37b586a700 5 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/db_impl_files.cc:307] [JOB 16] Delete db/006322.sst type=2 #6322 -- OK -3> 2018-03-14 22:14:01.748124 7f37b586a700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1521065641748121, "job": 16, "event": "table_file_deletion", "file_number": 6322} -2> 2018-03-14 22:14:01.748130 7f37b586a700 5 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/db_impl_files.cc:307] [JOB 16] Delete db/006276.sst type=2 #6276 -- OK -1> 2018-03-14 22:14:01.748134 7f37b586a700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1521065641748133, "job": 16, "event": "table_file_deletion", "file_number": 6276} 0> 2018-03-14 22:49:29.198238 7f37bf87e700 -1 *** Caught signal (Segmentation fault) ** in thread 7f37bf87e700 thread_name:safe_timer ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable) 1: (()+0xa3c611) [0x5633ee9df611] 2: (()+0xf5e0) [0x7f37c6b3f5e0] 3: [0x563400080000] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 0 lockdep 0/ 0 context 0/ 0 crush 0/ 0 mds 0/ 0 mds_balancer 0/ 0 mds_locker 0/ 0 mds_log 0/ 0 mds_log_expire 0/ 0 mds_migrator 0/ 0 buffer 0/ 0 timer 0/ 0 filer 0/ 1 striper 0/ 0 objecter 0/ 0 rados 0/ 0 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 0 journaler 0/ 0 objectcacher 0/ 0 client 0/ 0 osd 0/ 0 optracker 0/ 0 objclass 0/ 0 filestore 0/ 0 journal 0/ 0 ms 1/ 5 mon 0/ 0 monc 0/ 0 paxos 0/ 0 tp 0/ 0 auth 1/ 5 crypto 0/ 0 finisher 1/ 1 reserver 0/ 0 heartbeatmap 0/ 0 perfcounter 0/ 0 rgw 1/10 civetweb 1/ 5 javaclient 0/ 0 asok 0/ 0 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 0 bluestore 1/ 0 bluefs 0/ 0 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 kinetic 1/ 5 fuse 1/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.7.log
what all extra information is required to debug this issue?
History
#1 Updated by Nokia ceph-users about 6 years ago
- File ceph-osd.121.txt View added
Nokia ceph-users wrote:
We are having 5 node cluster with 5 mons and 120 OSDs.
One of the OSD (osd.7) crashed with following logs:
[...]what all extra information is required to debug this issue?
Crash reproduced in a similar cluster(340 OSDs) with luminous 12.2.2
cn2.chn6us1c1.cdn ~# abrt-cli list --since 1521543718 id ca4e01c701cd3a2e50e4ec1e1176aa14f012aff5 reason: ceph-osd killed by SIGABRT time: Tue 20 Mar 2018 09:10:16 PM UTC cmdline: /usr/bin/ceph-osd -f --cluster ceph --id 121 --setuser ceph --setgroup ceph package: ceph-osd-12.2.2-0.el7 uid: 167 (ceph) count: 1 Directory: /var/spool/abrt/ccpp-2018-03-20-21:10:16-45896
cn2.chn6us1c1.cdn /var/log/ceph# zgrep boot ceph.log-20180321.gz 2018-03-20 21:12:14.915185 mon.cn1 mon.0 10.50.35.71:6789/0 309391 : cluster [INF] osd.121 10.50.35.72:6906/707645 boot 2018-03-20 21:12:14.783543 mon.cn1 mon.0 10.50.35.71:6789/0 309390 : cluster [INF] Health check cleared: OSD_DOWN (was: 1 osds down) 2018-03-20 21:12:14.915185 mon.cn1 mon.0 10.50.35.71:6789/0 309391 : cluster [INF] osd.121 10.50.35.72:6906/707645 boot 2018-03-20 21:12:15.108860 mon.cn1 mon.0 10.50.35.71:6789/0 309727 : cluster [WRN] Health check update: Degraded data redundancy: 3857957/1306619130 objects degraded (0.295%), 121 pgs unclean, 121 pgs degraded, 121 pgs undersized (PG_DEGRADED) Attached the ceph-osd.121 log file. Please raise ticket of needed.
Attaching the osd logs
#2 Updated by Patrick Donnelly almost 6 years ago
- Project changed from Ceph to RADOS
- Subject changed from OSD crashed in Luminous 12.2.4 to osd: segfault
- Source set to Community (user)
- Release deleted (
luminous) - Component(RADOS) OSD added
#3 Updated by Josh Durgin almost 6 years ago
- Project changed from RADOS to bluestore
#4 Updated by Sage Weil over 5 years ago
- Status changed from New to Can't reproduce