Bug #24678
ceph-mon segmentation fault after setting pool size to 1 on degraded cluster
% Done:
0%
Source:
Tags:
ceph-mon Segmentation fault
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We have an issue with starting any from 3 monitors after changing pool size from 3 to 1. The cluster was in a degraded state (2/3 OSD nodes in down state), also there was a BUG #24423
-7> 2018-06-27 18:45:58.929 7f7637878700 4 mgrc handle_mgr_map Active mgr is now 10.14.88.207:6801/569195 -6> 2018-06-27 18:46:00.245 7f763b880700 0 mon.cdp4@0(leader) e1 handle_command mon_command({"var": "size", "prefix": "osd pool set", "pool": "cephfs_data", "val": "1"} v 0) v1 -5> 2018-06-27 18:46:00.245 7f763b880700 0 log_channel(audit) log [INF] : from='client.114290 -' entity='client.admin' cmd=[{"var": "size", "prefix": "osd pool set", "pool": "cephfs_data", "val": "1"}]: dispatch -4> 2018-06-27 18:46:00.321 7f7637878700 4 mgrc handle_mgr_map Got map version 92 -3> 2018-06-27 18:46:00.321 7f7637878700 4 mgrc handle_mgr_map Active mgr is now 10.14.88.207:6801/569195 -2> 2018-06-27 18:46:00.325 7f7637878700 0 log_channel(audit) log [INF] : from='client.114290 -' entity='client.admin' cmd='[{"var": "size", "prefix": "osd pool set", "pool": "cephfs_data", "val": "1"}]': finished -1> 2018-06-27 18:46:00.325 7f7637878700 0 log_channel(cluster) log [DBG] : osdmap e934: 64 total, 28 up, 47 in 0> 2018-06-27 18:46:00.333 7f763d884700 -1 *** Caught signal (Segmentation fault) ** in thread 7f763d884700 thread_name:cpu_tp ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable) 1: (()+0x4b22a0) [0x5608f564c2a0] 2: (()+0x11390) [0x7f7646d02390] 3: (OSDMapMapping::_build_rmap(OSDMap const&)+0x114) [0x7f764759f204] 4: (OSDMapMapping::_finish(OSDMap const&)+0x11) [0x7f764759f531] 5: (ParallelPGMapper::Job::finish_one()+0xf5) [0x7f764759f635] 6: (ParallelPGMapper::WQ::_process(ParallelPGMapper::Item*, ThreadPool::TPHandle&)+0x5c) [0x7f764759f6bc] 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0x8f7) [0x7f76473fdc37] 8: (ThreadPool::WorkThread::entry()+0x10) [0x7f76473feb60] 9: (()+0x76ba) [0x7f7646cf86ba] 10: (clone()+0x6d) [0x7f7645a1c41d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
stop / start systemctl monitor service / target is leading to segfault with the following stack trace.
-20> 2018-06-27 19:58:28.391 7fd0c8edf180 4 rocksdb: [/build/ceph-13.2.0/src/rocksdb/db/version_set.cc:3362] Recovered from manifest file:/var/lib/ceph/mon/ceph-cdp4/store.db/MANIFEST-019582 succeeded,manifest_file_number is 19582, next_file_number is 19585, last_sequ ence is 7451606, log_number is 0,prev_log_number is 0,max_column_family is 0,deleted_log_number is 19580 -19> 2018-06-27 19:58:28.391 7fd0c8edf180 4 rocksdb: [/build/ceph-13.2.0/src/rocksdb/db/version_set.cc:3370] Column family [default] (ID 0), log number is 19581 -18> 2018-06-27 19:58:28.391 7fd0c8edf180 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1530118708394381, "job": 1, "event": "recovery_started", "log_files": [19583]} -17> 2018-06-27 19:58:28.391 7fd0c8edf180 4 rocksdb: [/build/ceph-13.2.0/src/rocksdb/db/db_impl_open.cc:551] Recovering log #19583 mode 2 -16> 2018-06-27 19:58:28.391 7fd0c8edf180 4 rocksdb: [/build/ceph-13.2.0/src/rocksdb/db/version_set.cc:2863] Creating manifest 19585 -15> 2018-06-27 19:58:28.395 7fd0c8edf180 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1530118708395928, "job": 1, "event": "recovery_finished"} -14> 2018-06-27 19:58:28.395 7fd0c8edf180 5 rocksdb: [/build/ceph-13.2.0/src/rocksdb/db/db_impl_files.cc:380] [JOB 2] Delete /var/lib/ceph/mon/ceph-cdp4/store.db//MANIFEST-019582 type=3 #19582 -- OK -13> 2018-06-27 19:58:28.395 7fd0c8edf180 5 rocksdb: [/build/ceph-13.2.0/src/rocksdb/db/db_impl_files.cc:380] [JOB 2] Delete /var/lib/ceph/mon/ceph-cdp4/store.db//019583.log type=0 #19583 -- OK -12> 2018-06-27 19:58:28.395 7fd0c8edf180 4 rocksdb: [/build/ceph-13.2.0/src/rocksdb/db/db_impl_open.cc:1218] DB pointer 0x55a24fd08000 -11> 2018-06-27 19:58:28.395 7fd0c8edf180 0 starting mon.cdp4 rank 0 at public addr 10.14.88.204:6789/0 at bind addr 10.14.88.204:6789/0 mon_data /var/lib/ceph/mon/ceph-cdp4 fsid 04176392-32d2-11e8-a537-00259074f012 -10> 2018-06-27 19:58:28.395 7fd0c8edf180 0 starting mon.cdp4 rank 0 at 10.14.88.204:6789/0 mon_data /var/lib/ceph/mon/ceph-cdp4 fsid 04176392-32d2-11e8-a537-00259074f012 -9> 2018-06-27 19:58:28.399 7fd0c8edf180 0 mon.cdp4@-1(probing).mds e14 print_map e14 enable_multiple, ever_enabled_multiple: 0,0 compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} legacy client fscid: 1 Filesystem 'cephfs' (1) fs_name cephfs epoch 14 flags 12 created 2018-06-10 01:03:29.512343 modified 2018-06-27 17:53:21.155983 tableserver 0 root 0 session_timeout 60 session_autoclose 300 max_file_size 1099511627776 last_failure 0 last_failure_osd_epoch 905 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} max_mds 1 in 0 up {0=90956} failed damaged stopped data_pools [5] metadata_pool 8 inline_data disabled balancer standby_count_wanted 1 90956: 10.14.88.207:6800/1768662562 'cdp7' mds.0.9 up:active seq 11 104121: 10.14.88.208:6800/3625827821 'cdp8' mds.0.0 up:standby-replay seq 1 (standby for rank 0) -8> 2018-06-27 19:58:28.399 7fd0c8edf180 0 mon.cdp4@-1(probing).osd e934 crush map has features 288514051259236352, adjusting msgr requires -7> 2018-06-27 19:58:28.399 7fd0c8edf180 0 mon.cdp4@-1(probing).osd e934 crush map has features 288514051259236352, adjusting msgr requires -6> 2018-06-27 19:58:28.399 7fd0c8edf180 0 mon.cdp4@-1(probing).osd e934 crush map has features 1009089991638532096, adjusting msgr requires -5> 2018-06-27 19:58:28.399 7fd0c8edf180 0 mon.cdp4@-1(probing).osd e934 crush map has features 288514051259236352, adjusting msgr requires -4> 2018-06-27 19:58:28.399 7fd0c8edf180 4 mgrc handle_mgr_map Got map version 92 -3> 2018-06-27 19:58:28.399 7fd0c8edf180 4 mgrc handle_mgr_map Active mgr is now 10.14.88.207:6801/569195 -2> 2018-06-27 19:58:28.399 7fd0c8edf180 4 mgrc reconnect Starting new session with 10.14.88.207:6801/569195 -1> 2018-06-27 19:58:28.403 7fd0c8edf180 0 mon.cdp4@-1(probing) e1 my rank is now 0 (was -1) 0> 2018-06-27 19:58:28.415 7fd0b66a7700 -1 *** Caught signal (Segmentation fault) ** in thread 7fd0b66a7700 thread_name:cpu_tp ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable) 1: (()+0x4b22a0) [0x55a24e5b92a0] 2: (()+0x11390) [0x7fd0bfb25390] 3: (OSDMapMapping::_build_rmap(OSDMap const&)+0x1d5) [0x7fd0c03c22c5] 4: (OSDMapMapping::_finish(OSDMap const&)+0x11) [0x7fd0c03c2531] 5: (ParallelPGMapper::Job::finish_one()+0xf5) [0x7fd0c03c2635] 6: (ParallelPGMapper::WQ::_process(ParallelPGMapper::Item*, ThreadPool::TPHandle&)+0x5c) [0x7fd0c03c26bc] 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0x8f7) [0x7fd0c0220c37] 8: (ThreadPool::WorkThread::entry()+0x10) [0x7fd0c0221b60] 9: (()+0x76ba) [0x7fd0bfb1b6ba] 10: (clone()+0x6d) [0x7fd0be83f41d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
History
#1 Updated by Patrick Donnelly over 5 years ago
- Project changed from Ceph to RADOS
- Category deleted (
Monitor) - Component(RADOS) Monitor added
#2 Updated by Josh Durgin over 5 years ago
- Priority changed from Normal to High
#3 Updated by Josh Durgin over 4 years ago
- Status changed from New to Can't reproduce