Project

General

Profile

Actions

Bug #19980

closed

stop two osd-services,which caused "Caught signal ",in thread 7f6bc8225700 thread_name:safe_timer

Added by xw zhang almost 7 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)

bluestore+rbd+ec+overwrite+iscsi

steps:
1.writing files;
2.stop two osd-services;
3.check osd log

result:
2017-05-17 16:18:33.142772 7f71dff8f700 -1 osd.0 28215 pgid 7.15fs0 has ref count of 2
2017-05-17 16:18:33.149798 7f71dff8f700 -1 ** Caught signal (Aborted) *
in thread 7f71dff8f700 thread_name:signal_handler

ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
1: (()+0xcab9b2) [0x5633bb4e99b2]
2: (()+0x11390) [0x7f71fc494390]
3: (gsignal()+0x38) [0x7f71fb430428]
4: (abort()+0x16a) [0x7f71fb43202a]
5: (OSD::shutdown()+0x195c) [0x5633baf80c4c]
6: (OSD::handle_signal(int)+0x136) [0x5633baf81196]
7: (SignalHandler::entry()+0x3a2) [0x5633bb4eb052]
8: (()+0x76ba) [0x7f71fc48a6ba]
9: (clone()+0x6d) [0x7f71fb50182d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
-5475> 2017-05-17 15:25:23.551927 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command perfcounters_dump hook 0x5633c66f81b0
-5474> 2017-05-17 15:25:23.551936 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command 1 hook 0x5633c66f81b0
-5473> 2017-05-17 15:25:23.551938 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command perf dump hook 0x5633c66f81b0
-5472> 2017-05-17 15:25:23.551940 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command perfcounters_schema hook 0x5633c66f81b0
-5471> 2017-05-17 15:25:23.551942 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command perf histogram dump hook 0x5633c66f81b0
-5470> 2017-05-17 15:25:23.551943 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command 2 hook 0x5633c66f81b0
-5469> 2017-05-17 15:25:23.551945 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command perf schema hook 0x5633c66f81b0
-5468> 2017-05-17 15:25:23.551947 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command perf histogram schema hook 0x5633c66f81b0
-5467> 2017-05-17 15:25:23.551949 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command perf reset hook 0x5633c66f81b0
-5466> 2017-05-17 15:25:23.551951 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command config show hook 0x5633c66f81b0
-5465> 2017-05-17 15:25:23.551952 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command config set hook 0x5633c66f81b0
-5464> 2017-05-17 15:25:23.551954 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command config get hook 0x5633c66f81b0
-5463> 2017-05-17 15:25:23.551956 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command config diff hook 0x5633c66f81b0
-5462> 2017-05-17 15:25:23.551958 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command log flush hook 0x5633c66f81b0
-5461> 2017-05-17 15:25:23.551959 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command log dump hook 0x5633c66f81b0
-5460> 2017-05-17 15:25:23.551961 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command log reopen hook 0x5633c66f81b0
-5459> 2017-05-17 15:25:23.551968 7f71fdfb0c80 5 asok(0x5633c6748d20) register_command dump_mempools hook 0x5633c6854268
-5458> 2017-05-17 15:25:23.558807 7f71fdfb0c80 0 set uid:gid to 1000:1000 (ceph:ceph)
-5457> 2017-05-17 15:25:23.558820 7f71fdfb0c80 0 ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e), process ceph-osd, pid 26075
-5456> 2017-05-17 15:25:23.597505 7f71fdfb0c80 -1 WARNING: experimental feature 'bluestore' is enabled
Please be aware that this feature is experimental, untested,
unsupported, and may result in data corruption, data loss,
and/or irreparable damage to your cluster. Do not use
feature with important data.

5455> 2017-05-17 15:25:23.716365 7f71fdfb0c80  0 pidfile_write: ignore empty --pid-file
-5454> 2017-05-17 15:25:23.721924 7f71fdfb0c80 0 load: jerasure load: lrc load: isa
-5453> 2017-05-17 15:25:23.886745 7f71fdfb0c80 0 set rocksdb option compression = kNoCompression
-5452> 2017-05-17 15:25:23.886763 7f71fdfb0c80 0 set rocksdb option max_write_buffer_number = 4
-5451> 2017-05-17 15:25:23.886771 7f71fdfb0c80 0 set rocksdb option min_write_buffer_number_to_merge = 1
-5450> 2017-05-17 15:25:23.886778 7f71fdfb0c80 0 set rocksdb option recycle_log_file_num = 4
-5449> 2017-05-17 15:25:23.886785 7f71fdfb0c80 0 set rocksdb option writable_file_max_buffer_size = 0
-5448> 2017-05-17 15:25:23.886796 7f71fdfb0c80 0 set rocksdb option write_buffer_size = 268435456
-5447> 2017-05-17 15:25:23.886828 7f71fdfb0c80 0 set rocksdb option compression = kNoCompression
-5446> 2017-05-17 15:25:23.886836 7f71fdfb0c80 0 set rocksdb option max_write_buffer_number = 4
-5445> 2017-05-17 15:25:23.886844 7f71fdfb0c80 0 set rocksdb option min_write_buffer_number_to_merge = 1
-5444> 2017-05-17 15:25:23.886851 7f71fdfb0c80 0 set rocksdb option recycle_log_file_num = 4
-5443> 2017-05-17 15:25:23.886857 7f71fdfb0c80 0 set rocksdb option writable_file_max_buffer_size = 0
-5442> 2017-05-17 15:25:23.886864 7f71fdfb0c80 0 set rocksdb option write_buffer_size = 268435456
-5441> 2017-05-17 15:25:31.122667 7f71fdfb0c80 0 <cls> /build/ceph-12.0.2/src/cls/hello/cls_hello.cc:296: loading cls_hello
-5440> 2017-05-17 15:25:31.122838 7f71fdfb0c80 0 <cls> /build/ceph-12.0.2/src/cls/cephfs/cls_cephfs.cc:198: loading cephfs
-5439> 2017-05-17 15:25:31.126087 7f71fdfb0c80 0 _get_class not permitted to load lua
-5438> 2017-05-17 15:25:31.126099 7f71fdfb0c80 0 _get_class not permitted to load kvs
-5437> 2017-05-17 15:25:31.148280 7f71fdfb0c80 0 osd.0 28131 crush map has features 2268850290688, adjusting msgr requires for clients
-5436> 2017-05-17 15:25:31.148310 7f71fdfb0c80 0 osd.0 28131 crush map has features 2543728197632 was 8705, adjusting msgr requires for mons
-5435> 2017-05-17 15:25:31.148316 7f71fdfb0c80 0 osd.0 28131 crush map has features 720578484107493376, adjusting msgr requires for osds
-5434> 2017-05-17 15:25:31.851307 7f71fdfb0c80 0 osd.0 28131 load_pgs
-5433> 2017-05-17 15:25:41.152020 7f71fdfb0c80 0 osd.0 28131 load_pgs opened 267 pgs
-5432> 2017-05-17 15:25:41.152085 7f71fdfb0c80 0 osd.0 28131 using 1 op queue with priority op cut off at 64.
-5431> 2017-05-17 15:25:41.156434 7f71fdfb0c80 -1 osd.0 28131 log_to_monitors {default=true}
-5430> 2017-05-17 15:25:42.272299 7f71fdfb0c80 0 osd.0 28131 done with init, starting boot process
-5429> 2017-05-17 15:25:43.130295 7f71f9116700 0 -
10.10.133.1:6800/26075 >> 10.10.133.1:6806/26284 conn(0x5633cd34f000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY
5428> 2017-05-17 15:25:43.130488 7f71f9116700 0 -- 10.10.133.1:6800/26075 >> 10.10.133.1:6819/16301 conn(0x5633cd350800 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY
5427> 2017-05-17 15:52:27.924504 7f71f9116700 0 - 10.10.133.1:6800/26075 >> 10.10.133.1:6804/32554 conn(0x5633d04eb000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept we reset (peer sent cseq 2), sending RESETSESSION
5426> 2017-05-17 15:52:31.024392 7f71f9116700 0 - 10.10.133.1:6800/26075 >> 10.10.133.1:6804/32554 conn(0x5633d04eb000 :6800 s=STATE_OPEN pgs=35 cs=1 l=0).fault with nothing to send, going to standby
5425> 2017-05-17 15:53:52.318360 7f71f9116700 0 - 10.10.133.1:6800/26075 >> 10.10.133.1:6804/8661 conn(0x5633df934000 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING_WAIT_CONNECT_REPLY
-5424> 2017-05-17 16:18:32.162844 7f71dff8f700 -1 received signal: Terminated from PID: 1 task name: /sbin/init UID: 0
-5423> 2017-05-17 16:18:32.162871 7f71dff8f700 -1 osd.0 28215 * Got signal Terminated *
-5422> 2017-05-17 16:18:32.162877 7f71dff8f700 0 osd.0 28215 prepare_to_stop telling mon we are shutting down
-5421> 2017-05-17 16:18:32.899751 7f71f0841700 0 osd.0 28215 got_stop_ack starting shutdown
-5420> 2017-05-17 16:18:32.899829 7f71dff8f700 0 osd.0 28215 prepare_to_stop starting shutdown
-5419> 2017-05-17 16:18:32.899840 7f71dff8f700 -1 osd.0 28215 shutdown
-5418> 2017-05-17 16:18:32.900798 7f71dff8f700 20 osd.0 28215 kicking pg 7.128s2
Actions #1

Updated by xw zhang almost 7 years ago

0> 2017-03-28 20:41:17.910308 7f6bc8225700 -1 ** Caught signal (Aborted) *
in thread 7f6bc8225700 thread_name:safe_timer

ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)
1: (()+0x923d8a) [0x7f6bdf8cbd8a]
2: (()+0xf100) [0x7f6bdc9f5100]
3: (gsignal()+0x37) [0x7f6bdb8135f7]
4: (abort()+0x148) [0x7f6bdb814ce8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x165) [0x7f6bdc1179d5]
6: (()+0x5e946) [0x7f6bdc115946]
7: (()+0x5e973) [0x7f6bdc115973]
8: (()+0x5eb93) [0x7f6bdc115b93]
9: (std::__throw_out_of_range(char const*)+0x77) [0x7f6bdc16aa17]
10: (()+0xaedd30) [0x7f6bdfa95d30]
11: (PerfCountersCollection::with_counters(std::function<void (std::map<std::string, PerfCounters::perf_counter_data_any_d*, std::less<std::string>, std::allocator<std::pair<std::string const, PerfCounters::perf_counter_data_any_d*> > > const&)>) const+0x2d) [0x7f6bdfa328ad]
12: (MgrClient::send_report()+0x3d1) [0x7f6bdfa94421]
13: (FunctionContext::finish(int)+0x2a) [0x7f6bdf5a701a]
14: (Context::complete(int)+0x9) [0x7f6bdf4268b9]
15: (SafeTimer::timer_thread()+0x104) [0x7f6bdfa68fa4]
16: (SafeTimerThread::entry()+0xd) [0x7f6bdfa6a9dd]
17: (()+0x7dc5) [0x7f6bdc9eddc5]
18: (clone()+0x6d) [0x7f6bdb8d428d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #2

Updated by Sage Weil almost 3 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF