Project

General

Profile

Bug #13576

The osd was killed by unkown thread or software.

Added by ceph zte over 8 years ago. Updated over 8 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/giant
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

My ceph version is 0.87. And i install openstack with ceph in the same machine.

When i use service ceph restart command to restart ceph.

One osd named osd.13 can not set up,but other osd can set up propely.I do not know who kill the ceph osd.13,when it is

set up.I do not kill it.

The message log below is:

*Oct 21 09:37:15 ceph242 systemd[1]: Starting /usr/bin/bash -c ulimit -n 32768; ulimit -c unlimited; /usr/bin/ceph-run /usr/bin/ceph-osd -i 13 --pid-file /var/run/ceph/osd.13.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Oct 21 09:37:15 ceph242 systemd[1]: Started /usr/bin/bash -c ulimit -n 32768; ulimit -c unlimited; /usr/bin/ceph-run /usr/bin/ceph-osd -i 13 --pid-file /var/run/ceph/osd.13.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Oct 21 09:37:15 ceph242 bash[8259]: libust[8272/8272]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:375)
Oct 21 09:37:15 ceph242 bash[8259]: starting osd.13 at :/0 osd_data /var/lib/ceph/osd/ceph-13 /var/lib/ceph/osd/ceph-13/journal  Here start the osd.13* 
Oct 21 09:37:15 ceph242 systemd[1]: Starting /usr/bin/bash -c ulimit -n 32768; ulimit -c unlimited; /usr/bin/ceph-run /usr/bin/ceph-osd -i 14 --pid-file /var/run/ceph/osd.14.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Oct 21 09:37:15 ceph242 systemd[1]: Started /usr/bin/bash -c ulimit -n 32768; ulimit -c unlimited; /usr/bin/ceph-run /usr/bin/ceph-osd -i 14 --pid-file /var/run/ceph/osd.14.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Oct 21 09:37:15 ceph242 bash[8703]: libust[8715/8715]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:375)
Oct 21 09:37:15 ceph242 bash[8703]: starting osd.14 at :/0 osd_data /var/lib/ceph/osd/ceph-14 /var/lib/ceph/osd/ceph-14/journal
Oct 21 09:37:16 ceph242 systemd[1]: Starting /usr/bin/bash -c ulimit -n 32768; ulimit -c unlimited; /usr/bin/ceph-run /usr/bin/ceph-osd -i 18 --pid-file /var/run/ceph/osd.18.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Oct 21 09:37:16 ceph242 systemd[1]: Started /usr/bin/bash -c ulimit -n 32768; ulimit -c unlimited; /usr/bin/ceph-run /usr/bin/ceph-osd -i 18 --pid-file /var/run/ceph/osd.18.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Oct 21 09:37:16 ceph242 bash[9114]: libust[9126/9126]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:375)
Oct 21 09:37:16 ceph242 bash[9114]: starting osd.18 at :/0 osd_data /var/lib/ceph/osd/ceph-18 /var/lib/ceph/osd/ceph-18/journal
Oct 21 09:37:16 ceph242 systemd[1]: Starting /usr/bin/bash -c ulimit -n 32768; ulimit -c unlimited; /usr/bin/ceph-run /usr/bin/ceph-osd -i 15 --pid-file /var/run/ceph/osd.15.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Oct 21 09:37:16 ceph242 systemd[1]: Started /usr/bin/bash -c ulimit -n 32768; ulimit -c unlimited; /usr/bin/ceph-run /usr/bin/ceph-osd -i 15 --pid-file /var/run/ceph/osd.15.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Oct 21 09:37:16 ceph242 bash[9554]: libust[9565/9565]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:375)
Oct 21 09:37:16 ceph242 bash[9554]: starting osd.15 at :/0 osd_data /var/lib/ceph/osd/ceph-15 /var/lib/ceph/osd/ceph-15/journal
Oct 21 09:37:17 ceph242 systemd[1]: Starting /usr/bin/bash -c ulimit -n 32768; ulimit -c unlimited; /usr/bin/ceph-run /usr/bin/ceph-osd -i 16 --pid-file /var/run/ceph/osd.16.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Oct 21 09:37:17 ceph242 systemd[1]: Started /usr/bin/bash -c ulimit -n 32768; ulimit -c unlimited; /usr/bin/ceph-run /usr/bin/ceph-osd -i 16 --pid-file /var/run/ceph/osd.16.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Oct 21 09:37:17 ceph242 bash[9967]: libust[9980/9980]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:375)
Oct 21 09:37:17 ceph242 bash[9967]: starting osd.16 at :/0 osd_data /var/lib/ceph/osd/ceph-16 /var/lib/ceph/osd/ceph-16/journal
Oct 21 09:37:17 ceph242 systemd[1]: Starting /usr/bin/bash -c ulimit -n 32768; ulimit -c unlimited; /usr/bin/ceph-run /usr/bin/ceph-osd -i 12 --pid-file /var/run/ceph/osd.12.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Oct 21 09:37:17 ceph242 systemd[1]: Started /usr/bin/bash -c ulimit -n 32768; ulimit -c unlimited; /usr/bin/ceph-run /usr/bin/ceph-osd -i 12 --pid-file /var/run/ceph/osd.12.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Oct 21 09:37:17 ceph242 bash[10547]: libust[10560/10560]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:375)
Oct 21 09:37:17 ceph242 bash[10547]: starting osd.12 at :/0 osd_data /var/lib/ceph/osd/ceph-12 /var/lib/ceph/osd/ceph-12/journal
Oct 21 09:37:18 ceph242 salt-minion[28870]: no valid command found; 10 closest matches:
Oct 21 09:37:18 ceph242 salt-minion[28870]: config set <var> <val> [<val>...]
Oct 21 09:37:18 ceph242 salt-minion[28870]: git_version
Oct 21 09:37:18 ceph242 salt-minion[28870]: help
Oct 21 09:37:18 ceph242 salt-minion[28870]: config show
Oct 21 09:37:18 ceph242 salt-minion[28870]: get_command_descriptions
Oct 21 09:37:18 ceph242 salt-minion[28870]: config get <var>
Oct 21 09:37:18 ceph242 salt-minion[28870]: perfcounters_schema
Oct 21 09:37:18 ceph242 salt-minion[28870]: 2
Oct 21 09:37:18 ceph242 salt-minion[28870]: config diff
Oct 21 09:37:18 ceph242 salt-minion[28870]: 0
Oct 21 09:37:18 ceph242 salt-minion[28870]: no valid command found; 10 closest matches:
Oct 21 09:37:18 ceph242 salt-minion[28870]: config set <var> <val> [<val>...]
Oct 21 09:37:18 ceph242 salt-minion[28870]: version
Oct 21 09:37:18 ceph242 salt-minion[28870]: git_version
Oct 21 09:37:18 ceph242 salt-minion[28870]: help
Oct 21 09:37:18 ceph242 salt-minion[28870]: config show
Oct 21 09:37:18 ceph242 salt-minion[28870]: get_command_descriptions
Oct 21 09:37:18 ceph242 salt-minion[28870]: config get <var>
Oct 21 09:37:18 ceph242 salt-minion[28870]: perfcounters_dump
Oct 21 09:37:18 ceph242 salt-minion[28870]: 2
Oct 21 09:37:18 ceph242 salt-minion[28870]: config diff
Oct 21 09:37:18 ceph242 salt-minion[28870]: no valid command found; 10 closest matches:
Oct 21 09:37:18 ceph242 salt-minion[28870]: config set <var> <val> [<val>...]
Oct 21 09:37:18 ceph242 salt-minion[28870]: version
Oct 21 09:37:18 ceph242 salt-minion[28870]: git_version
Oct 21 09:37:18 ceph242 salt-minion[28870]: help
Oct 21 09:37:18 ceph242 salt-minion[28870]: config show
Oct 21 09:37:18 ceph242 salt-minion[28870]: get_command_descriptions
Oct 21 09:37:18 ceph242 salt-minion[28870]: config get <var>
Oct 21 09:37:18 ceph242 salt-minion[28870]: perfcounters_dump
Oct 21 09:37:18 ceph242 salt-minion[28870]: 2
Oct 21 09:37:18 ceph242 salt-minion[28870]: config diff
Oct 21 09:37:18 ceph242 salt-minion[28870]: no valid command found; 10 closest matches:
Oct 21 09:37:18 ceph242 salt-minion[28870]: config set <var> <val> [<val>...]
Oct 21 09:37:18 ceph242 salt-minion[28870]: version
Oct 21 09:37:18 ceph242 salt-minion[28870]: git_version
Oct 21 09:37:18 ceph242 salt-minion[28870]: help
Oct 21 09:37:18 ceph242 salt-minion[28870]: config show
Oct 21 09:37:18 ceph242 salt-minion[28870]: get_command_descriptions
Oct 21 09:37:18 ceph242 salt-minion[28870]: config get <var>
Oct 21 09:37:18 ceph242 salt-minion[28870]: perfcounters_dump
Oct 21 09:37:18 ceph242 salt-minion[28870]: 2
Oct 21 09:37:18 ceph242 salt-minion[28870]: config diff
Oct 21 09:37:18 ceph242 salt-minion[28870]: no valid command found; 10 closest matches:
Oct 21 09:37:18 ceph242 salt-minion[28870]: config set <var> <val> [<val>...]
Oct 21 09:37:18 ceph242 salt-minion[28870]: version
Oct 21 09:37:18 ceph242 salt-minion[28870]: git_version
Oct 21 09:37:18 ceph242 salt-minion[28870]: help
Oct 21 09:37:18 ceph242 salt-minion[28870]: config show
Oct 21 09:37:18 ceph242 salt-minion[28870]: get_command_descriptions
Oct 21 09:37:18 ceph242 salt-minion[28870]: config get <var>
Oct 21 09:37:18 ceph242 salt-minion[28870]: perfcounters_dump
Oct 21 09:37:18 ceph242 salt-minion[28870]: 2
Oct 21 09:37:18 ceph242 salt-minion[28870]: config diff
Oct 21 09:37:18 ceph242 salt-minion[28870]: {'jid': '20151021093718293023', 'return': None, 'success': True, 'schedule': 'ceph.getrbdmap', 'pid': 10852, 'fun': 'ceph.getrbdmap', 'id': 'ceph242'}
Oct 21 09:37:18 ceph242 systemd[1]: Starting /usr/bin/bash -c ulimit -n 32768; ulimit -c unlimited; /usr/bin/ceph-run /usr/bin/ceph-osd -i 28 --pid-file /var/run/ceph/osd.28.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Oct 21 09:37:18 ceph242 systemd[1]: Started /usr/bin/bash -c ulimit -n 32768; ulimit -c unlimited; /usr/bin/ceph-run /usr/bin/ceph-osd -i 28 --pid-file /var/run/ceph/osd.28.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Oct 21 09:37:18 ceph242 bash[11340]: libust[11353/11353]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:375)
Oct 21 09:37:18 ceph242 bash[11340]: starting osd.28 at :/0 osd_data /var/lib/ceph/osd/ceph-28 /var/lib/ceph/osd/ceph-28/journal
Oct 21 09:37:19 ceph242 salt-minion[28870]: no valid command found; 10 closest matches:
Oct 21 09:37:19 ceph242 salt-minion[28870]: config set <var> <val> [<val>...]
Oct 21 09:37:19 ceph242 salt-minion[28870]: version
Oct 21 09:37:19 ceph242 salt-minion[28870]: git_version
Oct 21 09:37:19 ceph242 salt-minion[28870]: help
Oct 21 09:37:19 ceph242 salt-minion[28870]: config show
Oct 21 09:37:19 ceph242 salt-minion[28870]: get_command_descriptions
Oct 21 09:37:19 ceph242 salt-minion[28870]: config get <var>
Oct 21 09:37:19 ceph242 salt-minion[28870]: perfcounters_dump
Oct 21 09:37:19 ceph242 salt-minion[28870]: 2
Oct 21 09:37:19 ceph242 salt-minion[28870]: config diff
Oct 21 09:37:19 ceph242 salt-minion[28870]: no valid command found; 10 closest matches:
Oct 21 09:37:19 ceph242 salt-minion[28870]: config set <var> <val> [<val>...]
Oct 21 09:37:19 ceph242 salt-minion[28870]: version
Oct 21 09:37:19 ceph242 salt-minion[28870]: git_version
Oct 21 09:37:19 ceph242 salt-minion[28870]: help
Oct 21 09:37:19 ceph242 salt-minion[28870]: config show
Oct 21 09:37:19 ceph242 salt-minion[28870]: get_command_descriptions
Oct 21 09:37:19 ceph242 salt-minion[28870]: config get <var>
Oct 21 09:37:19 ceph242 salt-minion[28870]: perfcounters_dump
Oct 21 09:37:19 ceph242 salt-minion[28870]: 2
Oct 21 09:37:19 ceph242 salt-minion[28870]: config diff
Oct 21 09:37:19 ceph242 salt-minion[28870]: no valid command found; 10 closest matches:
Oct 21 09:37:19 ceph242 salt-minion[28870]: config set <var> <val> [<val>...]
Oct 21 09:37:19 ceph242 salt-minion[28870]: version
Oct 21 09:37:19 ceph242 salt-minion[28870]: git_version
Oct 21 09:37:19 ceph242 salt-minion[28870]: help
Oct 21 09:37:19 ceph242 salt-minion[28870]: config show
Oct 21 09:37:19 ceph242 salt-minion[28870]: get_command_descriptions
Oct 21 09:37:19 ceph242 salt-minion[28870]: config get <var>
Oct 21 09:37:19 ceph242 salt-minion[28870]: perfcounters_dump
Oct 21 09:37:19 ceph242 salt-minion[28870]: 2
Oct 21 09:37:19 ceph242 salt-minion[28870]: config diff
Oct 21 09:37:19 ceph242 salt-minion[28870]: no valid command found; 10 closest matches:
Oct 21 09:37:19 ceph242 salt-minion[28870]: config set <var> <val> [<val>...]
Oct 21 09:37:19 ceph242 salt-minion[28870]: version
Oct 21 09:37:19 ceph242 salt-minion[28870]: git_version
Oct 21 09:37:19 ceph242 salt-minion[28870]: help
Oct 21 09:37:19 ceph242 salt-minion[28870]: config show
Oct 21 09:37:19 ceph242 salt-minion[28870]: get_command_descriptions
Oct 21 09:37:19 ceph242 salt-minion[28870]: config get <var>
Oct 21 09:37:19 ceph242 salt-minion[28870]: perfcounters_dump
Oct 21 09:37:19 ceph242 salt-minion[28870]: 2
Oct 21 09:37:19 ceph242 salt-minion[28870]: config diff
Oct 21 09:37:19 ceph242 salt-minion[28870]: no valid command found; 10 closest matches:
Oct 21 09:37:19 ceph242 salt-minion[28870]: config set <var> <val> [<val>...]
Oct 21 09:37:19 ceph242 salt-minion[28870]: version
Oct 21 09:37:19 ceph242 salt-minion[28870]: git_version
Oct 21 09:37:19 ceph242 salt-minion[28870]: help
Oct 21 09:37:19 ceph242 salt-minion[28870]: config show
Oct 21 09:37:19 ceph242 salt-minion[28870]: get_command_descriptions
Oct 21 09:37:19 ceph242 salt-minion[28870]: config get <var>
Oct 21 09:37:19 ceph242 salt-minion[28870]: perfcounters_dump
Oct 21 09:37:19 ceph242 salt-minion[28870]: 2
Oct 21 09:37:19 ceph242 salt-minion[28870]: config diff
*Oct 21 09:37:19 ceph242 bash[8259]: /usr/bin/ceph-run: line 62:  8272 Terminated              "$@" Here osd.13 was terminated by other thread*

The ceph osd.13 log is:
2015-10-21 09:37:15.279261 7f6156bbd900  0 filestore(/var/lib/ceph/osd/ceph-13) backend xfs (magic 0x58465342)
2015-10-21 09:37:15.317714 7f6156bbd900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-13) detect_features: FIEMAP ioctl is supported and appears to work
2015-10-21 09:37:15.317739 7f6156bbd900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-13) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-10-21 09:37:15.351518 7f6156bbd900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-13) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-10-21 09:37:15.351598 7f6156bbd900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-13) detect_feature: extsize is disabled by conf
2015-10-21 09:37:15.402264 7f6156bbd900  0 filestore(/var/lib/ceph/osd/ceph-13) mount: WRITEAHEAD journal mode explicitly enabled in conf
2015-10-21 09:37:15.408128 7f6156bbd900  1 journal _open /var/lib/ceph/osd/ceph-13/journal fd 20: 10736386048 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-10-21 09:37:15.417227 7f6156bbd900  1 journal _open /var/lib/ceph/osd/ceph-13/journal fd 20: 10736386048 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-10-21 09:37:15.418156 7f6156bbd900  0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2015-10-21 09:37:15.423979 7f6156bbd900  0 osd.13 474599 crush map has features 1107558400, adjusting msgr requires for clients
2015-10-21 09:37:15.423989 7f6156bbd900  0 osd.13 474599 crush map has features 1107558400 was 8705, adjusting msgr requires for mons
2015-10-21 09:37:15.423996 7f6156bbd900  0 osd.13 474599 crush map has features 1107558400, adjusting msgr requires for osds
2015-10-21 09:37:15.424012 7f6156bbd900  0 osd.13 474599 load_pgs
2015-10-21 09:37:17.421581 7f6156bbd900  0 osd.13 474599 load_pgs opened 614 pgs
2015-10-21 09:37:17.423115 7f6156bbd900 -1 osd.13 474599 set_disk_tp_priority(22) Invalid argument: osd_disk_thread_ioprio_class is  but only the following values are allowed: idle, be or rt
2015-10-21 09:37:17.426471 7f614375b700  0 osd.13 474599 ignoring osdmap until we have initialized
2015-10-21 09:37:17.426531 7f614375b700  0 osd.13 474599 ignoring osdmap until we have initialized
2015-10-21 09:37:17.617044 7f6156bbd900  0 osd.13 474599 done with init, starting boot process
*2015-10-21 09:37:18.826390 7f612b5f2700 -1 osd.13 474602 *** Got signal Terminated **** Here the osd.13 get terminate 
2015-10-21 09:37:18.826445 7f612b5f2700  0 osd.13 474602 prepare_to_stop telling mon we are shutting down

History

#1 Updated by ceph zte over 8 years ago

I have kwon the resaon.I install ceph and openstack in the same mache.

The openstack have the guard thread,the gurad thread use systemctrl to juge wheacher ceph is ok.

When the systemctrl status ceph is not active,The open statck guard thread will restart the ceph.

But I think when the ceph to be killed or stop by other users.Now we can get too little message.

like below,we do not kown we kill ceph.
2015-10-21 09:37:18.826390 7f612b5f2700 -1 osd.13 474602 Got signal Terminated

we can modify the ceph SignalHandler code to kown who kill ceph or stop the ceph service.

Is that ok?

#2 Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)

#3 Updated by Samuel Just over 8 years ago

  • Status changed from New to Won't Fix

Doesn't seem like the osd can do much about this.

#4 Updated by ceph zte over 8 years ago

I can try do it,let osd konw who kill it.

Also available in: Atom PDF