Project

General

Profile

Actions

Bug #10085

closed

dirty exit ("Illegal instruction") on pthread_rwlock_unlock()

Added by Denis kaganovich over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
giant, firefly, dumpling
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After upgrade to glibc 2.20, "ceph" & "rbd" commands exiting with "Illegal instruction" exit message and !=0 exit code (but job OK). Same firefly & giant. Also "rbd" (only) show next:

  • Caught signal (Illegal instruction)
    in thread 7f21f64d0800
    ceph version 0.80.7-124-g0804dee (0804deeab293e09123d1b58825051ccc4dddbc0e)
    1: rbd() [0x42f27f]
    2: (()+0x33b20) [0x7f21f3f3fb20]
    3: (pthread_rwlock_unlock()+0x12) [0x7f21f4e2c402]
    4: (librados::RadosClient::~RadosClient()+0x125) [0x7f21f5160195]
    5: (librados::RadosClient::~RadosClient()+0x9) [0x7f21f5160319]
    6: (librados::Rados::shutdown()+0x36) [0x7f21f5133136]
    7: (main()+0x392) [0x40eac2]
    8: (__libc_start_main()+0xf0) [0x7f21f3f2bfa0]
    9: rbd() [0x4159ce]
    2014-11-12 21:42:23.254063 7f21f64d0800 -1
    Caught signal (Illegal instruction) *
    in thread 7f21f64d0800

    ceph version 0.80.7-124-g0804dee (0804deeab293e09123d1b58825051ccc4dddbc0e)
    1: rbd() [0x42f27f]
    2: (()+0x33b20) [0x7f21f3f3fb20]
    3: (pthread_rwlock_unlock()+0x12) [0x7f21f4e2c402]
    4: (librados::RadosClient::~RadosClient()+0x125) [0x7f21f5160195]
    5: (librados::RadosClient::~RadosClient()+0x9) [0x7f21f5160319]
    6: (librados::Rados::shutdown()+0x36) [0x7f21f5133136]
    7: (main()+0x392) [0x40eac2]
    8: (__libc_start_main()+0xf0) [0x7f21f3f2bfa0]
    9: rbd() [0x4159ce]
    NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
80> 2014-11-12 21:42:23.236671 7f21f64d0800 5 asok(0x121a800) register_command perfcounters_dump hook 0x121a7e0
-79> 2014-11-12 21:42:23.236698 7f21f64d0800 5 asok(0x121a800) register_command 1 hook 0x121a7e0
-78> 2014-11-12 21:42:23.236706 7f21f64d0800 5 asok(0x121a800) register_command perf dump hook 0x121a7e0
-77> 2014-11-12 21:42:23.236714 7f21f64d0800 5 asok(0x121a800) register_command perfcounters_schema hook 0x121a7e0
-76> 2014-11-12 21:42:23.236726 7f21f64d0800 5 asok(0x121a800) register_command 2 hook 0x121a7e0
-75> 2014-11-12 21:42:23.236731 7f21f64d0800 5 asok(0x121a800) register_command perf schema hook 0x121a7e0
-74> 2014-11-12 21:42:23.236737 7f21f64d0800 5 asok(0x121a800) register_command config show hook 0x121a7e0
-73> 2014-11-12 21:42:23.236742 7f21f64d0800 5 asok(0x121a800) register_command config set hook 0x121a7e0
-72> 2014-11-12 21:42:23.236747 7f21f64d0800 5 asok(0x121a800) register_command config get hook 0x121a7e0
-71> 2014-11-12 21:42:23.236752 7f21f64d0800 5 asok(0x121a800) register_command log flush hook 0x121a7e0
-70> 2014-11-12 21:42:23.236757 7f21f64d0800 5 asok(0x121a800) register_command log dump hook 0x121a7e0
-69> 2014-11-12 21:42:23.236764 7f21f64d0800 5 asok(0x121a800) register_command log reopen hook 0x121a7e0
-68> 2014-11-12 21:42:23.239829 7f21f64d0800 10 monclient(hunting): build_initial_monmap
-67> 2014-11-12 21:42:23.239937 7f21f64d0800 1 librados: starting msgr at :/0
-66> 2014-11-12 21:42:23.239951 7f21f64d0800 1 librados: starting objecter
-65> 2014-11-12 21:42:23.239987 7f21f64d0800 1 -
:/0 messenger.start
64> 2014-11-12 21:42:23.240005 7f21f64d0800 1 librados: setting wanted keys
-63> 2014-11-12 21:42:23.240009 7f21f64d0800 1 librados: calling monclient init
-62> 2014-11-12 21:42:23.240011 7f21f64d0800 10 monclient(hunting): init
-61> 2014-11-12 21:42:23.240016 7f21f64d0800 5 adding auth protocol: cephx
-60> 2014-11-12 21:42:23.240020 7f21f64d0800 10 monclient(hunting): auth_supported 2 method cephx
-59> 2014-11-12 21:42:23.240204 7f21f64d0800 2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.keyring
-58> 2014-11-12 21:42:23.240278 7f21f64d0800 10 monclient(hunting): _reopen_session rank -1 name
-57> 2014-11-12 21:42:23.240336 7f21f64d0800 10 monclient(hunting): picked mon.a con 0x12275e0 addr 10.227.227.101:6789/0
-56> 2014-11-12 21:42:23.240365 7f21f64d0800 10 monclient(hunting): _send_mon_message to mon.a at 10.227.227.101:6789/0
-55> 2014-11-12 21:42:23.240372 7f21f64d0800 1 -
:/1071357 --> 10.227.227.101:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x1227a40 con 0x12275e0
54> 2014-11-12 21:42:23.240388 7f21f64d0800 10 monclient(hunting): renew_subs
-53> 2014-11-12 21:42:23.240400 7f21f64d0800 10 monclient(hunting): authenticate will time out at 2014-11-12 21:47:23.240399
-52> 2014-11-12 21:42:23.240985 7f21f64c8700 1 -
10.227.227.105:0/1071357 learned my addr 10.227.227.105:0/1071357
51> 2014-11-12 21:42:23.241821 7f21f1d56700 1 - 10.227.227.105:0/1071357 <== mon.0 10.227.227.101:6789/0 1 ==== mon_map v1 ==== 473+0+0 (505717684 0 0) 0x7f21e0000bd0 con 0x12275e0
-50> 2014-11-12 21:42:23.241865 7f21f1d56700 10 monclient(hunting): handle_monmap mon_map v1
-49> 2014-11-12 21:42:23.241883 7f21f1d56700 10 monclient(hunting): got monmap 1, mon.a is now rank 0
-48> 2014-11-12 21:42:23.241889 7f21f1d56700 10 monclient(hunting): dump:
epoch 1
fsid 4fc73849-f913-4689-b6a6-efcefccae8d1
last_changed 2013-05-28 18:43:12.325687
created 2013-05-28 18:43:12.325687
0: 10.227.227.101:6789/0 mon.a
1: 10.227.227.103:6789/0 mon.b
2: 10.227.227.104:6789/0 mon.c

47> 2014-11-12 21:42:23.241931 7f21f1d56700  1 - 10.227.227.105:0/1071357 <== mon.0 10.227.227.101:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (3914553092 0 0) 0x7f21e0001080 con 0x12275e0
46> 2014-11-12 21:42:23.241985 7f21f1d56700 10 monclient(hunting): my global_id is 1917150
-45> 2014-11-12 21:42:23.242145 7f21f1d56700 10 monclient(hunting): _send_mon_message to mon.a at 10.227.227.101:6789/0
-44> 2014-11-12 21:42:23.242156 7f21f1d56700 1 -
10.227.227.105:0/1071357 --> 10.227.227.101:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7f21e40017c0 con 0x12275e0
43> 2014-11-12 21:42:23.242881 7f21f1d56700 1 - 10.227.227.105:0/1071357 <== mon.0 10.227.227.101:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (2469527312 0 0) 0x7f21e0001080 con 0x12275e0
42> 2014-11-12 21:42:23.242968 7f21f1d56700 10 monclient(hunting): _send_mon_message to mon.a at 10.227.227.101:6789/0
-41> 2014-11-12 21:42:23.242977 7f21f1d56700 1 -
10.227.227.105:0/1071357 --> 10.227.227.101:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 0x7f21e4001e90 con 0x12275e0
40> 2014-11-12 21:42:23.243659 7f21f1d56700 1 - 10.227.227.105:0/1071357 <== mon.0 10.227.227.101:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 393+0+0 (1710929821 0 0) 0x7f21e00008c0 con 0x12275e0
39> 2014-11-12 21:42:23.243717 7f21f1d56700 1 monclient(hunting): found mon.a
-38> 2014-11-12 21:42:23.243722 7f21f1d56700 10 monclient: _send_mon_message to mon.a at 10.227.227.101:6789/0
-37> 2014-11-12 21:42:23.243728 7f21f1d56700 1 -
10.227.227.105:0/1071357 --> 10.227.227.101:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x1227f00 con 0x12275e0
36> 2014-11-12 21:42:23.243779 7f21f64d0800 5 monclient: authenticate success, global_id 1917150
-35> 2014-11-12 21:42:23.243821 7f21f64d0800 5 asok(0x121a800) register_command objecter_requests hook 0x1228110
-34> 2014-11-12 21:42:23.243879 7f21f64d0800 10 monclient: renew_subs
-33> 2014-11-12 21:42:23.243885 7f21f64d0800 10 monclient: _send_mon_message to mon.a at 10.227.227.101:6789/0
-32> 2014-11-12 21:42:23.243895 7f21f64d0800 1 -
10.227.227.105:0/1071357 --> 10.227.227.101:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x1227a40 con 0x12275e0
31> 2014-11-12 21:42:23.243910 7f21f64d0800 10 monclient: renew_subs
-30> 2014-11-12 21:42:23.243913 7f21f64d0800 10 monclient: _send_mon_message to mon.a at 10.227.227.101:6789/0
-29> 2014-11-12 21:42:23.243918 7f21f64d0800 1 -
10.227.227.105:0/1071357 --> 10.227.227.101:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x1229160 con 0x12275e0
28> 2014-11-12 21:42:23.243946 7f21f64d0800 1 librados: init done
-27> 2014-11-12 21:42:23.244313 7f21f1d56700 1 -
10.227.227.105:0/1071357 <== mon.0 10.227.227.101:6789/0 5 ==== mon_map v1 ==== 473+0+0 (505717684 0 0) 0x7f21e00012f0 con 0x12275e0
-26> 2014-11-12 21:42:23.244330 7f21f1d56700 10 monclient: handle_monmap mon_map v1
-25> 2014-11-12 21:42:23.244345 7f21f1d56700 10 monclient: got monmap 1, mon.a is now rank 0
-24> 2014-11-12 21:42:23.244349 7f21f1d56700 10 monclient: dump:
epoch 1
fsid 4fc73849-f913-4689-b6a6-efcefccae8d1
last_changed 2013-05-28 18:43:12.325687
created 2013-05-28 18:43:12.325687
0: 10.227.227.101:6789/0 mon.a
1: 10.227.227.103:6789/0 mon.b
2: 10.227.227.104:6789/0 mon.c
23> 2014-11-12 21:42:23.244382 7f21f1d56700  1 - 10.227.227.105:0/1071357 <== mon.0 10.227.227.101:6789/0 6 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (2160789602 0 0) 0x7f21e0001590 con 0x12275e0
22> 2014-11-12 21:42:23.244393 7f21f1d56700 10 monclient: handle_subscribe_ack sent 2014-11-12 21:42:23.240391 renew after 2014-11-12 21:44:53.240391
-21> 2014-11-12 21:42:23.247283 7f21f1d56700 1 -
10.227.227.105:0/1071357 <== mon.0 10.227.227.101:6789/0 7 ==== osd_map(79561..79561 src has 78135..79561) v3 ==== 94482+0+0 (4079190998 0 0) 0x7f21e00008c0 con 0x12275e0
20> 2014-11-12 21:42:23.248914 7f21f1d56700 1 - 10.227.227.105:0/1071357 <== mon.0 10.227.227.101:6789/0 8 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (2160789602 0 0) 0x7f21e00184e0 con 0x12275e0
19> 2014-11-12 21:42:23.248927 7f21f1d56700 10 monclient: handle_subscribe_ack sent 0.000000, ignoring
-18> 2014-11-12 21:42:23.248932 7f21f1d56700 1 -
10.227.227.105:0/1071357 <== mon.0 10.227.227.101:6789/0 9 ==== osd_map(79561..79561 src has 78135..79561) v3 ==== 94482+0+0 (4079190998 0 0) 0x7f21e002f920 con 0x12275e0
17> 2014-11-12 21:42:23.248946 7f21f1d56700 1 - 10.227.227.105:0/1071357 <== mon.0 10.227.227.101:6789/0 10 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (2160789602 0 0) 0x7f21e002fdc0 con 0x12275e0
16> 2014-11-12 21:42:23.248954 7f21f1d56700 10 monclient: handle_subscribe_ack sent 0.000000, ignoring
-15> 2014-11-12 21:42:23.249032 7f21f64d0800 1 -
10.227.227.105:0/1071357 --> 10.227.227.101:6810/32749 -- osd_op(client.1917150.0:1 rbd_directory [read 0~0] 2.30a98c1c ack+read e79561) v4 -- ?+0 0x122a790 con 0x1229d70
14> 2014-11-12 21:42:23.251299 7f21f1d56700 1 - 10.227.227.105:0/1071357 <== osd.0 10.227.227.101:6810/32749 1 ==== osd_op_reply(1 rbd_directory [read 0~109] v0'0 uv282471 ondisk = 0) v6 ==== 180+0+109 (1476285171 0 3731283201) 0x7f21dc000c70 con 0x1229d70
13> 2014-11-12 21:42:23.251418 7f21f64d0800 1 - 10.227.227.105:0/1071357 --> 10.227.227.101:6810/32749 -- osd_op(client.1917150.0:2 rbd_directory [call rbd.dir_list] 2.30a98c1c ack+read e79561) v4 -- ?+0 0x122a790 con 0x1229d70
12> 2014-11-12 21:42:23.253222 7f21f1d56700 1 - 10.227.227.105:0/1071357 <== osd.0 10.227.227.101:6810/32749 2 ==== osd_op_reply(2 rbd_directory [call] v0'0 uv282471 ondisk = 0) v6 ==== 180+0+823 (3757076840 0 2169796863) 0x7f21dc000c70 con 0x1229d70
11> 2014-11-12 21:42:23.253436 7f21f64d0800 1 - 10.227.227.105:0/1071357 mark_down 0x1229d70 -- 0x1229ae0
10> 2014-11-12 21:42:23.253505 7f21f64d0800 10 monclient: shutdownshutdown
-9> 2014-11-12 21:42:23.253483 7f21eeb4e700 2 -
10.227.227.105:0/1071357 >> 10.227.227.101:6810/32749 pipe(0x1229ae0 sd=4 :36480 s=4 pgs=54 cs=1 l=1 c=0x1229d70).reader couldn't read tag, (0) Success
8> 2014-11-12 21:42:23.253512 7f21eeb4e700 2 - 10.227.227.105:0/1071357 >> 10.227.227.101:6810/32749 pipe(0x1229ae0 sd=4 :36480 s=4 pgs=54 cs=1 l=1 c=0x1229d70).fault (0) Success
7> 2014-11-12 21:42:23.253535 7f21f64d0800 1 - 10.227.227.105:0/1071357 mark_down 0x12275e0 -- 0x1227350
6> 2014-11-12 21:42:23.253562 7f21f64d0800 5 asok(0x121a800) unregister_command objecter_requests
-5> 2014-11-12 21:42:23.253575 7f21f64d0800 1 -
10.227.227.105:0/1071357 mark_down_all
4> 2014-11-12 21:42:23.253559 7f21efd52700 2 - 10.227.227.105:0/1071357 >> 10.227.227.101:6789/0 pipe(0x1227350 sd=3 :55320 s=4 pgs=12684 cs=1 l=1 c=0x12275e0).reader couldn't read tag, (0) Success
3> 2014-11-12 21:42:23.253587 7f21efd52700 2 - 10.227.227.105:0/1071357 >> 10.227.227.101:6789/0 pipe(0x1227350 sd=3 :55320 s=4 pgs=12684 cs=1 l=1 c=0x12275e0).fault (0) Success
2> 2014-11-12 21:42:23.253659 7f21f64d0800 1 - 10.227.227.105:0/1071357 shutdown complete.
-1> 2014-11-12 21:42:23.253668 7f21f64d0800 1 librados: shutdown
0> 2014-11-12 21:42:23.254063 7f21f64d0800 -1 ** Caught signal (Illegal instruction) *
in thread 7f21f64d0800
ceph version 0.80.7-124-g0804dee (0804deeab293e09123d1b58825051ccc4dddbc0e)
1: rbd() [0x42f27f]
2: (()+0x33b20) [0x7f21f3f3fb20]
3: (pthread_rwlock_unlock()+0x12) [0x7f21f4e2c402]
4: (librados::RadosClient::~RadosClient()+0x125) [0x7f21f5160195]
5: (librados::RadosClient::~RadosClient()+0x9) [0x7f21f5160319]
6: (librados::Rados::shutdown()+0x36) [0x7f21f5133136]
7: (main()+0x392) [0x40eac2]
8: (__libc_start_main()+0xf0) [0x7f21f3f2bfa0]
9: rbd() [0x4159ce]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 keyvaluestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
2/-2 (syslog threshold)
99/99 (stderr threshold)
max_recent 500
max_new 1000
log_file
--
end dump of recent events ---
reraise_fatal: default handler for signal 4 didn't terminate the process?

Actions #1

Updated by Samuel Just over 9 years ago

  • Status changed from New to Rejected

whatever the platform is, you'll have to build your own packages, I guess.

Actions #2

Updated by Denis kaganovich over 9 years ago

"Own packages"? It is Gentoo.

Actions #3

Updated by Denis kaganovich over 9 years ago

PS Bug is filled to (https://bugs.gentoo.org/show_bug.cgi?id=529076 ), but I think there was near "vanilla" case.

Actions #5

Updated by Greg Farnum over 9 years ago

  • Subject changed from glibc 2.20: "ceph" & "rbd" dirty exit ("Illegal instruction") to dirty exit ("Illegal instruction") on pthread_rwlock_unlock()
  • Category deleted (ceph cli)
  • Status changed from Rejected to In Progress
  • Priority changed from Normal to High
  • Source changed from other to Community (user)
  • Backport set to giant, firefly, dumpling

It looks to me like this is a result of our naughty rwlock handling: https://github.com/ceph/ceph/pull/2937

There's another report of it on the mailing list in the OSD as well: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044870.html

Actions #6

Updated by Sage Weil over 9 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF