Bug #3787
closedCeph OSD crashes on ceph tell osd.x
0%
Description
I recently set up a small test cluster with 2 nodes to test the 0.48.3 -> 0.56.1 upgrade. After Upgrading one of the nodes to 0.56.1 (OSD, MON, MDS, RadosGW) we noticed that the ceph osd crashes after issuing a:
ceph tell osd.0 (where 0 is the updated osd)
6> 2013-01-11 08:02:33.338507 7f2a40788700 10 monclient: tick
-5> 2013-01-11 08:02:33.338540 7f2a40788700 10 monclient: _check_auth_rotating renewing rotating keys (they expired before 2013-01-11 08:02:03.338538)
-4> 2013-01-11 08:02:33.338559 7f2a40788700 10 monclient: renew subs? (now: 2013-01-11 08:02:33.338558; renew after: 2013-01-11 08:04:40.569279) - no
3> 2013-01-11 08:02:33.341715 7f2a4bf9f700 5 osd.0 34 tick
-2> 2013-01-11 08:02:33.947731 7f2a3ae75700 1 - 10.251.46.216:6800/8872 >> :/0 pipe(0x2f49d80 sd=34 :6800 pgs=0 cs=0 l=0).accept sd=34 10.251.46.216:35825/0
1> 2013-01-11 08:02:33.948104 7f2a43f8f700 1 - 10.251.46.216:6800/8872 <== client.? 10.251.46.216:0/8958 1 ==== command(tid 1: ) v1 ==== 20+0+0 (2743466195 0 0) 0x31f0e00 con 0x2f506e0
0> 2013-01-11 08:02:33.949617 7f2a3d782700 -1 ** Caught signal (Segmentation fault) *
in thread 7f2a3d782700
ceph version 0.56.1 (e4a541624df62ef353e754391cbbb707f54b16f7)
1: /usr/bin/ceph-osd() [0x79c2e9]
2: (()+0xeff0) [0x7f2a504b1ff0]
3: (std::string::compare(char const*) const+0x16) [0x7f2a4f7aab66]
4: (OSD::do_command(Connection*, unsigned long, std::vector<std::string, std::allocator<std::string> >&, ceph::buffer::list&)+0x311) [0x5fc741]
5: (OSD::CommandWQ::_process(OSD::Command*)+0x37) [0x64af47]
6: (ThreadPool::worker(ThreadPool::WorkThread*)+0x82b) [0x82b47b]
7: (ThreadPool::WorkThread::entry()+0x10) [0x82dc60]
8: (()+0x68ca) [0x7f2a504a98ca]
9: (clone()+0x6d) [0x7f2a4efd8b6d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
0/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 hadoop
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 100000
max_new 1000
the setup is running on a standard Debian Squeeze installation with the packages from http://ceph.com/debian-bobtail/
Interestingly the ceph 0.48.3 OSD also crashes when issuing a ceph tell osd.1:
3> 2013-01-11 08:15:33.524493 7faa4ef46700 1 - 10.58.214.195:6803/9677 <== client.? 10.251.46.216:0/9448 1 ==== command(tid 1: ) v1 ==== 20+0+0 (2743466195 0 0) 0x318a1c0 con 0x298f3c0
-2> 2013-01-11 08:15:33.524524 7faa4ef46700 5 throttle(osd_client_bytes 0x7fff3498e0b0) put 20 (0xb68cc8 -> 0)
-1> 2013-01-11 08:15:33.524530 7faa4ef46700 5 throttle(msgr_dispatch_throttler-client 0x1f879e0) put 20 (0xb68cc8 -> 0)
0> 2013-01-11 08:15:33.525740 7faa49e3b700 -1 ** Caught signal (Segmentation fault) *
in thread 7faa49e3b700
ceph version 0.48.3argonaut (commit:920f82e805efec2cae05b79c155c07df0f3ed5dd)
1: /usr/bin/ceph-osd() [0x707249]
2: (()+0xeff0) [0x7faa5ca56ff0]
3: (std::string::compare(char const*) const+0x16) [0x7faa5bf64b66]
4: (OSD::do_command(Connection*, unsigned long, std::vector<std::string, std::allocator<std::string> >&, ceph::buffer::list&)+0x383) [0x5b6103]
5: (OSD::CommandWQ::_process(OSD::Command*)+0x35) [0x5f9fd5]
6: (ThreadPool::worker()+0x76f) [0x78f61f]
7: (ThreadPool::WorkThread::entry()+0xd) [0x5e5efd]
8: (()+0x68ca) [0x7faa5ca4e8ca]
9: (clone()+0x6d) [0x7faa5b792b6d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- end dump of recent events ---
Other than that the cluster seems to work fine.
Updated by Sage Weil over 11 years ago
- Status changed from New to 12
- Priority changed from Normal to Urgent
verified this happens on master. should be an easy fix. thanks for the report!
Updated by Samuel Just over 11 years ago
- Status changed from Fix Under Review to Resolved
8cf79f252a1bcea5713065390180a36f31d66dfd
Updated by Sage Weil about 11 years ago
ushed to bobtail, 55687240b2de20185524de07e67f42c3b1ae6592