Support #10486
openOSD Keeps Going Down
0%
Description
Hello,
I am encountering an issue where an OSD (OSD.1) keeps going down and out.
Initially I thought it was a drive problem however it is mounted and the drive is in a healthy state.
I tried starting the OSD again however after around 15 minutes it went down and was dropped.
I cannot seem to figure out why this is occurring and would like if someone would help me solve this problem.
Here is an excerpt of the log for the event that caused this OSD to be dropped.
0> 2015-01-08 11:20:14.843782 7fb1c4179700 -1 ** Caught signal (Aborted) *
in thread 7fb1c4179700
ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
1: /usr/bin/ceph-osd() [0xa88332]
2: (()+0xf130) [0x7fb1e4664130]
3: (gsignal()+0x39) [0x7fb1e307e5c9]
4: (abort()+0x148) [0x7fb1e307fcd8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x165) [0x7fb1e39829d5]
6: (()+0x5e946) [0x7fb1e3980946]
7: (()+0x5e973) [0x7fb1e3980973]
8: (()+0x5eb9f) [0x7fb1e3980b9f]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0xb7ae4a]
10: (FileStore::read(coll_t, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, bool)+0xbd4) [0x8e9bb4]
11: (ReplicatedBackend::build_push_op(ObjectRecoveryInfo const&, ObjectRecoveryProgress const&, ObjectRecoveryProgress*, PushOp*, object_stat_sum_t*)+0x5e9) [0x829949]
12: (ReplicatedBackend::prep_push(std::tr1::shared_ptr<ObjectContext>, hobject_t const&, pg_shard_t, eversion_t, interval_set<unsigned long>&, std::map<hobject_t, interval_set<unsigned long>, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, interval_set<unsigned long> > > >&, PushOp*)+0x40c) [0x82aabc]
13: (ReplicatedBackend::prep_push_to_replica(std::tr1::shared_ptr<ObjectContext>, hobject_t const&, pg_shard_t, PushOp*)+0x567) [0x82ee57]
14: (ReplicatedBackend::start_pushes(hobject_t const&, std::tr1::shared_ptr<ObjectContext>, ReplicatedBackend::RPGHandle*)+0x1bf) [0x831e8f]
15: (ReplicatedBackend::recover_object(hobject_t const&, eversion_t, std::tr1::shared_ptr<ObjectContext>, std::tr1::shared_ptr<ObjectContext>, PGBackend::RecoveryHandle*)+0xf3) [0x9df463]
16: (ReplicatedPG::prep_object_replica_pushes(hobject_t const&, eversion_t, PGBackend::RecoveryHandle*)+0x86b) [0x84c87b]
17: (ReplicatedPG::recover_replicas(int, ThreadPool::TPHandle&)+0xa68) [0x84dc38]
18: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*, ThreadPool::TPHandle&, int*)+0x5db) [0x87831b]
19: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x2c3) [0x6784e3]
20: (OSD::RecoveryWQ::_process(PG*, ThreadPool::TPHandle&)+0x27) [0x6dadb7]
21: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa66) [0xb6b966]
22: (ThreadPool::WorkThread::entry()+0x10) [0xb6c9f0]
23: (()+0x7df3) [0x7fb1e465cdf3]
24: (clone()+0x6d) [0x7fb1e313f01d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 keyvaluestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
2/-2 (syslog threshold) end dump of recent events ---
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.1.log
--
Updated by Sage Weil over 9 years ago
if you look further up in the log you should see why it is crashing.. a segfault message or failed assert or something?
Updated by Shun Mok Bhark over 9 years ago
Looking further up the log it has a osd_ping stamps.
Isn't this a failed assert?
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0xb7ae4a]
Updated by Shun Mok Bhark over 9 years ago
Shun Mok Bhark wrote:
Looking further up the log it has a osd_ping stamps.