Project

General

Profile

Actions

Bug #17553

closed

OSD crashed with signal 6, BackoffThrottle::get(unsigned long)

Added by Greg Farnum over 7 years ago. Updated over 6 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
FileStore
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://qa-proxy.ceph.com/teuthology/gregf-2016-10-06_18:07:58-fs-greg-fs-testing-105---basic-mira/458034/

This was in an FS branch testing run, but it didn't include any OSD changes.

2016-10-06T20:30:21.162 INFO:tasks.ceph.osd.1.mira056.stderr:*** Caught signal (Aborted) **
2016-10-06T20:30:21.162 INFO:tasks.ceph.osd.1.mira056.stderr: in thread 7f77c1c31700 thread_name:tp_osd_tp
2016-10-06T20:30:21.170 INFO:tasks.ceph.osd.1.mira056.stderr: ceph version v11.0.0-3149-g338d8fc (338d8fcd2d131763fecc99041119982db4cd3e65)
2016-10-06T20:30:21.170 INFO:tasks.ceph.osd.1.mira056.stderr: 1: (()+0x87ab52) [0x55dd18da5b52]
2016-10-06T20:30:21.171 INFO:tasks.ceph.osd.1.mira056.stderr: 2: (()+0x10340) [0x7f77dfb3e340]
2016-10-06T20:30:21.171 INFO:tasks.ceph.osd.1.mira056.stderr: 3: (pthread_cond_wait()+0xc4) [0x7f77dfb3a414]
2016-10-06T20:30:21.171 INFO:tasks.ceph.osd.1.mira056.stderr: 4: (std::condition_variable::wait(std::unique_lock<std::mutex>&)+0xc) [0x7f77ded3a4dc]
2016-10-06T20:30:21.171 INFO:tasks.ceph.osd.1.mira056.stderr: 5: (BackoffThrottle::get(unsigned long)+0xed) [0x55dd18f223ed]
2016-10-06T20:30:21.171 INFO:tasks.ceph.osd.1.mira056.stderr: 6: (FileStore::op_queue_reserve_throttle(FileStore::Op*)+0x2b) [0x55dd18bc067b]
2016-10-06T20:30:21.171 INFO:tasks.ceph.osd.1.mira056.stderr: 7: (FileStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, std::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x50c) [0x55dd18bd7c7c]
2016-10-06T20:30:21.171 INFO:tasks.ceph.osd.1.mira056.stderr: 8: (ReplicatedPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, std::shared_ptr<OpRequest>)+0x7c) [0x55dd18aae79c]
2016-10-06T20:30:21.171 INFO:tasks.ceph.osd.1.mira056.stderr: 9: (ReplicatedBackend::submit_transaction(hobject_t const&, eversion_t const&, std::unique_ptr<PGBackend::PGTransaction, std::default_delete<PGBackend::PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_
log_entry_t> > const&, boost::optional<pg_hit_set_history_t>&, Context*, Context*, Context*, unsigned long, osd_reqid_t, std::shared_ptr<OpRequest>)+0x92b) [0x55dd18b7394b]
2016-10-06T20:30:21.172 INFO:tasks.ceph.osd.1.mira056.stderr: 10: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*, ReplicatedPG::OpContext*)+0x79c) [0x55dd18a3b5dc]
2016-10-06T20:30:21.172 INFO:tasks.ceph.osd.1.mira056.stderr: 11: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x1765) [0x55dd18a8f505]
2016-10-06T20:30:21.172 INFO:tasks.ceph.osd.1.mira056.stderr: 12: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x27df) [0x55dd18a927bf]
2016-10-06T20:30:21.172 INFO:tasks.ceph.osd.1.mira056.stderr: 13: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x73c) [0x55dd18a4e75c]
2016-10-06T20:30:21.172 INFO:tasks.ceph.osd.1.mira056.stderr: 14: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f5) [0x55dd18906d55]
2016-10-06T20:30:21.172 INFO:tasks.ceph.osd.1.mira056.stderr: 15: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest> const&)+0x5d) [0x55dd18906f7d]
2016-10-06T20:30:21.172 INFO:tasks.ceph.osd.1.mira056.stderr: 16: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x87c) [0x55dd1892835c]
2016-10-06T20:30:21.172 INFO:tasks.ceph.osd.1.mira056.stderr: 17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947) [0x55dd18f2e457]
2016-10-06T20:30:21.172 INFO:tasks.ceph.osd.1.mira056.stderr: 18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55dd18f305b0]
2016-10-06T20:30:21.173 INFO:tasks.ceph.osd.1.mira056.stderr: 19: (()+0x8182) [0x7f77dfb36182]
2016-10-06T20:30:21.173 INFO:tasks.ceph.osd.1.mira056.stderr: 20: (clone()+0x6d) [0x7f77de4a547d]
2016-10-06T20:30:21.173 INFO:tasks.ceph.osd.1.mira056.stderr:2016-10-06 20:30:21.170245 7f77c1c31700 -1 *** Caught signal (Aborted) **
2016-10-06T20:30:21.173 INFO:tasks.ceph.osd.1.mira056.stderr: in thread 7f77c1c31700 thread_name:tp_osd_tp
Actions #1

Updated by Greg Farnum over 7 years ago

I think I remember something about a shutdown race and crash somewhere, so maybe that's this? Not sure though.

Actions #2

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
  • Category changed from OSD to Correctness/Safety
  • Component(RADOS) FileStore added
Actions #3

Updated by Sage Weil over 6 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF