Project

General

Profile

Bug #20277

bluestore crashed while performing scrub

Added by Kefu Chai almost 7 years ago. Updated over 6 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
BlueStore
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version 12.0.3 (f2337d1b42fa49dbb0a93e4048a42762e3dffbbf)
 1: (()+0x9bb95a) [0x562f2224595a]
 2: (()+0x110c0) [0x7f0296ebb0c0]
 3: (gsignal()+0xcf) [0x7f0295cf5fcf]
 4: (abort()+0x16a) [0x7f0295cf73fa]
 5: (()+0x2be37) [0x7f0295ceee37]
 6: (()+0x2bee2) [0x7f0295ceeee2]
 7: (()+0x3b4062) [0x562f21c3e062]
 8: (BlueStore::Blob::get_ref(BlueStore::Collection*, unsigned int, unsigned int)+0) [0x562f220e1150]
 9: (BlueStore::Blob::get_ref(BlueStore::Collection*, unsigned int, unsigned int)+0x242) [0x562f220e1392]
 10: (BlueStore::Blob::decode(BlueStore::Collection*, ceph::buffer::ptr::iterator&, unsigned long, unsigned long*, bool)+0x60d) [0x562f220fdf9d]
 11: (BlueStore::ExtentMap::decode_spanning_blobs(ceph::buffer::ptr::iterator&)+0x23f) [0x562f2210ce1f]
 12: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0xf06) [0x562f2212d366]
 13: (BlueStore::stat(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, stat*, bool)+0xcc) [0x562f2212debc]
 14: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t, std::allocator<hobject_t> > const&, bool, unsigned int, ThreadPool::TPHandle&)+0x1ed) [0x562f21f0742d]
 15: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x214) [0x562f21dc3b64]
 16: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x61d) [0x562f21dc448d]
 17: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x772) [0x562f21e78172]
 18: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x22c) [0x562f21d19e0c]
 19: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x562f21d1a227]
 20: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x108c) [0x562f21d45d4c]
 21: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x96e) [0x562f2228c0ae]
 22: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562f2228e2b0]
 23: (()+0x7494) [0x7f0296eb1494]
 24: (clone()+0x3f) [0x7f0295dab93f]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

not sure if it exists in the latest version, though.

ceph-osd.11.log.bz2 (573 KB) Kefu Chai, 06/13/2017 08:20 AM

History

#1 Updated by Peter Gervai almost 7 years ago

What happened (twice) was:
  • the osd had a crc error inconsistent pg
  • set debug-bluestore and debug-osd to 20
  • the osd crashes
    (so I haven't initiated the scrub manually)

after it has been 'pg repair'ed I set debug again and started scrub and deep-scrub and osd haven't crashed again.

#2 Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (107)
  • Component(RADOS) BlueStore added

#3 Updated by Sage Weil almost 7 years ago

  • Status changed from New to Need More Info

A bug was just fixed in the spanning blob code, see https://github.com/ceph/ceph/pull/15654. Are you able to reproduce the crash, and/or can you retry with latest master?

#4 Updated by Sage Weil over 6 years ago

  • Status changed from Need More Info to Can't reproduce

Also available in: Atom PDF