Project

General

Profile

Bug #20923

ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(last >= start)

Added by Martin Millnert almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Backfill/Recovery
Target version:
Start date:
08/06/2017
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:

Description

Running 12.1.1 RC1 OSD:s, currently doing inline migration to BlueStore (ceph osd destroy procedure). Getting these assert fails quite often now.

Single-host cluster: 1 MON, 1 MGR, 5 OSD (4 FileStore, 1 BlueStore).

     0> 2017-08-06 11:28:50.154046 7f81daa56700 -1 /build/ceph-12.1.1/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::ExtentMap::fault_range(KeyValueDB*, uint32_
t, uint32_t)' thread 7f81daa56700 time 2017-08-06 11:28:50.142943
/build/ceph-12.1.1/src/os/bluestore/BlueStore.cc: 2630: FAILED assert(last >= start)

 ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x563a7c847e32]
 2: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x911) [0x563a7c6bd2c1]
 3: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph:
:buffer::list&, unsigned int)+0x23b) [0x563a7c70508b]
 4: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::b
uffer::list&, unsigned int)+0x102) [0x563a7c705ec2]
 5: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1685) [0x563a7c7148a5]
 6: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp
>, ThreadPool::TPHandle*)+0x378) [0x563a7c715a48]
 7: (PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, boost::intrusive_ptr<OpRequest>)+0xa7) [0x563a7c47cb17]
 8: (ECBackend::dispatch_recovery_messages(RecoveryMessages&, int)+0x69a) [0x563a7c593a9a]
 9: (ECBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x255) [0x563a7c5a28f5]
 10: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5e7) [0x563a7c417667]
 11: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) [0x563a7c2b7b07]
 12: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x563a7c5120a7]
 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x108c) [0x563a7c2e334c]
 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x93d) [0x563a7c84dadd]
 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x563a7c84fd00]
 16: (()+0x7494) [0x7f81f5e34494]
 17: (clone()+0x3f) [0x7f81f4ebbaff]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Following this some other threads dumps as well from abort signals, but I don't include them here.
It happens during backfill.

History

#1 Updated by Sage Weil almost 2 years ago

  • Status changed from New to Need More Info
  • Priority changed from Normal to Urgent

can you reproduce with debug bluestore = 1/30 and attach the resulting log?

#2 Updated by Martin Millnert almost 2 years ago

Sage Weil wrote:

can you reproduce with debug bluestore = 1/30 and attach the resulting log?

Here it comes (object names anonymized). Log from osd boot until crash.
File too big to be attached, put it at https://martin.millnert.se/files/ceph-osd.1.log.sedded.gz

#3 Updated by Sage Weil almost 2 years ago

  • Backport set to luminous

#4 Updated by Sage Weil almost 2 years ago

    -1> 2017-08-07 15:06:23.190485 7f64d9f56700 30 bluestore.extentmap(0x55b38a793270) fault_range 0xfff190a0~2aaf60

This object is larger than 32bits (4gb), which bluestore does not allow/support. Why are your rados objects so big? They should not be more than 10s of MB.

I'll add some sanity checks earlier on that error out when offsets are so large.

FWIW the default setting for osd_max_object_size is 128MB now, although in jewel it was 100GB... I think because we goofed the GB vs MB.

#5 Updated by Martin Millnert almost 2 years ago

Sage Weil wrote:

[...]
This object is larger than 32bits (4gb), which bluestore does not allow/support. Why are your rados objects so big? They should not be more than 10s of MB.

Oh that explains it well. Reason: I wrote som RADOS-interacting code myself and some very rudimentary benchmarking where I concluded it was a bit faster to go for non-chunked objects (EC pool).

Now that you mention it I recall this from the release notes of the luminous RC1.
I'll hack up some python-rados-loop to chunk them up.

I'll add some sanity checks earlier on that error out when offsets are so large.

Based on a sample size of 1, I'd say statistically that'd be helpful for other Ceph cluster operators as well. :)

FWIW the default setting for osd_max_object_size is 128MB now, although in jewel it was 100GB... I think because we goofed the GB vs MB.

Nod, recall the release note comment.. now.

Thanks for the fault finding.

#6 Updated by Sage Weil almost 2 years ago

  • Status changed from Need More Info to Need Review
  • Priority changed from Urgent to Normal

#7 Updated by Sage Weil almost 2 years ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF