Project

General

Profile

Bug #20849

Bluestore crashes with 4KB RBD randwrite over EC pool

Added by Jianjian Huo 23 days ago. Updated 14 days ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
Start date:
07/28/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No
Component(RADOS):

Description

This is a 48-OSD ceph cluster, on 4 nodes.
ceph version 12.1.1-624-g5d6eebc (5d6eebcf269b51b4e8f203caa65247dfb9b0d530) luminous (rc).

Created 48 RBD images with ec overwrite enabled
ceph osd pool create mypool 1024
ceph osd erasure-code-profile set myprofile k=6 m=2 crush-failure-domain=osd
ceph osd pool create mypool_ec 1024 1024 erasure myprofile
ceph osd pool set mypool_ec allow_ec_overwrites true
for i in {1..48}; do rbd create --size 100G --data-pool mypool_ec mypool/ssd_$i; done

Started 48 fio clients, each client writing into one rbd image (randwrite, 4KB, jobs=1, qd=1, runtime =300)
After one or two minutes, 4 OSD were down.
$ceph -s
cluster:
id: 59721d7c-b3e2-4263-92a4-aaf2b46b3969
health: HEALTH_WARN
4 osds down
Reduced data availability: 144 pgs inactive, 14 pgs down
Degraded data redundancy: 433559/8297835 objects degraded (5.225%), 776 pgs unclean, 652 pgs degraded, 110 pgs incomplete, 652 pgs undersized

services:
mon: 1 daemons, quorum a
mgr: e07e04305.eu6sqa(active)
osd: 48 osds: 44 up, 48 in; 776 remapped pgs
data:
pools: 2 pools, 2048 pgs
objects: 1012k objects, 2750 GB
usage: 4634 GB used, 84785 GB / 89420 GB avail
pgs: 7.031% pgs not active
433559/8297835 objects degraded (5.225%)
1272 active+clean
632 active+undersized+degraded
110 incomplete
20 undersized+degraded+peered
14 down

History

#1 Updated by Jianjian Huo 23 days ago

One OSD has this log

1: (()+0xa610b1) [0x7f098f3520b1]
2: (()+0xf100) [0x7f098c13d100]
3: (gsignal()+0x37) [0x7f098b1665f7]
4: (abort()+0x148) [0x7f098b167ce8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x284) [0x7f098f390914]
6: (BlueStore::ExtentMap::dirty_range(unsigned int, unsigned int)+0x4be) [0x7f098f2028ae]
7: (BlueStore::_do_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>)+0x12fd) [0x7f098f2560bd]
8: (BlueStore::_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0x94) [0x7f098f256834]
9: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x18b8) [0x7f098f269468]
10: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) [0x7f098f26a000]
11: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f098efdded5]
12: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, Context*)+0x631) [0x7f098f0fc751]
13: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) [0x7f098f10d4c7]
14: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x7f098f012e90]
15: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x58e) [0x7f098ef7f73e]
16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) [0x7f098ee18cb9]
17: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x7f098f07ff77]
18: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x7f098ee43bce]
19: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8e9) [0x7f098f396369]
20: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f098f3984f0]
21: (()+0x7dc5) [0x7f098c135dc5]
22: (clone()+0x6d) [0x7f098b227ced]

#2 Updated by Jianjian Huo 23 days ago

Second time ran, recreated whole cluster, ran same workload again, and 4 OSDs down again.
But this time saw different failure log.

One OSD has this:
ceph version 12.1.1-624-g5d6eebc (5d6eebcf269b51b4e8f203caa65247dfb9b0d530) luminous (rc)
1: (()+0xa610b1) [0x7fe882dc20b1]
2: (()+0xf100) [0x7fe87fbad100]
3: (gsignal()+0x37) [0x7fe87ebd65f7]
4: (abort()+0x148) [0x7fe87ebd7ce8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x165) [0x7fe87f4db9d5]
6: (()+0x5e946) [0x7fe87f4d9946]
7: (()+0x5e973) [0x7fe87f4d9973]
8: (()+0x5eb93) [0x7fe87f4d9b93]
9: (()+0x15ce2) [0x7fe8813f9ce2]
10: (operator new[](unsigned long)+0x1d5) [0x7fe88141c045]
11: (rocksdb::Arena::AllocateNewBlock(unsigned long)+0x40) [0x7fe883105f90]
12: (rocksdb::Arena::AllocateFallback(unsigned long, bool)+0x47) [0x7fe883106047]
13: (rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)+0xa0) [0x7fe883089340]
14: (()+0xd5b587) [0x7fe8830bc587]
15: (()+0xd5b613) [0x7fe8830bc613]
16: (rocksdb::MemTable::Add(unsigned long, rocksdb::ValueType, rocksdb::Slice const&, rocksdb::Slice const&, bool, rocksdb::MemTablePostProcessInfo*)+0xcf) [0x7fe883085f5f]
17: (rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&)+0x3d9) [0x7fe8830b7aa9]
18: (rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x4cd) [0x7fe8830b280d]
19: (rocksdb::WriteBatchInternal::InsertInto(rocksdb::autovector<rocksdb::WriteThread::Writer*, 8ul> const&, unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, unsigned long, rocksdb::DB*, bool)+0x128) [0x7fe8830b52a8]
20: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool)+0x15db) [0x7fe88315545b]
21: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x27) [0x7fe883155787]
22: (RocksDBStore::submit_transaction(std::shared_ptr<KeyValueDB::TransactionImpl>)+0xcf) [0x7fe882d197af]
23: (BlueStore::_kv_sync_thread()+0x1ac6) [0x7fe882cb3336]
24: (BlueStore::KVSyncThread::entry()+0xd) [0x7fe882cf983d]
25: (()+0x7dc5) [0x7fe87fba5dc5]
26: (clone()+0x6d) [0x7fe87ec97ced]

And another down OSD has this error:
ceph version 12.1.1-624-g5d6eebc (5d6eebcf269b51b4e8f203caa65247dfb9b0d530) luminous (rc)
1: (()+0xa610b1) [0x7fd39aa0c0b1]
2: (()+0xf100) [0x7fd3977f7100]
3: (gsignal()+0x37) [0x7fd3968205f7]
4: (abort()+0x148) [0x7fd396821ce8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x165) [0x7fd3971259d5]
6: (()+0x5e946) [0x7fd397123946]
7: (()+0x5e973) [0x7fd397123973]
8: (()+0x5eb93) [0x7fd397123b93]
9: (ceph::buffer::create_aligned(unsigned int, unsigned int)+0x28a) [0x7fd39aa12f6a]
10: (BlueRocksWritableFile::Append(rocksdb::Slice const&)+0xc5) [0x7fd39a9e8015]
11: (rocksdb::WritableFileWriter::WriteBuffered(char const*, unsigned long)+0xa3) [0x7fd39ad54b53]
12: (rocksdb::WritableFileWriter::Append(rocksdb::Slice const&)+0x2f0) [0x7fd39ad556b0]
13: (rocksdb::log::Writer::EmitPhysicalRecord(rocksdb::log::RecordType, char const*, unsigned long)+0x7a) [0x7fd39acce33a]
14: (rocksdb::log::Writer::AddRecord(rocksdb::Slice const&)+0xfd) [0x7fd39acce58d]
15: (rocksdb::DBImpl::WriteToWAL(rocksdb::autovector<rocksdb::WriteThread::Writer*, 8ul> const&, rocksdb::log::Writer*, bool, bool, unsigned long)+0x18a) [0x7fd39ad9d9aa]
16: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool)+0x92b) [0x7fd39ad9e7ab]
17: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x27) [0x7fd39ad9f787]
18: (RocksDBStore::submit_transaction(std::shared_ptr<KeyValueDB::TransactionImpl>)+0xcf) [0x7fd39a9637af]
19: (BlueStore::_kv_sync_thread()+0x1ac6) [0x7fd39a8fd336]
20: (BlueStore::KVSyncThread::entry()+0xd) [0x7fd39a94383d]
21: (()+0x7dc5) [0x7fd3977efdc5]
22: (clone()+0x6d) [0x7fd3968e1ced]

#3 Updated by Sage Weil 18 days ago

  • Status changed from New to Need More Info
  • Priority changed from Normal to Urgent

Can you reproduce this will debug bluestore = 20? And/or at least show the log snippet for the first osd crash (an assertion failure) so we can see what line it asserted on?

Also, do you see this still on current master?

#4 Updated by Sage Weil 16 days ago

  • Project changed from Ceph to RADOS
  • Status changed from Need More Info to Verified
  • Assignee set to Sage Weil
  • Priority changed from Urgent to Immediate

got logs!

/a/joshd-2017-08-04_06:16:52-rados-wip-20904-distro-basic-smithi/1482581

#5 Updated by Sage Weil 16 days ago

  • Status changed from Verified to Testing

#6 Updated by Sage Weil 14 days ago

  • Status changed from Testing to Resolved

Also available in: Atom PDF