Project

General

Profile

Actions

Bug #19800

closed

some osds are down when create a new pool and a new image of the pool (bluestore)

Added by Tang Jin almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
BlueStore
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After much Write IOs such as snapshot writing and PG splitting for cluster, to create a new pool and a new image of the pool will cause some osds are down, I find there are much Read IO in down osd whose cpu-usage are 100%.
In this case the objectstore is bluestore, and I never find this problem in cluster with filestore.
I also catch the result of 'IOSTAT', 'osd.log' and gbd stack when cpu-usage are 100%.


Files

disk busy.png (23.8 KB) disk busy.png Tang Jin, 04/28/2017 02:22 AM
Actions #1

Updated by Tang Jin almost 7 years ago

(gdb) bt
#0 0x00002b0ff06d4cc3 in pread64 () at ../sysdeps/unix/syscall-template.S:81
#1 0x00005591f7cecd35 in pread (_offset=3698483200, __nbytes=4096, __buf=<optimized out>, __fd=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/unistd.h:99
#2 KernelDevice::direct_read_unaligned (this=this@entry=0x2b0ff3705900, off=off@entry=3698485588, len=len@entry=1097, buf=buf@entry=0x2b10429fb470 "") at os/bluestore/KernelDevice.cc:567
#3 0x00005591f7ced882 in KernelDevice::read_random (this=0x2b0ff3705900, off=3698485588, len=1097, buf=0x2b10429fb470 "", buffered=<optimized out>) at os/bluestore/KernelDevice.cc:601
#4 0x00005591f7cbeef5 in BlueFS::_read_random (this=0x2b0ff379ef00, h=0x2b13827ff200, off=10643796, len=1097, out=out@entry=0x2b10429fb470 "") at os/bluestore/BlueFS.cc:860
#5 0x00005591f7ce08f0 in read_random (out=0x2b10429fb470 "", len=<optimized out>, offset=<optimized out>, h=<optimized out>, this=<optimized out>) at os/bluestore/BlueFS.h:415
#6 BlueRocksRandomAccessFile::Read (this=<optimized out>, offset=<optimized out>, n=<optimized out>, result=0x2b10429fb3d0, scratch=0x2b10429fb470 "") at os/bluestore/BlueRocksEnv.cc:94
#7 0x00005591f7e37f15 in rocksdb::RandomAccessFileReader::Read (this=this@entry=0x2b1004427840, offset=10643796, n=n@entry=1097, result=result@entry=0x2b10429fb3d0, scratch=scratch@entry=0x2
b10429fb470 "") at util/file_reader_writer.cc:40
#8 0x00005591f7e12588 in ReadBlock (options=..., buf=0x2b10429fb470 "", contents=0x2b10429fb3d0, handle=..., footer=..., file=0x2b1004427840) at table/format.cc:263
#9 rocksdb::ReadBlockContents (file=0x2b1004427840, footer=..., read_options=..., handle=..., contents=contents@entry=0x2b10429fc860, ioptions=..., decompression_requested=decompression_requ
ested@entry=true, compression_dict=..., cache_options=...) at table/format.cc:368
#10 0x00005591f7e03248 in rocksdb::(anonymous namespace)::ReadBlockFromFile (file=<optimized out>, footer=..., options=..., handle=..., result=result@entry=0x2b10429fc9d0, ioptions=..., do_un
compress=true, compression_dict=..., cache_options=..., read_amp_bytes_per_bit=0) at table/block_based_table_reader.cc:75
#11 0x00005591f7e0560e in rocksdb::BlockBasedTable::NewDataBlockIterator (rep=0x2b1004448380, ro=..., index_value=..., input_iter=0x0) at table/block_based_table_reader.cc:1225
#12 0x00005591f7e2d156 in rocksdb::(anonymous namespace)::TwoLevelIterator::InitDataBlock (this=0x2b111fe96000) at table/two_level_iterator.cc:222
#13 0x00005591f7e2d3ec in rocksdb::(anonymous namespace)::TwoLevelIterator::SkipEmptyDataBlocksForward (this=0x2b111fe96000) at table/two_level_iterator.cc:170
#14 0x00005591f7e2d466 in Next (this=0x2b1125ebb600) at ./table/iterator_wrapper.h:61
#15 rocksdb::(anonymous namespace)::TwoLevelIterator::Next (this=0x2b1125ebb5b0) at table/two_level_iterator.cc:149
#16 0x00005591f7e17a2d in Next (this=0x2b1125ebb288) at ./table/iterator_wrapper.h:61
#17 rocksdb::MergingIterator::Next (this=0x2b1125ebb1e8) at table/merger.cc:159
#18 0x00005591f7dab452 in rocksdb::DBIter::FindNextUserEntryInternal (this=this@entry=0x2b1125ebb038, skipping=<optimized out>, skipping@entry=false, prefix_check=prefix_check@entry=false) at
db/db_iter.cc:419
#19 0x00005591f7dadff3 in FindNextUserEntry (prefix_check=false, skipping=false, this=0x2b1125ebb038) at db/db_iter.cc:342
#20 rocksdb::DBIter::Seek (this=0x2b1125ebb038, target=...) at db/db_iter.cc:803
#21 0x00005591f7d59725 in RocksDBStore::RocksDBWholeSpaceIteratorImpl::lower_bound (this=0x2b1043814258, prefix=..., to=...) at kv/RocksDBStore.cc:688
#22 0x00005591f7d590e0 in RocksDBStore::RocksDBWholeSpaceIteratorImpl::upper_bound (this=0x2b1043814258, prefix="M", after="\000\000\000\000\000\000\004\002.MAP
", '0' <repeats 15 times>, "1_
", '0' <repeats 14 times>, "19.151") at kv/RocksDBStore.cc:676
#23 0x00005591f7be9fbd in BlueStore::OmapIteratorImpl::upper_bound (this=0x2b1043812340, after="MAP_", '0' <repeats 15 times>, "1_", '0' <repeats 14 times>, "19.151") at os/bluestore/BlueStor
e.cc:5673
#24 0x00005591f79ae6b8 in OSDriver::get_next (this=<optimized out>, key="MAP_", '0' <repeats 15 times>, "1_", '0' <repeats 14 times>, "19.151", next=0x2b10429fd0a0) at osd/SnapMapper.cc:43
#25 0x00005591f79b131a in MapCacher::MapCacher<std::string, ceph::buffer::list>::get_next (this=this@entry=0x2b1066a678c8, key="MAP_", '0' <repeats 15 times>, "1_", '0' <repeats 14 times>, "1
9.151", next=next@entry=0x2b10429fd330) at ./common/map_cacher.hpp:103
#26 0x00005591f79aeaf6 in SnapMapper::get_next_object_to_trim (this=this@entry=0x2b1066a678c8, snap=..., snap@entry=..., hoid=hoid@entry=0x2b1043818148) at osd/SnapMapper.cc:253
#27 0x00005591f7ab16c7 in ReplicatedPG::TrimmingObjects::react (this=this@entry=0x2b1043818100) at osd/ReplicatedPG.cc:13202
#28 0x00005591f7aeb984 in react<ReplicatedPG::TrimmingObjects, boost::statechart::event_base, void const*> (eventType=<synthetic pointer>, evt=..., stt=...) at /usr/local/include/boost/statec
hart/custom_reaction.hpp:42
#29 local_react_impl<boost::mpl::list<boost::statechart::custom_reaction<ReplicatedPG::SnapTrim>, boost::statechart::transition<ReplicatedPG::Reset, ReplicatedPG::NotTrimming> >, boost::state
chart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::
na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (eventType=0x5591f85607f0 <boost::statechart::de
tail::id_holder<ReplicatedPG::SnapTrim>::idProvider_>, evt=..., stt=...) at /usr/local/include/boost/statechart/simple_state.hpp:816
#30 local_react<boost::mpl::list<boost::statechart::custom_reaction<ReplicatedPG::SnapTrim>, boost::statechart::transition<ReplicatedPG::Reset, ReplicatedPG::NotTrimming> > > (eventType=0x559
1f85607f0 <boost::statechart::detail::id_holder<ReplicatedPG::SnapTrim>::idProvider_>, evt=..., this=0x2b1043818100) at /usr/local/include/boost/statechart/simple_state.hpp:851
#31 boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl (this=0x2b1043818100,
evt=..., eventType=0x5591f85607f0 <boost::statechart::detail::id_holder<ReplicatedPG::SnapTrim>::idProvider_>) at /usr/local/include/boost/statechart/simple_state.hpp:489
#32 0x00005591f7ad719f in operator() (this=<synthetic pointer>) at /usr/local/include/boost/statechart/state_machine.hpp:87
#33 operator()<boost::statechart::detail::send_function<boost::statechart::detail::state_base<std::allocator<void>, boost::statechart::detail::rtti_policy>, boost::statechart::event_base, voi
d const*>, boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming>::exception_event_handler> (this=0x2b1066a69b38, action=...) at /usr/local/include/boost/state
chart/null_exception_translator.hpp:33
#34 send_event (evt=..., this=0x2b1066a69ae0) at /usr/local/include/boost/statechart/state_machine.hpp:889
#35 boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events (this=0x2
b1066a69ae0) at /usr/local/include/boost/statechart/state_machine.hpp:910
#36 0x00005591f7a53bf3 in process_event (evt=..., this=0x2b1066a69ae0) at /usr/local/include/boost/statechart/state_machine.hpp:280
#37 ReplicatedPG::snap_trimmer (this=0x2b1066a67800, queued=<optimized out>) at osd/ReplicatedPG.cc:3695
#38 0x00005591f793e8df in internal_visit<PGSnapTrim> (operand=..., this=<synthetic pointer>) at /usr/local/include/boost/variant/variant.hpp:1046
#39 visitation_impl_invoke_impl<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*, PGSnapTrim> (storage=0x2b10429fdb20, visitor=<synthetic pointer>) at /usr/local/include/boo
st/variant/detail/visitation_impl.hpp:114
#40 visitation_impl_invoke<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*, PGSnapTrim, boost::variant<std::shared_ptr<OpRequest>, PGSnapTrim, PGScrub, PGRecovery>::has_fal
lback_type_> (t=0x0, storage=0x2b10429fdb20, visitor=<synthetic pointer>, internal_which=<optimized out>) at /usr/local/include/boost/variant/detail/visitation_impl.hpp:157
#41 visitation_impl<mpl_::int_<0>, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<4l>, std::shared_ptr<OpRequest>, boost::mpl::l_item<mpl_::lon
g_<3l>, PGSnapTrim, boost::mpl::l_item<mpl_::long_<2l>, PGScrub, boost::mpl::l_item<mpl_::long_<1l>, PGRecovery, boost::mpl::l_end> > > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::de
tail::variant::invoke_visitor<PGQueueable::RunVis>, void*, boost::variant<std::shared_ptr<OpRequest>, PGSnapTrim, PGScrub, PGRecovery>::has_fallback_type_> (no_backup_flag=..., storage=0x2b10
429fdb20, visitor=<synthetic pointer>, logical_which=<optimized out>, internal_which=<optimized out>) at /usr/local/include/boost/variant/detail/visitation_impl.hpp:238
#42 internal_apply_visitor_impl<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*> (storage=0x2b10429fdb20, visitor=<synthetic pointer>, logical_which=<optimized out>, intern
al_which=<optimized out>) at /usr/local/include/boost/variant/variant.hpp:2389
#43 internal_apply_visitor<boost::detail::variant::invoke_visitor<PGQueueable::RunVis> > (visitor=<synthetic pointer>, this=0x2b10429fdb18) at /usr/local/include/boost/variant/variant.hpp:240
0
#44 apply_visitor<PGQueueable::RunVis> (visitor=..., this=0x2b10429fdb18) at /usr/local/include/boost/variant/variant.hpp:2423
#45 apply_visitor<PGQueueable::RunVis, boost::variant<std::shared_ptr<OpRequest>, PGSnapTrim, PGScrub, PGRecovery> > (visitable=..., visitor=...) at /usr/local/include/boost/variant/detail/ap
ply_visitor_unary.hpp:70
#46 run (handle=..., pg=..., osd=<optimized out>, this=0x2b10429fdb18) at osd/OSD.h:441
#47 OSD::ShardedOpWQ::_process (this=0x2b0ff3f3cf48, thread_index=<optimized out>, hb=<optimized out>) at osd/OSD.cc:8874
#48 0x00005591f7f9d7a8 in ShardedThreadPool::shardedthreadpool_worker (this=0x2b0ff3f3c718, thread_index=<optimized out>) at common/WorkQueue.cc:356
#49 0x00005591f7f9f8f0 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at common/WorkQueue.h:685
#50 0x00002b0ff06cd184 in start_thread (arg=0x2b10429ff700) at pthread_create.c:312
#51 0x00002b0ff280537d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Actions #2

Updated by Tang Jin almost 7 years ago

Actions #3

Updated by Sage Weil almost 7 years ago

  • Subject changed from some osds are down when create a new pool and a new image of the pool to some osds are down when create a new pool and a new image of the pool (bluestore)
  • Priority changed from High to Urgent
Actions #4

Updated by Tang Jin almost 7 years ago

In this case, the bug occurs if firstly remove pool, following create pool and image.
When it occur, The most intuitionistic phenomenon is that rbd command(rbd create image) will be pending for few tens of seconds.
If you repeat these operations(firstly remove pool, then create pool and image) will also reproduce it again.

Actions #5

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
  • Component(RADOS) BlueStore added
Actions #6

Updated by Sage Weil almost 7 years ago

  • Status changed from New to Resolved

This looks like rocksdb compaction, probably triggered in part by a big deletion. There was a recent fix to do readahead that vastly improved the IO performance when doing compaction. Can you please try the latest master branch and see if the problem is resolved?

Actions #7

Updated by Tang Jin almost 7 years ago

@Sage Weil weil, could you show me the PR refer to readahead please.

Actions

Also available in: Atom PDF