https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2017-07-09T17:49:11Z
Ceph
RADOS - Bug #20545: erasure coding = crashes
https://tracker.ceph.com/issues/20545?journal_id=94779
2017-07-09T17:49:11Z
Bob Bobington
<ul></ul><p>Actually I thought to test this with filestore on BTRFS and it fails there in the same way as well. This seems to be an issue with Ceph rather than Bluestore.</p>
RADOS - Bug #20545: erasure coding = crashes
https://tracker.ceph.com/issues/20545?journal_id=94781
2017-07-09T18:27:02Z
Bob Bobington
<ul></ul><p>I ran Rados bench on the same cluster and it seems to be working fine, so it seems that something about my Python code is causing the crash. Here it is:</p>
<pre><code class="python syntaxhl"><span class="CodeRay"><span class="keyword">import</span> <span class="include">rados</span>
cluster = rados.Rados(conffile=<span class="string"><span class="delimiter">'</span><span class="content">/etc/ceph/ceph.conf</span><span class="delimiter">'</span></span>)
cluster.connect()
block_size = <span class="integer">1024</span>*<span class="integer">1024</span>*<span class="integer">4</span>
<span class="keyword">for</span> root, dirnames, filenames <span class="keyword">in</span> os.walk(<span class="string"><span class="delimiter">'</span><span class="content">/path/to/stuff</span><span class="delimiter">'</span></span>):
<span class="keyword">for</span> filename <span class="keyword">in</span> filenames:
fullpath = os.path.join(root, filename)
key = os.path.relpath(fullpath, <span class="string"><span class="delimiter">'</span><span class="content">/path/to/stuff</span><span class="delimiter">'</span></span>)
<span class="keyword">with</span> <span class="predefined">open</span>(fullpath, <span class="string"><span class="delimiter">'</span><span class="content">r</span><span class="delimiter">'</span></span>) <span class="keyword">as</span> infile:
offset = <span class="integer">0</span>
<span class="keyword">while</span> <span class="predefined-constant">True</span>:
data = infile.read(block_size)
ioctx.aio_write(key, data, offset=offset)
<span class="keyword">if</span> <span class="predefined">len</span>(data) < block_size:
<span class="keyword">break</span>
offset += block_size
</span></code></pre>
RADOS - Bug #20545: erasure coding = crashes
https://tracker.ceph.com/issues/20545?journal_id=94782
2017-07-09T18:34:40Z
Bob Bobington
<ul></ul><p>Sorry, forgot a line of the code. Here's the exact process I'm using to do this:</p>
<p>Shell:</p>
<pre><code class="text syntaxhl"><span class="CodeRay">for x in k l g h
sudo wipefs -a /dev/sd$x
sudo ceph-disk prepare --fs-type btrfs --dmcrypt /dev/sd$x
sudo ceph-disk activate /dev/sd"$x"1
end
ceph osd erasure-code-profile set myprofile k=2 m=1 ruleset-failure-domain=osd
ceph osd pool create pool 256 256 erasure myprofile
</span></code></pre>
<p>Python</p>
<pre><code class="python syntaxhl"><span class="CodeRay"><span class="keyword">import</span> <span class="include">rados</span>
cluster = rados.Rados(conffile=<span class="string"><span class="delimiter">'</span><span class="content">/etc/ceph/ceph.conf</span><span class="delimiter">'</span></span>)
cluster.connect()
ioctx = cluster.open_ioctx(<span class="string"><span class="delimiter">'</span><span class="content">pool</span><span class="delimiter">'</span></span>)
block_size = <span class="integer">1024</span>*<span class="integer">1024</span>*<span class="integer">4</span>
<span class="keyword">for</span> root, dirnames, filenames <span class="keyword">in</span> os.walk(<span class="string"><span class="delimiter">'</span><span class="content">/path/to/stuff</span><span class="delimiter">'</span></span>):
<span class="keyword">for</span> filename <span class="keyword">in</span> filenames:
fullpath = os.path.join(root, filename)
key = os.path.relpath(fullpath, <span class="string"><span class="delimiter">'</span><span class="content">/path/to/stuff</span><span class="delimiter">'</span></span>)
<span class="keyword">with</span> <span class="predefined">open</span>(fullpath, <span class="string"><span class="delimiter">'</span><span class="content">r</span><span class="delimiter">'</span></span>) <span class="keyword">as</span> infile:
offset = <span class="integer">0</span>
<span class="keyword">while</span> <span class="predefined-constant">True</span>:
data = infile.read(block_size)
ioctx.aio_write(key, data, offset=offset)
<span class="keyword">if</span> <span class="predefined">len</span>(data) < block_size:
<span class="keyword">break</span>
offset += block_size
</span></code></pre>
RADOS - Bug #20545: erasure coding = crashes
https://tracker.ceph.com/issues/20545?journal_id=94990
2017-07-12T15:32:04Z
Josh Durgin
<ul></ul><p>From the log the backtrace is:<br /><pre>
-- Logs begin at Thu 2017-07-06 16:44:09 PDT, end at Fri 2017-07-07 00:10:13 PDT. --
Jul 07 00:03:36 hostname ceph-osd[14090]: *** Caught signal (Aborted) **
Jul 07 00:03:36 hostname ceph-osd[14090]: in thread 7fa734b32700 thread_name:tp_osd_tp
Jul 07 00:03:36 hostname ceph-osd[14090]: ceph version 12.1.0 (262617c9f16c55e863693258061c5b25dea5b086) luminous (dev)
Jul 07 00:03:36 hostname ceph-osd[14090]: 1: (()+0x95cd16) [0x55eeb36d4d16]
Jul 07 00:03:36 hostname ceph-osd[14090]: 2: (()+0x11940) [0x7fa7509ca940]
Jul 07 00:03:36 hostname ceph-osd[14090]: 3: (pthread_cond_wait()+0x1fd) [0x7fa7509c639d]
Jul 07 00:03:36 hostname ceph-osd[14090]: 4: (Throttle::_wait(long)+0x33e) [0x55eeb370f38e]
Jul 07 00:03:36 hostname ceph-osd[14090]: 5: (Throttle::get(long, long)+0x2a2) [0x55eeb3710102]
Jul 07 00:03:36 hostname ceph-osd[14090]: 6: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0xde4) [0x55eeb35dc684]
Jul 07 00:03:36 hostname ceph-osd[14090]: 7: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x68) [0x55eeb332c2c8]
Jul 07 00:03:36 hostname ceph-osd[14090]: 8: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, Context*)+0x960) [0x55eeb3455ca0]
Jul 07 00:03:36 hostname ceph-osd[14090]: 9: (ECBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x321) [0x55eeb346dd21]
Jul 07 00:03:36 hostname ceph-osd[14090]: 10: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x64a) [0x55eeb32d437a]
Jul 07 00:03:36 hostname ceph-osd[14090]: 11: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1bc) [0x55eeb317080c]
Jul 07 00:03:36 hostname ceph-osd[14090]: 12: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x55eeb3170c3a]
Jul 07 00:03:36 hostname ceph-osd[14090]: 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1ae7) [0x55eeb3194037]
Jul 07 00:03:36 hostname ceph-osd[14090]: 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x672) [0x55eeb371e3d2]
Jul 07 00:03:36 hostname ceph-osd[14090]: 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55eeb3721b70]
Jul 07 00:03:36 hostname ceph-osd[14090]: 16: (()+0x7297) [0x7fa7509c0297]
Jul 07 00:03:36 hostname ceph-osd[14090]: 17: (clone()+0x3f) [0x7fa74fe501ef]
</pre></p>
RADOS - Bug #20545: erasure coding = crashes
https://tracker.ceph.com/issues/20545?journal_id=94991
2017-07-12T15:33:39Z
Greg Farnum
gfarnum@redhat.com
<ul><li><strong>Project</strong> changed from <i>Ceph</i> to <i>RADOS</i></li><li><strong>Subject</strong> changed from <i>Bluestore + erasure coding = crashes</i> to <i>erasure coding = crashes</i></li><li><strong>Category</strong> deleted (<del><i>OSD</i></del>)</li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li></ul><p>So this looks like you're just killing the cluster by overflowing it with infinite IO. The crash is distressing, though.</p>
RADOS - Bug #20545: erasure coding = crashes
https://tracker.ceph.com/issues/20545?journal_id=95656
2017-07-20T23:47:05Z
Daniel Oliveira
doliveira@suse.com
<ul></ul><p>Trying to reproduce this issue in my lab</p>
RADOS - Bug #20545: erasure coding = crashes
https://tracker.ceph.com/issues/20545?journal_id=96350
2017-08-02T15:11:56Z
Sage Weil
sage@newdream.net
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-7 priority-highest closed" href="/issues/20295">Bug #20295</a>: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites</i> added</li></ul>
RADOS - Bug #20545: erasure coding = crashes
https://tracker.ceph.com/issues/20545?journal_id=96352
2017-08-02T15:12:20Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Duplicate</i></li></ul><p>I think this is the same as <a class="issue tracker-1 status-3 priority-7 priority-highest closed" title="Bug: bluestore: Timeout in tp_osd_tp threads when running RBD bench in EC pool w/ overwrites (Resolved)" href="https://tracker.ceph.com/issues/20295">#20295</a>, which we can now reproduce.</p>