https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2019-01-09T13:01:45ZCeph bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271142019-01-09T13:01:45ZIgor Fedotovigor.fedotov@croit.io
<ul></ul><p>Attached OSD log seems to be broken, could you please upload it or any other valid one again?</p> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271262019-01-09T14:19:51ZGreg Smith
<ul><li><strong>File</strong> <a href="/attachments/download/3887/ceph-osd.06.log.gz">ceph-osd.06.log.gz</a> added</li></ul> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271272019-01-09T14:38:01ZIgor Fedotovigor.fedotov@croit.io
<ul></ul><p>From mon log I can see osd.6 connection termination report at 06:42:34:<br />2019-01-09 06:42:34.122505 mon.mon01 mon.0 192.168.31.21:6789/0 474393 : cluster [INF] osd.6 failed (root=default,host=osd1202) (connection refused reported by osd.2)</p>
<p>and boot at 06:43:06:<br />2019-01-09 06:43:06.284262 mon.mon01 mon.0 192.168.31.21:6789/0 474641 : cluster [INF] osd.6 192.168.31.202:6812/3044608 boot</p>
<p>The latest OSD log starts at 06:41:53 and ends at 06:42:28 hence not including the above failure period:<br />2019-01-09 06:41:53.009 7fb22a897700 1 -- 192.168.31.202:6813/3038018 <== osd.0 192.168.31.201:6805/3028713 16643 ==== MOSDECSubOpWrite(1.79s1 2348/2338 ECSubWrite(tid=31034, reqid=client.21117692.0:3534428, at_version=2348'29537, trim_to=2152'26500, roll_forward_to=2348'29536)) v2 ==== 2099044+0+0 (2979773302 0 0) 0x561f1f680000 con 0x561f1790b500<br />...<br />2019-01-09 06:42:28.935 7fb20c621700 20 osd.6 pg_epoch: 2357 pg[1.1c6s2( v 2357'29190 (2152'26100,2357'29190] local-lis/les=2355/2356 n=4678 ec=78/78 lis/c 2355/2334 les/c/f 2356/2343/0 2355/2355/2332) [10,2,6,14]p10(0) r=2 lpr=2355 pi=[2334,2355)/1 luod=0'0 crt=2357'29190 active mbc={}] rollforward: entry=2357'29190 (0'0) modify 1:63d59ef3:::1000003fced.00000000:head by client.21117692.0:3534807 2019-01-09 06:41:55.797065 0</p>
<p>Would you share Do you have OSD log for that "broken state" period (06:42:34 - 06:43:06), please.</p> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271292019-01-09T14:41:06ZIgor Fedotovigor.fedotov@croit.io
<ul></ul><p>That would be great if you manage to collect "broken state" OSD log with "debug bluestore set to 20"</p> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271302019-01-09T15:16:09ZGreg Smith
<ul><li><strong>File</strong> <a href="/attachments/download/3888/ceph-osd.06.log.bz2">ceph-osd.06.log.bz2</a> added</li></ul> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271312019-01-09T15:32:23ZIgor Fedotovigor.fedotov@croit.io
<ul></ul><p>Great, thanks! Do you remember which compression method was configured in this specific case?</p> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271332019-01-09T15:34:22ZGreg Smith
<ul></ul><p>aggressive on the pool alone: ceph osd pool set cephfs_data compression_mode aggressive</p> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271342019-01-09T15:36:33ZIgor Fedotovigor.fedotov@croit.io
<ul></ul><p>Sorry, I meant what compression algorithm?</p> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271362019-01-09T15:51:10ZIgor Fedotovigor.fedotov@croit.io
<ul></ul><p>I bet it's lz4 not snappy.<br />Could you please switch everything to snappy and check if issue is still present?</p> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271472019-01-09T21:00:10ZIgor Fedotovigor.fedotov@croit.io
<ul><li><strong>Assignee</strong> set to <i>Igor Fedotov</i></li></ul> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271482019-01-09T21:00:43ZIgor Fedotovigor.fedotov@croit.io
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>12</i></li></ul> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271752019-01-10T06:03:53ZGreg Smith
<ul></ul><p>In this specific case it's lz4, but it also happened in snappy.</p> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271812019-01-10T08:52:22ZIgor Fedotovigor.fedotov@croit.io
<ul></ul><p>OSD.6 asserts on non-zero result returned from the compress method:</p>
<p>2019-01-09 06:42:33.599 7fb20e625700 -1 /build/ceph-13.2.2/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_alloc_write(BlueStore::TransContext*, BlueStore::CollectionRef, BlueStore::OnodeRef, BlueStore::WriteContext*)' thread 7fb20e625700 time 2019-01-09 06:42:33.598421<br />/build/ceph-13.2.2/src/os/bluestore/BlueStore.cc: 10559: FAILED assert(r == 0)</p>
<p>Snappy compressor plugin (compress() method from SnappyCompressor.h) never returns non-zero error code while LZ4 (LZCompressor.h) does.<br />Hence Snappy can't trigger the above assertion.<br />So either you observe two different issues or the above comment about broken cases for Snappy is incorrect.</p>
<p>Could you please double check Snappy compressor case (perhaps by disabling compression for all OSDs but a specific one where snappy to be enabled) and collect corresponding logs for failures at such an OSD if any.</p>
<p>Meanwhile I'm going to fix compress call error code processing that triggers the first assertion.</p> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271822019-01-10T09:07:22ZGreg Smith
<ul></ul><p>Hi Igor, <br />I meant that we had the same issue with snappy.<br />We'll try to reproduce it and send logs.<br />Thank you.</p> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271872019-01-10T10:37:14ZIgor Fedotovigor.fedotov@croit.io
<ul><li><strong>Pull request ID</strong> set to <i>25891</i></li></ul> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271882019-01-10T11:15:27ZGreg Smith
<ul></ul><p>Hi Igor,<br />A question: What is the best implantation of the compression, or what is your recommendation When the algorithm is set to snappy (as you recommended earlier)?<br />ceph osd pool set cephfs_data compression_mode force/aggressive<br />together with<br />bluestore_compression_mode = force/aggressive<br />or only one of them (which one if so)?</p> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1271912019-01-10T12:24:56ZIgor Fedotovigor.fedotov@croit.io
<ul></ul><p>I suggest to alter specific OSD compression settings only, not pool ones. I.e. bluestore_compression_mode + bluestore_compression_algorithm parameters.</p> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1277232019-01-18T14:42:57ZKefu Chaitchaikov@gmail.com
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>Pending Backport</i></li><li><strong>Backport</strong> set to <i>mimic, luminous</i></li></ul> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1278152019-01-21T09:14:54ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/37990">Backport #37990</a>: mimic: Compression not working, and when applied OSD disks are failing randomly</i> added</li></ul> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1278172019-01-21T09:15:01ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/37991">Backport #37991</a>: luminous: Compression not working, and when applied OSD disks are failing randomly</i> added</li></ul> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1279022019-01-22T12:38:17ZNathan Cutlerncutler@suse.cz
<ul></ul><p>@Igor - the command for creating backport issues is src/script/backport-create-issue (see comments at top of script to get it working in your environment)</p>
<p>It takes a single argument: the number of the master issue. The master issue must already be in "Pending Backport" status.</p> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1279092019-01-22T13:52:51ZIgor Fedotovigor.fedotov@croit.io
<ul><li><strong>Private</strong> changed from <i>No</i> to <i>Yes</i></li></ul><p>@Nathan - good to know, thanks.</p> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1279102019-01-22T13:53:01ZIgor Fedotovigor.fedotov@croit.io
<ul><li><strong>Private</strong> changed from <i>Yes</i> to <i>No</i></li></ul> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1294912019-02-15T22:38:22ZYuri Weinsteinyweinste@redhat.com
<ul></ul><p>merged <a class="external" href="https://github.com/ceph/ceph/pull/26342">https://github.com/ceph/ceph/pull/26342</a><br /><a class="external" href="https://github.com/ceph/ceph/pull/26544">https://github.com/ceph/ceph/pull/26544</a></p> bluestore - Bug #37839: Compression not working, and when applied OSD disks are failing randomlyhttps://tracker.ceph.com/issues/37839?journal_id=1338112019-04-08T19:23:50ZIgor Fedotovigor.fedotov@croit.io
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul>