https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2019-08-12T13:34:49ZCeph CephFS - Bug #41204: CephFS pool usage 3x above expected value and sparse journal dumpshttps://tracker.ceph.com/issues/41204?journal_id=1427682019-08-12T13:34:49ZAnonymous
<ul></ul><p>referenced: <a class="external" href="https://tracker.ceph.com/issues/41026">https://tracker.ceph.com/issues/41026</a></p> CephFS - Bug #41204: CephFS pool usage 3x above expected value and sparse journal dumpshttps://tracker.ceph.com/issues/41204?journal_id=1427692019-08-12T13:35:57ZAnonymous
<ul></ul><p>I had the same issue before our MDS and Mons died. <br />Journal was producing 2 files a few TB big and the metadatapool was about 140GiB</p> CephFS - Bug #41204: CephFS pool usage 3x above expected value and sparse journal dumpshttps://tracker.ceph.com/issues/41204?journal_id=1427822019-08-12T15:42:46ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Target version</strong> set to <i>v15.0.0</i></li><li><strong>Start date</strong> deleted (<del><i>08/12/2019</i></del>)</li><li><strong>Source</strong> set to <i>Community (user)</i></li><li><strong>ceph-qa-suite</strong> deleted (<del><i>fs</i></del>)</li><li><strong>Component(FS)</strong> deleted (<del><i>MDS, libcephfs</i></del>)</li></ul><p>Each file will have an object in the default data pool (the data pool used at file system creation time) with an extended attribute (xattr) storing the backtrace information. This xattr is not erasure coded (it's replicated) and <em>may be</em> the cause of your significant pool usage if you're dealing with so many small files. The overhead of replicated xattrs does not really correspond (I think) to what you're seeing though.</p>
<p>See also: <a class="external" href="https://docs.ceph.com/docs/master/cephfs/createfs/#creating-pools">https://docs.ceph.com/docs/master/cephfs/createfs/#creating-pools</a></p>
<p>I'm sorry you're stumbling across this. I'll get back to you with what the Ceph team thinks.</p> CephFS - Bug #41204: CephFS pool usage 3x above expected value and sparse journal dumpshttps://tracker.ceph.com/issues/41204?journal_id=1428902019-08-13T20:01:07ZJanek Bevendorff
<ul></ul><p>I tried again, this time with a replicated pool and just one MDS. I think it's too early to draw definitive conclusions, but I noticed that as soon as I tried adding additional MDS ranks, the metadata pool size exploded from a couple of hundred MB to 5GB (with large fluctuation in both directions). When I reset to a single MDS, the size reduced back to 250-300MB.</p> CephFS - Bug #41204: CephFS pool usage 3x above expected value and sparse journal dumpshttps://tracker.ceph.com/issues/41204?journal_id=1448072019-08-29T09:30:09ZJanek Bevendorff
<ul></ul><p>Little status update: our data pool now uses up 186TiB while only storing 53TiB of actual data with a replication factor of 3. That's quite a significant overhead of 27TiB. The metadata pool is 404GiB, which also appears massive to me. Meanwhile, the MDS caps at around 100 ops/s, most likely as a result of the large metadata pool size (it was several thousand in the beginning).</p> CephFS - Bug #41204: CephFS pool usage 3x above expected value and sparse journal dumpshttps://tracker.ceph.com/issues/41204?journal_id=1448392019-08-29T11:38:54ZIgor Fedotovigor.fedotov@croit.io
<ul></ul><p>Janek Bevendorff wrote:</p>
<blockquote>
<p>Little status update: our data pool now uses up 186TiB while only storing 53TiB of actual data with a replication factor of 3. That's quite a significant overhead of 27TiB. The metadata pool is 404GiB, which also appears massive to me. Meanwhile, the MDS caps at around 100 ops/s, most likely as a result of the large metadata pool size (it was several thousand in the beginning).</p>
</blockquote>
<p>Given mentioned "small-sized files" I suspect wasted space is caused by bluestore allocation granularity.<br />In case of spinner drive the default size is 64K which means that each file/object takes at least 64K of space.<br />So having tons of 4K files might caused massive space waste.</p>
<p>So some questions to clarify if this is the case:<br />1) Is this bluestore?<br />2) What are main disk drives behind, SSDs or spinners?<br />3) Do you have any understanding what is size distribution for these "small-sized" files. E.g. something like that:<br />10% - less than 1K<br />20% - less than 4K<br />etc...<br />4) Can you share performance counters dumps for 2-3 osds backing cephfs.storage.data pool?</p> CephFS - Bug #41204: CephFS pool usage 3x above expected value and sparse journal dumpshttps://tracker.ceph.com/issues/41204?journal_id=1449112019-08-30T07:47:01ZJanek Bevendorff
<ul></ul><p>It's Bluestore on spinning disks. I don't really have an overview of the data distribution, it's very uneven. Perhaps a third of the total size comes from files of a few hundred MB up to a few GB. And then we have millions of smaller files, but I doubt that we have too many of 10k or below. I would assume that most are between 100k and 10M. That's all just a ballpark guess, though. I might be totally wrong about this.</p>
<p>I created two dumps for your:</p>
<p>schema.697: <a class="external" href="https://pastebin.com/FjBzLG4A">https://pastebin.com/FjBzLG4A</a><br />dump.697: <a class="external" href="https://pastebin.com/HBgX62tP">https://pastebin.com/HBgX62tP</a></p>
<p>schema.1061: <a class="external" href="https://pastebin.com/TnuT3QrK">https://pastebin.com/TnuT3QrK</a><br />dump.1061: <a class="external" href="https://pastebin.com/FaHS3NMr">https://pastebin.com/FaHS3NMr</a></p> CephFS - Bug #41204: CephFS pool usage 3x above expected value and sparse journal dumpshttps://tracker.ceph.com/issues/41204?journal_id=1561342020-01-17T22:53:51ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Target version</strong> deleted (<del><i>v15.0.0</i></del>)</li></ul>