Project

General

Profile

Activity

From 01/10/2018 to 02/08/2018

02/08/2018

03:50 PM Bug #22616: bluestore_cache_data uses too much memory
Ok, I think the thing to do here is make the bluestore trimming a bit more frequent, and have this as a known caveat ... Sage Weil
07:52 AM Bug #22957 (Duplicate): [bluestore]bstore_kv_final thread seems deadlock
ceph 12.2.1
ec overwrite
cephfs performance test
_pool 2 'fs_data' erasure size 3 min_size 3 crush_rule 1 obj...
zhou yang

02/07/2018

07:38 AM Bug #22285 (Resolved): _read_bdev_label unable to decode label at offset
Nathan Cutler
07:38 AM Backport #22892 (Resolved): luminous: _read_bdev_label unable to decode label at offset
Nathan Cutler

02/06/2018

10:42 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Update: Now at least one other host starts giving me these crc errors, too...
So I have now at least two out of th...
Martin Preuss
10:19 PM Backport #22892: luminous: _read_bdev_label unable to decode label at offset
merged https://github.com/ceph/ceph/pull/20326 Yuri Weinstein

02/05/2018

08:41 PM Backport #22892 (In Progress): luminous: _read_bdev_label unable to decode label at offset
Nathan Cutler
06:26 PM Backport #22892: luminous: _read_bdev_label unable to decode label at offset
http://tracker.ceph.com/issues/22892 Abhishek Lekshmanan

02/03/2018

07:24 AM Bug #22535 (Resolved): OSD crushes with FAILED assert(used_blocks.size() > count) during the firs...
Nathan Cutler
07:24 AM Backport #22633 (Resolved): luminous: OSD crushes with FAILED assert(used_blocks.size() > count) ...
Nathan Cutler
07:14 AM Backport #22698 (Resolved): luminous: bluestore: New OSD - Caught signal - bstore_kv_sync
Nathan Cutler
01:08 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
I just re-created all 3 OSDs on ceph1 (the host which had the read errors).
Now the errors occur less often, but t...
Martin Preuss

02/02/2018

10:37 PM Backport #22633: luminous: OSD crushes with FAILED assert(used_blocks.size() > count) during the ...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/19888
merged
Yuri Weinstein
10:32 PM Backport #22698: luminous: bluestore: New OSD - Caught signal - bstore_kv_sync
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/19995
merged
Yuri Weinstein
01:29 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
It doesn't seem to happen on all servers, it's only 5 out of 15.
But there is nothing special about the affected ser...
Paul Emmerich
03:08 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
I'm also seeing this on one cluster. Bluestore and CephFS, replicated pools, no compression, HDDs.
It happens random...
Paul Emmerich
06:03 AM Bug #21312: occaionsal ObjectStore/StoreTestSpecificAUSize.Many4KWritesTest/2 failure
Nathan Cutler

02/01/2018

11:44 PM Backport #22892 (Resolved): luminous: _read_bdev_label unable to decode label at offset
https://github.com/ceph/ceph/pull/20326 Nathan Cutler
11:31 PM Bug #22161 (Resolved): bluestore: do not crash on over-large objects
Nathan Cutler
11:30 PM Backport #22507 (Resolved): luminous: bluestore: do not crash on over-large objects
Nathan Cutler
11:07 PM Backport #22507: luminous: bluestore: do not crash on over-large objects
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/19630
merged
Yuri Weinstein
04:11 PM Bug #22285: _read_bdev_label unable to decode label at offset
master pr: https://github.com/ceph/ceph/pull/20090 Abhishek Lekshmanan
02:51 PM Bug #22285 (Pending Backport): _read_bdev_label unable to decode label at offset
Sage Weil
12:01 PM Bug #21312: occaionsal ObjectStore/StoreTestSpecificAUSize.Many4KWritesTest/2 failure
https://github.com/ceph/ceph/pull/20230 Igor Fedotov
07:54 AM Bug #20557: segmentation fault with rocksdb|BlueStore and jemalloc
Hi, just wanted to report I'm hitting the same issue on centos 7 with jemalloc-3.6.0-1.el7 and ceph 12.2.2 Nikola Ciprich

01/31/2018

08:23 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
I have the same problem on my cluster. Periodically I got pg inconsistent only on bluestore osd with this type of mes... Nicolas Drufin

01/29/2018

10:43 AM Bug #21312: occaionsal ObjectStore/StoreTestSpecificAUSize.Many4KWritesTest/2 failure
actual_allocated_size - expected_allocated_size = 4259840 - 4194304 = 0x10000 Kefu Chai
04:25 AM Bug #21312: occaionsal ObjectStore/StoreTestSpecificAUSize.Many4KWritesTest/2 failure
http://pulpito.ceph.com/yuriw-2018-01-26_18:13:44-rados-wip_yuri_master_1.26.18-distro-basic-smithi/2112995/... Kefu Chai
02:27 AM Bug #22796: bluestore gets to ENOSPC with small devices
David Turner wrote:
> I was able to resolve this issue by using the ceph-objectstore-tool to remove copies of PGs so...
Brad Hubbard

01/28/2018

03:54 PM Bug #22796: bluestore gets to ENOSPC with small devices
I was able to resolve this issue by using the ceph-objectstore-tool to remove copies of PGs so the osds could start. ... David Turner

01/27/2018

05:55 PM Bug #22102 (In Progress): BlueStore crashed on rocksdb checksum mismatch
full logs at 5e38cf1e-532a-4aa4-8289-5b9e9c59632a Sage Weil

01/26/2018

01:27 PM Bug #22796: bluestore gets to ENOSPC with small devices
This might be a red herring. I think Nick Fisk on the ML found the problem. Originally the output of `ceph osd df` s... David Turner
01:24 PM Bug #22796: bluestore gets to ENOSPC with small devices
debug bluestore = 20 log for the same OSD as before.
ceph-post-file: 06b467b7-4a91-4263-85e0-c89268b694e3
David Turner
01:16 PM Bug #22796: bluestore gets to ENOSPC with small devices
Please use ceph-post-file to upload the full logs. Greg Farnum

01/25/2018

02:35 PM Bug #20557: segmentation fault with rocksdb|BlueStore and jemalloc
The arch is x68_64. Ceph was installed from eu.ceph.com deb repo. This issue isn't current for me anymore as the clus... Mikko Tanner
02:24 PM Bug #20557: segmentation fault with rocksdb|BlueStore and jemalloc
Hi Mikko,
What architecture are you running on?
I tried to match your callstacks with binaries for x86_64 for "ceph...
Adam Kupczyk
02:10 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Martin,
For "device location [0x6d76b40000~1000]" it would be:
dd bs=4096 if=/var/lib/ceph/osd/ceph-1/block skip=...
Adam Kupczyk
11:11 AM Bug #22796: bluestore gets to ENOSPC with small devices
David Turner wrote:
> Here's a log with `debug bluestore = 5`.
David Turner
11:10 AM Bug #22796: bluestore gets to ENOSPC with small devices
Here's a log with `debug bluestore 5`. David Turner
11:00 AM Bug #22796: bluestore gets to ENOSPC with small devices
Can you attach logs with lower debug level? E.g. debug bluestore = 5 Igor Fedotov
10:51 AM Bug #22796 (Resolved): bluestore gets to ENOSPC with small devices
I have a 3 node cluster with mon, mds, mgr, and osds all running on each. The steps I've recently performed on my cl... David Turner

01/23/2018

10:24 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Hi,
how do I translate the given location, e.g. to a "dd" argument?
Meanwhile I found out that only the first m...
Martin Preuss
12:23 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Martin, your logs show places where data is located, for example: "device location [0x6d76b40000~1000]".
Is it possi...
Adam Kupczyk
04:05 PM Bug #22285: _read_bdev_label unable to decode label at offset
Alfredo Deza
10:21 AM Backport #22698: luminous: bluestore: New OSD - Caught signal - bstore_kv_sync
@Prashant Please fix the cherry-pick conflict resolution as suggested by Igor in the PR. Nathan Cutler
12:16 AM Bug #22427 (Resolved): osd_fsid does not exist, fsid is generated instead
Sage Weil

01/22/2018

08:17 PM Bug #22427 (Fix Under Review): osd_fsid does not exist, fsid is generated instead
PR at https://github.com/ceph/ceph/pull/20059 Alfredo Deza
03:51 PM Bug #22427 (Triaged): osd_fsid does not exist, fsid is generated instead
Sage Weil
03:53 PM Bug #22510: osd: BlueStore.cc: BlueStore::_balance_bluefs_freespace: assert(0 == "allocate failed...
Sage Weil
03:51 PM Bug #22245 (Need More Info): [segfault] ceph-bluestore-tool bluefs-log-dump
can you still reproduce this? do you have (or can you generate) a core file? THe log doesn't tell us where it faile... Sage Weil
03:45 PM Bug #22115 (Duplicate): OSD SIGABRT on bluestore_prefer_deferred_size = 104857600: assert(_buffer...
see #21932 Sage Weil
03:43 PM Bug #22543 (Can't reproduce): OSDs can not start after shutdown, killed by OOM killer during PGs ...
Sage Weil
03:40 PM Bug #22066 (Duplicate): bluestore osd asserts repeatedly with ceph-12.2.1/src/include/buffer.h: 8...
see #21932, pending backport, should be in 12.2.3 Sage Weil
03:34 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Martin, can you check your dmesg/kernel log and see if there are any media errors? The crc value is for a single blo... Sage Weil
03:16 PM Backport #22264 (Resolved): luminous: bluestore: db.slow used when db is not full
Igor Fedotov
03:02 PM Backport #22264: luminous: bluestore: db.slow used when db is not full
luminous cherry-pick is merged. Sage Weil
06:00 AM Bug #22616: bluestore_cache_data uses too much memory
I did some test with bluestore_default_buffered_read = false
The bluestore_cache_data now only use around a fe...
frank lin

01/19/2018

07:12 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Just an update:
"ceph pg repair x.yz"
changes the ceph status from HEALTH_ERR to HEALTH_OK (I have to do that e...
Martin Preuss
03:28 PM Bug #22616: bluestore_cache_data uses too much memory
Sage Weil wrote:
> Two things to try:
>
> - bluestore_default_buffered_read = false should make the problem go aw...
frank lin
03:03 PM Bug #22616: bluestore_cache_data uses too much memory
Two things to try:
- bluestore_default_buffered_read = false should make the problem go away, but is more of a wor...
Sage Weil
03:12 PM Bug #22534: Debian's bluestore *rocksdb* does not support neither fast CRC nor compression
1. Characteristics of build machine MUST NOT affect builds. i.e. we should strictly override ./configure options whic... Марк Коренберг
03:07 PM Bug #22534 (Need More Info): Debian's bluestore *rocksdb* does not support neither fast CRC nor c...
My guess is that the build machine or VM that debian used for the package was old and didn't have sse instructions? Sage Weil
01:04 AM Bug #22678: block checksum mismatch from rocksdb
I thought part of the issue might be the old firmware on the 3 x LSI SAS9201-8i controller cards. So I upgraded them ... Mike O'Connor

01/18/2018

11:03 PM Bug #22467 (Can't reproduce): osd boot has stuck for 10min because of clear_temp_object
25,000 is way too many PGs for one osd. I suspect the problem is that the cache for leveldb or rocksdb is way to sma... Sage Weil
11:01 PM Bug #21556 (Can't reproduce): luminous bluestore OSDs do not recover after out of memory
closing this, please reopen if you have more info! Sage Weil
03:26 PM Bug #22061 (Resolved): Bluestore: OSD killed due to high RAM usage
fixed in 12.2.2 Sage Weil
03:26 PM Bug #22540 (Won't Fix): bluestore crush when deleting pool
This is teh jewel bluestore, which is experiemental and very different from the luminous version.! Sage Weil
03:24 PM Bug #22044 (Need More Info): rocksdb log replay - corruption: missing start of fragmented record
Can you share a bit about how you reproduced this?
Our test suite is doing failure injection at the block layer th...
Sage Weil
07:07 AM Bug #22115 (Need More Info): OSD SIGABRT on bluestore_prefer_deferred_size = 104857600: assert(_b...
Shinobu Kinjo
03:14 AM Backport #22698 (In Progress): luminous: bluestore: New OSD - Caught signal - bstore_kv_sync
https://github.com/ceph/ceph/pull/19995 Prashant D

01/17/2018

08:16 PM Bug #22678: block checksum mismatch from rocksdb
I'm currently backing up all the data on both CephFS and RBD so that if needed I can wipe the configuration and start... Mike O'Connor
08:13 PM Bug #22678: block checksum mismatch from rocksdb
I'm able to create a crash easily by just copying files in to CephFS, but I was able to cause the crash with just RBD... Mike O'Connor
02:51 PM Bug #22678 (Duplicate): block checksum mismatch from rocksdb
Sage Weil
02:51 PM Bug #22678: block checksum mismatch from rocksdb
oh, see #22102.
was the workload cephfs?
Sage Weil
02:50 PM Bug #22678 (Need More Info): block checksum mismatch from rocksdb
Could it be that the device has an actual media error? Can you check dmesg for errors?
Sage Weil

01/16/2018

08:17 AM Backport #22698 (Resolved): luminous: bluestore: New OSD - Caught signal - bstore_kv_sync
https://github.com/ceph/ceph/pull/19995 Nathan Cutler
07:30 AM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
Sage Weil wrote:
> Have you seen any other instances of this? this is the first time i've heard of this particular ...
Mike O'Connor

01/15/2018

07:21 AM Bug #22678 (Duplicate): block checksum mismatch from rocksdb
Hi
There seems to be a crash bug in the Luminous OSD code which causes OSDs to crash....
Mike O'Connor
02:50 AM Bug #20236: bluestore: ObjectStore/StoreTestSpecificAUSize.Many4KWritesNoCSumTest/2 failure
/a//kchai-2018-01-11_06:11:31-rados-wip-kefu-testing-2018-01-11-1036-distro-basic-mira/2058374/teuthology.log
<pre...
Kefu Chai

01/11/2018

07:35 AM Bug #22616: bluestore_cache_data uses too much memory
One more fact of my test to add.
I have 48 osd for the test and there were only a few of the osd's bluestore_cache_d...
frank lin
12:52 AM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
I seem to be getting something like this also, mostly happens when the sytem is under write load. I have created the ... Mike O'Connor

01/10/2018

09:38 PM Bug #22609: thrash-eio + bluestore fails with "reached maximum tries (3650) after waiting for 219...
Is this bluestore not handling out of space conditions well? Josh Durgin
04:02 PM Bug #22616: bluestore_cache_data uses too much memory
Sage Weil wrote:
> Writes that are in flight to disk show up under bluestore_cache_data, so even if it is not *cachi...
frank lin
02:53 PM Bug #22616 (Need More Info): bluestore_cache_data uses too much memory
Writes that are in flight to disk show up under bluestore_cache_data, so even if it is not *caching* anything you'll ... Sage Weil
03:03 AM Bug #22616: bluestore_cache_data uses too much memory

The work load of read throughput test is 6 fio server with the following parameter
[4m-seq]
description="4m-seq...
frank lin
02:24 AM Backport #22633 (In Progress): luminous: OSD crushes with FAILED assert(used_blocks.size() > coun...
https://github.com/ceph/ceph/pull/19888 Prashant D
 

Also available in: Atom