Project

General

Profile

Activity

From 02/04/2018 to 03/05/2018

03/05/2018

09:47 PM Bug #22534 (In Progress): Debian's bluestore *rocksdb* does not support neither fast CRC nor comp...
The build args are all coming from ceph.spec.in or debian/rules, and should match up with the builds you see in shama... Sage Weil
08:15 PM Bug #22534: Debian's bluestore *rocksdb* does not support neither fast CRC nor compression
It really looks our official packages don't provide the FastCRC32 support in RocksDB. The report from verification is... Radoslaw Zarzynski
06:29 AM Bug #22534: Debian's bluestore *rocksdb* does not support neither fast CRC nor compression
Yes, downloaded from download.ceph.com Марк Коренберг
06:37 PM Bug #22977: High CPU load caused by operations on onode_map
Thank you!
I'll run this on one OSD and report back tomorrow as it takes some time for the problem to appear.
B...
Paul Emmerich
09:35 AM Bug #22977: High CPU load caused by operations on onode_map
Hi Paul,
I made a development build, based on 12.2.2.
It will dump stats from onode_map, a container we suspect t...
Adam Kupczyk
05:03 PM Backport #23226 (Resolved): luminous: bluestore_cache_data uses too much memory
https://github.com/ceph/ceph/pull/21059 Nathan Cutler
03:51 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
I think I'm having similar problem in my 3 nodes ceph cluster.
It's installed on proxmox nodes, each with 3x1TB HDD ...
Marco Baldini

03/04/2018

04:15 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Used the wrong OSDs for the previous post (these servers mostly had the "head candidate had a read error" scrub error... Paul Emmerich

03/03/2018

08:14 PM Bug #23206: ceph-osd daemon crashes - *** Caught signal (Aborted) **
This is an repeating/ongoing issue, please tell me what to help to investigate this. Anonymous
08:13 PM Bug #23206 (Rejected): ceph-osd daemon crashes - *** Caught signal (Aborted) **
Upfront: sorry for the title, I don't know better.
One of our OSDs on a machine is constantly up/down due to crashin...
Anonymous
02:55 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Tl;dr: retrying the read works.
I've also been running one server with that patch for a few days and got a log for...
Paul Emmerich

03/02/2018

04:34 AM Backport #23173 (In Progress): luminous: BlueFS reports rotational journals if BDEV_WAL is not set
Nathan Cutler
02:45 AM Bug #22957: [bluestore]bstore_kv_final thread seems deadlock
Adam Kupczyk wrote:
> Hi Zhou,
> 1) Could you next time attach with gdb and "bt" of threads bstore_kv_final and fi...
zhou yang
02:28 AM Bug #22957: [bluestore]bstore_kv_final thread seems deadlock
Sage Weil wrote:
> I'm pretty sure this is #21470, fixed in 12.2.2. Please upgrade!
Thanks a lot, I will upgrade...
zhou yang
12:22 AM Bug #21259: bluestore: segv in BlueStore::TwoQCache::_trim
And another slightly different log:... Paul Emmerich
12:13 AM Bug #21259: bluestore: segv in BlueStore::TwoQCache::_trim
I'm also seeing this on a completely unrelated 12.2.4 cluster. Only thing they have in common is: they use erasure co... Paul Emmerich

03/01/2018

05:48 PM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
I seem to be getting something like this as well. Knocked out 8 osd's in our cluster across multiple hosts. We we're ... rory shcramm
03:42 PM Bug #22534: Debian's bluestore *rocksdb* does not support neither fast CRC nor compression
Марк Коренберг wrote:
> I use official .deb packages. and 12.2.1 exactly. (maybe I tested on 1.2.2, I'm not sure)
...
Sage Weil

02/28/2018

07:37 PM Backport #23173: luminous: BlueFS reports rotational journals if BDEV_WAL is not set
https://github.com/ceph/ceph/pull/20651 Greg Farnum
11:20 AM Backport #23173 (Resolved): luminous: BlueFS reports rotational journals if BDEV_WAL is not set
https://github.com/ceph/ceph/pull/20651 Nathan Cutler
03:13 PM Bug #23141: BlueFS reports rotational journals if BDEV_WAL is not set
https://github.com/ceph/ceph/pull/20602 Greg Farnum
01:35 AM Bug #23141 (Pending Backport): BlueFS reports rotational journals if BDEV_WAL is not set
Kefu Chai
02:00 PM Bug #22616 (Pending Backport): bluestore_cache_data uses too much memory
Kefu Chai
08:56 AM Bug #23165: OSD used for Metadata / MDS storage constantly entering heartbeat timeout
After a night of waiting and some million files written and deleted, the sitation has stabilized.
The space-usage...
Oliver Freyermuth

02/27/2018

10:33 PM Bug #23165: OSD used for Metadata / MDS storage constantly entering heartbeat timeout
Ah, and this is Luminous 12.2.4, just upgraded from 12.2.3 a few hours ago. Oliver Freyermuth
10:31 PM Bug #23165: OSD used for Metadata / MDS storage constantly entering heartbeat timeout
It also seems:... Oliver Freyermuth
10:29 PM Bug #23165: OSD used for Metadata / MDS storage constantly entering heartbeat timeout
To clarify, since this was maybe not clear from the ticket text:
Those 4 OSDs were only used for the metadata pool. ...
Oliver Freyermuth
10:28 PM Bug #23165 (Resolved): OSD used for Metadata / MDS storage constantly entering heartbeat timeout
After our stress test creating 100,000,000 small files on cephfs, and now finally deleting all those files, now 2 of ... Oliver Freyermuth
08:50 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
I recently upgraded one of my test nodes, now two of my three node have 16 gb RAM with 4 OSDs (4tb hdd each), the 3rd... Martin Preuss
05:08 PM Bug #22977: High CPU load caused by operations on onode_map
Sure, thanks for looking into it :) Paul Emmerich

02/26/2018

11:22 PM Bug #23120: OSDs continously crash during recovery
Actually, looks like your crashing ops are different from mine. I'll just open a new bug. Peter Woodman
11:21 PM Bug #23120: OSDs continously crash during recovery
Yeah, I've got some of that. Problem is, I'm not seeing debug log messages that should be there based on the failure,... Peter Woodman
09:33 PM Bug #23120: OSDs continously crash during recovery
The bad news (for the ticket) is that the problem vanished after restarting all crashing OSDs often enough,
and temp...
Oliver Freyermuth
09:03 PM Bug #23120 (Need More Info): OSDs continously crash during recovery
Can you reproduce the crash on one or more OSDs with 'debug osd = 20' and 'debug bluestore = 20'?
Also, can you ch...
Sage Weil
08:56 AM Bug #23120: OSDs continously crash during recovery
@Peter Woodman: Since the system recovered after many OSD restarts (see my previous comment) and I did not think abou... Oliver Freyermuth
02:47 AM Bug #23120: OSDs continously crash during recovery
Hey, I might be seeing the same bug. Can you paste in the operation dump that shows up right before that crash, and m... Peter Woodman
09:41 PM Bug #23141 (Fix Under Review): BlueFS reports rotational journals if BDEV_WAL is not set
Greg Farnum
09:33 PM Bug #23141: BlueFS reports rotational journals if BDEV_WAL is not set
BlueFS::wal_is_rotational() returns true if (!bdev[BDEV_WAL]). Updating this to fall back through BDEV_DB and BDEV_SLOW. Greg Farnum
09:32 PM Bug #23141 (Resolved): BlueFS reports rotational journals if BDEV_WAL is not set
This came in from two different users on the mailing list (https://www.spinics.net/lists/ceph-users/msg42873.html, ht... Greg Farnum
09:00 PM Bug #22977 (Need More Info): High CPU load caused by operations on onode_map
Sage Weil
09:00 PM Bug #22977: High CPU load caused by operations on onode_map
Hi Paul,
I'm having trouble viewing the perf report files (crashes on my box; doesn't shows but shows nothing on M...
Sage Weil

02/25/2018

11:55 PM Bug #23120: OSDs continously crash during recovery
All HDD-OSDs have 4 TB, while the SSDs used for the metadata pool have 240 GB. Oliver Freyermuth
11:51 PM Bug #23120: OSDs continously crash during recovery
Here's a ceph osd tree due to popular request:... Oliver Freyermuth
07:27 PM Bug #23120: OSDs continously crash during recovery
Cluster has mostly recovered, looks good.
Still, hopefully the stacktrace and logs can help to track down the under...
Oliver Freyermuth
05:26 PM Bug #23120: OSDs continously crash during recovery
After many restarts of all OSDs, and temporarily lowering min_size, they now stay up. I'll watch and see if the clust... Oliver Freyermuth
05:09 PM Bug #23120: OSDs continously crash during recovery
Here's another log of another OSD:
7de1dddf-27d4-4b6b-9128-0138bfaf85cf
backtrace looks similar.
Oliver Freyermuth
05:00 PM Bug #23120: OSDs continously crash during recovery
It might be that this OSD was subject to OOM at some point in the last 24 hours.
It seems OSDs are using 2-3 times ...
Oliver Freyermuth
04:23 PM Bug #23120 (Can't reproduce): OSDs continously crash during recovery
I have several OSDs continuously crashing during recovery. This is Luminous 12.2.3. ... Oliver Freyermuth

02/23/2018

05:16 AM Backport #23074 (In Progress): luminous: bluestore: statfs available can go negative
https://github.com/ceph/ceph/pull/20554 Prashant D

02/22/2018

11:15 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Hi Martin,
I am not sure yet what causes problem with 0x6706be76 crc.
To pinpoint, I added debug code to close i...
Adam Kupczyk

02/21/2018

09:25 PM Backport #23074 (Resolved): luminous: bluestore: statfs available can go negative
https://github.com/ceph/ceph/pull/20554 Nathan Cutler
04:34 PM Bug #23040 (Pending Backport): bluestore: statfs available can go negative
Sage Weil
11:09 AM Backport #23063 (Resolved): luminous: osd: BlueStore.cc: BlueStore::_balance_bluefs_freespace: as...
https://github.com/ceph/ceph/pull/21394 Nathan Cutler

02/20/2018

06:44 PM Bug #22534: Debian's bluestore *rocksdb* does not support neither fast CRC nor compression
I use official .deb packages. and 12.2.1 exactly. (maybe I tested on 1.2.2, I'm not sure) Марк Коренберг
02:33 PM Bug #22534: Debian's bluestore *rocksdb* does not support neither fast CRC nor compression
Yeah, I agree. Is there a build log from the debian build farm?
The packages we build upstream *do* appear to h...
Sage Weil
03:34 PM Bug #21781 (Can't reproduce): bluestore: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixNoCsu...
Sage Weil
03:33 PM Feature #21741 (In Progress): os/bluestore: multi-tier support in BlueStore
Sage Weil
03:32 PM Bug #22044 (Can't reproduce): rocksdb log replay - corruption: missing start of fragmented record
please let us know and we can reopen if this is still an issue with the latest code. Sage Weil
03:32 PM Bug #21550 (Can't reproduce): PG errors reappearing after OSD node rebooted on Luminous
Sage Weil
03:01 PM Bug #21550: PG errors reappearing after OSD node rebooted on Luminous
Hi Sage,
No, I have not seen the problem on this test cluster since rebuilding it with 12.2.2, and the system has ...
Eric Eastman
03:26 PM Bug #22510 (Pending Backport): osd: BlueStore.cc: BlueStore::_balance_bluefs_freespace: assert(0 ...
https://github.com/ceph/ceph/pull/18494 is the fix in master; should be backported to luminous Sage Weil
03:24 PM Bug #22796: bluestore gets to ENOSPC with small devices
The full checks rely on a (slow) feedback loop. For small devices, it's easy to go faster than the "set the full fla... Sage Weil
03:21 PM Bug #21809 (Can't reproduce): Raw Used space is 70x higher than actually used space (maybe orphan...
Sage Weil
03:12 PM Bug #22957: [bluestore]bstore_kv_final thread seems deadlock
Hi Zhou,
1) Could you next time attach with gdb and "bt" of threads bstore_kv_final and finisher.
2) Are you worki...
Adam Kupczyk
02:18 PM Bug #22957 (Duplicate): [bluestore]bstore_kv_final thread seems deadlock
I'm pretty sure this is #21470, fixed in 12.2.2. Please upgrade! Sage Weil
02:35 PM Bug #22616 (Fix Under Review): bluestore_cache_data uses too much memory
https://github.com/ceph/ceph/pull/20498 Sage Weil
02:29 PM Feature #20801 (Rejected): ability to rebuild BlueStore WAL journals is missing
The wal or journal is an integral part of the store. The data store cannot be reconstructed without it. Sage Weil
02:28 PM Bug #21068 (Won't Fix): ceph-disk deploy bluestore fails to create correct block symlink for mult...
focusing on ceph-volume instead of ceph-disk for bluestore support. Sage Weil
02:26 PM Bug #22245 (Can't reproduce): [segfault] ceph-bluestore-tool bluefs-log-dump
Sage Weil
02:26 PM Bug #22609 (Can't reproduce): thrash-eio + bluestore fails with "reached maximum tries (3650) aft...
Haven't seen this failure in quite a while.. I think it may be resolved! reopen if it reappears Sage Weil
02:24 PM Feature #21306 (Rejected): Reduce RBD filestore/bluestore fragmentation throught fallocate
We already have a function like this for filestore: librbd does a hint and filestore calls the xfs ioctl to set the d... Sage Weil
02:21 PM Feature #22159 (In Progress): allow tracking of bluestore compression ratio by pool
see https://github.com/ceph/ceph/pull/19454 Sage Weil
02:20 PM Bug #21040 (Resolved): bluestore: multiple objects (clones?) referencing same blocks (on all repl...
The original bug here is fixed. Meanwhile, Igor is working on a repair function for ceph-bluestore-tool that will co... Sage Weil
02:16 PM Bug #20870 (In Progress): OSD compression: incorrect display of the used disk space
The problem is that currently the RAW USED stats is just USED * (replications or ec factor).
Igor is working on ...
Sage Weil
01:06 PM Bug #20385 (Won't Fix): jemalloc+Bluestore+BlueFS causes unexpected RSS Memory usage growth
we don't care about jemalloc at this point Sage Weil

02/19/2018

11:22 PM Bug #20997 (Can't reproduce): bluestore_types.h: 739: FAILED assert(p != extents.end())
Marek, if you still see this (or a related issue) please ping me. I dropped the ball on this bug but we're newly foc... Sage Weil
05:49 PM Bug #21550 (Need More Info): PG errors reappearing after OSD node rebooted on Luminous
Hi Eric,
Do you still see this problem? I haven't seen anything like it so I'm hoping this is an artifact of 12.2.0
Sage Weil
05:47 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Sage Weil
05:47 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Martin Preuss wrote:
> I would like to try with filestore, but since the introduction of this ceph-volume stuff crea...
Sage Weil
04:24 PM Bug #22796: bluestore gets to ENOSPC with small devices
Yes The 22GB is correct, the 16PB is not. I created a quick set of SSD OSDs to test new crush rules from what the OSD... David Turner
04:10 PM Bug #22796 (Need More Info): bluestore gets to ENOSPC with small devices
... Sage Weil
04:16 PM Bug #23040 (Fix Under Review): bluestore: statfs available can go negative
https://github.com/ceph/ceph/pull/20487 Sage Weil
04:14 PM Bug #23040 (Resolved): bluestore: statfs available can go negative
see https://tracker.ceph.com/issues/22796 Sage Weil
03:41 PM Bug #21259: bluestore: segv in BlueStore::TwoQCache::_trim
Sage Weil
09:53 AM Bug #21259: bluestore: segv in BlueStore::TwoQCache::_trim
There is another one issue reproduction at https://tracker.ceph.com/issues/22977 Igor Fedotov
09:52 AM Bug #22977: High CPU load caused by operations on onode_map
The last backtrace seems similar to the one from
https://tracker.ceph.com/issues/21259#change-107555
and not sur...
Igor Fedotov

02/16/2018

11:58 PM Bug #21259: bluestore: segv in BlueStore::TwoQCache::_trim
Having this issue multiple times on ceph version 12.2.2 with this patch https://github.com/ceph/ceph/pull/18805 alrea... Petr Ilyin
05:32 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Hi Paul,
Paul Emmerich wrote:
> Martin, let's compare hardware. The only cluster I'm seeing this on is HP servers...
Martin Preuss
05:24 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Hi,
now I see an inconsistent PG for which both OSDs report that HEAD has a read error, but in this case no read e...
Martin Preuss

02/14/2018

06:20 AM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
Sage Weil wrote:
> If this is something you can reproduce that would be extermely helpful.
We can't reproduce bug...
Artemy Kapitula
06:06 AM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
Sage Weil wrote:
> Have you seen this bug occur since you filed the bug? One other user was seeing it but we've bee...
Artemy Kapitula

02/13/2018

06:16 PM Bug #22977: High CPU load caused by operations on onode_map
I've just observed an OSD crashing with the following log.
Feb 13 09:36:03 ceph-XXX-osd-a10 ceph-osd[24106]: *...
Paul Emmerich

02/12/2018

09:25 PM Bug #22977: High CPU load caused by operations on onode_map
32 OSDs, so ~650k objects/ec chunks per OSDs. Paul Emmerich
11:51 AM Bug #22977: High CPU load caused by operations on onode_map
The cluster is running with default settings except for osd_max_backfills.
Sorry, I completely forgot about the me...
Paul Emmerich
10:35 AM Bug #22977: High CPU load caused by operations on onode_map
Hi Paul.
Could you please share your Ceph settings?
Also please collect mempool statistics on a saturated OSD using...
Igor Fedotov
07:10 PM Bug #22102 (Need More Info): BlueStore crashed on rocksdb checksum mismatch
Sage Weil
07:10 PM Bug #22102: BlueStore crashed on rocksdb checksum mismatch
Have you seen this bug occur since you filed the bug? One other user was seeing it but we've been able to generate l... Sage Weil
09:25 AM Bug #22978 (Duplicate): assert in Throttle.cc on primary OSD with Bluestore and an erasure coded ...
Looks like a duplicate of http://tracker.ceph.com/issues/22539. Should be fixed in v12.2.3 Igor Fedotov
02:58 AM Bug #21312: occaionsal ObjectStore/StoreTestSpecificAUSize.Many4KWritesTest/2 failure
... Kefu Chai

02/11/2018

05:27 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Martin, let's compare hardware. The only cluster I'm seeing this on is HP servers with smartarray controllers and lot... Paul Emmerich

02/10/2018

10:17 PM Bug #22977: High CPU load caused by operations on onode_map
Here's a "perf dump" from an OSD suffering from this.
Potentially relevant onode data that looks similar on all OS...
Paul Emmerich
12:04 AM Bug #22978 (Duplicate): assert in Throttle.cc on primary OSD with Bluestore and an erasure coded ...
An assert is being hit in Throttle.cc with BlueStore, when a large object is being written on the primary OSD in an e... Subhachandra Chandra

02/09/2018

09:14 PM Bug #22977 (Resolved): High CPU load caused by operations on onode_map
I'm investigating performance on a cluster that shows an unusually high CPU load.
Setup are Bluestore OSDs running m...
Paul Emmerich

02/08/2018

03:50 PM Bug #22616: bluestore_cache_data uses too much memory
Ok, I think the thing to do here is make the bluestore trimming a bit more frequent, and have this as a known caveat ... Sage Weil
07:52 AM Bug #22957 (Duplicate): [bluestore]bstore_kv_final thread seems deadlock
ceph 12.2.1
ec overwrite
cephfs performance test
_pool 2 'fs_data' erasure size 3 min_size 3 crush_rule 1 obj...
zhou yang

02/07/2018

07:38 AM Bug #22285 (Resolved): _read_bdev_label unable to decode label at offset
Nathan Cutler
07:38 AM Backport #22892 (Resolved): luminous: _read_bdev_label unable to decode label at offset
Nathan Cutler

02/06/2018

10:42 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Update: Now at least one other host starts giving me these crc errors, too...
So I have now at least two out of th...
Martin Preuss
10:19 PM Backport #22892: luminous: _read_bdev_label unable to decode label at offset
merged https://github.com/ceph/ceph/pull/20326 Yuri Weinstein

02/05/2018

08:41 PM Backport #22892 (In Progress): luminous: _read_bdev_label unable to decode label at offset
Nathan Cutler
06:26 PM Backport #22892: luminous: _read_bdev_label unable to decode label at offset
http://tracker.ceph.com/issues/22892 Abhishek Lekshmanan
 

Also available in: Atom