Project

General

Profile

Activity

From 02/27/2019 to 03/28/2019

03/28/2019

03:35 PM Bug #38489: bluestore_prefer_deferred_size_hdd units are not clear
But wait! documentation is still not fixed! Марк Коренберг
02:08 PM Bug #38489 (Resolved): bluestore_prefer_deferred_size_hdd units are not clear
Neha Ojha
02:26 PM Bug #38176 (Won't Fix): Unable to recover from ENOSPC in BlueFS, WAL
We decided to not fix this. Neha Ojha
02:22 PM Bug #38559 (Fix Under Review): 50-100% iops lost due to bluefs_preextend_wal_files = false
https://github.com/ceph/ceph/pull/26909 Neha Ojha
02:20 PM Bug #38637 (Need More Info): BlueStore::ExtentMap::fault_range() assert
Neha Ojha
02:18 PM Bug #38738 (In Progress): ceph ssd osd latency increase over time, until restart
Neha Ojha
02:17 PM Feature #38816 (Need More Info): Deferred writes do not work for random writes
Neha Ojha
02:13 PM Bug #37282: rocksdb: submit_transaction_sync error: Corruption: block checksum mismatch code = 2
Keeping "needs more info" state. Radoslaw Zarzynski
11:08 AM Backport #38586 (Resolved): luminous: OSD crashes in get_str_map while creating with ceph-volume
Nathan Cutler
02:04 AM Bug #38745: spillover that doesn't make sense
256MB+2.56GB+25.6GB=~28-29GB - for default Luminous options. Konstantin Shalygin

03/27/2019

01:27 PM Bug #38745: spillover that doesn't make sense
Konstanin, okey but in documentation are default settings.
We have...
Rafal Wadolowski
01:00 PM Bug #38745: spillover that doesn't make sense
??why 30Gb are problem in my case???
Because of compaction levels: https://github.com/facebook/rocksdb/wiki/Leve...
Konstantin Shalygin
12:13 PM Bug #38745: spillover that doesn't make sense
Konstantin Shalygin wrote:
> Rafal, there is not your case! You spillover is because your db is lower than 30Gb. Ple...
Rafal Wadolowski
08:58 AM Bug #38745: spillover that doesn't make sense
Rafal, there is not your case! You spillover is because your db is lower than 30Gb. Please consult with http://lists.... Konstantin Shalygin
08:55 AM Bug #38745: spillover that doesn't make sense
The slow bytes used is the problem we've been seeing for one year.
One of server has 20GB db.wal for 8TB RAW device....
Rafal Wadolowski
12:48 AM Bug #38250: assert failure crash prevents ceph-osd from running
I'm not sure how to get the errno value. I don't see it anywhere in the logs. However SMART started complaining abo... Adam DC949

03/26/2019

04:48 PM Backport #38586: luminous: OSD crashes in get_str_map while creating with ceph-volume
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26900
merged
Yuri Weinstein
07:44 AM Bug #38363: Failure in assert when calling: ceph-volume lvm prepare --bluestore --data /dev/sdg
I think I found the problem or perhaps the bug in ceph, At least I found a way to work around it....
Trying to cre...
Rainer Krienke
02:35 AM Bug #38745: spillover that doesn't make sense
??@Sage, I observed up to 2x space utilization increase during compaction.??
This is normal behavior for first com...
Konstantin Shalygin

03/25/2019

04:20 PM Bug #38745: spillover that doesn't make sense
Curious thing is that one is unable to realize what space is occupied at slow device due to fallbacks from neither "k... Igor Fedotov
03:56 PM Bug #38745: spillover that doesn't make sense
ceph-post-file: a6ef2d24-56c0-486d-bb1e-f82080c0da9e
Sage Weil
03:27 PM Bug #38745: spillover that doesn't make sense
@Sage, I observed up to 2x space utilization increase during compaction. You can inspect l_bluefs_max_bytes_wal, l_bl... Igor Fedotov
03:15 PM Bug #38745: spillover that doesn't make sense
I tried a compaction on osd.50. Before,... Sage Weil
02:12 PM Bug #38745: spillover that doesn't make sense
Generally I suppose this is a valid state - RocksDB put next level data to slow device when it expects it wouldn't fi... Igor Fedotov
11:10 AM Bug #38745: spillover that doesn't make sense
Chris Callegari wrote:
> Also my cluster did not display the 'osd.X spilled over 123 GiB metadata from 'blah' device...
Igor Fedotov
12:17 AM Bug #38745: spillover that doesn't make sense
Also my cluster did not display the 'osd.X spilled over 123 GiB metadata from 'blah' device (20 GiB used of 31 GiB) t... Chris Callegari
12:15 AM Bug #38745: spillover that doesn't make sense
I recently upgraded from latest mimic to nautilus. My cluster displayed 'BLUEFS_SPILLOVER BlueFS spillover detected ... Chris Callegari
01:55 PM Feature #38816: Deferred writes do not work for random writes
I tried similar thing when Mark asked me. In summary, you *can* enlarge your deferred queue a bit, but you can't make... Vitaliy Filippov
11:02 AM Feature #38816: Deferred writes do not work for random writes
Mark, I'm not sure your root cause analysis is 100% valid. And to avoid any speculations I'd prefer to arrange the be... Igor Fedotov

03/23/2019

09:15 PM Backport #38915 (Resolved): nautilus: BlueFS might request more space from slow device than is ac...
https://github.com/ceph/ceph/pull/27139 Nathan Cutler
09:15 PM Backport #38914 (Rejected): luminous: BlueFS might request more space from slow device than is ac...
Nathan Cutler
09:14 PM Backport #38913 (Rejected): mimic: BlueFS might request more space from slow device than is actua...
Nathan Cutler
09:14 PM Backport #38912 (Resolved): nautilus: Bitmap allocator might fail to return contiguous chunk desp...
https://github.com/ceph/ceph/pull/27139 Nathan Cutler
09:14 PM Backport #38911 (Resolved): luminous: Bitmap allocator might fail to return contiguous chunk desp...
https://github.com/ceph/ceph/pull/27312 Nathan Cutler
09:14 PM Backport #38910 (Resolved): mimic: Bitmap allocator might fail to return contiguous chunk despite...
https://github.com/ceph/ceph/pull/27298 Nathan Cutler

03/22/2019

09:31 PM Bug #38761 (Pending Backport): Bitmap allocator might fail to return contiguous chunk despite hav...
https://github.com/ceph/ceph/pull/26939 Sage Weil
09:31 PM Bug #38760 (Pending Backport): BlueFS might request more space from slow device than is actually ...
https://github.com/ceph/ceph/pull/26939 Sage Weil

03/21/2019

12:18 AM Feature #38816: Deferred writes do not work for random writes
I want bluestore to be able to buffer(defer), say, 30 seconds of random writes in RocksDB at SSD speed. I expect back... Марк Коренберг

03/20/2019

10:55 AM Bug #38738: ceph ssd osd latency increase over time, until restart
hoan nv wrote:
> do you have temporary solutions for this issue.
>
> I tried move device class from ssd to hdd bu...
Igor Fedotov
08:48 AM Bug #38738: ceph ssd osd latency increase over time, until restart
do you have temporary solutions for this issue.
I tried move device class from ssd to hdd but no luck.
My clust...
hoan nv

03/19/2019

07:16 PM Feature #38816 (In Progress): Deferred writes do not work for random writes
Well, how to reproduce:
osd.11 is a bluestore OSD with RocksDB on SSD, and main data on HDD.
ceph osd pool cr...
Марк Коренберг
05:51 PM Bug #38795: fsck on mkfs breaks ObjectStore/StoreTestSpecificAUSize.BlobReuseOnOverwrite
The issue persist till the second cache rebalance occurs after fsck completion. So a workaround for UT might be to wa... Igor Fedotov

03/18/2019

03:20 PM Bug #38738: ceph ssd osd latency increase over time, until restart
hoan nv wrote:
> i have same issue
Version's ceph is 13.2.2
hoan nv
03:19 PM Bug #38738: ceph ssd osd latency increase over time, until restart
i have same issue
hoan nv
12:20 PM Bug #38795 (Resolved): fsck on mkfs breaks ObjectStore/StoreTestSpecificAUSize.BlobReuseOnOverwrite
if bluestore_fsck_on_mkfs is enabled the test case fails in Mimic and Luminous:
[ RUN ] ObjectStore/StoreTestSp...
Igor Fedotov

03/15/2019

03:16 PM Backport #38779 (In Progress): mimic: ceph_test_objecstore: bluefs mount fail with overlapping op...
Nathan Cutler
03:14 PM Backport #38779 (Resolved): mimic: ceph_test_objecstore: bluefs mount fail with overlapping op_al...
https://github.com/ceph/ceph/pull/26983 Nathan Cutler
03:15 PM Backport #38778 (In Progress): luminous: ceph_test_objecstore: bluefs mount fail with overlapping...
Nathan Cutler
03:14 PM Backport #38778 (Resolved): luminous: ceph_test_objecstore: bluefs mount fail with overlapping op...
https://github.com/ceph/ceph/pull/26979 Nathan Cutler
03:13 PM Bug #24598 (Pending Backport): ceph_test_objecstore: bluefs mount fail with overlapping op_alloc_add
Nathan Cutler
12:52 PM Bug #38761 (Fix Under Review): Bitmap allocator might fail to return contiguous chunk despite hav...
Igor Fedotov
11:16 AM Bug #38761 (Resolved): Bitmap allocator might fail to return contiguous chunk despite having enou...
This happens when allocator has contiguous 4GB-aligned chunks to allocate from only. Internal stuff searching for fre... Igor Fedotov
12:51 PM Bug #38760 (Fix Under Review): BlueFS might request more space from slow device than is actually ...
Igor Fedotov
11:09 AM Bug #38760 (Resolved): BlueFS might request more space from slow device than is actually needed
When expanding slow device BlueFS has two sizes - one that it actually need for the current action and one that is a ... Igor Fedotov

03/14/2019

10:08 PM Bug #38745 (In Progress): spillover that doesn't make sense
... Sage Weil
11:52 AM Bug #38738: ceph ssd osd latency increase over time, until restart
Anton,
there is a thread named "ceph osd commit latency increase over time, until
restart" at ceph-users mail li...
Igor Fedotov
10:38 AM Bug #38738 (Resolved): ceph ssd osd latency increase over time, until restart
We register disk latency for VMs on SSD pool increase over time.
The VM disk latency normally is 0.5-3 ms.
The VM ...
Anton Usanov
09:55 AM Bug #38363: Failure in assert when calling: ceph-volume lvm prepare --bluestore --data /dev/sdg
I tested more with exactly the same hardware (PowerEdge R730xd). I tried to setup ceph luminous on Ububntu 16.04 and ... Rainer Krienke

03/13/2019

09:16 AM Support #38707 (Closed): Ceph OSD Down & Out - can't bring back up - Caught signal (Segmentation ...
I noticed that in my 3-node, 12-osd cluster (3 OSD per Node), one node has all 3 of its OSDs marked "Down" and "Out".... Liam Retrams

03/12/2019

03:39 PM Bug #38559: 50-100% iops lost due to bluefs_preextend_wal_files = false
Yes, I've thought of that but I haven't tested it... However this is rather strange then. Who does the fsync if BlueF... Vitaliy Filippov
03:00 PM Bug #38559: 50-100% iops lost due to bluefs_preextend_wal_files = false
Sage Weil
02:59 PM Bug #38559: 50-100% iops lost due to bluefs_preextend_wal_files = false
This goes away after you write more metadta into rocksdb and it starts overwriting previous wal files. The purpose o... Sage Weil
12:13 PM Bug #38574 (Resolved): mimic: Unable to recover from ENOSPC in BlueFS
Nathan Cutler
02:31 AM Backport #38586 (In Progress): luminous: OSD crashes in get_str_map while creating with ceph-volume
https://github.com/ceph/ceph/pull/26900 Prashant D

03/11/2019

08:18 PM Bug #38272 (Fix Under Review): "no available blob id" assertion might occur
Igor Fedotov
07:54 PM Bug #38395 (Resolved): luminous: write following remove might access previous onode
Igor Fedotov
07:41 PM Bug #38395: luminous: write following remove might access previous onode
https://github.com/ceph/ceph/pull/26540 merged Yuri Weinstein
04:57 PM Backport #38663 (Resolved): luminous: mimic: Unable to recover from ENOSPC in BlueFS
https://github.com/ceph/ceph/pull/26866 Neha Ojha
01:45 PM Backport #38663 (In Progress): luminous: mimic: Unable to recover from ENOSPC in BlueFS
Nathan Cutler
01:41 PM Backport #38663 (Resolved): luminous: mimic: Unable to recover from ENOSPC in BlueFS
https://github.com/ceph/ceph/pull/26866 Nathan Cutler

03/08/2019

08:41 PM Bug #38574 (Pending Backport): mimic: Unable to recover from ENOSPC in BlueFS
Neha Ojha
08:39 PM Bug #38574: mimic: Unable to recover from ENOSPC in BlueFS
https://github.com/ceph/ceph/pull/26735 merged Yuri Weinstein
01:32 PM Bug #38329: OSD crashes in get_str_map while creating with ceph-volume
FYI and FWIW, Boris Ranto put 14.0.1 into F30/rawhide. It's sort of Standard Operating Procedure (SOP) to put early r... Kaleb KEITHLEY
10:34 AM Bug #38637: BlueStore::ExtentMap::fault_range() assert
Can you make sure the underlying device is OK as a first step? This error might indicate corruption. It may be also b... Brad Hubbard
09:17 AM Bug #38637 (Won't Fix): BlueStore::ExtentMap::fault_range() assert
Hi,
I have rook with ceph ceph-12.2.4
3 Mon, 5 OSD.
For a last few hours one of my OSD is in crashing loop.
<...
Karol Chrapek

03/07/2019

01:49 PM Bug #38557 (Closed): pkg dependency issues upgrading from 12.2.y to 14.x.y
Nathan Cutler
07:14 AM Backport #38587 (In Progress): mimic: OSD crashes in get_str_map while creating with ceph-volume
Ashish Singh

03/06/2019

09:55 PM Bug #38489: bluestore_prefer_deferred_size_hdd units are not clear
So that's why write_big operations may be also deferred just like write_small's. OK, thank you very much, it's clear now Vitaliy Filippov
08:29 PM Bug #38489: bluestore_prefer_deferred_size_hdd units are not clear
It's not deferring because at the layer that deferring happens, we're talking about blobs (not writes), and the blogs... Sage Weil
04:08 PM Bug #38489: bluestore_prefer_deferred_size_hdd units are not clear
Forgot to mention, this was Ceph 14.1.0 Vitaliy Filippov
09:25 AM Bug #38489: bluestore_prefer_deferred_size_hdd units are not clear
I've just tried to set
[osd]
bluestore_prefer_deferred_size_hdd = 4194304
On a test HDD plugged in my laptop. ...
Vitaliy Filippov
06:54 PM Bug #38557: pkg dependency issues upgrading from 12.2.y to 14.x.y
accidentally opened against bluestore. You may close it.
See https://tracker.ceph.com/issues/38612 instead.
Kaleb KEITHLEY

03/05/2019

05:45 PM Backport #38587 (Resolved): mimic: OSD crashes in get_str_map while creating with ceph-volume
https://github.com/ceph/ceph/pull/26810 Nathan Cutler
05:45 PM Backport #38586 (Resolved): luminous: OSD crashes in get_str_map while creating with ceph-volume
https://github.com/ceph/ceph/pull/26900 Nathan Cutler
03:18 PM Bug #38489: bluestore_prefer_deferred_size_hdd units are not clear
I've just verified deferred writes behavior for 4M writes using objectstore FIO plugin.
Indeed bluestore splits writ...
Igor Fedotov
08:10 AM Bug #38489: bluestore_prefer_deferred_size_hdd units are not clear
Sage Weil wrote:
> > all writes of size 4MB with bluestore_prefer_deferred_size_hdd < 524288 go HDD directly. >= 524...
Марк Коренберг
11:46 AM Bug #38363: Failure in assert when calling: ceph-volume lvm prepare --bluestore --data /dev/sdg
I finally found the extended debug log in /var/log/ceph/ceph-osd.0.log. I attached the log output file (44k) to this ... Rainer Krienke

03/04/2019

11:23 PM Bug #38489: bluestore_prefer_deferred_size_hdd units are not clear
> all writes of size 4MB with bluestore_prefer_deferred_size_hdd < 524288 go HDD directly. >= 524288 through SSD (I m... Sage Weil
05:55 PM Bug #38574 (Resolved): mimic: Unable to recover from ENOSPC in BlueFS
This the same issue as https://tracker.ceph.com/issues/36268.
We have alternate fix for mimic, which will be backpor...
Neha Ojha
03:36 PM Bug #38329 (Pending Backport): OSD crashes in get_str_map while creating with ceph-volume
Sage Weil
03:23 PM Bug #36268 (Resolved): Unable to recover from ENOSPC in BlueFS
Alternative fix for mimic and luminous: https://github.com/ceph/ceph/pull/26735 Sage Weil

03/03/2019

08:07 PM Bug #38559 (Resolved): 50-100% iops lost due to bluefs_preextend_wal_files = false
Hi.
I was investigating why RocksDB performance is so bad considering random 4K iops. I was looking at strace and ...
Vitaliy Filippov
01:30 PM Bug #25077: Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
We have upgraded to 12.2.11. During reboots the following would pass by:
[16:20:59] @ bitrot: osd.17 [ERR] 7...
Stefan Kooman
11:55 AM Bug #38557 (Closed): pkg dependency issues upgrading from 12.2.y to 14.x.y
Description of problem:
With respect to https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...
Kaleb KEITHLEY

03/02/2019

02:28 PM Bug #38554 (Duplicate): ObjectStore/StoreTestSpecificAUSize.TooManyBlobsTest/2 fail, Expected: (r...
... Sage Weil

03/01/2019

04:44 PM Bug #36482: High amount of Read I/O on BlueFS/DB when listing omap keys
FYI, I think I hit another case with this with this in the last two weeks.
A RGW only case where if you would list...
Wido den Hollander
03:27 PM Bug #36455 (Resolved): BlueStore: ENODATA not fully handled
Nathan Cutler
03:27 PM Backport #37825 (Resolved): luminous: BlueStore: ENODATA not fully handled
Nathan Cutler
03:10 PM Backport #36641 (New): mimic: Unable to recover from ENOSPC in BlueFS
Nathan Cutler
03:10 PM Backport #36640 (New): luminous: Unable to recover from ENOSPC in BlueFS
Nathan Cutler
03:09 PM Bug #36268 (Pending Backport): Unable to recover from ENOSPC in BlueFS
Sage, did you mean to cancel the mimic and luminous backports when you changed the status to Resolved? Nathan Cutler
10:46 AM Bug #25077 (New): Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
Igor Fedotov
10:46 AM Bug #25077: Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
Stefan Kooman wrote:
> @Igor Fedotov:
>
> We are using ceph balancer to get PGs balanced accross the cluster. The...
Igor Fedotov
07:56 AM Bug #38363: Failure in assert when calling: ceph-volume lvm prepare --bluestore --data /dev/sdg
I tried but the output of ceph-volume remains the same....
I added this to /etc/ceph/ceph.conf on my testing nod...
Rainer Krienke

02/28/2019

07:30 PM Backport #37825: luminous: BlueStore: ENODATA not fully handled
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/25855
merged
Yuri Weinstein
05:15 PM Bug #37282: rocksdb: submit_transaction_sync error: Corruption: block checksum mismatch code = 2
We're not sure how to proceed without being able to reprdocue the crash, and we have never seen this.
1. Would it...
Sage Weil
04:41 PM Bug #38329 (Fix Under Review): OSD crashes in get_str_map while creating with ceph-volume
reproduce this and got a core.
I think the problem is an empty string passed to trim() in str_map.cc. Fix here: h...
Sage Weil
03:37 PM Bug #23206 (Rejected): ceph-osd daemon crashes - *** Caught signal (Aborted) **
not enough info Sage Weil
03:35 PM Bug #24639 (Can't reproduce): [segfault] segfault in BlueFS::read
sounds like a hardware problem then! Sage Weil
03:34 PM Bug #25098: Bluestore OSD failed to start with `bluefs_types.h: 54: FAILED assert(pos <= end)`
Current status:
We want a more concrete source of truth for whether the db and/or wal partitions should exist--som...
Sage Weil
03:31 PM Bug #34526 (Duplicate): OSD crash in KernelDevice::direct_read_unaligned while scrubbing
Sage Weil
09:55 AM Bug #34526: OSD crash in KernelDevice::direct_read_unaligned while scrubbing
IMO this is BlueStore (or more precisely BlueFS and/or RocksDB) related.
And I think it's duplicate of #36482
O...
Igor Fedotov
03:30 PM Bug #36268 (Resolved): Unable to recover from ENOSPC in BlueFS
Sage Weil
03:30 PM Bug #36331 (Need More Info): FAILED ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixNoCsum/2 ...
this was an ubuntu 18.04 kernel. maybe this was the pread vs swap zeroed pages kerenl bug?
i think we need anothe...
Sage Weil
03:27 PM Bug #36364 (Can't reproduce): Bluestore OSD IO Hangs near Flush (flush in 90.330556)
Sage Weil
03:27 PM Bug #38049 (Resolved): random osds failing in thread_name:bstore_kv_final
Sage Weil
03:23 PM Bug #38250 (Need More Info): assert failure crash prevents ceph-osd from running
Is the errno EIO in this case?
On read error we do crash and fail the OSD. There is generally no recovery path fo...
Sage Weil
03:18 PM Bug #38272 (In Progress): "no available blob id" assertion might occur
Sage Weil
03:16 PM Bug #38363 (Need More Info): Failure in assert when calling: ceph-volume lvm prepare --bluestore ...
Can you reproduce this with debug_bluestore=20, debug_bluefs=20, debug_bdev=20?
Thanks!
Sage Weil
03:14 PM Bug #36482: High amount of Read I/O on BlueFS/DB when listing omap keys
- it looks like implementing readahead in bluefs would help
- we think newer rocksdb does its own readahead
Sage Weil
02:41 PM Bug #36482: High amount of Read I/O on BlueFS/DB when listing omap keys
We've got another occurrence for this issue too.
Omap listing for specific onode consistently takes ~2 mins while d...
Igor Fedotov
02:19 PM Bug #36482: High amount of Read I/O on BlueFS/DB when listing omap keys
I think this is the same issue:
https://marc.info/?l=ceph-devel&m=155134206210976&w=2
Igor Fedotov
03:04 PM Bug #37914 (Can't reproduce): bluestore: segmentation fault
no logs or core. hoping it was teh hypercombined bufferlist memory corruption issue Sage Weil
09:57 AM Bug #25077: Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
@Igor Fedotov:
We are using ceph balancer to get PGs balanced accross the cluster. The day after the crashes, the ...
Stefan Kooman

02/27/2019

11:18 AM Bug #25077: Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
Check, I'll collect the needed information. Note, during the restarts of the storage servers the *same* OSDs crashed ... Stefan Kooman
07:09 AM Feature #38494: Bluestore: issue discards on everything non-discarded during deep-scrubs
Included link is just related PR. Марк Коренберг
07:07 AM Feature #38494: Bluestore: issue discards on everything non-discarded during deep-scrubs
text formatting of previous message is wrong. I did not want to stroke-out the text. Марк Коренберг
07:07 AM Feature #38494 (New): Bluestore: issue discards on everything non-discarded during deep-scrubs
Yes, we have bdev_enable_discard and bdev_async_discard, but they are not documented.
Ubuntu issues ...
Марк Коренберг
 

Also available in: Atom