Project

General

Profile

Activity

From 01/23/2019 to 02/21/2019

02/21/2019

09:52 PM Backport #38142: luminous: os/bluestore: fixup access a destroy cond cause deadlock or undefine b...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26261
merged
Yuri Weinstein
03:49 PM Bug #38329: OSD crashes in get_str_map while creating with ceph-volume
(original reporter here)
I have following customisation in ceph.conf:...
Tomasz Torcz
01:00 PM Bug #25077: Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
We *think* we have hit this same issue "in the field" on a Luminous 12.2.8 cluster:
2019-02-20 18:42:45.261357 7fd...
Stefan Kooman

02/20/2019

11:15 AM Bug #38395 (Fix Under Review): luminous: write following remove might access previous onode
Igor Fedotov
10:35 AM Bug #38395: luminous: write following remove might access previous onode
Igor Fedotov
10:25 AM Bug #38395 (Resolved): luminous: write following remove might access previous onode
So the sequence is as follows:
T1:
remove A
T2:
touch A
write A
In Luminous there is a chance that A is rem...
Igor Fedotov

02/18/2019

01:41 PM Bug #38363 (Need More Info): Failure in assert when calling: ceph-volume lvm prepare --bluestore ...
I run Ubuntu 18.04 and and ceph version 13.2.4-1bionic from this repo: https://download.ceph.com/debian-mimic.
Whe...
Rainer Krienke

02/16/2019

11:00 AM Backport #37990 (Resolved): mimic: Compression not working, and when applied OSD disks are failin...
Nathan Cutler

02/15/2019

10:38 PM Bug #37839: Compression not working, and when applied OSD disks are failing randomly
merged https://github.com/ceph/ceph/pull/26342
https://github.com/ceph/ceph/pull/26544
Yuri Weinstein
08:29 PM Bug #38329: OSD crashes in get_str_map while creating with ceph-volume
- have any options been customized?
- what version is this? 14.0.1-2.fc30 is a random dev checkpoint commit from ma...
Sage Weil
08:12 PM Bug #38329: OSD crashes in get_str_map while creating with ceph-volume
Changing back to the Ceph tracker, this is not a crash in ceph-volume or specific to ceph-volume that I can see Alfredo Deza
12:57 PM Bug #38329 (Resolved): OSD crashes in get_str_map while creating with ceph-volume
see https://bugzilla.redhat.com/show_bug.cgi?id=1661583... Kaleb KEITHLEY
03:10 PM Bug #38049: random osds failing in thread_name:bstore_kv_final
We have not experienced any further crashes in over a week (compared to multiple crashes per hour before), so it look... Lawrence Smith

02/13/2019

12:39 PM Bug #38230 (Resolved): segv in onode lookup
https://github.com/ceph/ceph/pull/26391 Sage Weil

02/12/2019

03:35 PM Bug #38272: "no available blob id" assertion might occur
onode dump shortly before the assertion:
2019-02-12 18:23:47.546 7fca6fab1b40 0 bluestore(bluestore.test_temp_dir) ...
Igor Fedotov
03:29 PM Bug #38272: "no available blob id" assertion might occur
Backtrace from UT:
-1> 2019-02-12 18:23:48.346 7fca6fab1b40 -1 /home/if/ceph/src/os/bluestore/BlueStore.cc: In fun...
Igor Fedotov
03:26 PM Bug #38272: "no available blob id" assertion might occur
Stack trace from the customer log:
2019-02-06 00:04:25.934977 7ff3e3bca700 -1 /home/abuild/rpmbuild/BUILD/ceph-12.2....
Igor Fedotov
03:25 PM Bug #38272 (Resolved): "no available blob id" assertion might occur
We observed that on-site but unfortunately OSD were removed and are unavailable for inspection.
However I managed to...
Igor Fedotov
03:01 PM Backport #38143 (Resolved): mimic: os/bluestore: fixup access a destroy cond cause deadlock or un...
Nathan Cutler
12:00 AM Backport #38143: mimic: os/bluestore: fixup access a destroy cond cause deadlock or undefine beha...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26260
merged
Yuri Weinstein
02:53 PM Backport #38188 (In Progress): luminous: deep fsck fails on inspecting very large onodes
Nathan Cutler
02:50 PM Backport #38187 (Resolved): mimic: deep fsck fails on inspecting very large onodes
Nathan Cutler

02/11/2019

09:08 PM Backport #38187: mimic: deep fsck fails on inspecting very large onodes
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26291
merged
Yuri Weinstein
04:34 PM Bug #38049: random osds failing in thread_name:bstore_kv_final
From the first log, this looks like #36541. My guess is the crashes you were seeing after were continued problems fr... Sage Weil
10:06 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
I still seem to be experiencing these errors, albeit at a much reduced rate since upgrading to 13.2.3. I could wake u... Nick Fisk

02/09/2019

05:14 PM Bug #38250 (Rejected): assert failure crash prevents ceph-osd from running
One of my OSDs keeps crashing shortly after startup, which is preventing it from joining the cluster. The core issue... Adam DC949

02/08/2019

03:40 PM Bug #38230: segv in onode lookup
... Sage Weil

02/07/2019

10:41 PM Bug #38230: segv in onode lookup
i'm guessing this is the same heap corruption we've been seeing, but logging it anyway Sage Weil
10:40 PM Bug #38230 (Resolved): segv in onode lookup
... Sage Weil

02/06/2019

03:56 PM Bug #38049: random osds failing in thread_name:bstore_kv_final
After two days of running fine I had set the bluestore and bluefs log level back to default, so I dont know how helpf... Lawrence Smith
03:27 PM Bug #38049: random osds failing in thread_name:bstore_kv_final
Lawrence, would you share the log for current crashes please?
Existing failures with fsck are expected as the patc...
Igor Fedotov
01:24 PM Bug #38049: random osds failing in thread_name:bstore_kv_final
We have since patched our ceph with https://github.com/ceph/ceph/pull/24686 which fixes Issue #36541. Since then the ... Lawrence Smith
02:16 PM Bug #38176: Unable to recover from ENOSPC in BlueFS, WAL
Fixed link to bug replication script.
https://drive.google.com/file/d/10Lvcf6_Lj2c2sydcfU170lbb-IQClvH-
Adam Kupczyk
09:05 AM Bug #37360 (Resolved): bluefs-bdev-expand aborts
Igor Fedotov
08:54 AM Backport #38188: luminous: deep fsck fails on inspecting very large onodes
No need for that additional cherry-pick, just add new option using the method applicable for luminous Igor Fedotov

02/05/2019

11:56 PM Backport #38188 (Need More Info): luminous: deep fsck fails on inspecting very large onodes
We need to cherry-pick additional commits to get this backport PR, Option::TYPE_SIZE and Option::FLAG_RUNTIME not de... Prashant D
05:02 PM Backport #38188 (Resolved): luminous: deep fsck fails on inspecting very large onodes
https://github.com/ceph/ceph/pull/26387 Nathan Cutler
11:31 PM Backport #38187 (In Progress): mimic: deep fsck fails on inspecting very large onodes
https://github.com/ceph/ceph/pull/26291 Prashant D
05:01 PM Backport #38187 (Resolved): mimic: deep fsck fails on inspecting very large onodes
https://github.com/ceph/ceph/pull/26291 Nathan Cutler
09:39 PM Backport #37494 (Resolved): mimic: bluefs-bdev-expand aborts
Igor Fedotov
09:17 PM Backport #37494: mimic: bluefs-bdev-expand aborts
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/25348
merged
Yuri Weinstein
11:43 AM Bug #38176 (Fix Under Review): Unable to recover from ENOSPC in BlueFS, WAL
Kefu Chai
11:28 AM Bug #38176 (Won't Fix): Unable to recover from ENOSPC in BlueFS, WAL
It is possible to insert so much OMAP data into objects that it will overflow storage and cause ENOSPC when rocksdb t... Adam Kupczyk

02/04/2019

08:55 PM Bug #38065 (Pending Backport): deep fsck fails on inspecting very large onodes
Neha Ojha
08:51 PM Bug #38065: deep fsck fails on inspecting very large onodes
Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/26170
merged
Yuri Weinstein
11:21 AM Backport #38161 (Rejected): mimic: KernelDevice exclusive lock broken
Nathan Cutler
11:21 AM Backport #38160 (Rejected): luminous: KernelDevice exclusive lock broken
https://github.com/ceph/ceph/pull/34514 Nathan Cutler
03:32 AM Backport #38142 (In Progress): luminous: os/bluestore: fixup access a destroy cond cause deadlock...
https://github.com/ceph/ceph/pull/26261 Prashant D
03:30 AM Backport #38143 (In Progress): mimic: os/bluestore: fixup access a destroy cond cause deadlock or...
https://github.com/ceph/ceph/pull/26260 Prashant D

02/03/2019

05:21 PM Bug #38150 (Pending Backport): KernelDevice exclusive lock broken
Kefu Chai

02/02/2019

10:59 AM Bug #38049: random osds failing in thread_name:bstore_kv_final
The fsck on a failed osd gives the following:... Lawrence Smith

02/01/2019

05:39 PM Bug #38150: KernelDevice exclusive lock broken
https://github.com/ceph/ceph/pull/26245 Sage Weil
05:37 PM Bug #38150 (Resolved): KernelDevice exclusive lock broken
fcntl locks go away when *any* fd on the file is closed.
This can cause corruption when running ceph-osd in contai...
Sage Weil
02:12 PM Bug #38049: random osds failing in thread_name:bstore_kv_final
@Sage - could you please confirm this is a duplicate for #36541 Igor Fedotov
02:10 PM Bug #38049: random osds failing in thread_name:bstore_kv_final
Log extract for competing transactions (T1 - rename A->B, T2 - write A, T3 - remove B:... Igor Fedotov
02:05 PM Bug #38049: random osds failing in thread_name:bstore_kv_final
Looks like duplicate for #36541 Igor Fedotov
01:54 PM Bug #37282: rocksdb: submit_transaction_sync error: Corruption: block checksum mismatch code = 2
I might have been bitten by the same issue. The OSD in question is has its main data on a spinning drive and its data... David Sieger
09:18 AM Backport #38143 (Resolved): mimic: os/bluestore: fixup access a destroy cond cause deadlock or un...
https://github.com/ceph/ceph/pull/26260 Nathan Cutler
09:18 AM Backport #38142 (Resolved): luminous: os/bluestore: fixup access a destroy cond cause deadlock or...
https://github.com/ceph/ceph/pull/26261 Nathan Cutler

01/31/2019

03:28 PM Bug #20557 (Closed): segmentation fault with rocksdb|BlueStore and jemalloc
Neha Ojha
03:26 PM Bug #25098: Bluestore OSD failed to start with `bluefs_types.h: 54: FAILED assert(pos <= end)`
This is about OSD's behavior in malfunctioned environment (disappearing block.db symlink). We need to terminate OSD b... Radoslaw Zarzynski
03:26 PM Bug #24561 (Resolved): if disableWAL is set, submit_transacton_sync will met error.
Neha Ojha
03:24 PM Bug #24906 (Closed): fio with bluestore crushed
Closing this for now based on Adam's theory. Please feel free to reopen if the issue persists. Neha Ojha
03:19 PM Bug #27222 (Can't reproduce): FAILED assert(available >= allocated) in void AllocatorLevel02<T>::...
Neha Ojha
03:10 PM Bug #37733 (Pending Backport): os/bluestore: fixup access a destroy cond cause deadlock or undefi...
Neha Ojha
02:58 PM Bug #38049: random osds failing in thread_name:bstore_kv_final
Just to record intermediate analysis results:
the failure sequence looks like the following - compressed blob was ov...
Igor Fedotov
02:21 PM Bug #38049: random osds failing in thread_name:bstore_kv_final
Lawrence, can you please run fsck for some of already failed OSDs.
Igor Fedotov

01/30/2019

06:20 PM Bug #36541 (Resolved): rename does not old ref to replacement onode at old name
Patrick Donnelly
06:19 PM Backport #36639 (Resolved): mimic: rename does not old ref to replacement onode at old name
Patrick Donnelly
05:06 PM Backport #36639: mimic: rename does not old ref to replacement onode at old name
Patrick Donnelly wrote:
> https://github.com/ceph/ceph/pull/25313
merged
Yuri Weinstein

01/28/2019

03:28 PM Bug #38065 (Fix Under Review): deep fsck fails on inspecting very large onodes
Igor Fedotov
03:27 PM Bug #38065: deep fsck fails on inspecting very large onodes
https://github.com/ceph/ceph/pull/26170 Igor Fedotov
03:16 PM Bug #38065: deep fsck fails on inspecting very large onodes
Looks like aio_queue_t::submit_batch timeouts due to long list of blocks to read. We need to cap this for fsck and, p... Igor Fedotov
02:21 PM Bug #38065 (Resolved): deep fsck fails on inspecting very large onodes
Steps to reproduce (100%):
Put 3GB object to replicated pool via rados then do deep fsck either on mount or via cep...
Igor Fedotov
02:50 PM Backport #37824 (Resolved): mimic: BlueStore: ENODATA not fully handled
Nathan Cutler

01/25/2019

04:03 PM Backport #37824: mimic: BlueStore: ENODATA not fully handled
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/25854
merged
Yuri Weinstein
03:24 PM Bug #38049 (Resolved): random osds failing in thread_name:bstore_kv_final
Since upgrading to mimic 13.2.4 single osds are randomly failing with the following assert:... Lawrence Smith
 

Also available in: Atom