Project

General

Profile

Activity

From 07/11/2018 to 08/09/2018

08/09/2018

12:11 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Oh, yeah, looks like I fail at copy & pasting URLs. Correct link is https://github.com/ceph/ceph/pull/23273 Paul Emmerich

08/08/2018

10:01 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Paul Emmerich wrote:
> I've prototyped a work-around here: https://github.com/ceph/ceph/pull/2327
>
> Is there a ...
Honggang Yang
02:08 AM Bug #25207: ceph-volume lvm create gives segmentation fault
This looks a bit like the error we see when jemalloc is enabled in /etc/{default,sysconfig}/ceph. Can you see if it ... Sage Weil

08/06/2018

04:38 PM Bug #21480 (Resolved): bluestore: flush_commit is racy
Igor Fedotov
04:37 PM Bug #23540 (Resolved): FAILED assert(0 == "can't mark unloaded shard dirty") with compression ena...
Igor Fedotov
04:36 PM Backport #24798 (Resolved): luminous: FAILED assert(0 == "can't mark unloaded shard dirty") with ...
Igor Fedotov
03:16 AM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
Troy Ablan wrote:
> Before I took the crashy ones down, and ran a bluestore fsck. fsck came back with similar err...
Troy Ablan
03:15 AM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
Wanted to update that I
# Have gotten 100% of the data out of the cluster, and as far as I can tell, everything is...
Troy Ablan

08/05/2018

09:19 PM Backport #24260 (Resolved): luminous: bluestore: flush_commit is racy
Igor Fedotov

08/03/2018

03:20 PM Backport #24770: luminous: set correctly shard for existed Collection.
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22860
merged
Yuri Weinstein
03:13 PM Backport #24260: luminous: bluestore: flush_commit is racy
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22904
merged
Yuri Weinstein
03:12 PM Backport #24260: luminous: bluestore: flush_commit is racy
https://github.com/ceph/ceph/pull/22909
merged
Yuri Weinstein

08/01/2018

08:21 PM Bug #25207: ceph-volume lvm create gives segmentation fault
Alfredo Deza wrote:
> [...]
>
> The above is considered normal for the first time the OSD is created.
>
> Not ...
Pavan Kumar Linga
10:55 AM Bug #25207: ceph-volume lvm create gives segmentation fault
... Alfredo Deza
12:37 AM Bug #25207 (Can't reproduce): ceph-volume lvm create gives segmentation fault
Hi,
I am trying to install the osd on nvme drive. It is not linked to any volume groups.
When i tried to use the...
Pavan Kumar Linga

07/31/2018

06:25 PM Bug #25006: bad csum during upgrade test
Nathan Cutler wrote:
> Also, I noticed this in the test yaml:
>
> [...]
>
> The only thing within @parallel@ i...
Sage Weil

07/30/2018

08:13 PM Bug #25180 (Resolved): ObjectStore/StoreTest.CompressionTest/2 fail
... Sage Weil
06:51 PM Bug #24859 (Resolved): Multiple races related to destruction of SharedBlob and BlueStore::split_c...
Nathan Cutler
06:51 PM Backport #24886 (Resolved): luminous: Multiple races related to destruction of SharedBlob and Blu...
Nathan Cutler
04:42 PM Backport #24886: luminous: Multiple races related to destruction of SharedBlob and BlueStore::spl...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/23064
merged
Yuri Weinstein

07/28/2018

08:07 PM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
It looks like I have quite a few more than this one OSD that's crashing. They all fsck successfully, but repair will... Troy Ablan

07/27/2018

10:01 PM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
repair log and core respectively
ceph-post-file: 189d2970-662a-49c0-ac6c-6ee6d124d523
ceph-post-file: 8675da71-1b...
Troy Ablan
09:53 PM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
I have since updated to 13.2.1 and the cluster appears to be allowing me to get at least some of my data out. Howeve... Troy Ablan

07/26/2018

10:02 PM Bug #25077 (Fix Under Review): Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
https://github.com/ceph/ceph/pull/23257 Igor Fedotov
12:50 PM Bug #25077: Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
https://github.com/ceph/ceph/pull/23257 Igor Fedotov
07:37 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
I've prototyped a work-around here: https://github.com/ceph/ceph/pull/2327
Is there a good reason to not retry rea...
Paul Emmerich
03:17 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
I agree that it is cleary a kernel bug in 4.9+, but I disagree with won't fix as a conclusion. Also, it also happens ... Paul Emmerich
04:39 PM Bug #22102 (Won't Fix): BlueStore crashed on rocksdb checksum mismatch
This appears to be a kernel bug related to swapping.
So far no indication it affects distro kernels.
Sage Weil
09:41 AM Bug #25098: Bluestore OSD failed to start with `bluefs_types.h: 54: FAILED assert(pos <= end)`
first 1 MB of the device `dd if=/dev/sdc of=/tmp/foo bs=1024K skip=1 count=1`... benoit hudzia
09:32 AM Bug #25098: Bluestore OSD failed to start with `bluefs_types.h: 54: FAILED assert(pos <= end)`
Dump of the bluefs super block : ... benoit hudzia
09:27 AM Bug #25098: Bluestore OSD failed to start with `bluefs_types.h: 54: FAILED assert(pos <= end)`
output of ceph-volume.log , seems ok, nothing strange:... benoit hudzia

07/25/2018

11:12 PM Bug #25077 (In Progress): Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
Igor Fedotov
09:30 PM Bug #22464 (Won't Fix): Bluestore: many checksum errors, always 0x6706be76 (which matches a zero ...
I'm going to close this given that all of the evidence seems to point to a kernel bug with swap. Sage Weil
09:20 PM Bug #24903 (Resolved): Update 12.2.5 -> 12.2.6: block.db symlink exists but target unusable
Sage Weil
11:57 AM Bug #25098 (Resolved): Bluestore OSD failed to start with `bluefs_types.h: 54: FAILED assert(pos ...
This occurs sometimes... hard to catch :
1. we zap the device
2. do ceph lvm active with a cache and DB on SSD ...
benoit hudzia

07/24/2018

01:41 PM Bug #25077: Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
Looks like a race between object and collection removals. Igor Fedotov
01:39 PM Bug #25077 (Can't reproduce): Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
The issue occurs after running the following command for a while:
../bin/ceph_test_objectstore --gtest_filter=Object...
Igor Fedotov
12:35 PM Bug #20236 (In Progress): bluestore: ObjectStore/StoreTestSpecificAUSize.Many4KWritesNoCSumTest/2...
It looks like I have some insight on the root cause, got that much more frequently once having some blobs at DB only ... Igor Fedotov
03:31 AM Bug #25050: osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
It is already back to production without compression, I am not sure it is related to the compression type, we had sim... Yohay Azulay

07/23/2018

06:24 PM Bug #25050: osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
Mind switching to a different compression method and try again? Igor Fedotov
06:22 PM Bug #25050: osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
Correct. The log is with LZ4 compression is enabled and the OSD fails to start.
When I disabled the lz4 it went up...
Yohay Azulay
05:12 PM Bug #25050: osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
Yohay,
it looks like LZ4 compressor is still enabled and failing. Haven't you enabled compression on per-OSD basis?
Igor Fedotov
05:49 PM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
Here's the fsck commandline and its output. Log file has been uploaded to https://mooinglemur.com/2018-07-ceph/
B...
Troy Ablan
11:03 AM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
The root cause for the abort is duplicate pextent(0x31a67390000~10000) that is present at both
"6.3es4_head 4#6:7e1...
Igor Fedotov
09:26 AM Bug #24968 (Closed): Compaction error: Corruption: block checksum mismatch
Hardware issues were the root cause hence closing. Igor Fedotov

07/22/2018

06:46 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
Did some further tests. The iomemory-vsl driver seems to scramble data in some conditions. So no need to search for i... Markus Stockhausen
12:13 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
Hi Igor,
checking the hardware was a good clue. I guess the reason is identified. Bluestore WAL/DB does not play n...
Markus Stockhausen
12:07 AM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
It appears that I have access to all of the pools at this point now that one of the crashing OSDs is staying down, bu... Troy Ablan

07/21/2018

11:33 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
Markus,
does it refuses to start from scratch or after some load?
Is this just a single OSD or multiple ones are...
Igor Fedotov
07:30 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
Sorry Igor I started rebuilding the cluster shortly before your answer.
Nevertheless I was able to reproduce the e...
Markus Stockhausen
08:58 AM Bug #25006: bad csum during upgrade test
Also, I noticed this in the test yaml:... Nathan Cutler
08:49 AM Bug #25006: bad csum during upgrade test
The "missing primary copy" message happens on a mixed luminous/mimic cluster (MONs, MGR, and half of OSDs running mim... Nathan Cutler
08:29 AM Bug #25050: osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
I meant and now OSD is now up. Yohay Azulay
08:28 AM Bug #25050: osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
I disabled the LZ4 compression on both cephfs data and metadata pools and the osd is not up.
Similar symptom we ha...
Yohay Azulay
07:16 AM Bug #25050 (Duplicate): osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
OSD Crashed and it fails to start,
Debug Log can be download here: http://77.247.180.45/download/ceph-osd.22.debu...
Yohay Azulay

07/19/2018

11:02 PM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
Attaching thread dump. Brad Hubbard
10:48 PM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
Looks like we are passing a bad bluestore_pextent_t into txc->released.insert.... Brad Hubbard
03:25 PM Bug #25001 (Can't reproduce): Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
This bug has been opened following on from http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-July/028232.html
...
Troy Ablan
08:43 PM Bug #25006 (Can't reproduce): bad csum during upgrade test
Run: http://pulpito.ceph.com/yuriw-2018-07-18_21:43:34-upgrade:luminous-x-mimic-distro-basic-smithi/
Job: 2796264
L...
Yuri Weinstein
08:14 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
Markus,
before redeploying the OSDs can you monitor current memory usage with top and/or free tools for a while?
Ju...
Igor Fedotov
03:22 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
Hi Igor,
thanks for your feedback. The ceph servers have 128GB RAM with 10 1.8TB HDD and 3 1.92TB SSD plus an 640G...
Markus Stockhausen
09:51 AM Bug #24968: Compaction error: Corruption: block checksum mismatch
Markus,
first of all - I think 'improper' device size reporting is unrelated to this issue. This report contains jus...
Igor Fedotov
12:05 PM Backport #24799 (Resolved): mimic: FAILED assert(0 == "can't mark unloaded shard dirty") with com...
Igor Fedotov
09:28 AM Bug #24639: [segfault] segfault in BlueFS::read
Rowan,
do you remember what were BlueFS volume sizes for that breaking OSDs?
Igor Fedotov
09:15 AM Bug #24560 (Resolved): BitmapAllocator::_mark_allocated parameter overflow.
Igor Fedotov

07/18/2018

07:31 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
One strange thing from the detail log (out) is a size mismatch for bdev 2:
At the beginning of the log we see:
<p...
Markus Stockhausen
02:56 PM Backport #24887 (Resolved): mimic: Multiple races related to destruction of SharedBlob and BlueSt...
Nathan Cutler
02:17 PM Backport #24887: mimic: Multiple races related to destruction of SharedBlob and BlueStore::split_...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/23065
merged
Yuri Weinstein

07/17/2018

07:35 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
Ceph.conf... Markus Stockhausen
07:31 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
System is Centos 7.5 with longterm kernel 4.14.52 (kernel-ml spinoff from ELRepo) Markus Stockhausen
07:23 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
Log of
CEPH_ARGS="--debug-bluestore 20 --debug-bluefs 20 --err-to-stderr --log-file out" ceph-bluestore-tool fsck ...
Markus Stockhausen
07:17 PM Bug #24968 (Closed): Compaction error: Corruption: block checksum mismatch
I'm unning ceph luminous 12.2.5 for a few weeks now. Unitl now only very light usage. Today we started our first ceph... Markus Stockhausen

07/16/2018

03:19 AM Backport #24887 (In Progress): mimic: Multiple races related to destruction of SharedBlob and Blu...
https://github.com/ceph/ceph/pull/23065 Prashant D
02:35 AM Backport #24886 (In Progress): luminous: Multiple races related to destruction of SharedBlob and ...
https://github.com/ceph/ceph/pull/23064 Prashant D

07/13/2018

04:51 PM Bug #24903 (Fix Under Review): Update 12.2.5 -> 12.2.6: block.db symlink exists but target unusable
https://github.com/ceph/ceph/pull/23031 Sage Weil
02:59 PM Bug #24903: Update 12.2.5 -> 12.2.6: block.db symlink exists but target unusable
What is happening here is that ceph-volume was relying on bluestore to set these links, and then when bluestore chang... Alfredo Deza
10:07 AM Bug #24903 (Resolved): Update 12.2.5 -> 12.2.6: block.db symlink exists but target unusable
After updating from 12.2.5 to 12.2.6 BlueStore OSDs with a separate blocks.db device will not restart:
2018-07-13 ...
Robert Sander
10:54 AM Bug #24906 (Closed): fio with bluestore crushed
... Honggang Yang
08:03 AM Bug #24901 (Resolved): Client reads fail due to bad CRC under high memory pressure on OSDs
I've seen problems with read failures due to CRC mismatches on two completely independent clusters with different har... Paul Emmerich

07/12/2018

10:49 AM Bug #22977 (Resolved): High CPU load caused by operations on onode_map
Nathan Cutler
10:49 AM Backport #24720 (Resolved): mimic: High CPU load caused by operations on onode_map
Nathan Cutler
12:05 AM Backport #24720: mimic: High CPU load caused by operations on onode_map
Patrick Donnelly wrote:
> https://github.com/ceph/ceph/pull/22777
merged
Yuri Weinstein
10:36 AM Backport #24769 (Resolved): mimic: set correctly shard for existed Collection.
Nathan Cutler
10:17 AM Backport #24887 (Resolved): mimic: Multiple races related to destruction of SharedBlob and BlueSt...
https://github.com/ceph/ceph/pull/23065 Nathan Cutler
10:17 AM Backport #24886 (Resolved): luminous: Multiple races related to destruction of SharedBlob and Blu...
https://github.com/ceph/ceph/pull/23064 Nathan Cutler
03:02 AM Bug #24859: Multiple races related to destruction of SharedBlob and BlueStore::split_cache()
https://github.com/ceph/ceph/pull/22972 Sage Weil
03:02 AM Bug #24859 (Pending Backport): Multiple races related to destruction of SharedBlob and BlueStore:...
Sage Weil
12:00 AM Backport #24799: mimic: FAILED assert(0 == "can't mark unloaded shard dirty") with compression en...
Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/22910
merged
Yuri Weinstein

07/11/2018

08:11 PM Backport #24769: mimic: set correctly shard for existed Collection.
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22859
merged
Yuri Weinstein
 

Also available in: Atom