Activity
From 07/11/2018 to 08/09/2018
08/09/2018
- 12:11 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- Oh, yeah, looks like I fail at copy & pasting URLs. Correct link is https://github.com/ceph/ceph/pull/23273
08/08/2018
- 10:01 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- Paul Emmerich wrote:
> I've prototyped a work-around here: https://github.com/ceph/ceph/pull/2327
>
> Is there a ... - 02:08 AM Bug #25207: ceph-volume lvm create gives segmentation fault
- This looks a bit like the error we see when jemalloc is enabled in /etc/{default,sysconfig}/ceph. Can you see if it ...
08/06/2018
- 04:38 PM Bug #21480 (Resolved): bluestore: flush_commit is racy
- 04:37 PM Bug #23540 (Resolved): FAILED assert(0 == "can't mark unloaded shard dirty") with compression ena...
- 04:36 PM Backport #24798 (Resolved): luminous: FAILED assert(0 == "can't mark unloaded shard dirty") with ...
- 03:16 AM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
- Troy Ablan wrote:
> Before I took the crashy ones down, and ran a bluestore fsck. fsck came back with similar err... - 03:15 AM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
- Wanted to update that I
# Have gotten 100% of the data out of the cluster, and as far as I can tell, everything is...
08/05/2018
08/03/2018
- 03:20 PM Backport #24770: luminous: set correctly shard for existed Collection.
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22860
merged - 03:13 PM Backport #24260: luminous: bluestore: flush_commit is racy
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22904
merged - 03:12 PM Backport #24260: luminous: bluestore: flush_commit is racy
- https://github.com/ceph/ceph/pull/22909
merged
08/01/2018
- 08:21 PM Bug #25207: ceph-volume lvm create gives segmentation fault
- Alfredo Deza wrote:
> [...]
>
> The above is considered normal for the first time the OSD is created.
>
> Not ... - 10:55 AM Bug #25207: ceph-volume lvm create gives segmentation fault
- ...
- 12:37 AM Bug #25207 (Can't reproduce): ceph-volume lvm create gives segmentation fault
- Hi,
I am trying to install the osd on nvme drive. It is not linked to any volume groups.
When i tried to use the...
07/31/2018
- 06:25 PM Bug #25006: bad csum during upgrade test
- Nathan Cutler wrote:
> Also, I noticed this in the test yaml:
>
> [...]
>
> The only thing within @parallel@ i...
07/30/2018
- 08:13 PM Bug #25180 (Resolved): ObjectStore/StoreTest.CompressionTest/2 fail
- ...
- 06:51 PM Bug #24859 (Resolved): Multiple races related to destruction of SharedBlob and BlueStore::split_c...
- 06:51 PM Backport #24886 (Resolved): luminous: Multiple races related to destruction of SharedBlob and Blu...
- 04:42 PM Backport #24886: luminous: Multiple races related to destruction of SharedBlob and BlueStore::spl...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/23064
merged
07/28/2018
- 08:07 PM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
- It looks like I have quite a few more than this one OSD that's crashing. They all fsck successfully, but repair will...
07/27/2018
- 10:01 PM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
- repair log and core respectively
ceph-post-file: 189d2970-662a-49c0-ac6c-6ee6d124d523
ceph-post-file: 8675da71-1b... - 09:53 PM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
- I have since updated to 13.2.1 and the cluster appears to be allowing me to get at least some of my data out. Howeve...
07/26/2018
- 10:02 PM Bug #25077 (Fix Under Review): Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
- https://github.com/ceph/ceph/pull/23257
- 12:50 PM Bug #25077: Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
- https://github.com/ceph/ceph/pull/23257
- 07:37 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- I've prototyped a work-around here: https://github.com/ceph/ceph/pull/2327
Is there a good reason to not retry rea... - 03:17 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
- I agree that it is cleary a kernel bug in 4.9+, but I disagree with won't fix as a conclusion. Also, it also happens ...
- 04:39 PM Bug #22102 (Won't Fix): BlueStore crashed on rocksdb checksum mismatch
- This appears to be a kernel bug related to swapping.
So far no indication it affects distro kernels. - 09:41 AM Bug #25098: Bluestore OSD failed to start with `bluefs_types.h: 54: FAILED assert(pos <= end)`
- first 1 MB of the device `dd if=/dev/sdc of=/tmp/foo bs=1024K skip=1 count=1`...
- 09:32 AM Bug #25098: Bluestore OSD failed to start with `bluefs_types.h: 54: FAILED assert(pos <= end)`
- Dump of the bluefs super block : ...
- 09:27 AM Bug #25098: Bluestore OSD failed to start with `bluefs_types.h: 54: FAILED assert(pos <= end)`
- output of ceph-volume.log , seems ok, nothing strange:...
07/25/2018
- 11:12 PM Bug #25077 (In Progress): Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
- 09:30 PM Bug #22464 (Won't Fix): Bluestore: many checksum errors, always 0x6706be76 (which matches a zero ...
- I'm going to close this given that all of the evidence seems to point to a kernel bug with swap.
- 09:20 PM Bug #24903 (Resolved): Update 12.2.5 -> 12.2.6: block.db symlink exists but target unusable
- 11:57 AM Bug #25098 (Resolved): Bluestore OSD failed to start with `bluefs_types.h: 54: FAILED assert(pos ...
- This occurs sometimes... hard to catch :
1. we zap the device
2. do ceph lvm active with a cache and DB on SSD ...
07/24/2018
- 01:41 PM Bug #25077: Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
- Looks like a race between object and collection removals.
- 01:39 PM Bug #25077 (Can't reproduce): Occasional assertion in ObjectStore/StoreTest.HashCollisionTest/2
- The issue occurs after running the following command for a while:
../bin/ceph_test_objectstore --gtest_filter=Object... - 12:35 PM Bug #20236 (In Progress): bluestore: ObjectStore/StoreTestSpecificAUSize.Many4KWritesNoCSumTest/2...
- It looks like I have some insight on the root cause, got that much more frequently once having some blobs at DB only ...
- 03:31 AM Bug #25050: osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
- It is already back to production without compression, I am not sure it is related to the compression type, we had sim...
07/23/2018
- 06:24 PM Bug #25050: osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
- Mind switching to a different compression method and try again?
- 06:22 PM Bug #25050: osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
- Correct. The log is with LZ4 compression is enabled and the OSD fails to start.
When I disabled the lz4 it went up... - 05:12 PM Bug #25050: osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
- Yohay,
it looks like LZ4 compressor is still enabled and failing. Haven't you enabled compression on per-OSD basis?
- 05:49 PM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
- Here's the fsck commandline and its output. Log file has been uploaded to https://mooinglemur.com/2018-07-ceph/
B... - 11:03 AM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
- The root cause for the abort is duplicate pextent(0x31a67390000~10000) that is present at both
"6.3es4_head 4#6:7e1... - 09:26 AM Bug #24968 (Closed): Compaction error: Corruption: block checksum mismatch
- Hardware issues were the root cause hence closing.
07/22/2018
- 06:46 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
- Did some further tests. The iomemory-vsl driver seems to scramble data in some conditions. So no need to search for i...
- 12:13 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
- Hi Igor,
checking the hardware was a good clue. I guess the reason is identified. Bluestore WAL/DB does not play n... - 12:07 AM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
- It appears that I have access to all of the pools at this point now that one of the crashing OSDs is staying down, bu...
07/21/2018
- 11:33 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
- Markus,
does it refuses to start from scratch or after some load?
Is this just a single OSD or multiple ones are... - 07:30 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
- Sorry Igor I started rebuilding the cluster shortly before your answer.
Nevertheless I was able to reproduce the e... - 08:58 AM Bug #25006: bad csum during upgrade test
- Also, I noticed this in the test yaml:...
- 08:49 AM Bug #25006: bad csum during upgrade test
- The "missing primary copy" message happens on a mixed luminous/mimic cluster (MONs, MGR, and half of OSDs running mim...
- 08:29 AM Bug #25050: osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
- I meant and now OSD is now up.
- 08:28 AM Bug #25050: osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
- I disabled the LZ4 compression on both cephfs data and metadata pools and the osd is not up.
Similar symptom we ha... - 07:16 AM Bug #25050 (Duplicate): osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
- OSD Crashed and it fails to start,
Debug Log can be download here: http://77.247.180.45/download/ceph-osd.22.debu...
07/19/2018
- 11:02 PM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
- Attaching thread dump.
- 10:48 PM Bug #25001: Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
- Looks like we are passing a bad bluestore_pextent_t into txc->released.insert....
- 03:25 PM Bug #25001 (Can't reproduce): Crashing OSDs after going from 12.2.5 -> 12.2.6 -> 13.2.0
- This bug has been opened following on from http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-July/028232.html
... - 08:43 PM Bug #25006 (Can't reproduce): bad csum during upgrade test
- Run: http://pulpito.ceph.com/yuriw-2018-07-18_21:43:34-upgrade:luminous-x-mimic-distro-basic-smithi/
Job: 2796264
L... - 08:14 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
- Markus,
before redeploying the OSDs can you monitor current memory usage with top and/or free tools for a while?
Ju... - 03:22 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
- Hi Igor,
thanks for your feedback. The ceph servers have 128GB RAM with 10 1.8TB HDD and 3 1.92TB SSD plus an 640G... - 09:51 AM Bug #24968: Compaction error: Corruption: block checksum mismatch
- Markus,
first of all - I think 'improper' device size reporting is unrelated to this issue. This report contains jus... - 12:05 PM Backport #24799 (Resolved): mimic: FAILED assert(0 == "can't mark unloaded shard dirty") with com...
- 09:28 AM Bug #24639: [segfault] segfault in BlueFS::read
- Rowan,
do you remember what were BlueFS volume sizes for that breaking OSDs?
- 09:15 AM Bug #24560 (Resolved): BitmapAllocator::_mark_allocated parameter overflow.
07/18/2018
- 07:31 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
- One strange thing from the detail log (out) is a size mismatch for bdev 2:
At the beginning of the log we see:
<p... - 02:56 PM Backport #24887 (Resolved): mimic: Multiple races related to destruction of SharedBlob and BlueSt...
- 02:17 PM Backport #24887: mimic: Multiple races related to destruction of SharedBlob and BlueStore::split_...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/23065
merged
07/17/2018
- 07:35 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
- Ceph.conf...
- 07:31 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
- System is Centos 7.5 with longterm kernel 4.14.52 (kernel-ml spinoff from ELRepo)
- 07:23 PM Bug #24968: Compaction error: Corruption: block checksum mismatch
- Log of
CEPH_ARGS="--debug-bluestore 20 --debug-bluefs 20 --err-to-stderr --log-file out" ceph-bluestore-tool fsck ... - 07:17 PM Bug #24968 (Closed): Compaction error: Corruption: block checksum mismatch
- I'm unning ceph luminous 12.2.5 for a few weeks now. Unitl now only very light usage. Today we started our first ceph...
07/16/2018
- 03:19 AM Backport #24887 (In Progress): mimic: Multiple races related to destruction of SharedBlob and Blu...
- https://github.com/ceph/ceph/pull/23065
- 02:35 AM Backport #24886 (In Progress): luminous: Multiple races related to destruction of SharedBlob and ...
- https://github.com/ceph/ceph/pull/23064
07/13/2018
- 04:51 PM Bug #24903 (Fix Under Review): Update 12.2.5 -> 12.2.6: block.db symlink exists but target unusable
- https://github.com/ceph/ceph/pull/23031
- 02:59 PM Bug #24903: Update 12.2.5 -> 12.2.6: block.db symlink exists but target unusable
- What is happening here is that ceph-volume was relying on bluestore to set these links, and then when bluestore chang...
- 10:07 AM Bug #24903 (Resolved): Update 12.2.5 -> 12.2.6: block.db symlink exists but target unusable
- After updating from 12.2.5 to 12.2.6 BlueStore OSDs with a separate blocks.db device will not restart:
2018-07-13 ... - 10:54 AM Bug #24906 (Closed): fio with bluestore crushed
- ...
- 08:03 AM Bug #24901 (Resolved): Client reads fail due to bad CRC under high memory pressure on OSDs
- I've seen problems with read failures due to CRC mismatches on two completely independent clusters with different har...
07/12/2018
- 10:49 AM Bug #22977 (Resolved): High CPU load caused by operations on onode_map
- 10:49 AM Backport #24720 (Resolved): mimic: High CPU load caused by operations on onode_map
- 12:05 AM Backport #24720: mimic: High CPU load caused by operations on onode_map
- Patrick Donnelly wrote:
> https://github.com/ceph/ceph/pull/22777
merged - 10:36 AM Backport #24769 (Resolved): mimic: set correctly shard for existed Collection.
- 10:17 AM Backport #24887 (Resolved): mimic: Multiple races related to destruction of SharedBlob and BlueSt...
- https://github.com/ceph/ceph/pull/23065
- 10:17 AM Backport #24886 (Resolved): luminous: Multiple races related to destruction of SharedBlob and Blu...
- https://github.com/ceph/ceph/pull/23064
- 03:02 AM Bug #24859: Multiple races related to destruction of SharedBlob and BlueStore::split_cache()
- https://github.com/ceph/ceph/pull/22972
- 03:02 AM Bug #24859 (Pending Backport): Multiple races related to destruction of SharedBlob and BlueStore:...
- 12:00 AM Backport #24799: mimic: FAILED assert(0 == "can't mark unloaded shard dirty") with compression en...
- Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/22910
merged
07/11/2018
- 08:11 PM Backport #24769: mimic: set correctly shard for existed Collection.
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22859
merged
Also available in: Atom