Project

General

Profile

Activity

From 02/10/2023 to 03/11/2023

03/11/2023

12:35 PM Feature #58421 (Pending Backport): OSD metadata should show the min_alloc_size that each OSD was ...
Igor Fedotov

03/10/2023

04:47 PM Backport #58952 (In Progress): reef: OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/out...
https://github.com/ceph/ceph/pull/50475 Radoslaw Zarzynski
04:38 PM Backport #58952 (Resolved): reef: OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output...
When upgrading to rook 1.8.3 (ceph 16.2.7) we experience issue's with the OSD initialization; basically only +/- 50% ... Radoslaw Zarzynski
04:33 PM Backport #58633 (In Progress): quincy: OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/o...
Radoslaw Zarzynski
04:32 PM Backport #58633: quincy: OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
https://github.com/ceph/ceph/pull/50474 Radoslaw Zarzynski
03:40 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
ah, ok. Chances this ticket is related much higher then. So my recommendation would be to upgrade to Quincy once the... Igor Fedotov
02:48 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
> And as far as I understand that's a single-shot issue for you so far, right?
yes
> Anyway haven't you observe...
Anonymous
02:22 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
Hi Sven,
thanks a lot for all the info. Unfortunately it looks like the actual corruption happened during the first ...
Igor Fedotov

03/09/2023

06:33 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Hi Ilya,
I've managed to make a much simpler (and smaller) reproducer, without any VM involvement. If I create an ...
Roel van Meer

03/07/2023

11:25 AM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
the affected osd started fine after bluestore repair.
anything else you need?
Anonymous
01:10 AM Backport #58675 (Resolved): quincy: ONode ref counting is broken
Konstantin Shalygin

03/06/2023

11:21 PM Backport #58675: quincy: ONode ref counting is broken
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/50048
merged
Yuri Weinstein

03/01/2023

10:44 AM Backport #57604 (In Progress): quincy: Log message is little confusing
https://github.com/ceph/ceph/pull/50323 Igor Fedotov
10:41 AM Backport #57603 (In Progress): pacific: Log message is little confusing
https://github.com/ceph/ceph/pull/50322 Igor Fedotov
10:30 AM Bug #53466 (Resolved): OSD is unable to allocate free space for BlueFS
Igor Fedotov
10:28 AM Backport #58589 (Rejected): pacific: OSD is unable to allocate free space for BlueFS
After considering the amount of required efforts (=dependent commits to be backported) we decided to omit this fix in... Igor Fedotov
10:23 AM Backport #58849 (In Progress): pacific: AvlAllocator::init_rm|add_free methods perform assertion ...
https://github.com/ceph/ceph/pull/50321 Igor Fedotov
09:42 AM Backport #58848 (In Progress): quincy: AvlAllocator::init_rm|add_free methods perform assertion c...
https://github.com/ceph/ceph/pull/50319 Igor Fedotov
12:53 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Roel van Meer wrote:
> > - Did you attempt to correlate the time it takes for the corruption to occur (sometimes les...
Ilya Dryomov

02/27/2023

05:13 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
Hi Igor,
Yes, that is what I observe in Nautilus - far fewer updates to the file metadata.
It does appear that ...
Joshua Baergen
04:40 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
Hi Joshua,
likely your root cause analysis is valid and rocksdb's recycle_log_file_num options works differently at ...
Igor Fedotov
11:45 AM Bug #56210 (Fix Under Review): crash: int BlueFS::_replay(bool, bool): assert(r == q->second->fil...
Igor Fedotov
07:26 AM Bug #51370: StoreTestSpecificAUSize.SyntheticMatrixCsumAlgorithm/2 timed out
Local env:
[ RUN ] ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixNoCsum/2
also did hang.
1. Subtest:
...
Adam Kupczyk

02/26/2023

09:40 AM Backport #58849 (Resolved): pacific: AvlAllocator::init_rm|add_free methods perform assertion che...
Backport Bot
09:39 AM Backport #58848 (Resolved): quincy: AvlAllocator::init_rm|add_free methods perform assertion chec...
https://github.com/ceph/ceph/pull/50319 Backport Bot
09:35 AM Bug #54579 (Pending Backport): AvlAllocator::init_rm|add_free methods perform assertion check bef...
Igor Fedotov

02/23/2023

10:17 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Coming back to the nosnaptrim result, is it possible to increase debug logging in such a way that we can know what is... Roel van Meer
10:06 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Ilya Dryomov wrote:
> Let's try to hone in on the VM. You mentioned that the workload is "some logging". Can you...
Roel van Meer
09:49 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Ilya Dryomov wrote:
> Can you attach the kernel log ("journalctl -k") from the hypervisor?
Done
Roel van Meer
09:52 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
I also did two more tests with different settings on the disk in Proxmox:
9720: Disk configured without discard op...
Roel van Meer
09:42 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Ilya Dryomov wrote:
> You also said you were going to run a set of tests with nosnaptrim to confirm to deny that h...
Roel van Meer
08:25 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
Hey Igor,
Using the new metrics from https://github.com/ceph/ceph/pull/50245 in our staging environment, it became...
Joshua Baergen

02/21/2023

10:30 AM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
I did spot these lines in dmesg, but don't know if they are related, as the date does not match the osd crash.
I i...
Anonymous
10:11 AM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
I now started the osd again, and at least it did not crash - yet:... Anonymous

02/20/2023

02:36 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Hi Roel,
So far I see only one variable that contributes to the reproducer: whether the VM is running. Given the ...
Ilya Dryomov
11:00 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Hi Ilya,
I hope you had a good weekend!
Are there any other things I can test in order to help determine the root...
Roel van Meer

02/17/2023

04:33 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
I tried "repair" instead of fsck, here are the results:... Anonymous
12:09 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
If you need anything else, I'm happy to assist in debugging. Anonymous
12:32 PM Fix #58759 (New): BlueFS log runway space exhausted
In BlueFS::_flush_and_sync_log_core we have following data integrity check:
ceph_assert(bl.length() <= runway);
I...
Adam Kupczyk
07:22 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
> If so, can you amend the image creation step to disable the object-map image feature and repeat any of reproducers,... Roel van Meer

02/16/2023

03:31 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
deep fsck returned the same:... Anonymous
02:04 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
fsck:... Anonymous
02:00 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
Here is the ceph-osd log: https://drive.google.com/file/d/1VoNbGab9U6qTBPK0tWfr9AWR1RZLmUng/view?usp=share_link
an...
Anonymous
01:44 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
Could you please share a log for a failing osd startup attempt too?
And fsck report too, please.
You might want...
Igor Fedotov
01:38 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
I also have the complete ceph-osd-74.log file, but it is 14MB compressed as tar.gz so I can't upload it directly.
...
Anonymous
01:28 PM Bug #58747 (Need More Info): /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_...
Hi,
I have a crashing OSD with a crash message I have never seen before. Also this OSD Unit did not recover from t...
Anonymous
03:07 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
Hey Igor,
Let me attach some updated pictures from the production system in question. They all include 14.2.18, 16...
Joshua Baergen
01:36 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
Joshua Baergen wrote:
> Hi Igor, we have our first datapoint regarding 16.2.11: While it appears to return write _by...
Igor Fedotov
02:53 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Ilya Dryomov wrote:
> I'm guessing that the "VM not started: No checksum change after 10h and counting" test conti...
Roel van Meer
12:21 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Roel van Meer wrote:
> Also, the VMs are configured with VirtIO, writeback cache, and discard on.
> Would it be use...
Ilya Dryomov
11:46 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
> For completeness, can you repeat this one with snapshot being mapped just once?
9712: rbd device read with dd, 4...
Roel van Meer
07:37 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Also, the VMs are configured with VirtIO, writeback cache, and discard on.
Would it be useful to see if any of these...
Roel van Meer
07:33 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Ilya Dryomov wrote:
> For completeness, can you repeat this one with snapshot being mapped just once?
Yes, will...
Roel van Meer
01:04 PM Bug #52095: OSD container can't start: _read_bdev_label unable to decode label at offset 102: buf...
Richard Hesse wrote:
> Here's a truncated version of the ceph-volume.log. It was 1.7MB compressed so I only included...
Igor Fedotov
08:51 AM Backport #58588 (Resolved): quincy: OSD is unable to allocate free space for BlueFS
Igor Fedotov

02/15/2023

10:35 PM Bug #56788: crash: void KernelDevice::_aio_thread(): abort
/a/yuriw-2023-02-13_21:53:12-rados-wip-yuri-testing-2023-02-06-1155-quincy-distro-default-smithi/7172130 Laura Flores
07:58 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Roel van Meer wrote:
> 9711: rbd device read with dd, 4MB blocks and direct io (with map/unmap): checksum changed af...
Ilya Dryomov
07:54 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
I did four new tests today, with four VMs, all started at approx the same time.
9708: Our standard reproduction: c...
Roel van Meer

02/14/2023

11:03 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
> What is the kernel version on the node(s) where RBD devices are mapped?
Linux pve08-dlf 5.15.83-1-pve #1 SMP PVE...
Roel van Meer
10:58 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Ilya Dryomov wrote:
> I doubt that the pool IDs matter but erroneously removed SharedBlob entries (https://tracker...
Roel van Meer
10:47 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Roel van Meer wrote:
> That would be krbd.
What is the kernel version on the node(s) where RBD devices are mapped...
Ilya Dryomov
10:44 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Ilya Dryomov wrote:
> - What driver does the VM use to access the image
That would be krbd.
As for the other...
Roel van Meer
10:38 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Roel van Meer wrote:
> Something that might or might not be relevant to mention is the history of this cluster. IIRC...
Ilya Dryomov
10:23 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Roel van Meer wrote:
> Good morning Ilya,
>
> There's not much going on in this pool with regards to snapshots. T...
Ilya Dryomov
08:58 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Something that might or might not be relevant to mention is the history of this cluster. IIRC it was installed with N... Roel van Meer
07:07 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Good morning Ilya,
There's not much going on in this pool with regards to snapshots. There are a few snapshots tha...
Roel van Meer
09:48 PM Bug #52095: OSD container can't start: _read_bdev_label unable to decode label at offset 102: buf...
Here's a truncated version of the ceph-volume.log. It was 1.7MB compressed so I only included a minute or two of outp... Richard Hesse
04:35 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
Hi Igor, we have our first datapoint regarding 16.2.11: While it appears to return write _byte_ amplification to Naut... Joshua Baergen

02/13/2023

03:31 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
The attached "ceph status" output doesn't show any PGs in snaptrim state. "snaptrim" is short for snapshot trimming.... Ilya Dryomov
03:11 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
Hi Ilya,
this is a plain, freshly installed Debian 11, that does nothing. So, the workload is some logging, that's...
Roel van Meer
03:02 PM Bug #58707 (Need More Info): rbd snapshot corruption, likely due to snaptrim
Hi Roel,
After starting the VM, what sort of workload is running there? What is the rate of change for the backin...
Ilya Dryomov
02:12 PM Bug #58707 (Need More Info): rbd snapshot corruption, likely due to snaptrim
Dear maintainers,
We have one Ceph pool where rbd snapshots are being corrupted. This happens within hours of the ...
Roel van Meer
12:18 PM Bug #58646 (Fix Under Review): Data format for persisting alloc map needs redesign
Igor Fedotov
12:17 PM Backport #58676 (In Progress): pacific: ONode ref counting is broken
https://github.com/ceph/ceph/pull/50072 Igor Fedotov
12:16 PM Bug #57855 (Fix Under Review): cannot enable level_compaction_dynamic_level_bytes
Igor Fedotov
12:10 PM Bug #57855 (New): cannot enable level_compaction_dynamic_level_bytes
Well, indeed level_compaction_dynamic_level_bytes mode can't be enabled with fit_to_fast selector if bluestore is equ... Igor Fedotov
10:36 AM Bug #52095: OSD container can't start: _read_bdev_label unable to decode label at offset 102: buf...
Richard Hesse wrote:
> I'm also seeing the same issue on Pacific 16.2.11 (midway through upgrading from 16.2.10). Th...
Igor Fedotov

02/12/2023

05:15 PM Bug #54019: OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
Is there a nightly container build for Pacific that includes this fix? This issue has crippled my previously working ... Richard Hesse

02/10/2023

11:11 PM Bug #52095: OSD container can't start: _read_bdev_label unable to decode label at offset 102: buf...
I'm also seeing the same issue on Pacific 16.2.11 (midway through upgrading from 16.2.10). The non-LVM containerized ... Richard Hesse
 

Also available in: Atom