Activity
From 02/10/2023 to 03/11/2023
03/11/2023
- 12:35 PM Feature #58421 (Pending Backport): OSD metadata should show the min_alloc_size that each OSD was ...
03/10/2023
- 04:47 PM Backport #58952 (In Progress): reef: OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/out...
- https://github.com/ceph/ceph/pull/50475
- 04:38 PM Backport #58952 (Resolved): reef: OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output...
- When upgrading to rook 1.8.3 (ceph 16.2.7) we experience issue's with the OSD initialization; basically only +/- 50% ...
- 04:33 PM Backport #58633 (In Progress): quincy: OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/o...
- 04:32 PM Backport #58633: quincy: OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
- https://github.com/ceph/ceph/pull/50474
- 03:40 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
- ah, ok. Chances this ticket is related much higher then. So my recommendation would be to upgrade to Quincy once the...
- 02:48 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
- > And as far as I understand that's a single-shot issue for you so far, right?
yes
> Anyway haven't you observe... - 02:22 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
- Hi Sven,
thanks a lot for all the info. Unfortunately it looks like the actual corruption happened during the first ...
03/09/2023
- 06:33 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Hi Ilya,
I've managed to make a much simpler (and smaller) reproducer, without any VM involvement. If I create an ...
03/07/2023
- 11:25 AM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
- the affected osd started fine after bluestore repair.
anything else you need? - 01:10 AM Backport #58675 (Resolved): quincy: ONode ref counting is broken
03/06/2023
- 11:21 PM Backport #58675: quincy: ONode ref counting is broken
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/50048
merged
03/01/2023
- 10:44 AM Backport #57604 (In Progress): quincy: Log message is little confusing
- https://github.com/ceph/ceph/pull/50323
- 10:41 AM Backport #57603 (In Progress): pacific: Log message is little confusing
- https://github.com/ceph/ceph/pull/50322
- 10:30 AM Bug #53466 (Resolved): OSD is unable to allocate free space for BlueFS
- 10:28 AM Backport #58589 (Rejected): pacific: OSD is unable to allocate free space for BlueFS
- After considering the amount of required efforts (=dependent commits to be backported) we decided to omit this fix in...
- 10:23 AM Backport #58849 (In Progress): pacific: AvlAllocator::init_rm|add_free methods perform assertion ...
- https://github.com/ceph/ceph/pull/50321
- 09:42 AM Backport #58848 (In Progress): quincy: AvlAllocator::init_rm|add_free methods perform assertion c...
- https://github.com/ceph/ceph/pull/50319
- 12:53 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Roel van Meer wrote:
> > - Did you attempt to correlate the time it takes for the corruption to occur (sometimes les...
02/27/2023
- 05:13 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
- Hi Igor,
Yes, that is what I observe in Nautilus - far fewer updates to the file metadata.
It does appear that ... - 04:40 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
- Hi Joshua,
likely your root cause analysis is valid and rocksdb's recycle_log_file_num options works differently at ... - 11:45 AM Bug #56210 (Fix Under Review): crash: int BlueFS::_replay(bool, bool): assert(r == q->second->fil...
- 07:26 AM Bug #51370: StoreTestSpecificAUSize.SyntheticMatrixCsumAlgorithm/2 timed out
- Local env:
[ RUN ] ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixNoCsum/2
also did hang.
1. Subtest:
...
02/26/2023
- 09:40 AM Backport #58849 (In Progress): pacific: AvlAllocator::init_rm|add_free methods perform assertion ...
- 09:39 AM Backport #58848 (Resolved): quincy: AvlAllocator::init_rm|add_free methods perform assertion chec...
- 09:35 AM Bug #54579 (Pending Backport): AvlAllocator::init_rm|add_free methods perform assertion check bef...
02/23/2023
- 10:17 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Coming back to the nosnaptrim result, is it possible to increase debug logging in such a way that we can know what is...
- 10:06 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Ilya Dryomov wrote:
> Let's try to hone in on the VM. You mentioned that the workload is "some logging". Can you... - 09:49 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Ilya Dryomov wrote:
> Can you attach the kernel log ("journalctl -k") from the hypervisor?
Done - 09:52 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- I also did two more tests with different settings on the disk in Proxmox:
9720: Disk configured without discard op... - 09:42 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Ilya Dryomov wrote:
> You also said you were going to run a set of tests with nosnaptrim to confirm to deny that h... - 08:25 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
- Hey Igor,
Using the new metrics from https://github.com/ceph/ceph/pull/50245 in our staging environment, it became...
02/21/2023
- 10:30 AM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
- I did spot these lines in dmesg, but don't know if they are related, as the date does not match the osd crash.
I i... - 10:11 AM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
- I now started the osd again, and at least it did not crash - yet:...
02/20/2023
- 02:36 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Hi Roel,
So far I see only one variable that contributes to the reproducer: whether the VM is running. Given the ... - 11:00 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Hi Ilya,
I hope you had a good weekend!
Are there any other things I can test in order to help determine the root...
02/17/2023
- 04:33 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
- I tried "repair" instead of fsck, here are the results:...
- 12:09 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
- If you need anything else, I'm happy to assist in debugging.
- 12:32 PM Fix #58759 (New): BlueFS log runway space exhausted
- In BlueFS::_flush_and_sync_log_core we have following data integrity check:
ceph_assert(bl.length() <= runway);
I... - 07:22 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- > If so, can you amend the image creation step to disable the object-map image feature and repeat any of reproducers,...
02/16/2023
- 03:31 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
- deep fsck returned the same:...
- 02:04 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
- fsck:...
- 02:00 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
- Here is the ceph-osd log: https://drive.google.com/file/d/1VoNbGab9U6qTBPK0tWfr9AWR1RZLmUng/view?usp=share_link
an... - 01:44 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
- Could you please share a log for a failing osd startup attempt too?
And fsck report too, please.
You might want... - 01:38 PM Bug #58747: /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_msg("uh oh, missi...
- I also have the complete ceph-osd-74.log file, but it is 14MB compressed as tar.gz so I can't upload it directly.
... - 01:28 PM Bug #58747 (Need More Info): /build/ceph-15.2.17/src/os/bluestore/BlueStore.cc: 3945: ceph_abort_...
- Hi,
I have a crashing OSD with a crash message I have never seen before. Also this OSD Unit did not recover from t... - 03:07 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
- Hey Igor,
Let me attach some updated pictures from the production system in question. They all include 14.2.18, 16... - 01:36 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
- Joshua Baergen wrote:
> Hi Igor, we have our first datapoint regarding 16.2.11: While it appears to return write _by... - 02:53 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Ilya Dryomov wrote:
> I'm guessing that the "VM not started: No checksum change after 10h and counting" test conti... - 12:21 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Roel van Meer wrote:
> Also, the VMs are configured with VirtIO, writeback cache, and discard on.
> Would it be use... - 11:46 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- > For completeness, can you repeat this one with snapshot being mapped just once?
9712: rbd device read with dd, 4... - 07:37 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Also, the VMs are configured with VirtIO, writeback cache, and discard on.
Would it be useful to see if any of these... - 07:33 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Ilya Dryomov wrote:
> For completeness, can you repeat this one with snapshot being mapped just once?
Yes, will... - 01:04 PM Bug #52095: OSD container can't start: _read_bdev_label unable to decode label at offset 102: buf...
- Richard Hesse wrote:
> Here's a truncated version of the ceph-volume.log. It was 1.7MB compressed so I only included... - 08:51 AM Backport #58588 (Resolved): quincy: OSD is unable to allocate free space for BlueFS
02/15/2023
- 10:35 PM Bug #56788: crash: void KernelDevice::_aio_thread(): abort
- /a/yuriw-2023-02-13_21:53:12-rados-wip-yuri-testing-2023-02-06-1155-quincy-distro-default-smithi/7172130
- 07:58 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Roel van Meer wrote:
> 9711: rbd device read with dd, 4MB blocks and direct io (with map/unmap): checksum changed af... - 07:54 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- I did four new tests today, with four VMs, all started at approx the same time.
9708: Our standard reproduction: c...
02/14/2023
- 11:03 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- > What is the kernel version on the node(s) where RBD devices are mapped?
Linux pve08-dlf 5.15.83-1-pve #1 SMP PVE... - 10:58 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Ilya Dryomov wrote:
> I doubt that the pool IDs matter but erroneously removed SharedBlob entries (https://tracker... - 10:47 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Roel van Meer wrote:
> That would be krbd.
What is the kernel version on the node(s) where RBD devices are mapped... - 10:44 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Ilya Dryomov wrote:
> - What driver does the VM use to access the image
That would be krbd.
As for the other... - 10:38 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Roel van Meer wrote:
> Something that might or might not be relevant to mention is the history of this cluster. IIRC... - 10:23 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Roel van Meer wrote:
> Good morning Ilya,
>
> There's not much going on in this pool with regards to snapshots. T... - 08:58 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Something that might or might not be relevant to mention is the history of this cluster. IIRC it was installed with N...
- 07:07 AM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Good morning Ilya,
There's not much going on in this pool with regards to snapshots. There are a few snapshots tha... - 09:48 PM Bug #52095: OSD container can't start: _read_bdev_label unable to decode label at offset 102: buf...
- Here's a truncated version of the ceph-volume.log. It was 1.7MB compressed so I only included a minute or two of outp...
- 04:35 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
- Hi Igor, we have our first datapoint regarding 16.2.11: While it appears to return write _byte_ amplification to Naut...
02/13/2023
- 03:31 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- The attached "ceph status" output doesn't show any PGs in snaptrim state. "snaptrim" is short for snapshot trimming....
- 03:11 PM Bug #58707: rbd snapshot corruption, likely due to snaptrim
- Hi Ilya,
this is a plain, freshly installed Debian 11, that does nothing. So, the workload is some logging, that's... - 03:02 PM Bug #58707 (Need More Info): rbd snapshot corruption, likely due to snaptrim
- Hi Roel,
After starting the VM, what sort of workload is running there? What is the rate of change for the backin... - 02:12 PM Bug #58707 (Need More Info): rbd snapshot corruption, likely due to snaptrim
- Dear maintainers,
We have one Ceph pool where rbd snapshots are being corrupted. This happens within hours of the ... - 12:18 PM Bug #58646 (Fix Under Review): Data format for persisting alloc map needs redesign
- 12:17 PM Backport #58676 (In Progress): pacific: ONode ref counting is broken
- https://github.com/ceph/ceph/pull/50072
- 12:16 PM Bug #57855 (Fix Under Review): cannot enable level_compaction_dynamic_level_bytes
- 12:10 PM Bug #57855 (New): cannot enable level_compaction_dynamic_level_bytes
- Well, indeed level_compaction_dynamic_level_bytes mode can't be enabled with fit_to_fast selector if bluestore is equ...
- 10:36 AM Bug #52095: OSD container can't start: _read_bdev_label unable to decode label at offset 102: buf...
- Richard Hesse wrote:
> I'm also seeing the same issue on Pacific 16.2.11 (midway through upgrading from 16.2.10). Th...
02/12/2023
- 05:15 PM Bug #54019: OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
- Is there a nightly container build for Pacific that includes this fix? This issue has crippled my previously working ...
02/10/2023
- 11:11 PM Bug #52095: OSD container can't start: _read_bdev_label unable to decode label at offset 102: buf...
- I'm also seeing the same issue on Pacific 16.2.11 (midway through upgrading from 16.2.10). The non-LVM containerized ...
Also available in: Atom