Project

General

Profile

Activity

From 01/03/2023 to 02/01/2023

02/01/2023

04:35 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
Hi Igor,
Thanks for continuing to dig into this! Some answers to your questions:
> The first question would be ...
Joshua Baergen
11:21 AM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
And one more note:
The latest Pacific release (16.2.11) might show much better behavior in terms of log compaction a...
Igor Fedotov
11:12 AM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
Joshua,
thanks for the log. Something interesting, indeed.
The first question would be - do you have any custom b...
Igor Fedotov
10:21 AM Feature #58421 (Fix Under Review): OSD metadata should show the min_alloc_size that each OSD was ...
Igor Fedotov
12:46 AM Bug #58022: Fragmentation score rising by seemingly stuck thread
We got some monitoring on a 3rd cluster. we're seeing it there too, though slower then the other cluster.
I was se...
Kevin Fox

01/31/2023

04:16 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
Hi Igor,
> Now I'm wondering whether that high compaction rate persists permanently?
Ah, sorry, I should have m...
Joshua Baergen
03:34 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
Hi Joshua,
good catch.
Now I'm wondering whether that high compaction rate persists permanently?
And if so - cou...
Igor Fedotov

01/30/2023

01:21 PM Bug #54019: OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
Christian Rohmann wrote:
> Thanks for fixing this ... but somehow the link to the ML (https://lists.ceph.io/hyperkit...
Igor Fedotov

01/29/2023

09:42 PM Bug #54019: OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error
Thanks for fixing this ... but somehow the link to the ML (https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/C... Christian Rohmann

01/27/2023

05:56 PM Bug #58022: Fragmentation score rising by seemingly stuck thread
Just looking at lengths, there are lots of pretty small ones?:
[kfox@zathras tmp]$ jq '.extents[].length' osd.3 | s...
Kevin Fox
05:52 PM Bug #58022: Fragmentation score rising by seemingly stuck thread
Please find dumps attached. Kevin Fox
10:21 AM Bug #58022: Fragmentation score rising by seemingly stuck thread
Igor Fedotov wrote:
> One potential explanation can be pretty trivial: allocator keeps tracking a sort of history(hi...
Adam Kupczyk
02:51 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
Hey Igor, based on the discussion at the perf meeting yesterday we've added some exports for bluefs log stats. Here's... Joshua Baergen

01/26/2023

05:43 PM Bug #58022: Fragmentation score rising by seemingly stuck thread
The patch implies that the calculation may be wrong? But why would behavior change on restart?
Thanks,
Kevin
Kevin Fox
04:18 PM Bug #58022: Fragmentation score rising by seemingly stuck thread
https://github.com/ceph/ceph/pull/49885 Neha Ojha
12:09 PM Bug #57507: rocksdb crushed due to checksum mismatch
Deepika Upadhyay wrote:
> Hey Igor, did this bug got resolved in 16.2.11, can you share the tracker which might be a...
Igor Fedotov
07:30 AM Bug #57507: rocksdb crushed due to checksum mismatch
Hey Igor, did this bug got resolved in 16.2.11, can you share the tracker which might be addressing that, thanks! Deepika Upadhyay
11:40 AM Backport #58588 (In Progress): quincy: OSD is unable to allocate free space for BlueFS
https://github.com/ceph/ceph/pull/49884 Igor Fedotov
01:01 AM Backport #58588 (Resolved): quincy: OSD is unable to allocate free space for BlueFS
Backport Bot
01:02 AM Backport #58589 (Rejected): pacific: OSD is unable to allocate free space for BlueFS
Backport Bot
12:35 AM Bug #53466 (Pending Backport): OSD is unable to allocate free space for BlueFS
Neha Ojha

01/25/2023

11:23 PM Bug #53466: OSD is unable to allocate free space for BlueFS
https://github.com/ceph/ceph/pull/48854 merged Yuri Weinstein
06:27 PM Feature #57785: fragmentation score in metrics
Ok. Thanks. Kevin Fox
02:00 PM Feature #57785: fragmentation score in metrics
Hi Kevin,
We will implement the aligned fragmentation score after we merge https://github.com/ceph/ceph/pull/48854.
Yaarit Hatuka
05:59 PM Bug #58022: Fragmentation score rising by seemingly stuck thread
For question 1, here's a couple of screenshots with consumed space and fragmentation added to both. utilization is pr... Kevin Fox
05:24 PM Bug #58022: Fragmentation score rising by seemingly stuck thread
Hi Kevin,
I have two questions:
1) Is rising fragmentation score related to change of free space?
If no other...
Adam Kupczyk

01/24/2023

11:58 PM Feature #57785: fragmentation score in metrics
Any updates on this?
Thanks,
Kevin
Kevin Fox
05:43 PM Bug #58022: Fragmentation score rising by seemingly stuck thread
After restart:
[root@rcs3 ~]# journalctl -u ceph-b15015c8-af07-4973-b35d-28c3bfd2af22@osd.4.service | grep probe
Ja...
Kevin Fox
05:23 PM Bug #58256 (Resolved): ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2: Expected: (logger->ge...
Igor Fedotov
05:21 PM Bug #58256: ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2: Expected: (logger->get(l_bluefs_...
https://github.com/ceph/ceph/pull/49392 merged Yuri Weinstein

01/23/2023

05:00 PM Bug #58022: Fragmentation score rising by seemingly stuck thread
Here's osd4 that was still running away this morning. I just restarted it. Here's the right before metrics. Will get ... Kevin Fox
04:39 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
Hey Igor, it just so happens that we've collected some perf dumps from a cluster that we upgraded this weekend. We ha... Joshua Baergen
02:43 PM Bug #58530: Pacific: Significant write amplification as compared to Nautilus
Joshua, please share perf counter dumps for a couple of OSDs of each MAS.
Igor Fedotov

01/20/2023

10:05 PM Bug #58530 (Triaged): Pacific: Significant write amplification as compared to Nautilus
After upgrading multiple RBD clusters from 14.2.18 to 16.2.9, we've found that OSDs write significantly more to the u... Joshua Baergen
06:40 PM Feature #58113 (Fix Under Review): BLK/Kernel: Improve protection against running one OSD twice
Neha Ojha
05:38 PM Bug #58022: Fragmentation score rising by seemingly stuck thread
Here is a runaway one I restarted 2 days ago.
ELAPSED CMD
2-00:09:13 /usr/bin/ceph-osd -n osd.3 -f --setuser...
Kevin Fox
05:06 PM Bug #58022: Fragmentation score rising by seemingly stuck thread
I can get some more, but here's an initial bit.
osd.4 has been running away for a long time (at least 2 weeks. bas...
Kevin Fox
02:10 PM Bug #58022: Fragmentation score rising by seemingly stuck thread
At the first step I'd like to see allocation stats probes from OSD logs. Here is an example:
2023-01-20T16:28:41.4...
Igor Fedotov
01:59 PM Bug #58022: Fragmentation score rising by seemingly stuck thread
Vikhyat Umrao wrote:
> Igor/Adam - "But the behavior stops immediately on restart. So feels like some thread in the ...
Igor Fedotov

01/19/2023

10:52 PM Bug #58022: Fragmentation score rising by seemingly stuck thread
If I strace a run away osd, it shows up with 59 threads. If I do it to one that is not run away, it shows up with 59 ... Kevin Fox

01/18/2023

05:16 PM Bug #58022: Fragmentation score rising by seemingly stuck thread
Here's a picture. Kevin Fox
05:13 PM Bug #58022: Fragmentation score rising by seemingly stuck thread
We ended up slowly reformatting all of our osts and re-adding them. Things settled out to a fragmentation score of < ... Kevin Fox

01/17/2023

12:40 AM Bug #58463 (Fix Under Review): RocksDBTransactionImpl::rm_range_keys doesn't use bound iterator
Igor Fedotov

01/15/2023

11:24 PM Bug #58463 (Pending Backport): RocksDBTransactionImpl::rm_range_keys doesn't use bound iterator
Hence this might cause slow omap enumeration when rocksdb has got tons of tombstones.
Igor Fedotov

01/13/2023

04:35 PM Feature #58421: OSD metadata should show the min_alloc_size that each OSD was built with
Ideally this will be available both via `ceph osd metadata` and the admin socket so as to dovetail into common metric... Anthony D'Atri
12:14 PM Bug #53002 (Fix Under Review): crash BlueStore::Onode::put from BlueStore::TransContext::~TransCo...
Igor Fedotov
12:14 PM Bug #58439 (Duplicate): octopus osd crash
Igor Fedotov
09:57 AM Bug #58439 (Duplicate): octopus osd crash
Hi,
I was not able to find another bug which looks exactly like this (I found https://tracker.ceph.com/issues/2497...
Anonymous
11:59 AM Bug #58441 (New): ceph-bluestore-tool fsck crash with "FAILED ceph_assert(v.length() == p->shard_...
After OSD crashed with "FAILED ceph_assert(v.length() == p->shard_info->bytes)" (crash report here https://gist.githu... Changyuan Yu
10:46 AM Bug #58440 (Resolved): BlueFS spillover alert is broken
Apparently this has been removed by https://github.com/ceph/ceph/commit/d17cd6604b4031ca997deddc5440248aff451269#diff... Igor Fedotov

01/11/2023

10:53 PM Feature #58421 (Pending Backport): OSD metadata should show the min_alloc_size that each OSD was ...

To be very clear, the value the OSD was built with, *not* the prevailing value in `ceph.conf` or the central db.
...
Anthony D'Atri
06:42 AM Bug #53184: failed to start new osd due to SIGSEGV in BlueStore::read()
Hi Igor
I'm working with Satoru and Yuma, and I was trying to reproduce the problem with Ceph v17.2.5 and Rook v1....
Shinya Hayashi
02:01 AM Bug #58418 (New): unittest mempool always fail on Arm64 CI node
57: /root/ceph/src/test/test_mempool.cc:433: Failure
57: Expected: (missed) < (mempool::num_shards / 2), actual: 28 ...
Kevin Zhao

01/10/2023

11:40 AM Bug #56382: ONode ref counting is broken
Joshua Baergen wrote:
> All of the tickets related to each other for this problem are marked Duplicate. Which should...
Igor Fedotov
11:38 AM Bug #56382 (Fix Under Review): ONode ref counting is broken
Igor Fedotov

01/09/2023

02:40 PM Bug #56382: ONode ref counting is broken
All of the tickets related to each other for this problem are marked Duplicate. Which should be the main tracker for ... Joshua Baergen

01/03/2023

08:01 AM Bug #58274: BlueStore::collection_list becomes extremely slow due to unbounded rocksdb iteration
yixing hao wrote:
> Also observed from our HDD bluestore cluster with tens of billions of objects, the stack is like...
yixing hao
07:51 AM Bug #58274: BlueStore::collection_list becomes extremely slow due to unbounded rocksdb iteration
Also observed from our HDD bluestore cluster with tens of billions of objects, the stack is like the above.
7ffad9...
yixing hao
 

Also available in: Atom