Project

General

Profile

Activity

From 10/11/2022 to 11/09/2022

11/09/2022

10:56 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Managed to reproduce this on the Gibba cluster and produce a coredump!
The core file is located on gibba001 under ...
Laura Flores
08:18 PM Backport #57704 (Resolved): quincy: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reduc...
https://github.com/ceph/ceph/pull/48321 Kamoltat (Junior) Sirivadhna
08:17 PM Backport #57705 (Resolved): pacific: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when redu...
https://github.com/ceph/ceph/pull/48320 Kamoltat (Junior) Sirivadhna
08:17 PM Bug #50089 (Resolved): mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of...
Kamoltat (Junior) Sirivadhna
04:34 PM Bug #51729: Upmap verification fails for multi-level crush rule
Thanks again for looking at this.
I haven't looked further, but I suspect the issue will come down to the variable...
Chris Durham

11/08/2022

09:23 PM Bug #57017: mon-stretched_cluster: degraded stretched mode lead to Monitor crash
pacific backport: https://github.com/ceph/ceph/pull/48803 Kamoltat (Junior) Sirivadhna
08:59 PM Bug #57017: mon-stretched_cluster: degraded stretched mode lead to Monitor crash
quincy backport: https://github.com/ceph/ceph/pull/48802 Kamoltat (Junior) Sirivadhna
07:23 PM Bug #51729: Upmap verification fails for multi-level crush rule
I believe I've reproduced the issue using the osdmaps that Chris provided.
First, I used the osdmaptool to run the...
Laura Flores
02:08 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
after rechecking the logs it looks like we are taking 2 different versions of smithi01231941-9:head
All chunks with ...
Nitzan Mordechai
05:44 AM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
@Laura, thanks for confirm that in the coredump, yes, shard0 also showing that when it get the chunk from bluestore:
...
Nitzan Mordechai
12:07 AM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
Brad and I did some more debugging today.
Here is the end of the log associated with the coredump:...
Laura Flores

11/07/2022

09:27 PM Bug #57977: osd:tick checking mon for new map
Radoslaw Zarzynski wrote:
> Octopus is EOL. Does it happen on a supported release?
>
> Regardless of that, could ...
yite gu
06:13 PM Bug #57977 (Need More Info): osd:tick checking mon for new map
Octopus is EOL. Does it happen on a supported release?
Regardless of that, could you please provide logs from this...
Radoslaw Zarzynski
07:30 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
Also to note, we can see information about argument `to_read` here:... Laura Flores
07:27 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
@Nitzan, what do you think about this analysis? Or are there any other frames/locals you'd like me to check? Laura Flores
07:12 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
Looking at frame 12, I can see that the incorrect length (262144) for shard 0 is evident in the local variable "from"... Laura Flores
06:02 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
Got it to detect the right symbols with the new build!
I will attempt to analyze this coredump at a deeper level, ...
Laura Flores
03:16 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
According to Brad, the build needs to be as close to the test branch that originally experienced the crash as possibl... Laura Flores
07:18 PM Bug #51729: Upmap verification fails for multi-level crush rule
Thanks Chris! @Radek I have been taking some time to analyze this scenario, and will post updates soon. Laura Flores
06:36 PM Bug #51729: Upmap verification fails for multi-level crush rule
Thanks for the info! Laura, would you mind retaking a look? Radoslaw Zarzynski
06:36 PM Bug #51729 (New): Upmap verification fails for multi-level crush rule
Radoslaw Zarzynski
06:43 PM Bug #50219 (Closed): qa/standalone/erasure-code/test-erasure-eio.sh fails since pg is not in reco...
The original issue was caused by a commit in a wip branch being tested, so it's highly unprobable it's a reoccurence.... Radoslaw Zarzynski
06:42 PM Bug #57989 (New): test-erasure-eio.sh fails since pg is not in unfound
/a/lflores-2022-10-17_18:19:55-rados:standalone-main-distro-default-smithi/7071287... Radoslaw Zarzynski
06:35 PM Bug #57845: MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_OCTOPUS...
Likely it's even a duplicate of https://tracker.ceph.com/issues/52657. Radoslaw Zarzynski
06:28 PM Bug #52136 (Fix Under Review): Valgrind reports memory "Leak_DefinitelyLost" errors.
Neha Ojha
06:26 PM Bug #57940 (Duplicate): ceph osd crashes with FAILED ceph_assert(clone_overlap.count(clone)) when...
Looks like a duplicate of 56772. Radoslaw Zarzynski
06:24 PM Bug #55141: thrashers/fastread: assertion failure: rollback_info_trimmed_to == head
Nitzan Mordechai wrote:
> Radoslaw Zarzynski wrote:
> > Well, just found a new occurance.
> Where can i find it?
...
Radoslaw Zarzynski
06:12 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Brad and I ran a reproducer on the gibba cluster (restarting OSDs with `for osd in $(systemctl -l |grep osd|gawk '{pr... Laura Flores
06:01 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Is there any news on that? Radoslaw Zarzynski
05:59 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
Updated the PR link. Radoslaw Zarzynski
01:08 AM Bug #57937: pg autoscaler of rgw pools doesn't work after creating otp pool
Is there any updates? Please let me know if I can do something. Satoru Takeuchi

11/06/2022

05:47 AM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
@brad, maybe it's a good candidate for another blog for upstream core dump analysis that you talked about (ubuntu 20.04) Nitzan Mordechai

11/04/2022

07:21 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
@Brad do you have any tips on how to load the correct debug symbols for the above coredump? After running the `ceph-d... Laura Flores
05:48 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
No luck yet, but I'm trying to set up the right debug environment. So far, gdb is only giving me question marks, but ... Laura Flores
06:10 AM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
Laura, are you able to use GDB with debuginfo on that coredump file? Nitzan Mordechai
04:18 PM Bug #57977 (Pending Backport): osd:tick checking mon for new map
ceph version: 15.2.7
my cluster have a osd down, and it unable join the osdmap....
yite gu
09:17 AM Feature #48392: ceph ignores --keyring?
This issue is still present in Pacific. Is there any way to work around it except for moving the keys to /etc/ceph?
...
Janek Bevendorff

11/03/2022

08:56 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
By the way, I have the coredump saved on the teuthology node under /home/lflores/tracker_57757. Laura Flores
03:31 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
The output Nitzan pasted is from printing ECBackend::read_result_t:
src/osd/ECBackend.cc...
Laura Flores
03:23 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
Perhaps there is somewhere that the length should be getting updated, but it not? Laura Flores
02:27 PM Bug #57969 (New): monitor: ceph -s shows all monitors out of quorum for < 1s
Ceph -s UI shows all monitors out of quorum for a very short time < 1s.
Issue is like to have no real effect on the ...
Kamoltat (Junior) Sirivadhna
02:42 AM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Hi Brad, thanks for all the pointers on the tracker!
I went through the code with Josh and Radek after looking at yo...
Aishwarya Mathuria

11/02/2022

03:53 PM Fix #57963 (Fix Under Review): osd: Misleading information displayed for the running configuratio...
With the fix, the following is shown for an OSD with ssd as the underlying device type:... Sridhar Seshasayee
03:26 PM Fix #57963: osd: Misleading information displayed for the running configuration of osd_mclock_max...
See BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2111282 for additional information. Sridhar Seshasayee
03:25 PM Fix #57963 (Resolved): osd: Misleading information displayed for the running configuration of osd...
For the inactive device type(hdd/ssd) of an OSD, the running configuration option osd_mclock_max_capacity_iops_[hdd|s... Sridhar Seshasayee
06:40 AM Bug #57533 (Fix Under Review): Able to modify the mclock reservation, weight and limit parameters...
Sridhar Seshasayee

10/31/2022

01:58 PM Bug #53729 (Resolved): ceph-osd takes all memory before oom on boot
Konstantin Shalygin
01:58 PM Backport #55633 (Rejected): octopus: ceph-osd takes all memory before oom on boot
Octopus is EOL Konstantin Shalygin
12:41 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
... Nitzan Mordechai
03:57 AM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Still trying to run a test with added debugging due to the ongoing infra issues but I noticed that Coverity CID 15096... Brad Hubbard

10/29/2022

10:06 PM Documentation #46126: RGW docs lack an explanation of how permissions management works, especiall...
You thought that copying this rude exchange verbatim was essential to motivate improving the docs?
Matt
Zac Dover

10/27/2022

06:07 PM Bug #57940 (Duplicate): ceph osd crashes with FAILED ceph_assert(clone_overlap.count(clone)) when...
Hi, I have this current crash:
I've experienced a disk failure in my ceph cluster.
I've replaced the disk, but no...
Thomas Le Gentil
04:50 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
@Laura, thanks for that! i'll try first with main as you suggested Nitzan Mordechai
03:32 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
@Nitzan, here is the branch if you'd like to rebuild it on ci: https://github.com/ljflores/ceph/commits/wip-lflores-t... Laura Flores
10:36 AM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
The coredump from branch wip-lflores-testing, I was not able to create docker image since this branch is no longer av... Nitzan Mordechai
12:17 PM Bug #55141: thrashers/fastread: assertion failure: rollback_info_trimmed_to == head
Radoslaw Zarzynski wrote:
> Well, just found a new occurance.
Where can i find it?
Nitzan Mordechai
12:13 PM Bug #50042 (In Progress): rados/test.sh: api_watch_notify failures
Nitzan Mordechai
12:12 PM Bug #52136 (In Progress): Valgrind reports memory "Leak_DefinitelyLost" errors.
Nitzan Mordechai
11:47 AM Bug #57751 (In Progress): LibRadosAio.SimpleWritePP hang and pkill
Nitzan Mordechai
10:55 AM Bug #57751: LibRadosAio.SimpleWritePP hang and pkill
This is not an issue with the test, not all the osd are up, and we are waiting (valgrind report memory leak from rock... Nitzan Mordechai
04:26 AM Bug #57937 (Rejected): pg autoscaler of rgw pools doesn't work after creating otp pool
It's about the following my post to ceph-users ML.
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/threa...
Satoru Takeuchi

10/26/2022

11:25 PM Bug #57017 (Pending Backport): mon-stretched_cluster: degraded stretched mode lead to Monitor crash
Neha Ojha
09:18 PM Bug #52129: LibRadosWatchNotify.AioWatchDelete failed
/a/yuriw-2022-10-19_18:35:19-rados-wip-yuri10-testing-2022-10-19-0810-distro-default-smithi/7074802 Laura Flores
02:52 PM Bug #57883 (Resolved): test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get:...
Laura Flores
01:45 PM Bug #50042: rados/test.sh: api_watch_notify failures
... Nitzan Mordechai
04:58 AM Bug #50042: rados/test.sh: api_watch_notify failures
I checked all the list_watchers failures (checking size of watch list), It looks like the watcher timed out and that ... Nitzan Mordechai
06:09 AM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
I was able to gather a coredump and set up a binary compatible environment to debug it from this run Laura started in... Brad Hubbard
04:58 AM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
I wrote up an working explanation of PastIntervals in https://github.com/athanatos/ceph/tree/sjust/wip-49689-past-int... Samuel Just
12:07 AM Bug #57845 (New): MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_O...
Notes from rados team meeting:
Seems like the same class of bugs we hit in https://tracker.ceph.com/issues/52657 a...
Neha Ojha

10/25/2022

11:14 PM Bug #51729: Upmap verification fails for multi-level crush rule
I put together the following contrived example to
illustrate the problem. Again, this is pacific 16.2.9 on rocky8 li...
Chris Durham
05:19 PM Bug #50219 (New): qa/standalone/erasure-code/test-erasure-eio.sh fails since pg is not in recover...
The failure actually reproduced here:
/a/lflores-2022-10-17_18:19:55-rados:standalone-main-distro-default-smithi/7...
Laura Flores
05:06 PM Bug #57883 (Fix Under Review): test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_...
Laura Flores
02:21 PM Bug #57883 (In Progress): test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_g...
Laura Flores
02:19 PM Bug #57900 (In Progress): mon/crush_ops.sh: mons out of quorum
Laura Flores
02:17 PM Bug #57900: mon/crush_ops.sh: mons out of quorum
@Radek so the suggestion is to give the mons more time to reboot?
This is the workunit:
https://github.com/ceph/c...
Laura Flores

10/24/2022

06:18 PM Bug #57852: osd: unhealthy osd cannot be marked down in time
Not a something we introduced recently but still worth taking a look if nothing urgent is not the plate. Radoslaw Zarzynski
06:17 PM Bug #57852 (New): osd: unhealthy osd cannot be marked down in time
For the detailed explanation! Radoslaw Zarzynski
06:10 PM Bug #57845: MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_OCTOPUS...
Just before the crash time-outs were seen:... Radoslaw Zarzynski
06:05 PM Bug #57915: LibRadosWatchNotify.AioNotify - error callback ceph_assert(ref > 0)
Yes, this is one of the Notify bugs that i hit during my tests Nitzan Mordechai
05:14 PM Bug #57915: LibRadosWatchNotify.AioNotify - error callback ceph_assert(ref > 0)
Nitzan, I recall you mentioned about some watch-related tests on today's stand-up. Is this one of them? Radoslaw Zarzynski
05:57 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
As this is about EC: can be acting's items duplicated? Radoslaw Zarzynski
05:55 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
If https://github.com/ceph/ceph/pull/47901/commits/0d07b406dc2f854363f7ae9b970e980400f4f03e is the actual culprit, th... Radoslaw Zarzynski
05:42 PM Bug #57883: test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get: grep '\<5...
It looks we asked for taking osd.5 down, got a confirmation the command was handled by mon and then @get_osd@ said %5... Radoslaw Zarzynski
05:25 PM Bug #57900: mon/crush_ops.sh: mons out of quorum
Just **suggestion** from the bug scrub: this is a mon thrashing test. None of mon loga seems to have a trace of crash... Radoslaw Zarzynski
05:18 PM Bug #55141: thrashers/fastread: assertion failure: rollback_info_trimmed_to == head
Well, just found a new occurance. Radoslaw Zarzynski
05:11 PM Bug #55141: thrashers/fastread: assertion failure: rollback_info_trimmed_to == head
Lowering the priority as we haven't seen a reoccurence last time. Radoslaw Zarzynski
05:17 PM Bug #57913 (Duplicate): Thrashosd: timeout 120 ceph --cluster ceph osd pool rm unique_pool_2 uniq...
In the teuthology log:... Radoslaw Zarzynski
05:10 PM Bug #57529 (Fix Under Review): mclock backfill is getting higher priority than WPQ
Radoslaw Zarzynski
04:06 AM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Laura Flores wrote:
> Notes from the rados suite review:
>
> We may need to check if we're shutting down while se...
Brad Hubbard

10/23/2022

11:45 AM Bug #57915 (New): LibRadosWatchNotify.AioNotify - error callback ceph_assert(ref > 0)
/a//nmordech-2022-10-23_05:26:13-rados:verify-wip-nm-51282-distro-default-smithi/7077932... Nitzan Mordechai
05:19 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
Sridher, yes, those trackers look the same, valgrind make the osd start slower, maybe that's the reason we are seeing... Nitzan Mordechai

10/21/2022

04:19 PM Bug #55809: "Leak_IndirectlyLost" valgrind report on mon.c
/a/yuriw-2022-10-12_16:24:50-rados-wip-yuri8-testing-2022-10-12-0718-quincy-distro-default-smithi/7063948/ Kamoltat (Junior) Sirivadhna
04:16 PM Bug #57913 (Duplicate): Thrashosd: timeout 120 ceph --cluster ceph osd pool rm unique_pool_2 uniq...
/a/yuriw-2022-10-12_16:24:50-rados-wip-yuri8-testing-2022-10-12-0718-quincy-distro-default-smithi/7063868/
rados/t...
Kamoltat (Junior) Sirivadhna
08:41 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
@Nitzan Mordechai this is probably similar to,
https://tracker.ceph.com/issues/52948 and https://tracker.ceph.com/is...
Sridhar Seshasayee
07:47 AM Fix #57040 (Resolved): osd: Update osd's IOPS capacity using async Context completion instead of ...
Sridhar Seshasayee
07:46 AM Backport #57443 (Resolved): quincy: osd: Update osd's IOPS capacity using async Context completio...
Sridhar Seshasayee

10/20/2022

11:33 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Notes from the rados suite review:
We may need to check if we're shutting down while sending pg stats; if so, we d...
Laura Flores
03:07 PM Bug #57152 (Resolved): segfault in librados via libcephsqlite
Matan Breizman
03:06 PM Backport #57373 (Resolved): pacific: segfault in librados via libcephsqlite
Matan Breizman
02:56 PM Backport #57373: pacific: segfault in librados via libcephsqlite
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/48187
merged
Yuri Weinstein

10/19/2022

09:21 PM Backport #52747 (In Progress): pacific: MON_DOWN during mon_join process
Laura Flores
09:09 PM Backport #52746 (Rejected): octopus: MON_DOWN during mon_join process
Octopus is EOL. Laura Flores
08:59 PM Bug #43584: MON_DOWN during mon_join process
/a/yuriw-2022-10-05_20:44:57-rados-wip-yuri4-testing-2022-10-05-0917-pacific-distro-default-smithi/7055594 Laura Flores
08:46 PM Bug #57900 (In Progress): mon/crush_ops.sh: mons out of quorum
/a/teuthology-2022-10-09_07:01:03-rados-quincy-distro-default-smithi/7059463... Laura Flores
03:20 PM Bug #57698 (Pending Backport): osd/scrub: "scrub a chunk" requests are sent to the wrong set of r...
Ronen Friedman
10:29 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
The issue is that we having deadlock on specific condition. When we are trying to update the mClockScheduler config c... Nitzan Mordechai
05:31 AM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
I was able to reproduce this using the test Laura mentioned above - http://pulpito.front.sepia.ceph.com/amathuri-2022... Aishwarya Mathuria

10/18/2022

04:31 PM Bug #51729: Upmap verification fails for multi-level crush rule
Chris, can you please provide your osdmap binary? Neha Ojha
09:03 AM Bug #57845: MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_OCTOPUS...
Hi Neha,
the logs from the crash instance that I reported initially are already rotated out on the particular node...
Andreas Teuchert
02:48 AM Bug #57852: osd: unhealthy osd cannot be marked down in time
Radoslaw Zarzynski wrote:
> Could you please clarify a bit? Do you mean there some extra, unnecessary (from the POV ...
wencong wan

10/17/2022

06:27 PM Bug #57796: after rebalance of pool via pgupmap balancer, continuous issues in monitor log
Link to the discussion on ceph-users: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AZHAIGY3BIM4SGB... Radoslaw Zarzynski
06:20 PM Bug #57883: test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get: grep '\<5...
Let's first see if it's easily reproducible:
http://pulpito.front.sepia.ceph.com/lflores-2022-10-17_18:19:55-rados:s...
Laura Flores
06:03 PM Bug #57883: test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get: grep '\<5...
The failed function:
qa/standalone/erasure-code/test-erasure-code.sh...
Laura Flores
05:52 PM Bug #57883 (Resolved): test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get:...
/a/yuriw-2022-10-13_17:24:48-rados-main-distro-default-smithi/7065580... Laura Flores
06:16 PM Bug #57845 (Need More Info): MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(feature...
These reports in telemetry look similar: http://telemetry.front.sepia.ceph.com:4000/d/Nvj6XTaMk/spec-search?orgId=1&v... Neha Ojha
06:08 PM Bug #57852 (Need More Info): osd: unhealthy osd cannot be marked down in time
Could you please clarify a bit? Do you mean there some extra, unnecessary (from the POV of jugging whether an OSD is ... Radoslaw Zarzynski
05:48 PM Bug #57782: [mon] high cpu usage by fn_monstore thread
NOT A FIX (extra debugs): https://github.com/ceph/ceph/pull/48513 Radoslaw Zarzynski
05:45 PM Bug #57698 (Fix Under Review): osd/scrub: "scrub a chunk" requests are sent to the wrong set of r...
Neha Ojha
05:43 PM Bug #51729: Upmap verification fails for multi-level crush rule
A note from bug scrub: this is going to be assigned tomorrow. Radoslaw Zarzynski

10/14/2022

09:13 PM Bug #51729: Upmap verification fails for multi-level crush rule
Andras,
Thanks for the extra info. This needs to be addressed. Anyone?
Chris Durham
08:48 PM Bug #51729: Upmap verification fails for multi-level crush rule
Just to clarify - the error "verify_upmap number of buckets X exceeds desired Y" comes from the C++ code in ceph-mon ... Andras Pataki
06:47 PM Bug #51729: Upmap verification fails for multi-level crush rule
I am now seeing this issue on pacific, 16.2.10 on rocky8 linux.
If I have a >2 level rule on an ec pool (6+2), suc...
Chris Durham
04:15 PM Bug #57698: osd/scrub: "scrub a chunk" requests are sent to the wrong set of replicas
Following some discussions: here are excerpts from a run demonstrating this issue.
Test run rfriedma-2022-09-28_15:5...
Ronen Friedman

10/13/2022

07:39 AM Bug #57859 (Fix Under Review): bail from handle_command() if _generate_command_map() fails
Ilya Dryomov
03:51 AM Bug #57859 (Resolved): bail from handle_command() if _generate_command_map() fails
https://tracker.ceph.com/issues/54558 catches an exception from handle_command() to avoid mon termination due to a po... nikhil kshirsagar
04:03 AM Bug #54558: malformed json in a Ceph RESTful API call can stop all ceph-mon services
nikhil kshirsagar wrote:
> Ilya Dryomov wrote:
> > I don't think https://github.com/ceph/ceph/pull/45547 is a compl...
nikhil kshirsagar

10/12/2022

05:08 PM Bug #57782: [mon] high cpu usage by fn_monstore thread
Hey Radek,
makes sense, I created a debug branch https://github.com/ceph/ceph-ci/pull/new/wip-crush-debug and migh...
Deepika Upadhyay
02:39 AM Bug #57852 (Need More Info): osd: unhealthy osd cannot be marked down in time
Before an unhealthy osd is marked down by mon, other osd may choose it as
heartbeat peer and then report an incorrec...
wencong wan

10/11/2022

10:13 AM Bug #57845 (New): MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_O...
... Andreas Teuchert
 

Also available in: Atom