Project

General

Profile

Activity

From 09/28/2022 to 10/27/2022

10/27/2022

06:07 PM Bug #57940 (Duplicate): ceph osd crashes with FAILED ceph_assert(clone_overlap.count(clone)) when...
Hi, I have this current crash:
I've experienced a disk failure in my ceph cluster.
I've replaced the disk, but no...
Thomas Le Gentil
04:50 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
@Laura, thanks for that! i'll try first with main as you suggested Nitzan Mordechai
03:32 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
@Nitzan, here is the branch if you'd like to rebuild it on ci: https://github.com/ljflores/ceph/commits/wip-lflores-t... Laura Flores
10:36 AM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
The coredump from branch wip-lflores-testing, I was not able to create docker image since this branch is no longer av... Nitzan Mordechai
12:17 PM Bug #55141: thrashers/fastread: assertion failure: rollback_info_trimmed_to == head
Radoslaw Zarzynski wrote:
> Well, just found a new occurance.
Where can i find it?
Nitzan Mordechai
12:13 PM Bug #50042 (In Progress): rados/test.sh: api_watch_notify failures
Nitzan Mordechai
12:12 PM Bug #52136 (In Progress): Valgrind reports memory "Leak_DefinitelyLost" errors.
Nitzan Mordechai
11:47 AM Bug #57751 (In Progress): LibRadosAio.SimpleWritePP hang and pkill
Nitzan Mordechai
10:55 AM Bug #57751: LibRadosAio.SimpleWritePP hang and pkill
This is not an issue with the test, not all the osd are up, and we are waiting (valgrind report memory leak from rock... Nitzan Mordechai
04:26 AM Bug #57937 (Rejected): pg autoscaler of rgw pools doesn't work after creating otp pool
It's about the following my post to ceph-users ML.
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/threa...
Satoru Takeuchi

10/26/2022

11:25 PM Bug #57017 (Pending Backport): mon-stretched_cluster: degraded stretched mode lead to Monitor crash
Neha Ojha
09:18 PM Bug #52129: LibRadosWatchNotify.AioWatchDelete failed
/a/yuriw-2022-10-19_18:35:19-rados-wip-yuri10-testing-2022-10-19-0810-distro-default-smithi/7074802 Laura Flores
02:52 PM Bug #57883 (Resolved): test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get:...
Laura Flores
01:45 PM Bug #50042: rados/test.sh: api_watch_notify failures
... Nitzan Mordechai
04:58 AM Bug #50042: rados/test.sh: api_watch_notify failures
I checked all the list_watchers failures (checking size of watch list), It looks like the watcher timed out and that ... Nitzan Mordechai
06:09 AM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
I was able to gather a coredump and set up a binary compatible environment to debug it from this run Laura started in... Brad Hubbard
04:58 AM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
I wrote up an working explanation of PastIntervals in https://github.com/athanatos/ceph/tree/sjust/wip-49689-past-int... Samuel Just
12:07 AM Bug #57845 (New): MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_O...
Notes from rados team meeting:
Seems like the same class of bugs we hit in https://tracker.ceph.com/issues/52657 a...
Neha Ojha

10/25/2022

11:14 PM Bug #51729: Upmap verification fails for multi-level crush rule
I put together the following contrived example to
illustrate the problem. Again, this is pacific 16.2.9 on rocky8 li...
Chris Durham
05:19 PM Bug #50219 (New): qa/standalone/erasure-code/test-erasure-eio.sh fails since pg is not in recover...
The failure actually reproduced here:
/a/lflores-2022-10-17_18:19:55-rados:standalone-main-distro-default-smithi/7...
Laura Flores
05:06 PM Bug #57883 (Fix Under Review): test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_...
Laura Flores
02:21 PM Bug #57883 (In Progress): test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_g...
Laura Flores
02:19 PM Bug #57900 (In Progress): mon/crush_ops.sh: mons out of quorum
Laura Flores
02:17 PM Bug #57900: mon/crush_ops.sh: mons out of quorum
@Radek so the suggestion is to give the mons more time to reboot?
This is the workunit:
https://github.com/ceph/c...
Laura Flores

10/24/2022

06:18 PM Bug #57852: osd: unhealthy osd cannot be marked down in time
Not a something we introduced recently but still worth taking a look if nothing urgent is not the plate. Radoslaw Zarzynski
06:17 PM Bug #57852 (New): osd: unhealthy osd cannot be marked down in time
For the detailed explanation! Radoslaw Zarzynski
06:10 PM Bug #57845: MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_OCTOPUS...
Just before the crash time-outs were seen:... Radoslaw Zarzynski
06:05 PM Bug #57915: LibRadosWatchNotify.AioNotify - error callback ceph_assert(ref > 0)
Yes, this is one of the Notify bugs that i hit during my tests Nitzan Mordechai
05:14 PM Bug #57915: LibRadosWatchNotify.AioNotify - error callback ceph_assert(ref > 0)
Nitzan, I recall you mentioned about some watch-related tests on today's stand-up. Is this one of them? Radoslaw Zarzynski
05:57 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
As this is about EC: can be acting's items duplicated? Radoslaw Zarzynski
05:55 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
If https://github.com/ceph/ceph/pull/47901/commits/0d07b406dc2f854363f7ae9b970e980400f4f03e is the actual culprit, th... Radoslaw Zarzynski
05:42 PM Bug #57883: test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get: grep '\<5...
It looks we asked for taking osd.5 down, got a confirmation the command was handled by mon and then @get_osd@ said %5... Radoslaw Zarzynski
05:25 PM Bug #57900: mon/crush_ops.sh: mons out of quorum
Just **suggestion** from the bug scrub: this is a mon thrashing test. None of mon loga seems to have a trace of crash... Radoslaw Zarzynski
05:18 PM Bug #55141: thrashers/fastread: assertion failure: rollback_info_trimmed_to == head
Well, just found a new occurance. Radoslaw Zarzynski
05:11 PM Bug #55141: thrashers/fastread: assertion failure: rollback_info_trimmed_to == head
Lowering the priority as we haven't seen a reoccurence last time. Radoslaw Zarzynski
05:17 PM Bug #57913 (Duplicate): Thrashosd: timeout 120 ceph --cluster ceph osd pool rm unique_pool_2 uniq...
In the teuthology log:... Radoslaw Zarzynski
05:10 PM Bug #57529 (Fix Under Review): mclock backfill is getting higher priority than WPQ
Radoslaw Zarzynski
04:06 AM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Laura Flores wrote:
> Notes from the rados suite review:
>
> We may need to check if we're shutting down while se...
Brad Hubbard

10/23/2022

11:45 AM Bug #57915 (New): LibRadosWatchNotify.AioNotify - error callback ceph_assert(ref > 0)
/a//nmordech-2022-10-23_05:26:13-rados:verify-wip-nm-51282-distro-default-smithi/7077932... Nitzan Mordechai
05:19 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
Sridher, yes, those trackers look the same, valgrind make the osd start slower, maybe that's the reason we are seeing... Nitzan Mordechai

10/21/2022

04:19 PM Bug #55809: "Leak_IndirectlyLost" valgrind report on mon.c
/a/yuriw-2022-10-12_16:24:50-rados-wip-yuri8-testing-2022-10-12-0718-quincy-distro-default-smithi/7063948/ Kamoltat (Junior) Sirivadhna
04:16 PM Bug #57913 (Duplicate): Thrashosd: timeout 120 ceph --cluster ceph osd pool rm unique_pool_2 uniq...
/a/yuriw-2022-10-12_16:24:50-rados-wip-yuri8-testing-2022-10-12-0718-quincy-distro-default-smithi/7063868/
rados/t...
Kamoltat (Junior) Sirivadhna
08:41 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
@Nitzan Mordechai this is probably similar to,
https://tracker.ceph.com/issues/52948 and https://tracker.ceph.com/is...
Sridhar Seshasayee
07:47 AM Fix #57040 (Resolved): osd: Update osd's IOPS capacity using async Context completion instead of ...
Sridhar Seshasayee
07:46 AM Backport #57443 (Resolved): quincy: osd: Update osd's IOPS capacity using async Context completio...
Sridhar Seshasayee

10/20/2022

11:33 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Notes from the rados suite review:
We may need to check if we're shutting down while sending pg stats; if so, we d...
Laura Flores
03:07 PM Bug #57152 (Resolved): segfault in librados via libcephsqlite
Matan Breizman
03:06 PM Backport #57373 (Resolved): pacific: segfault in librados via libcephsqlite
Matan Breizman
02:56 PM Backport #57373: pacific: segfault in librados via libcephsqlite
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/48187
merged
Yuri Weinstein

10/19/2022

09:21 PM Backport #52747 (In Progress): pacific: MON_DOWN during mon_join process
Laura Flores
09:09 PM Backport #52746 (Rejected): octopus: MON_DOWN during mon_join process
Octopus is EOL. Laura Flores
08:59 PM Bug #43584: MON_DOWN during mon_join process
/a/yuriw-2022-10-05_20:44:57-rados-wip-yuri4-testing-2022-10-05-0917-pacific-distro-default-smithi/7055594 Laura Flores
08:46 PM Bug #57900 (In Progress): mon/crush_ops.sh: mons out of quorum
/a/teuthology-2022-10-09_07:01:03-rados-quincy-distro-default-smithi/7059463... Laura Flores
03:20 PM Bug #57698 (Pending Backport): osd/scrub: "scrub a chunk" requests are sent to the wrong set of r...
Ronen Friedman
10:29 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
The issue is that we having deadlock on specific condition. When we are trying to update the mClockScheduler config c... Nitzan Mordechai
05:31 AM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
I was able to reproduce this using the test Laura mentioned above - http://pulpito.front.sepia.ceph.com/amathuri-2022... Aishwarya Mathuria

10/18/2022

04:31 PM Bug #51729: Upmap verification fails for multi-level crush rule
Chris, can you please provide your osdmap binary? Neha Ojha
09:03 AM Bug #57845: MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_OCTOPUS...
Hi Neha,
the logs from the crash instance that I reported initially are already rotated out on the particular node...
Andreas Teuchert
02:48 AM Bug #57852: osd: unhealthy osd cannot be marked down in time
Radoslaw Zarzynski wrote:
> Could you please clarify a bit? Do you mean there some extra, unnecessary (from the POV ...
wencong wan

10/17/2022

06:27 PM Bug #57796: after rebalance of pool via pgupmap balancer, continuous issues in monitor log
Link to the discussion on ceph-users: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AZHAIGY3BIM4SGB... Radoslaw Zarzynski
06:20 PM Bug #57883: test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get: grep '\<5...
Let's first see if it's easily reproducible:
http://pulpito.front.sepia.ceph.com/lflores-2022-10-17_18:19:55-rados:s...
Laura Flores
06:03 PM Bug #57883: test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get: grep '\<5...
The failed function:
qa/standalone/erasure-code/test-erasure-code.sh...
Laura Flores
05:52 PM Bug #57883 (Resolved): test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get:...
/a/yuriw-2022-10-13_17:24:48-rados-main-distro-default-smithi/7065580... Laura Flores
06:16 PM Bug #57845 (Need More Info): MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(feature...
These reports in telemetry look similar: http://telemetry.front.sepia.ceph.com:4000/d/Nvj6XTaMk/spec-search?orgId=1&v... Neha Ojha
06:08 PM Bug #57852 (Need More Info): osd: unhealthy osd cannot be marked down in time
Could you please clarify a bit? Do you mean there some extra, unnecessary (from the POV of jugging whether an OSD is ... Radoslaw Zarzynski
05:48 PM Bug #57782: [mon] high cpu usage by fn_monstore thread
NOT A FIX (extra debugs): https://github.com/ceph/ceph/pull/48513 Radoslaw Zarzynski
05:45 PM Bug #57698 (Fix Under Review): osd/scrub: "scrub a chunk" requests are sent to the wrong set of r...
Neha Ojha
05:43 PM Bug #51729: Upmap verification fails for multi-level crush rule
A note from bug scrub: this is going to be assigned tomorrow. Radoslaw Zarzynski

10/14/2022

09:13 PM Bug #51729: Upmap verification fails for multi-level crush rule
Andras,
Thanks for the extra info. This needs to be addressed. Anyone?
Chris Durham
08:48 PM Bug #51729: Upmap verification fails for multi-level crush rule
Just to clarify - the error "verify_upmap number of buckets X exceeds desired Y" comes from the C++ code in ceph-mon ... Andras Pataki
06:47 PM Bug #51729: Upmap verification fails for multi-level crush rule
I am now seeing this issue on pacific, 16.2.10 on rocky8 linux.
If I have a >2 level rule on an ec pool (6+2), suc...
Chris Durham
04:15 PM Bug #57698: osd/scrub: "scrub a chunk" requests are sent to the wrong set of replicas
Following some discussions: here are excerpts from a run demonstrating this issue.
Test run rfriedma-2022-09-28_15:5...
Ronen Friedman

10/13/2022

07:39 AM Bug #57859 (Fix Under Review): bail from handle_command() if _generate_command_map() fails
Ilya Dryomov
03:51 AM Bug #57859 (Resolved): bail from handle_command() if _generate_command_map() fails
https://tracker.ceph.com/issues/54558 catches an exception from handle_command() to avoid mon termination due to a po... nikhil kshirsagar
04:03 AM Bug #54558: malformed json in a Ceph RESTful API call can stop all ceph-mon services
nikhil kshirsagar wrote:
> Ilya Dryomov wrote:
> > I don't think https://github.com/ceph/ceph/pull/45547 is a compl...
nikhil kshirsagar

10/12/2022

05:08 PM Bug #57782: [mon] high cpu usage by fn_monstore thread
Hey Radek,
makes sense, I created a debug branch https://github.com/ceph/ceph-ci/pull/new/wip-crush-debug and migh...
Deepika Upadhyay
02:39 AM Bug #57852 (Need More Info): osd: unhealthy osd cannot be marked down in time
Before an unhealthy osd is marked down by mon, other osd may choose it as
heartbeat peer and then report an incorrec...
wencong wan

10/11/2022

10:13 AM Bug #57845 (New): MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_O...
... Andreas Teuchert

10/10/2022

06:33 PM Bug #57796: after rebalance of pool via pgupmap balancer, continuous issues in monitor log

Radoslaw,
Yes, I saw that piece of code too. But i *think* I figured it out just a short time ago. I had the cru...
Chris Durham
06:05 PM Bug #57796 (Need More Info): after rebalance of pool via pgupmap balancer, continuous issues in m...
Thanks for the report! The log comes from there:... Radoslaw Zarzynski
06:23 PM Bug #57782 (Need More Info): [mon] high cpu usage by fn_monstore thread
It looks we're burning CPU in @close(2)@. The single call site I can spot is in @write_data_set_to_csv@. Let's analyz... Radoslaw Zarzynski
06:08 AM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Laura Flores wrote:
> I contacted some Telemetry users. I will report back here with any information.
>
I am on...
Jimmy Spets

10/07/2022

08:32 PM Bug #57796: after rebalance of pool via pgupmap balancer, continuous issues in monitor log
I removed the hosts holding the osds reported by verify_upmap from the default root rule that no one uses, and the lo... Chris Durham
05:56 PM Bug #57796: after rebalance of pool via pgupmap balancer, continuous issues in monitor log
Note that the balancer balanced a replicated pool, using its own custom crush root too. The hosts in that pool (not i... Chris Durham
05:46 PM Bug #57796: after rebalance of pool via pgupmap balancer, continuous issues in monitor log
preformatting the crush info so it shows up properly ...... Chris Durham
05:43 PM Bug #57796 (Need More Info): after rebalance of pool via pgupmap balancer, continuous issues in m...

The pgupmap balancer was not balancing well, and after setting mgr/balancer/upmap_max_deviation to 1 (ceph config-k...
Chris Durham
04:46 PM Backport #57795 (In Progress): quincy: intrusive_lru leaking memory when
https://github.com/ceph/ceph/pull/54557 Backport Bot
04:46 PM Backport #57794 (Resolved): pacific: intrusive_lru leaking memory when
https://github.com/ceph/ceph/pull/54558 Backport Bot
04:29 PM Bug #57573 (Pending Backport): intrusive_lru leaking memory when
Casey Bodley
12:36 PM Bug #54773: crash: void MonMap::add(const mon_info_t&): assert(addr_mons.count(a) == 0)
See bug 54744. Gabriel Mainberger
12:35 PM Bug #54744: crash: void MonMap::add(const mon_info_t&): assert(addr_mons.count(a) == 0)
Rook v1.6.5 / Ceph v12.2.9 running on the host network and not inside the Kubernetes SDN caused creating a mon canary... Gabriel Mainberger

10/06/2022

08:38 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
I contacted some Telemetry users. I will report back here with any information.
Something to note: The large maj...
Laura Flores
05:08 PM Backport #57545: quincy: CommandFailedError: Command failed (workunit test rados/test_python.sh) ...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/48113
merged
Yuri Weinstein
05:05 PM Backport #57496: quincy: Invalid read of size 8 in handle_recovery_delete()
Nitzan Mordechai wrote:
> https://github.com/ceph/ceph/pull/48039
merged
Yuri Weinstein
05:04 PM Backport #57443: quincy: osd: Update osd's IOPS capacity using async Context completion instead o...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47983
merged
Yuri Weinstein
05:03 PM Backport #57346: quincy: expected valgrind issues and found none
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47933
merged
Yuri Weinstein
05:01 PM Backport #56602: quincy: ceph report missing osdmap_clean_epochs if answered by peon
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47928
merged
Yuri Weinstein
05:00 PM Backport #55282: quincy: osd: add scrub duration for scrubs after recovery
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47926
merged
Yuri Weinstein
04:47 PM Backport #57544: pacific: CommandFailedError: Command failed (workunit test rados/test_python.sh)...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/48112
merged
Yuri Weinstein
02:08 PM Bug #57782 (Fix Under Review): [mon] high cpu usage by fn_monstore thread
We observed high cpu usage by ms_dispatch and fn_monstore thread (amounting to 100-99% in top) Ceph [ deployment was ... Deepika Upadhyay

10/05/2022

06:49 PM Bug #57699 (Fix Under Review): slow osd boot with valgrind (reached maximum tries (50) after wait...
Radoslaw Zarzynski
06:48 PM Bug #57049 (Duplicate): cluster logging does not adhere to mon_cluster_log_file_level
Radoslaw Zarzynski
06:46 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
Hi Laura. In luck with verification of the hypothesis from the comment #17? Radoslaw Zarzynski
06:43 PM Bug #57532 (Duplicate): Notice discrepancies in the performance of mclock built-in profiles
Marked as duplicate per comment #4. Radoslaw Zarzynski
06:25 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
There is a coredump on the teuhtology node (@/ceph/teuthology-archive/yuriw-2022-09-29_16:44:24-rados-wip-lflores-tes... Radoslaw Zarzynski
06:19 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
I think this a fix for that got reverted in quincy (https://tracker.ceph.com/issues/53806) but it's still in @main@. ... Radoslaw Zarzynski
06:12 PM Bug #50042: rados/test.sh: api_watch_notify failures
Assigning to Nitzan just for the sake of testing the hypothesis from https://tracker.ceph.com/issues/50042#note-35. Radoslaw Zarzynski
06:06 PM Cleanup #57587 (Resolved): mon: fix Elector warnings
Resolved by https://github.com/ceph/ceph/pull/48289. Laura Flores
06:05 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
This won't be easy to reproduce but there are still some options like:
* contacting owners of the external cluster...
Radoslaw Zarzynski

10/04/2022

05:25 PM Bug #50042: rados/test.sh: api_watch_notify failures
/a/yuriw-2022-09-29_16:40:30-rados-wip-all-kickoff-r-distro-default-smithi/7047940... Laura Flores

10/03/2022

10:21 PM Bug #53575: Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64
Found a similar instance here:
/a/lflores-2022-09-30_21:47:41-rados-wip-lflores-testing-distro-default-smithi/7050...
Laura Flores
10:07 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
/a/yuriw-2022-09-29_16:44:24-rados-wip-lflores-testing-distro-default-smithi/7048304
/a/lflores-2022-09-30_21:47:41-...
Laura Flores
10:01 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
Put affected version as "14.2.9" since there is no option for "14.2.19". Laura Flores
09:59 PM Bug #57757 (Fix Under Review): ECUtil: terminate called after throwing an instance of 'ceph::buff...
/a/yuriw-2022-09-29_16:44:24-rados-wip-lflores-testing-distro-default-smithi/7048173/remote/smithi133/crash/posted/20... Laura Flores
12:59 PM Bug #57751 (Resolved): LibRadosAio.SimpleWritePP hang and pkill
/a/nmordech-2022-10-02_08:27:55-rados:verify-wip-nm-51282-distro-default-smithi/7051967/... Nitzan Mordechai

09/30/2022

07:13 PM Bug #17170 (Fix Under Review): mon/monclient: update "unable to obtain rotating service keys when...
Greg Farnum
04:49 PM Bug #57105: quincy: ceph osd pool set <pool> size math error
Looks like in both cases something is being subtracted from an zero value unsigned int64 and overflowing.
2^64 − ...
Brian Woods
03:37 PM Bug #57105: quincy: ceph osd pool set <pool> size math error
Setting the size (from 3) to 2, then setting it to 1 works...... Brian Woods
03:38 AM Bug #57105: quincy: ceph osd pool set <pool> size math error
I created a new cluster today to do a very specific test and ran into this (or something like it) again today. In th... Brian Woods
10:40 AM Bug #49777 (Resolved): test_pool_min_size: 'check for active or peered' reached maximum tries (5)...
Konstantin Shalygin
10:39 AM Backport #57022 (Resolved): pacific: test_pool_min_size: 'check for active or peered' reached max...
Konstantin Shalygin
09:28 AM Bug #50192 (Resolved): FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_...
Konstantin Shalygin
09:27 AM Backport #50274 (Resolved): pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get...
Konstantin Shalygin
09:27 AM Bug #53516 (Resolved): Disable health warning when autoscaler is on
Konstantin Shalygin
09:27 AM Backport #53644 (Resolved): pacific: Disable health warning when autoscaler is on
Konstantin Shalygin
09:27 AM Bug #51942 (Resolved): src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>())
Konstantin Shalygin
09:26 AM Backport #53339 (Resolved): pacific: src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<cons...
Konstantin Shalygin
09:26 AM Bug #55001 (Resolved): rados/test.sh: Early exit right after LibRados global tests complete
Konstantin Shalygin
09:26 AM Backport #57029 (Resolved): pacific: rados/test.sh: Early exit right after LibRados global tests ...
Konstantin Shalygin
09:26 AM Bug #57119 (Resolved): Heap command prints with "ceph tell", but not with "ceph daemon"
Konstantin Shalygin
09:25 AM Backport #57313 (Resolved): pacific: Heap command prints with "ceph tell", but not with "ceph dae...
Konstantin Shalygin
05:18 AM Backport #57372 (Resolved): quincy: segfault in librados via libcephsqlite
Konstantin Shalygin
04:23 AM Bug #57532: Notice discrepancies in the performance of mclock built-in profiles
As Sridhar has mentioned in the BZ, the Case 2 results are due to the max limit setting for best effort clients. This... Aishwarya Mathuria
02:19 AM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
/a/yuriw-2022-09-27_23:37:28-rados-wip-yuri2-testing-2022-09-27-1455-distro-default-smithi/7046230/ Kamoltat (Junior) Sirivadhna

09/29/2022

08:37 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
- This was visible again in LRC upgrade today.... Vikhyat Umrao
07:31 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
yuriw-2022-09-27_23:37:28-rados-wip-yuri2-testing-2022-09-27-1455-distro-default-smithi/7046253 Kamoltat (Junior) Sirivadhna
07:21 PM Bug #53768: timed out waiting for admin_socket to appear after osd.2 restart in thrasher/defaults...
yuriw-2022-09-27_23:37:28-rados-wip-yuri2-testing-2022-09-27-1455-distro-default-smithi/7046234 Kamoltat (Junior) Sirivadhna
06:02 PM Bug #55435 (Resolved): mon/Elector: notify_ranked_removed() does not properly erase dead_ping in ...
Konstantin Shalygin
06:01 PM Backport #56550 (Resolved): pacific: mon/Elector: notify_ranked_removed() does not properly erase...
Konstantin Shalygin
03:55 PM Bug #54611 (Resolved): prometheus metrics shows incorrect ceph version for upgraded ceph daemon
Konstantin Shalygin
03:54 PM Backport #55309 (Resolved): pacific: prometheus metrics shows incorrect ceph version for upgraded...
Konstantin Shalygin
02:52 PM Bug #57727: mon_cluster_log_file_level option doesn't take effect
Yes. I was trying to close it as a duplicate after editing my comment. Thank you for closing it. Prashant D
02:50 PM Bug #57727 (Duplicate): mon_cluster_log_file_level option doesn't take effect
Ah, you edited your comment to say "Closing this tracker as a duplicate of 57049". Ilya Dryomov
02:48 PM Bug #57727 (Fix Under Review): mon_cluster_log_file_level option doesn't take effect
Ilya Dryomov
02:41 PM Bug #57727: mon_cluster_log_file_level option doesn't take effect
Hi Ilya,
I had a PR#47480 opened for this issue but closed it in favor of PR#47502. We have a old tracker 57049 fo...
Prashant D
02:00 PM Bug #57727 (Duplicate): mon_cluster_log_file_level option doesn't take effect
This appears to be regression introduced in quincy in https://github.com/ceph/ceph/pull/42014:... Ilya Dryomov
02:44 PM Bug #57049: cluster logging does not adhere to mon_cluster_log_file_level
I had a PR#47480 opened for this issue but closed it in favor of PR#47502. The PR#47502 addresses this issue along wi... Prashant D
02:15 PM Backport #56735 (Resolved): octopus: unessesarily long laggy PG state
Konstantin Shalygin
02:14 PM Bug #50806 (Resolved): osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_lo...
Konstantin Shalygin
02:13 PM Backport #50893 (Resolved): pacific: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_s...
Konstantin Shalygin
02:07 PM Bug #55158 (Resolved): mon/OSDMonitor: properly set last_force_op_resend in stretch mode
Konstantin Shalygin
02:07 PM Backport #55281 (Resolved): pacific: mon/OSDMonitor: properly set last_force_op_resend in stretch...
Konstantin Shalygin
11:58 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
I was not able to reproduce it with the more debug messages, I created PR with the debug message and will wait for re... Nitzan Mordechai
07:28 AM Bug #56289 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
Matan Breizman
07:28 AM Bug #54710 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
Matan Breizman
07:28 AM Bug #54709 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
Matan Breizman
07:21 AM Bug #54708 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
Matan Breizman
07:02 AM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
Radoslaw Zarzynski wrote:
> A note from the bug scrub: work in progress.
WIP: https://gist.github.com/Matan-B/ca5...
Matan Breizman
02:47 AM Bug #57532: Notice discrepancies in the performance of mclock built-in profiles
Hi Bharath, could you also add the mClock configuration values from osd config show command here?
Aishwarya Mathuria

09/28/2022

06:03 PM Bug #53806 (New): unessesarily long laggy PG state
Reopening b/c the original fix had to be reverted: https://github.com/ceph/ceph/pull/44499#issuecomment-1247315820. Radoslaw Zarzynski
05:54 PM Bug #57618: rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)
Note from a scrub: might we worth talking about. Radoslaw Zarzynski
05:51 PM Bug #57650 (In Progress): mon-stretch: reweighting an osd to a big number, then back to original ...
Radoslaw Zarzynski
05:51 PM Bug #57678 (Fix Under Review): Mon fail to send pending metadata through MMgrUpdate after an upgr...
Radoslaw Zarzynski
05:50 PM Bug #57698: osd/scrub: "scrub a chunk" requests are sent to the wrong set of replicas
What are symptoms? How bad is it? A hang maybe? I'm asking to understand the impact. Radoslaw Zarzynski
05:48 PM Bug #57698 (In Progress): osd/scrub: "scrub a chunk" requests are sent to the wrong set of replicas
IIRC Ronen has mentioned the scrub code interchanges @get_acting_set()@ and @get_acting_recovery_backfill()@. Radoslaw Zarzynski
01:40 PM Bug #57698 (Resolved): osd/scrub: "scrub a chunk" requests are sent to the wrong set of replicas
The Primary registers its intent to scrub with the 'get_actingset()', as it should.
But the actual chunk requests ar...
Ronen Friedman
05:45 PM Bug #57699 (In Progress): slow osd boot with valgrind (reached maximum tries (50) after waiting f...
Marking WIP per our morning talk. Radoslaw Zarzynski
01:58 PM Bug #57699 (Resolved): slow osd boot with valgrind (reached maximum tries (50) after waiting for ...
/a/yuriw-2022-09-23_20:38:59-rados-wip-yuri6-testing-2022-09-23-1008-quincy-distro-default-smithi/7042504 ... Nitzan Mordechai
05:44 PM Backport #57705 (Resolved): pacific: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when redu...
Backport Bot
05:44 PM Backport #57704 (Resolved): quincy: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reduc...
Backport Bot
05:43 PM Bug #57529 (In Progress): mclock backfill is getting higher priority than WPQ
Marking as WIP as IIRC Sridhar was talking about this issue during core standups. Radoslaw Zarzynski
05:42 PM Bug #57573 (In Progress): intrusive_lru leaking memory when
As I understood:
1. @evit()@ intends to not free too much (which makes sense).
2. The dtor reuses @evict()@ for c...
Radoslaw Zarzynski
05:39 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
A note from the bug scrub: work in progress. Radoslaw Zarzynski
05:35 PM Bug #50089 (Pending Backport): mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing n...
Neha Ojha
11:06 AM Bug #50089: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors i...
... Gaurav Sitlani
11:03 AM Bug #50089: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors i...
I am seeing the same crash in version : ceph version 16.2.10 and just noticed that PR linked in this thread is merged... Gaurav Sitlani
01:10 PM Backport #57696 (Resolved): quincy: ceph log last command fail to log by verbosity level
https://github.com/ceph/ceph/pull/50407 Backport Bot
01:04 PM Feature #52424 (Resolved): [RFE] Limit slow request details to mgr log
Prashant D
01:03 PM Bug #57340 (Pending Backport): ceph log last command fail to log by verbosity level
Prashant D
 

Also available in: Atom