Activity
From 09/28/2022 to 10/27/2022
10/27/2022
- 06:07 PM Bug #57940 (Duplicate): ceph osd crashes with FAILED ceph_assert(clone_overlap.count(clone)) when...
- Hi, I have this current crash:
I've experienced a disk failure in my ceph cluster.
I've replaced the disk, but no... - 04:50 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
- @Laura, thanks for that! i'll try first with main as you suggested
- 03:32 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
- @Nitzan, here is the branch if you'd like to rebuild it on ci: https://github.com/ljflores/ceph/commits/wip-lflores-t...
- 10:36 AM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
- The coredump from branch wip-lflores-testing, I was not able to create docker image since this branch is no longer av...
- 12:17 PM Bug #55141: thrashers/fastread: assertion failure: rollback_info_trimmed_to == head
- Radoslaw Zarzynski wrote:
> Well, just found a new occurance.
Where can i find it?
- 12:13 PM Bug #50042 (In Progress): rados/test.sh: api_watch_notify failures
- 12:12 PM Bug #52136 (In Progress): Valgrind reports memory "Leak_DefinitelyLost" errors.
- 11:47 AM Bug #57751 (In Progress): LibRadosAio.SimpleWritePP hang and pkill
- 10:55 AM Bug #57751: LibRadosAio.SimpleWritePP hang and pkill
- This is not an issue with the test, not all the osd are up, and we are waiting (valgrind report memory leak from rock...
- 04:26 AM Bug #57937 (Rejected): pg autoscaler of rgw pools doesn't work after creating otp pool
- It's about the following my post to ceph-users ML.
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/threa...
10/26/2022
- 11:25 PM Bug #57017 (Pending Backport): mon-stretched_cluster: degraded stretched mode lead to Monitor crash
- 09:18 PM Bug #52129: LibRadosWatchNotify.AioWatchDelete failed
- /a/yuriw-2022-10-19_18:35:19-rados-wip-yuri10-testing-2022-10-19-0810-distro-default-smithi/7074802
- 02:52 PM Bug #57883 (Resolved): test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get:...
- 01:45 PM Bug #50042: rados/test.sh: api_watch_notify failures
- ...
- 04:58 AM Bug #50042: rados/test.sh: api_watch_notify failures
- I checked all the list_watchers failures (checking size of watch list), It looks like the watcher timed out and that ...
- 06:09 AM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- I was able to gather a coredump and set up a binary compatible environment to debug it from this run Laura started in...
- 04:58 AM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- I wrote up an working explanation of PastIntervals in https://github.com/athanatos/ceph/tree/sjust/wip-49689-past-int...
- 12:07 AM Bug #57845 (New): MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_O...
- Notes from rados team meeting:
Seems like the same class of bugs we hit in https://tracker.ceph.com/issues/52657 a...
10/25/2022
- 11:14 PM Bug #51729: Upmap verification fails for multi-level crush rule
- I put together the following contrived example to
illustrate the problem. Again, this is pacific 16.2.9 on rocky8 li... - 05:19 PM Bug #50219 (New): qa/standalone/erasure-code/test-erasure-eio.sh fails since pg is not in recover...
- The failure actually reproduced here:
/a/lflores-2022-10-17_18:19:55-rados:standalone-main-distro-default-smithi/7... - 05:06 PM Bug #57883 (Fix Under Review): test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_...
- 02:21 PM Bug #57883 (In Progress): test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_g...
- 02:19 PM Bug #57900 (In Progress): mon/crush_ops.sh: mons out of quorum
- 02:17 PM Bug #57900: mon/crush_ops.sh: mons out of quorum
- @Radek so the suggestion is to give the mons more time to reboot?
This is the workunit:
https://github.com/ceph/c...
10/24/2022
- 06:18 PM Bug #57852: osd: unhealthy osd cannot be marked down in time
- Not a something we introduced recently but still worth taking a look if nothing urgent is not the plate.
- 06:17 PM Bug #57852 (New): osd: unhealthy osd cannot be marked down in time
- For the detailed explanation!
- 06:10 PM Bug #57845: MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_OCTOPUS...
- Just before the crash time-outs were seen:...
- 06:05 PM Bug #57915: LibRadosWatchNotify.AioNotify - error callback ceph_assert(ref > 0)
- Yes, this is one of the Notify bugs that i hit during my tests
- 05:14 PM Bug #57915: LibRadosWatchNotify.AioNotify - error callback ceph_assert(ref > 0)
- Nitzan, I recall you mentioned about some watch-related tests on today's stand-up. Is this one of them?
- 05:57 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
- As this is about EC: can be acting's items duplicated?
- 05:55 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
- If https://github.com/ceph/ceph/pull/47901/commits/0d07b406dc2f854363f7ae9b970e980400f4f03e is the actual culprit, th...
- 05:42 PM Bug #57883: test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get: grep '\<5...
- It looks we asked for taking osd.5 down, got a confirmation the command was handled by mon and then @get_osd@ said %5...
- 05:25 PM Bug #57900: mon/crush_ops.sh: mons out of quorum
- Just **suggestion** from the bug scrub: this is a mon thrashing test. None of mon loga seems to have a trace of crash...
- 05:18 PM Bug #55141: thrashers/fastread: assertion failure: rollback_info_trimmed_to == head
- Well, just found a new occurance.
- 05:11 PM Bug #55141: thrashers/fastread: assertion failure: rollback_info_trimmed_to == head
- Lowering the priority as we haven't seen a reoccurence last time.
- 05:17 PM Bug #57913 (Duplicate): Thrashosd: timeout 120 ceph --cluster ceph osd pool rm unique_pool_2 uniq...
- In the teuthology log:...
- 05:10 PM Bug #57529 (Fix Under Review): mclock backfill is getting higher priority than WPQ
- 04:06 AM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
- Laura Flores wrote:
> Notes from the rados suite review:
>
> We may need to check if we're shutting down while se...
10/23/2022
- 11:45 AM Bug #57915 (New): LibRadosWatchNotify.AioNotify - error callback ceph_assert(ref > 0)
- /a//nmordech-2022-10-23_05:26:13-rados:verify-wip-nm-51282-distro-default-smithi/7077932...
- 05:19 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
- Sridher, yes, those trackers look the same, valgrind make the osd start slower, maybe that's the reason we are seeing...
10/21/2022
- 04:19 PM Bug #55809: "Leak_IndirectlyLost" valgrind report on mon.c
- /a/yuriw-2022-10-12_16:24:50-rados-wip-yuri8-testing-2022-10-12-0718-quincy-distro-default-smithi/7063948/
- 04:16 PM Bug #57913 (Duplicate): Thrashosd: timeout 120 ceph --cluster ceph osd pool rm unique_pool_2 uniq...
- /a/yuriw-2022-10-12_16:24:50-rados-wip-yuri8-testing-2022-10-12-0718-quincy-distro-default-smithi/7063868/
rados/t... - 08:41 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
- @Nitzan Mordechai this is probably similar to,
https://tracker.ceph.com/issues/52948 and https://tracker.ceph.com/is... - 07:47 AM Fix #57040 (Resolved): osd: Update osd's IOPS capacity using async Context completion instead of ...
- 07:46 AM Backport #57443 (Resolved): quincy: osd: Update osd's IOPS capacity using async Context completio...
10/20/2022
- 11:33 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
- Notes from the rados suite review:
We may need to check if we're shutting down while sending pg stats; if so, we d... - 03:07 PM Bug #57152 (Resolved): segfault in librados via libcephsqlite
- 03:06 PM Backport #57373 (Resolved): pacific: segfault in librados via libcephsqlite
- 02:56 PM Backport #57373: pacific: segfault in librados via libcephsqlite
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/48187
merged
10/19/2022
- 09:21 PM Backport #52747 (In Progress): pacific: MON_DOWN during mon_join process
- 09:09 PM Backport #52746 (Rejected): octopus: MON_DOWN during mon_join process
- Octopus is EOL.
- 08:59 PM Bug #43584: MON_DOWN during mon_join process
- /a/yuriw-2022-10-05_20:44:57-rados-wip-yuri4-testing-2022-10-05-0917-pacific-distro-default-smithi/7055594
- 08:46 PM Bug #57900 (In Progress): mon/crush_ops.sh: mons out of quorum
- /a/teuthology-2022-10-09_07:01:03-rados-quincy-distro-default-smithi/7059463...
- 03:20 PM Bug #57698 (Pending Backport): osd/scrub: "scrub a chunk" requests are sent to the wrong set of r...
- 10:29 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
- The issue is that we having deadlock on specific condition. When we are trying to update the mClockScheduler config c...
- 05:31 AM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
- I was able to reproduce this using the test Laura mentioned above - http://pulpito.front.sepia.ceph.com/amathuri-2022...
10/18/2022
- 04:31 PM Bug #51729: Upmap verification fails for multi-level crush rule
- Chris, can you please provide your osdmap binary?
- 09:03 AM Bug #57845: MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_OCTOPUS...
- Hi Neha,
the logs from the crash instance that I reported initially are already rotated out on the particular node... - 02:48 AM Bug #57852: osd: unhealthy osd cannot be marked down in time
- Radoslaw Zarzynski wrote:
> Could you please clarify a bit? Do you mean there some extra, unnecessary (from the POV ...
10/17/2022
- 06:27 PM Bug #57796: after rebalance of pool via pgupmap balancer, continuous issues in monitor log
- Link to the discussion on ceph-users: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AZHAIGY3BIM4SGB...
- 06:20 PM Bug #57883: test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get: grep '\<5...
- Let's first see if it's easily reproducible:
http://pulpito.front.sepia.ceph.com/lflores-2022-10-17_18:19:55-rados:s... - 06:03 PM Bug #57883: test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get: grep '\<5...
- The failed function:
qa/standalone/erasure-code/test-erasure-code.sh... - 05:52 PM Bug #57883 (Resolved): test-erasure-code.sh: TEST_rados_put_get_jerasure fails on "rados_put_get:...
- /a/yuriw-2022-10-13_17:24:48-rados-main-distro-default-smithi/7065580...
- 06:16 PM Bug #57845 (Need More Info): MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(feature...
- These reports in telemetry look similar: http://telemetry.front.sepia.ceph.com:4000/d/Nvj6XTaMk/spec-search?orgId=1&v...
- 06:08 PM Bug #57852 (Need More Info): osd: unhealthy osd cannot be marked down in time
- Could you please clarify a bit? Do you mean there some extra, unnecessary (from the POV of jugging whether an OSD is ...
- 05:48 PM Bug #57782: [mon] high cpu usage by fn_monstore thread
- NOT A FIX (extra debugs): https://github.com/ceph/ceph/pull/48513
- 05:45 PM Bug #57698 (Fix Under Review): osd/scrub: "scrub a chunk" requests are sent to the wrong set of r...
- 05:43 PM Bug #51729: Upmap verification fails for multi-level crush rule
- A note from bug scrub: this is going to be assigned tomorrow.
10/14/2022
- 09:13 PM Bug #51729: Upmap verification fails for multi-level crush rule
- Andras,
Thanks for the extra info. This needs to be addressed. Anyone?
- 08:48 PM Bug #51729: Upmap verification fails for multi-level crush rule
- Just to clarify - the error "verify_upmap number of buckets X exceeds desired Y" comes from the C++ code in ceph-mon ...
- 06:47 PM Bug #51729: Upmap verification fails for multi-level crush rule
- I am now seeing this issue on pacific, 16.2.10 on rocky8 linux.
If I have a >2 level rule on an ec pool (6+2), suc... - 04:15 PM Bug #57698: osd/scrub: "scrub a chunk" requests are sent to the wrong set of replicas
- Following some discussions: here are excerpts from a run demonstrating this issue.
Test run rfriedma-2022-09-28_15:5...
10/13/2022
- 07:39 AM Bug #57859 (Fix Under Review): bail from handle_command() if _generate_command_map() fails
- 03:51 AM Bug #57859 (Resolved): bail from handle_command() if _generate_command_map() fails
- https://tracker.ceph.com/issues/54558 catches an exception from handle_command() to avoid mon termination due to a po...
- 04:03 AM Bug #54558: malformed json in a Ceph RESTful API call can stop all ceph-mon services
- nikhil kshirsagar wrote:
> Ilya Dryomov wrote:
> > I don't think https://github.com/ceph/ceph/pull/45547 is a compl...
10/12/2022
- 05:08 PM Bug #57782: [mon] high cpu usage by fn_monstore thread
- Hey Radek,
makes sense, I created a debug branch https://github.com/ceph/ceph-ci/pull/new/wip-crush-debug and migh... - 02:39 AM Bug #57852 (Need More Info): osd: unhealthy osd cannot be marked down in time
- Before an unhealthy osd is marked down by mon, other osd may choose it as
heartbeat peer and then report an incorrec...
10/11/2022
- 10:13 AM Bug #57845 (New): MOSDRepOp::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_O...
- ...
10/10/2022
- 06:33 PM Bug #57796: after rebalance of pool via pgupmap balancer, continuous issues in monitor log
Radoslaw,
Yes, I saw that piece of code too. But i *think* I figured it out just a short time ago. I had the cru...- 06:05 PM Bug #57796 (Need More Info): after rebalance of pool via pgupmap balancer, continuous issues in m...
- Thanks for the report! The log comes from there:...
- 06:23 PM Bug #57782 (Need More Info): [mon] high cpu usage by fn_monstore thread
- It looks we're burning CPU in @close(2)@. The single call site I can spot is in @write_data_set_to_csv@. Let's analyz...
- 06:08 AM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
- Laura Flores wrote:
> I contacted some Telemetry users. I will report back here with any information.
>
I am on...
10/07/2022
- 08:32 PM Bug #57796: after rebalance of pool via pgupmap balancer, continuous issues in monitor log
- I removed the hosts holding the osds reported by verify_upmap from the default root rule that no one uses, and the lo...
- 05:56 PM Bug #57796: after rebalance of pool via pgupmap balancer, continuous issues in monitor log
- Note that the balancer balanced a replicated pool, using its own custom crush root too. The hosts in that pool (not i...
- 05:46 PM Bug #57796: after rebalance of pool via pgupmap balancer, continuous issues in monitor log
- preformatting the crush info so it shows up properly ......
- 05:43 PM Bug #57796 (Need More Info): after rebalance of pool via pgupmap balancer, continuous issues in m...
The pgupmap balancer was not balancing well, and after setting mgr/balancer/upmap_max_deviation to 1 (ceph config-k...- 04:46 PM Backport #57795 (New): quincy: intrusive_lru leaking memory when
- 04:46 PM Backport #57794 (New): pacific: intrusive_lru leaking memory when
- 04:29 PM Bug #57573 (Pending Backport): intrusive_lru leaking memory when
- 12:36 PM Bug #54773: crash: void MonMap::add(const mon_info_t&): assert(addr_mons.count(a) == 0)
- See bug 54744.
- 12:35 PM Bug #54744: crash: void MonMap::add(const mon_info_t&): assert(addr_mons.count(a) == 0)
- Rook v1.6.5 / Ceph v12.2.9 running on the host network and not inside the Kubernetes SDN caused creating a mon canary...
10/06/2022
- 08:38 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
- I contacted some Telemetry users. I will report back here with any information.
Something to note: The large maj... - 05:08 PM Backport #57545: quincy: CommandFailedError: Command failed (workunit test rados/test_python.sh) ...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/48113
merged - 05:05 PM Backport #57496: quincy: Invalid read of size 8 in handle_recovery_delete()
- Nitzan Mordechai wrote:
> https://github.com/ceph/ceph/pull/48039
merged - 05:04 PM Backport #57443: quincy: osd: Update osd's IOPS capacity using async Context completion instead o...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47983
merged - 05:03 PM Backport #57346: quincy: expected valgrind issues and found none
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47933
merged - 05:01 PM Backport #56602: quincy: ceph report missing osdmap_clean_epochs if answered by peon
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47928
merged - 05:00 PM Backport #55282: quincy: osd: add scrub duration for scrubs after recovery
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47926
merged - 04:47 PM Backport #57544: pacific: CommandFailedError: Command failed (workunit test rados/test_python.sh)...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/48112
merged - 02:08 PM Bug #57782 (Need More Info): [mon] high cpu usage by fn_monstore thread
- We observed high cpu usage by ms_dispatch and fn_monstore thread (amounting to 100-99% in top) Ceph [ deployment was ...
10/05/2022
- 06:49 PM Bug #57699 (Fix Under Review): slow osd boot with valgrind (reached maximum tries (50) after wait...
- 06:48 PM Bug #57049 (Duplicate): cluster logging does not adhere to mon_cluster_log_file_level
- 06:46 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
- Hi Laura. In luck with verification of the hypothesis from the comment #17?
- 06:43 PM Bug #57532 (Duplicate): Notice discrepancies in the performance of mclock built-in profiles
- Marked as duplicate per comment #4.
- 06:25 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
- There is a coredump on the teuhtology node (@/ceph/teuthology-archive/yuriw-2022-09-29_16:44:24-rados-wip-lflores-tes...
- 06:19 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
- I think this a fix for that got reverted in quincy (https://tracker.ceph.com/issues/53806) but it's still in @main@. ...
- 06:12 PM Bug #50042: rados/test.sh: api_watch_notify failures
- Assigning to Nitzan just for the sake of testing the hypothesis from https://tracker.ceph.com/issues/50042#note-35.
- 06:06 PM Cleanup #57587 (Resolved): mon: fix Elector warnings
- Resolved by https://github.com/ceph/ceph/pull/48289.
- 06:05 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
- This won't be easy to reproduce but there are still some options like:
* contacting owners of the external cluster...
10/04/2022
- 05:25 PM Bug #50042: rados/test.sh: api_watch_notify failures
- /a/yuriw-2022-09-29_16:40:30-rados-wip-all-kickoff-r-distro-default-smithi/7047940...
10/03/2022
- 10:21 PM Bug #53575: Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64
- Found a similar instance here:
/a/lflores-2022-09-30_21:47:41-rados-wip-lflores-testing-distro-default-smithi/7050... - 10:07 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
- /a/yuriw-2022-09-29_16:44:24-rados-wip-lflores-testing-distro-default-smithi/7048304
/a/lflores-2022-09-30_21:47:41-... - 10:01 PM Bug #57757: ECUtil: terminate called after throwing an instance of 'ceph::buffer::v15_2_0::end_of...
- Put affected version as "14.2.9" since there is no option for "14.2.19".
- 09:59 PM Bug #57757 (Fix Under Review): ECUtil: terminate called after throwing an instance of 'ceph::buff...
- /a/yuriw-2022-09-29_16:44:24-rados-wip-lflores-testing-distro-default-smithi/7048173/remote/smithi133/crash/posted/20...
- 12:59 PM Bug #57751 (Resolved): LibRadosAio.SimpleWritePP hang and pkill
- /a/nmordech-2022-10-02_08:27:55-rados:verify-wip-nm-51282-distro-default-smithi/7051967/...
09/30/2022
- 07:13 PM Bug #17170 (Fix Under Review): mon/monclient: update "unable to obtain rotating service keys when...
- 04:49 PM Bug #57105: quincy: ceph osd pool set <pool> size math error
- Looks like in both cases something is being subtracted from an zero value unsigned int64 and overflowing.
2^64 − ... - 03:37 PM Bug #57105: quincy: ceph osd pool set <pool> size math error
- Setting the size (from 3) to 2, then setting it to 1 works......
- 03:38 AM Bug #57105: quincy: ceph osd pool set <pool> size math error
- I created a new cluster today to do a very specific test and ran into this (or something like it) again today. In th...
- 10:40 AM Bug #49777 (Resolved): test_pool_min_size: 'check for active or peered' reached maximum tries (5)...
- 10:39 AM Backport #57022 (Resolved): pacific: test_pool_min_size: 'check for active or peered' reached max...
- 09:28 AM Bug #50192 (Resolved): FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_...
- 09:27 AM Backport #50274 (Resolved): pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get...
- 09:27 AM Bug #53516 (Resolved): Disable health warning when autoscaler is on
- 09:27 AM Backport #53644 (Resolved): pacific: Disable health warning when autoscaler is on
- 09:27 AM Bug #51942 (Resolved): src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>())
- 09:26 AM Backport #53339 (Resolved): pacific: src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<cons...
- 09:26 AM Bug #55001 (Resolved): rados/test.sh: Early exit right after LibRados global tests complete
- 09:26 AM Backport #57029 (Resolved): pacific: rados/test.sh: Early exit right after LibRados global tests ...
- 09:26 AM Bug #57119 (Resolved): Heap command prints with "ceph tell", but not with "ceph daemon"
- 09:25 AM Backport #57313 (Resolved): pacific: Heap command prints with "ceph tell", but not with "ceph dae...
- 05:18 AM Backport #57372 (Resolved): quincy: segfault in librados via libcephsqlite
- 04:23 AM Bug #57532: Notice discrepancies in the performance of mclock built-in profiles
- As Sridhar has mentioned in the BZ, the Case 2 results are due to the max limit setting for best effort clients. This...
- 02:19 AM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
- /a/yuriw-2022-09-27_23:37:28-rados-wip-yuri2-testing-2022-09-27-1455-distro-default-smithi/7046230/
09/29/2022
- 08:37 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
- - This was visible again in LRC upgrade today....
- 07:31 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
- yuriw-2022-09-27_23:37:28-rados-wip-yuri2-testing-2022-09-27-1455-distro-default-smithi/7046253
- 06:02 PM Bug #55435 (Resolved): mon/Elector: notify_ranked_removed() does not properly erase dead_ping in ...
- 06:01 PM Backport #56550 (Resolved): pacific: mon/Elector: notify_ranked_removed() does not properly erase...
- 03:55 PM Bug #54611 (Resolved): prometheus metrics shows incorrect ceph version for upgraded ceph daemon
- 03:54 PM Backport #55309 (Resolved): pacific: prometheus metrics shows incorrect ceph version for upgraded...
- 02:52 PM Bug #57727: mon_cluster_log_file_level option doesn't take effect
- Yes. I was trying to close it as a duplicate after editing my comment. Thank you for closing it.
- 02:50 PM Bug #57727 (Duplicate): mon_cluster_log_file_level option doesn't take effect
- Ah, you edited your comment to say "Closing this tracker as a duplicate of 57049".
- 02:48 PM Bug #57727 (Fix Under Review): mon_cluster_log_file_level option doesn't take effect
- 02:41 PM Bug #57727: mon_cluster_log_file_level option doesn't take effect
- Hi Ilya,
I had a PR#47480 opened for this issue but closed it in favor of PR#47502. We have a old tracker 57049 fo... - 02:00 PM Bug #57727 (Duplicate): mon_cluster_log_file_level option doesn't take effect
- This appears to be regression introduced in quincy in https://github.com/ceph/ceph/pull/42014:...
- 02:44 PM Bug #57049: cluster logging does not adhere to mon_cluster_log_file_level
- I had a PR#47480 opened for this issue but closed it in favor of PR#47502. The PR#47502 addresses this issue along wi...
- 02:15 PM Backport #56735 (Resolved): octopus: unessesarily long laggy PG state
- 02:14 PM Bug #50806 (Resolved): osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_lo...
- 02:13 PM Backport #50893 (Resolved): pacific: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_s...
- 02:07 PM Bug #55158 (Resolved): mon/OSDMonitor: properly set last_force_op_resend in stretch mode
- 02:07 PM Backport #55281 (Resolved): pacific: mon/OSDMonitor: properly set last_force_op_resend in stretch...
- 11:58 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
- I was not able to reproduce it with the more debug messages, I created PR with the debug message and will wait for re...
- 07:28 AM Bug #56289 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
- 07:28 AM Bug #54710 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
- 07:28 AM Bug #54709 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
- 07:21 AM Bug #54708 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
- 07:02 AM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- Radoslaw Zarzynski wrote:
> A note from the bug scrub: work in progress.
WIP: https://gist.github.com/Matan-B/ca5... - 02:47 AM Bug #57532: Notice discrepancies in the performance of mclock built-in profiles
- Hi Bharath, could you also add the mClock configuration values from osd config show command here?
09/28/2022
- 06:03 PM Bug #53806 (New): unessesarily long laggy PG state
- Reopening b/c the original fix had to be reverted: https://github.com/ceph/ceph/pull/44499#issuecomment-1247315820.
- 05:54 PM Bug #57618: rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)
- Note from a scrub: might we worth talking about.
- 05:51 PM Bug #57650 (In Progress): mon-stretch: reweighting an osd to a big number, then back to original ...
- 05:51 PM Bug #57678 (Fix Under Review): Mon fail to send pending metadata through MMgrUpdate after an upgr...
- 05:50 PM Bug #57698: osd/scrub: "scrub a chunk" requests are sent to the wrong set of replicas
- What are symptoms? How bad is it? A hang maybe? I'm asking to understand the impact.
- 05:48 PM Bug #57698 (In Progress): osd/scrub: "scrub a chunk" requests are sent to the wrong set of replicas
- IIRC Ronen has mentioned the scrub code interchanges @get_acting_set()@ and @get_acting_recovery_backfill()@.
- 01:40 PM Bug #57698 (Resolved): osd/scrub: "scrub a chunk" requests are sent to the wrong set of replicas
- The Primary registers its intent to scrub with the 'get_actingset()', as it should.
But the actual chunk requests ar... - 05:45 PM Bug #57699 (In Progress): slow osd boot with valgrind (reached maximum tries (50) after waiting f...
- Marking WIP per our morning talk.
- 01:58 PM Bug #57699 (Pending Backport): slow osd boot with valgrind (reached maximum tries (50) after wait...
- /a/yuriw-2022-09-23_20:38:59-rados-wip-yuri6-testing-2022-09-23-1008-quincy-distro-default-smithi/7042504 ...
- 05:44 PM Backport #57705 (Resolved): pacific: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when redu...
- 05:44 PM Backport #57704 (Resolved): quincy: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reduc...
- 05:43 PM Bug #57529 (In Progress): mclock backfill is getting higher priority than WPQ
- Marking as WIP as IIRC Sridhar was talking about this issue during core standups.
- 05:42 PM Bug #57573 (In Progress): intrusive_lru leaking memory when
- As I understood:
1. @evit()@ intends to not free too much (which makes sense).
2. The dtor reuses @evict()@ for c... - 05:39 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- A note from the bug scrub: work in progress.
- 05:35 PM Bug #50089 (Pending Backport): mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing n...
- 11:06 AM Bug #50089: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors i...
- ...
- 11:03 AM Bug #50089: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors i...
- I am seeing the same crash in version : ceph version 16.2.10 and just noticed that PR linked in this thread is merged...
- 01:10 PM Backport #57696 (In Progress): quincy: ceph log last command fail to log by verbosity level
- https://github.com/ceph/ceph/pull/50407
- 01:04 PM Feature #52424 (Resolved): [RFE] Limit slow request details to mgr log
- 01:03 PM Bug #57340 (Pending Backport): ceph log last command fail to log by verbosity level
Also available in: Atom