Activity
From 05/16/2023 to 06/14/2023
06/14/2023
- 01:59 PM Bug #59478: osd/scrub: verify SnapMapper consistency not backported
- I there any update on this backport? This seems to be causing corruption in some of our customers' rbd images.
- 11:47 AM Bug #59531: quincy: "OSD bench result of 228617.361065 IOPS exceeded the threshold limit of 500.0...
- quincy:
http://pulpito.front.sepia.ceph.com/yuriw-2023-06-13_23:20:02-fs-wip-yuri3-testing-2023-06-13-1204-quincy-di... - 11:47 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- quincy:
http://pulpito.front.sepia.ceph.com/yuriw-2023-06-13_23:20:02-fs-wip-yuri3-testing-2023-06-13-1204-quincy-di... - 10:51 AM Bug #57310 (Resolved): StriperTest: The futex facility returned an unexpected error code
- 10:44 AM Backport #58315 (Resolved): quincy: Valgrind reports memory "Leak_DefinitelyLost" errors.
- 10:41 AM Bug #21592 (Resolved): LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
- 10:40 AM Backport #58614 (Resolved): quincy: pglog growing unbounded on EC with copy by ref
- 09:49 AM Backport #61445 (Resolved): reef: slow osd boot with valgrind (reached maximum tries (50) after w...
- Already in reef
- 09:47 AM Backport #61446 (Resolved): quincy: slow osd boot with valgrind (reached maximum tries (50) after...
- 09:45 AM Bug #57165 (Resolved): expected valgrind issues and found none
- 09:43 AM Backport #58869 (Resolved): quincy: rados/test.sh: api_watch_notify failures
- merged
- 09:43 AM Backport #58612 (Resolved): quincy: api_watch_notify_pp: LibRadosWatchNotifyPPTests/LibRadosWatch...
- 09:41 AM Bug #58130 (In Progress): LibRadosAio.SimpleWrite hang and pkill
- 09:41 AM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
- After checking the logs, it looks like pg 33.6 on osd.7 was in peering state and we didn't had chance (Until Alarm po...
- 09:30 AM Feature #56153 (Resolved): add option to dump pg log to pg command
06/13/2023
- 08:46 PM Bug #57782: [mon] high cpu usage by fn_monstore thread
- Hello,
Reporting same issue with a ceph-16.2.10 cluster deployed on top of kubernetes (rook-1.9.12) when adding a ... - 08:27 PM Backport #61447 (Resolved): reef: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileT...
- 02:19 AM Bug #61650: pg can't recover from LAGGY state if the pool size is 1
- Our clusters' version is v17.2.6
- 02:16 AM Bug #61650 (New): pg can't recover from LAGGY state if the pool size is 1
- In one of our online clusters, some pool is configured to have no replicas.
There are times that some PGs keep bei...
06/12/2023
- 05:21 PM Backport #61647 (New): reef: TEST_recovery_scrub_2: TEST FAILED WITH 1 ERRORS
- 05:18 PM Bug #61386 (Pending Backport): TEST_recovery_scrub_2: TEST FAILED WITH 1 ERRORS
- 05:17 PM Bug #61585 (Need More Info): OSD segfault in PG::put()
- 05:16 PM Bug #61594: recovery_ops_reserved leak -- pg stuck in state recovering
- bump this up for the next scrub
06/09/2023
- 06:48 AM Backport #61624 (In Progress): quincy: Doc: Add mclock release note regarding design and usabilit...
- 06:37 AM Backport #61624 (Resolved): quincy: Doc: Add mclock release note regarding design and usability e...
- https://github.com/ceph/ceph/pull/51978
- 06:44 AM Backport #61625 (In Progress): reef: Doc: Add mclock release note regarding design and usability ...
- 06:37 AM Backport #61625 (Resolved): reef: Doc: Add mclock release note regarding design and usability enh...
- https://github.com/ceph/ceph/pull/51977
- 06:30 AM Bug #61623 (Resolved): Doc: Add mclock release note regarding design and usability enhancements.
- This is related to the fixes made as part of the following trackers:
1. https://tracker.ceph.com/issues/58529
2. ...
06/08/2023
- 03:25 PM Bug #59291: pg_pool_t version compatibility issue
- The proposal we discussed:...
- 02:50 PM Bug #59291: pg_pool_t version compatibility issue
- As discussed in the call with Radek and Nitzan,
Nitzan -> work on the actual fix
Junior -> Unit test -> Integrati... - 02:27 PM Bug #59291: pg_pool_t version compatibility issue
- > we can bump compact to 6 (currently 5) and check for version 29.6 and 31.6 that way we can easily find the new vers...
- 05:04 AM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
- it looks like we have something wrong with backoff with peering in some cases.
from osd.7 - we added backoff after s...
06/07/2023
- 03:12 PM Backport #61447: reef: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reach...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/51802
merged - 03:03 PM Bug #61547: Logs produce non-human-readable timestamps after monitor upgrade
- https://github.com/ceph/ceph/pull/51892 merged
- 03:01 PM Backport #61579: reef: Logs produce non-human-readable timestamps after monitor upgrade
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/51893
merged - 08:05 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- quincy:
http://pulpito.front.sepia.ceph.com/yuriw-2023-05-31_21:56:15-fs-wip-yuri6-testing-2023-05-31-0933-quincy-di... - 07:02 AM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
- quincy:
http://pulpito.front.sepia.ceph.com/yuriw-2023-05-31_21:56:15-fs-wip-yuri6-testing-2023-05-31-0933-quincy-di... - 07:00 AM Bug #59531: quincy: "OSD bench result of 228617.361065 IOPS exceeded the threshold limit of 500.0...
- quincy:
http://pulpito.front.sepia.ceph.com/yuriw-2023-05-31_21:56:15-fs-wip-yuri6-testing-2023-05-31-0933-quincy-di...
06/06/2023
- 04:23 PM Backport #61603 (Resolved): reef: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have...
- https://github.com/ceph/ceph/pull/52136
- 04:23 PM Backport #61602 (In Progress): pacific: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do no...
- https://github.com/ceph/ceph/pull/52138
- 04:22 PM Backport #61601 (In Progress): quincy: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not...
- https://github.com/ceph/ceph/pull/52137
- 04:14 PM Bug #59192 (Pending Backport): cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an...
- 01:08 PM Bug #61600 (New): dmClock client_map entries exceeding the erase age are not removed by the perio...
- The issue was originally observed in https://tracker.ceph.com/issues/61594. It was reproduced locally.
The test invo... - 10:59 AM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
- still digging the logs, but after we are sending aio_write, osd.7 showing added backoff on PG 33.6...
- 12:03 AM Bug #61594 (New): recovery_ops_reserved leak -- pg stuck in state recovering
- Original state:...
06/05/2023
- 06:24 PM Bug #59192 (Fix Under Review): cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an...
- 06:07 PM Bug #59192: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application enable...
- Hey Radek, yes. Looking into it, it should be a quick whitelist fix. Trying out a fix now.
- 05:33 PM Bug #59192: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application enable...
- Hi Laura! Do you have the bandwidth to take a deeper look?
- 05:33 PM Bug #61585: OSD segfault in PG::put()
- This was a vanilla main branch as far as Ceph bits are concerned (https://github.com/ceph/ceph/commit/4d3e9642f733c6e...
- 05:24 PM Bug #61585: OSD segfault in PG::put()
- Does it happen on main or solely in the testing branch?
- 08:40 AM Bug #61585: OSD segfault in PG::put()
- Ilya Dryomov wrote:
> The coredump binary collection sub-task is still broken, I'm going to file a separate ticket f... - 08:31 AM Bug #61585 (Need More Info): OSD segfault in PG::put()
- http://qa-proxy.ceph.com/teuthology/dis-2023-06-04_22:59:23-krbd-main-wip-exclusive-lock-snapc-default-smithi/7295992...
- 05:26 PM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
- Would you mind continuing?
- 07:07 AM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
- Laura Flores wrote:
> /a/yuriw-2023-03-14_20:10:47-rados-wip-yuri-testing-2023-03-14-0714-reef-distro-default-smithi... - 05:10 PM Bug #58894 (Resolved): [pg-autoscaler][mgr] does not throw warn to increase PG count on pools wit...
- all backports merged
- 05:08 PM Backport #59180 (Resolved): quincy: [pg-autoscaler][mgr] does not throw warn to increase PG count...
- merged
- 03:26 PM Backport #61446: quincy: slow osd boot with valgrind (reached maximum tries (50) after waiting fo...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/51807
merged - 03:25 PM Backport #61449: quincy: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: rea...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/51804
merged - 03:24 PM Backport #61456: quincy: test_dedup_tool.sh: test_dedup_object fails when pool 'dedup_chunk_pool'...
- https://github.com/ceph/ceph/pull/51780 merged
06/04/2023
- 12:38 PM Bug #61386 (Fix Under Review): TEST_recovery_scrub_2: TEST FAILED WITH 1 ERRORS
- 08:33 AM Bug #56034 (Fix Under Review): qa/standalone/osd/divergent-priors.sh fails in test TEST_divergent...
- 08:21 AM Bug #56034 (In Progress): qa/standalone/osd/divergent-priors.sh fails in test TEST_divergent_3()
- since TEST_divergent_3 have pg_autoscale_mode on not all the pgs are clean+active when we pick the divergent osd whic...
- 05:02 AM Bug #59291: pg_pool_t version compatibility issue
- Junior, i know its a bit ugly fix, but how about using the compact version as well?
we can bump compact to 6 (curren...
06/03/2023
- 07:53 PM Bug #61386: TEST_recovery_scrub_2: TEST FAILED WITH 1 ERRORS
- It's just a test issue per the conversation with Ronen on 3 Jun.
06/02/2023
- 08:29 PM Bug #59192: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application enable...
- /a/yuriw-2023-05-30_21:40:46-rados-wip-yuri10-testing-2023-05-30-1244-distro-default-smithi/7290995
- 04:15 PM Bug #59192 (New): cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an application ...
- Hmm, found another instance that looks like this tracker in main:
/a/yuriw-2023-06-01_19:33:38-rados-wip-yuri-testin... - 08:24 PM Bug #56034: qa/standalone/osd/divergent-priors.sh fails in test TEST_divergent_3()
- /a/yuriw-2023-05-30_21:40:46-rados-wip-yuri10-testing-2023-05-30-1244-distro-default-smithi/7291436...
- 05:06 PM Backport #61579 (In Progress): reef: Logs produce non-human-readable timestamps after monitor upg...
- 04:49 PM Backport #61579 (Resolved): reef: Logs produce non-human-readable timestamps after monitor upgrade
- https://github.com/ceph/ceph/pull/51893
- 04:48 PM Bug #61547 (Pending Backport): Logs produce non-human-readable timestamps after monitor upgrade
- 02:43 PM Bug #61547 (Fix Under Review): Logs produce non-human-readable timestamps after monitor upgrade
- 09:00 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- reef:
https://pulpito.ceph.com/yuriw-2023-05-28_14:46:14-fs-reef-release-distro-default-smithi/7288896 - 07:37 AM Feature #61573 (New): Erasure Code: processing speed down when cacheline not aligned
- Right now, Erasure Code SIMD_ALIGN variable has been set to 32, since function create_aligned() and rebuild_aligned()...
- 06:09 AM Bug #49962: 'sudo ceph --cluster ceph osd crush tunables default' fails due to valgrind: Unknown ...
- Ignore the previous comments. There is a new tracker https://tracker.ceph.com/issues/61400 which tracks the issue.
06/01/2023
- 07:05 PM Bug #61547: Logs produce non-human-readable timestamps after monitor upgrade
- Git bisect ended with:...
- 06:05 PM Bug #61547 (In Progress): Logs produce non-human-readable timestamps after monitor upgrade
- 05:14 PM Bug #61547: Logs produce non-human-readable timestamps after monitor upgrade
- Adding a note that this bug is not restricted to upgrades. We discovered it upon upgrading the monitor to Reef, but f...
- 03:56 PM Bug #61547: Logs produce non-human-readable timestamps after monitor upgrade
- I verified that this problem does not exist on a v17.2.6 cluster. This indicates one of the commits since v17.2.6 is ...
- 02:29 PM Bug #61547: Logs produce non-human-readable timestamps after monitor upgrade
- On freshly built and deployed Reef RC vstart.sh cluster the timestamps in monitor's log look human readable:...
- 01:20 AM Bug #61547: Logs produce non-human-readable timestamps after monitor upgrade
- Also happened in the ceph.audit.log (on gibba001 under /root/tracker_61547).
- 01:08 AM Bug #61547: Logs produce non-human-readable timestamps after monitor upgrade
- A good place to look for a regression might be under src/log.
- 12:49 AM Bug #61547: Logs produce non-human-readable timestamps after monitor upgrade
- Mgr log is on gibba006 under /root/tracker_61547, and ceph.log and mon log are on gibba001 under /root/tracker_61547....
- 06:18 PM Bug #51688: "stuck peering for" warning is misleading
- Hi Prashant! How about turning this reproducer into a workunit and appending to the PR?
- 06:16 PM Bug #59599 (Resolved): osd: cls_refcount unit test failures during upgrade sequence
- Backporting has been done manually, without tracker tickets.
- 05:49 PM Backport #61569 (Resolved): quincy: the mgr, osd version information missing in "ceph versions" c...
- https://github.com/ceph/ceph/pull/52161
- 05:47 PM Bug #61453 (Pending Backport): the mgr, osd version information missing in "ceph versions" comman...
- 02:24 PM Bug #59291: pg_pool_t version compatibility issue
- How big are the stretch-related fields after encoding?
- 02:16 PM Bug #59291: pg_pool_t version compatibility issue
- I have an impression since @30@ we've changed the responsibility of the version bits: now they're indicating whether ...
- 10:37 AM Bug #56192 (Fix Under Review): crash: virtual Monitor::~Monitor(): assert(session_map.sessions.em...
- 09:23 AM Bug #56192 (In Progress): crash: virtual Monitor::~Monitor(): assert(session_map.sessions.empty())
- 09:22 AM Bug #56192: crash: virtual Monitor::~Monitor(): assert(session_map.sessions.empty())
- New session connection are not checked before adding new session if the monitor is on shutdown.
05/31/2023
- 10:55 PM Bug #61547 (Resolved): Logs produce non-human-readable timestamps after monitor upgrade
- During the gibba upgrade to the Reef RC (4a02f3f496d9039326c49bf1fbe140388cd2f619), some of the logs produced non-hum...
- 07:20 AM Bug #38900: EC pools don't self repair on client read error
- I also found this problem in ceph-15.2.8. In the case of ec, a shard was damaged and could be read and returned to th...
05/30/2023
- 08:41 PM Backport #58979 (Resolved): reef: rocksdb "Leak_StillReachable" memory leak in mons
- Resolved by https://github.com/ceph/ceph/pull/50424
- 08:40 PM Bug #58925 (Resolved): rocksdb "Leak_StillReachable" memory leak in mons
- 08:37 PM Bug #59192 (Duplicate): cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an applic...
- Solved by https://github.com/ceph/ceph/pull/51494 in https://tracker.ceph.com/issues/61168.
- 07:23 PM Bug #61453: the mgr, osd version information missing in "ceph versions" command during cluster up...
- https://github.com/ceph/ceph/pull/51788
https://github.com/ceph/ceph/pull/51765
merged - 07:13 PM Bug #59333: PgScrubber: timeout on reserving replicas
- /a/yuriw-2023-05-28_14:41:12-rados-reef-release-distro-default-smithi/7288683
- 06:17 PM Bug #61386: TEST_recovery_scrub_2: TEST FAILED WITH 1 ERRORS
- /a/yuriw-2023-05-27_01:14:40-rados-wip-yuri-testing-2023-05-26-1204-distro-default-smithi/7287753$
- 06:01 PM Backport #59701 (Resolved): quincy: mon: FAILED ceph_assert(osdmon()->is_writeable())
- Backport PR: https://github.com/ceph/ceph/pull/51413
MERGED!
05/29/2023
- 12:26 PM Backport #61488 (In Progress): pacific: ceph: osd blocklist does not accept v2/v1: prefix for addr
- 11:14 AM Backport #61488 (Resolved): pacific: ceph: osd blocklist does not accept v2/v1: prefix for addr
- https://github.com/ceph/ceph/pull/51812
- 12:24 PM Backport #61487 (In Progress): quincy: ceph: osd blocklist does not accept v2/v1: prefix for addr
- 11:14 AM Backport #61487 (Resolved): quincy: ceph: osd blocklist does not accept v2/v1: prefix for addr
- https://github.com/ceph/ceph/pull/51811
- 11:08 AM Bug #58884 (Pending Backport): ceph: osd blocklist does not accept v2/v1: prefix for addr
- 11:06 AM Bug #59599: osd: cls_refcount unit test failures during upgrade sequence
- I think that i got it right - its pretty weird (for me) but thats what i found -
All the tests that failed in that ... - 08:06 AM Bug #59599 (In Progress): osd: cls_refcount unit test failures during upgrade sequence
- 07:40 AM Bug #59599: osd: cls_refcount unit test failures during upgrade sequence
- That behavior only happens with upgrade, i'm looking into it. But that error only occurs when the code that i added i...
- 08:04 AM Backport #61446 (In Progress): quincy: slow osd boot with valgrind (reached maximum tries (50) af...
- 08:01 AM Backport #61445 (In Progress): reef: slow osd boot with valgrind (reached maximum tries (50) afte...
- 07:54 AM Backport #61449 (In Progress): quincy: rados/singleton: radosbench.py: teuthology.exceptions.MaxW...
- 07:53 AM Backport #61448 (In Progress): pacific: rados/singleton: radosbench.py: teuthology.exceptions.Max...
- 07:52 AM Backport #61447 (In Progress): reef: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhi...
- 07:49 AM Backport #61451 (In Progress): quincy: qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['q...
- 07:49 AM Backport #61452 (In Progress): reef: qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quo...
- 07:42 AM Backport #61450 (In Progress): pacific: qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['...
05/27/2023
- 01:34 AM Backport #61475: pacific: common: Use double instead of long double to improve performance
- https://github.com/ceph/ceph/pull/51316
- 01:29 AM Backport #61475 (Resolved): pacific: common: Use double instead of long double to improve perform...
- To convert namoseconds to seconds, the precision needs to be 10,
and the precision of double is 15, which is enough ...
05/26/2023
- 09:26 PM Backport #58338 (Resolved): quincy: mon-stretched_cluster: degraded stretched mode lead to Monito...
- backport PR: https://github.com/ceph/ceph/pull/51413 merged!
- 08:17 PM Backport #61474: reef: the mgr, osd version information missing in "ceph versions" command during...
- https://github.com/ceph/ceph/pull/51788
- 06:56 PM Backport #61474 (Resolved): reef: the mgr, osd version information missing in "ceph versions" com...
- We were doing gibba cluster upgrade from quincy to reef and observed that ceph versions output was missing the mgr an...
- 01:50 PM Backport #61335: quincy: Able to modify the mclock reservation, weight and limit parameters when ...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/51664
merged - 01:49 PM Bug #59271: mon: FAILED ceph_assert(osdmon()->is_writeable())
- Kamoltat (Junior) Sirivadhna wrote:
> quincy: https://github.com/ceph/ceph/pull/51413
merged - 01:49 PM Bug #57017: mon-stretched_cluster: degraded stretched mode lead to Monitor crash
- Kamoltat (Junior) Sirivadhna wrote:
> quincy backport: https://github.com/ceph/ceph/pull/51413
merged
- 01:47 PM Backport #58612: quincy: api_watch_notify_pp: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.Wa...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/49938
merged - 06:37 AM Bug #51688: "stuck peering for" warning is misleading
- @Shreyansh This script does reproduce this issue but peering is intermediate. We can work on it to reproduce it consi...
- 12:35 AM Bug #61453 (Fix Under Review): the mgr, osd version information missing in "ceph versions" comman...
05/25/2023
- 11:27 PM Bug #61453: the mgr, osd version information missing in "ceph versions" command during cluster up...
- I see something similar on a new Reef cluster (18.0.0-3449-gb3947a04)
in my case, ceph versions only shows the mon... - 09:19 PM Bug #61453: the mgr, osd version information missing in "ceph versions" command during cluster up...
- PR - https://github.com/ceph/ceph/pull/48136 that might have caused the regression.
- 08:48 PM Bug #61453 (Resolved): the mgr, osd version information missing in "ceph versions" command during...
- We were doing gibba cluster upgrade from quincy to reef and observed that ceph versions output was missing the mgr an...
- 11:03 PM Bug #61457: PgScrubber: shard blocked on an object for too long
- The failure did not reproduce in 15 reruns: http://pulpito.front.sepia.ceph.com/lflores-2023-05-25_22:14:25-rados-wip...
- 11:02 PM Bug #61457: PgScrubber: shard blocked on an object for too long
- Ronen, could this be related to any recent scrub changes?
- 10:47 PM Bug #61457 (New): PgScrubber: shard blocked on an object for too long
- /a/yuriw-2023-05-25_14:52:58-rados-wip-yuri3-testing-2023-05-24-1136-quincy-distro-default-smithi/7286563...
- 10:40 PM Backport #61456 (New): quincy: test_dedup_tool.sh: test_dedup_object fails when pool 'dedup_chunk...
- 10:39 PM Bug #59599: osd: cls_refcount unit test failures during upgrade sequence
- /a/yuriw-2023-05-25_14:52:58-rados-wip-yuri3-testing-2023-05-24-1136-quincy-distro-default-smithi/7286576
- 05:49 PM Bug #59599: osd: cls_refcount unit test failures during upgrade sequence
- Hello Nitzan! Could it be related to https://github.com/ceph/ceph/pull/47332?
- 03:02 PM Bug #59599: osd: cls_refcount unit test failures during upgrade sequence
- i don't see any significant changes to this refcount object class in a long time. the @test_implicit_ec@ test case do...
- 10:37 PM Bug #58587: test_dedup_tool.sh: test_dedup_object fails when pool 'dedup_chunk_pool' does not exist
- /a/yuriw-2023-05-25_14:52:58-rados-wip-yuri3-testing-2023-05-24-1136-quincy-distro-default-smithi/7286538
- 10:36 PM Bug #58587 (Pending Backport): test_dedup_tool.sh: test_dedup_object fails when pool 'dedup_chunk...
- 06:42 PM Bug #59504: 17.2.6: build fails with fmt 9.1.0
- From the quincy's @Journal.cc:142@:...
- 06:31 PM Bug #53575 (Resolved): Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64
- Closing this as the last-mention PR got duplicated by https://github.com/ceph/ceph/pull/51341 and duplicate has been ...
- 06:27 PM Backport #61452 (Resolved): reef: qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quorum...
- https://github.com/ceph/ceph/pull/51800
- 06:27 PM Backport #61451 (Resolved): quincy: qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quor...
- https://github.com/ceph/ceph/pull/51801
- 06:27 PM Backport #61450 (Resolved): pacific: qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quo...
- https://github.com/ceph/ceph/pull/51799
- 06:24 PM Bug #59656 (Need More Info): pg_upmap_primary timeout
- 06:23 PM Bug #52316 (Pending Backport): qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quorum'])...
- 06:23 PM Bug #59333: PgScrubber: timeout on reserving replicas
- Ronen, PTAL.
- 06:19 PM Bug #61350 (Rejected): In the readonly mode of cache tier, the object isn't be promoted, which is...
- Nautilus is EOL.
Please also note the cache tiering will be depreacted in Reef. - 06:18 PM Bug #61358: qa: osd - cluster [WRN] 1 slow requests found in cluster log
- Bump up. Let's observe whether there are further occurances.
- 06:15 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Aishwarya, it started showing again. Could you please take a look?
- 06:09 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- Brad, let's sync talk about that in DS meeting.
- 06:05 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- The RocksDB upgrade PR has been merged on 1st March.
https://github.com/ceph/ceph/pull/49006 - 05:57 PM Backport #61449 (Resolved): quincy: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhil...
- https://github.com/ceph/ceph/pull/51804
- 05:57 PM Backport #61448 (Resolved): pacific: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhi...
- https://github.com/ceph/ceph/pull/51803
- 05:57 PM Backport #61447 (Resolved): reef: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileT...
- https://github.com/ceph/ceph/pull/51802
- 05:57 PM Backport #61446 (Resolved): quincy: slow osd boot with valgrind (reached maximum tries (50) after...
- https://github.com/ceph/ceph/pull/51807
- 05:57 PM Backport #61445 (Resolved): reef: slow osd boot with valgrind (reached maximum tries (50) after w...
- https://github.com/ceph/ceph/pull/51805
- 05:54 PM Bug #49888 (Pending Backport): rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTrie...
- 05:49 PM Bug #57699 (Pending Backport): slow osd boot with valgrind (reached maximum tries (50) after wait...
- 05:42 PM Bug #55009 (Fix Under Review): Scrubbing exits due to error reading object head
- Ronen, is it fine treat 51669 as a fix for this?
- 05:38 PM Bug #59291: pg_pool_t version compatibility issue
- Hi Junior! What's the current status?
05/24/2023
- 08:42 PM Bug #61228 (Resolved): Tests failing with slow scrubs with new mClock default profile
- 08:41 PM Backport #61232 (Resolved): reef: Tests failing with slow scrubs with new mClock default profile
- 08:41 PM Backport #61231 (Resolved): quincy: Tests failing with slow scrubs with new mClock default profile
- 08:06 PM Bug #61226 (Duplicate): event duration is overflow
- 07:35 PM Bug #61388 (Duplicate): osd/TrackedOp: TrackedOp event order error
- Closing this in favor of the fresh duplicate. Apologizes we've missed the initial fix!
- 04:16 PM Bug #61388: osd/TrackedOp: TrackedOp event order error
- duplicate issue: https://tracker.ceph.com/issues/58012
- 10:55 AM Bug #61388 (Pending Backport): osd/TrackedOp: TrackedOp event order error
- 01:53 AM Bug #61388 (Duplicate): osd/TrackedOp: TrackedOp event order error
- Header_read time is recv_stap, throttled time is throttle_stamp. Throttled event is in front of header_read event cur...
- 05:58 PM Backport #61404 (Resolved): reef: Scrubs are too slow with new mClock profile changes
- 05:41 PM Backport #61404 (Resolved): reef: Scrubs are too slow with new mClock profile changes
- https://github.com/ceph/ceph/pull/51712
- 05:55 PM Backport #61403 (In Progress): quincy: Scrubs are too slow with new mClock profile changes
- 05:41 PM Backport #61403 (Resolved): quincy: Scrubs are too slow with new mClock profile changes
- https://github.com/ceph/ceph/pull/51728
- 05:51 PM Backport #61345 (Resolved): reef: WaitReplicas::react(const DigestUpdate&): Unexpected DigestUpda...
- 05:51 PM Backport #61345: reef: WaitReplicas::react(const DigestUpdate&): Unexpected DigestUpdate event
- https://github.com/ceph/ceph/pull/51683
- 05:39 PM Bug #61313 (Pending Backport): Scrubs are too slow with new mClock profile changes
- 04:13 PM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
- /a/yuriw-2023-05-23_22:39:17-rados-wip-yuri6-testing-2023-05-23-0757-reef-distro-default-smithi/7284941
- 03:47 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- /a/yuriw-2023-05-24_14:33:21-rados-wip-yuri6-testing-2023-05-23-0757-reef-distro-default-smithi/7285192
- 02:23 PM Bug #57650: mon-stretch: reweighting an osd to a big number, then back to original causes uneven ...
- Think this has to do with how we are subtracting/adding weights to each crush bucket, I think a better way is to alwa...
- 12:45 PM Bug #57310 (Fix Under Review): StriperTest: The futex facility returned an unexpected error code
- 12:22 PM Bug #57310 (In Progress): StriperTest: The futex facility returned an unexpected error code
- This looks like we are sending notification to the semaphore after it was destroyed. we are missing some waits if we ...
- 10:59 AM Bug #59531: quincy: "OSD bench result of 228617.361065 IOPS exceeded the threshold limit of 500.0...
- quincy:
https://pulpito.ceph.com/yuriw-2023-05-23_15:23:11-fs-wip-yuri10-testing-2023-05-18-0815-quincy-distro-defau... - 10:58 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- quincy:
https://pulpito.ceph.com/yuriw-2023-05-23_15:23:11-fs-wip-yuri10-testing-2023-05-18-0815-quincy-distro-defau... - 10:57 AM Backport #61396 (New): quincy: osd/TrackedOp: TrackedOp event order error
- 10:57 AM Backport #61395 (New): reef: osd/TrackedOp: TrackedOp event order error
- 06:57 AM Documentation #58590: osd_op_thread_suicide_timeout is not documented
- > options that are marked `advanced` don't necessarily warrant detailed documentation
Yes that may be true. But th... - 01:07 AM Documentation #58590: osd_op_thread_suicide_timeout is not documented
- My sense is that options that are marked `advanced` don't necessarily warrant detailed documentation. There are near...
- 05:06 AM Bug #24990 (Resolved): api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
- 05:06 AM Backport #53166 (Resolved): pacific: api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
05/23/2023
- 10:59 PM Backport #61336: reef: Able to modify the mclock reservation, weight and limit parameters when bu...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/51663
merged - 10:57 PM Backport #61303: reef: src/osd/PrimaryLogPG.cc: 4284: ceph_abort_msg("out of order op")
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/51666
merged - 10:38 PM Bug #61386 (Pending Backport): TEST_recovery_scrub_2: TEST FAILED WITH 1 ERRORS
- /a/lflores-2023-05-23_18:17:13-rados-wip-yuri-testing-2023-05-22-0845-reef-distro-default-smithi/7284160...
- 10:17 PM Bug #61385 (New): TEST_dump_scrub_schedule fails from "key is query_active: negation:0 # expected...
- /a/yuriw-2023-05-22_23:22:00-rados-wip-yuri-testing-2023-05-22-0845-reef-distro-default-smithi/7282843...
- 09:59 PM Bug #57650: mon-stretch: reweighting an osd to a big number, then back to original causes uneven ...
- This some how only reproducible with re weight an osd from 0.0900 to 0.7000 and back to 0.0900.
This PR https://gi... - 07:37 PM Backport #53166: pacific: api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/51261
merged - 03:43 PM Bug #47273 (Resolved): ceph report missing osdmap_clean_epochs if answered by peon
- 03:42 PM Backport #56604 (Resolved): pacific: ceph report missing osdmap_clean_epochs if answered by peon
- 03:26 PM Backport #56604: pacific: ceph report missing osdmap_clean_epochs if answered by peon
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/51258
merged - 03:28 PM Backport #59628: pacific: rados/test.sh hang and pkilled (LibRadosWatchNotifyEC.WatchNotify)
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/51341
merged - 03:21 PM Backport #61150: quincy: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismat...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/51512
merged - 08:57 AM Bug #61358 (New): qa: osd - cluster [WRN] 1 slow requests found in cluster log
- Description: fs/32bits/{begin/{0-install 1-ceph 2-logrotate} clusters/fixed-2-ucephfs conf/{client mds mon osd} distr...
- 08:51 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- reef
https://pulpito.ceph.com/yuriw-2023-05-22_14:44:12-fs-wip-yuri3-testing-2023-05-21-0740-reef-distro-default-smi... - 06:52 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- reef
https://pulpito.ceph.com/yuriw-2023-05-10_18:53:39-fs-wip-yuri3-testing-2023-05-10-0851-reef-distro-default-smi... - 07:17 AM Bug #61226: event duration is overflow
- pr is here https://github.com/ceph/ceph/pull/51545
- 06:34 AM Feature #43910: Utilize new Linux kernel v5.6 prctl PR_SET_IO_FLUSHER option
- Rook gives a warning to not use XFS with hyperconverged settings (see https://github.com/rook/rook/blob/v1.11.6/Docum...
- 02:28 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- Still specific to Jammy.
- 01:59 AM Bug #61350 (Rejected): In the readonly mode of cache tier, the object isn't be promoted, which is...
- Steps to Reproduce::
1. creat cache iter of cephfs data pool.
2. copy a file to the mounted cephfs directory.
3. ...
05/22/2023
- 11:25 PM Bug #58894: [pg-autoscaler][mgr] does not throw warn to increase PG count on pools with autoscale...
- https://github.com/ceph/ceph/pull/50693 merged
- 09:53 PM Bug #61349: ObjectWriteOperation::mtime2() works with IoCtx::operate() but not aio_operate()
- for background, https://github.com/ceph/ceph/pull/50206 changes some of rgw's librados operations to aio_operate(), a...
- 09:51 PM Bug #61349 (Fix Under Review): ObjectWriteOperation::mtime2() works with IoCtx::operate() but not...
- 09:36 PM Bug #61349 (Resolved): ObjectWriteOperation::mtime2() works with IoCtx::operate() but not aio_ope...
- @librados::IoCtxImpl::operate()@ takes an optional @ceph::real_time*@ and uses it when given
but @librados::IoCtxI... - 09:04 PM Bug #59599: osd: cls_refcount unit test failures during upgrade sequence
- /a/yuriw-2023-05-22_15:26:04-rados-wip-yuri10-testing-2023-05-18-0815-quincy-distro-default-smithi/7282680
- 07:40 PM Backport #61232: reef: Tests failing with slow scrubs with new mClock default profile
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/51569
merged - 07:32 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- /a/lflores-2023-05-22_16:08:13-rados-wip-yuri6-testing-2023-05-19-1351-reef-distro-default-smithi/7282703
Was alre... - 06:17 PM Bug #61313 (Fix Under Review): Scrubs are too slow with new mClock profile changes
- 06:46 AM Bug #61313 (Resolved): Scrubs are too slow with new mClock profile changes
- Scrubs are being reported to be very slow in multiple teuthology tests causing them to fail.
An example: https://pu... - 04:50 PM Backport #61345 (Resolved): reef: WaitReplicas::react(const DigestUpdate&): Unexpected DigestUpda...
- 04:48 PM Bug #59049 (Pending Backport): WaitReplicas::react(const DigestUpdate&): Unexpected DigestUpdate ...
- 12:21 PM Backport #59538: pacific: osd/scrub: verify SnapMapper consistency not backported
- Hi @Ronen
Is there something we can do to prepare or help with the backport? - 11:57 AM Backport #61303 (In Progress): reef: src/osd/PrimaryLogPG.cc: 4284: ceph_abort_msg("out of order ...
- 11:40 AM Backport #61335 (In Progress): quincy: Able to modify the mclock reservation, weight and limit pa...
- 11:16 AM Backport #61335 (Resolved): quincy: Able to modify the mclock reservation, weight and limit param...
- https://github.com/ceph/ceph/pull/51664
- 11:37 AM Backport #61336 (In Progress): reef: Able to modify the mclock reservation, weight and limit para...
- 11:16 AM Backport #61336 (Resolved): reef: Able to modify the mclock reservation, weight and limit paramet...
- https://github.com/ceph/ceph/pull/51663
- 11:08 AM Bug #61155 (Pending Backport): Able to modify the mclock reservation, weight and limit parameters...
05/19/2023
- 09:40 PM Bug #55809: "Leak_IndirectlyLost" valgrind report on mon.c
- /a/yuriw-2023-05-10_14:47:51-rados-wip-yuri5-testing-2023-05-09-1324-pacific-distro-default-smithi/7269818/smithi043/...
- 06:27 PM Backport #61303 (In Progress): reef: src/osd/PrimaryLogPG.cc: 4284: ceph_abort_msg("out of order ...
- https://github.com/ceph/ceph/pull/51666
- 06:24 PM Bug #58940 (Pending Backport): src/osd/PrimaryLogPG.cc: 4284: ceph_abort_msg("out of order op")
- 09:17 AM Bug #54682: crash: void ReplicatedBackend::_do_push(OpRequestRef): abort
- I encountered the same problem in v15.2.8,
-7> 2023-05-19T15:16:16.593+0800 7f0c1beed700 -1 /SDS-CICD/rpmbuild...
05/18/2023
- 09:47 PM Bug #46877: mon_clock_skew_check: expected MON_CLOCK_SKEW but got none
- /a/yuriw-2023-05-18_14:38:49-rados-wip-yuri-testing-2023-05-10-0917-distro-default-smithi/7277648
- 09:25 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- Laura Flores wrote:
> /a/yuriw-2023-05-11_15:01:38-rados-wip-yuri8-testing-2023-05-10-1402-distro-default-smithi/727... - 06:11 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- /a/yuriw-2023-05-11_15:01:38-rados-wip-yuri8-testing-2023-05-10-1402-distro-default-smithi/7271184
So far, no Reef... - 06:36 PM Bug #59333: PgScrubber: timeout on reserving replicas
- /a/yuriw-2023-05-11_15:01:38-rados-wip-yuri8-testing-2023-05-10-1402-distro-default-smithi/7271192
- 01:42 PM Bug #52316 (Fix Under Review): qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quorum'])...
- 01:33 PM Bug #52316 (In Progress): qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quorum']) == l...
- Since we are failing at the first assert of the check for quorum, and we had few iterators over the thrashing, it loo...
- 12:41 PM Backport #61232 (In Progress): reef: Tests failing with slow scrubs with new mClock default profile
- 07:29 AM Backport #61232 (Resolved): reef: Tests failing with slow scrubs with new mClock default profile
- https://github.com/ceph/ceph/pull/51569
- 12:36 PM Backport #61231 (In Progress): quincy: Tests failing with slow scrubs with new mClock default pro...
- 07:29 AM Backport #61231 (Resolved): quincy: Tests failing with slow scrubs with new mClock default profile
- https://github.com/ceph/ceph/pull/51568
- 07:28 AM Bug #61228 (Pending Backport): Tests failing with slow scrubs with new mClock default profile
- 07:26 AM Bug #61228 (Fix Under Review): Tests failing with slow scrubs with new mClock default profile
- 04:11 AM Bug #61228 (Resolved): Tests failing with slow scrubs with new mClock default profile
- After the changes made in https://github.com/ceph/ceph/pull/49975, teuthology tests are failing due to slow scrubs wi...
- 02:32 AM Bug #61226 (Duplicate): event duration is overflow
- ...
05/17/2023
- 10:55 PM Bug #59656: pg_upmap_primary timeout
- Hello Flaura, thanks for your answer.
Indeed your example explains a lot of things, i will try to understand more ... - 09:26 PM Bug #59656: pg_upmap_primary timeout
- Hi Kevin,
Yes, the read balancer does take primary affinity into account.
I will walk through an example on a v... - 09:43 PM Bug #51729: Upmap verification fails for multi-level crush rule
- Hi Chris,
If possible, can you try this change with your crush rule?
https://github.com/ceph/ceph/compare/main...... - 07:51 PM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
- /a/yuriw-2023-05-16_23:44:06-rados-wip-yuri10-testing-2023-05-16-1243-distro-default-smithi/7276255
Hey Nitzan, ye... - 07:49 AM Bug #49888 (Fix Under Review): rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTrie...
- 05:39 AM Bug #49888 (In Progress): rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: re...
- If my above comment is correct, teuthology also have that incorrect configuration as default, in placeholder.py we th...
- 05:32 AM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
- Looks like all the failure related to thrash-eio, i checked all the archived that we have (that still out there) and ...
- 07:43 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- pacific:
https://pulpito.ceph.com/yuriw-2023-05-15_21:56:33-fs-wip-yuri2-testing-2023-05-15-0810-pacific_2-distro-de... - 07:42 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- pacific - https://pulpito.ceph.com/yuriw-2023-05-15_21:56:33-fs-wip-yuri2-testing-2023-05-15-0810-pacific_2-distro-de...
- 07:24 AM Bug #59779 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
- 07:24 AM Bug #59778 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
- 07:24 AM Bug #59777 (Duplicate): crash: void PeeringState::check_past_interval_bounds() const: abort
05/16/2023
- 07:10 PM Bug #59656: pg_upmap_primary timeout
- Hello Flaura (my shortcut for Flores + Laura), I experienced another malfunction (or not ?) of the read balancer.
Ba... - 04:28 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- reef - https://pulpito.ceph.com/yuriw-2023-05-15_15:22:39-fs-wip-yuri6-testing-2023-04-26-1247-reef-distro-default-sm...
- 11:17 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- reef:
https://pulpito.ceph.com/yuriw-2023-05-09_19:37:41-fs-wip-yuri10-testing-2023-05-08-0849-reef-distro-default-s... - 04:08 PM Bug #49962: 'sudo ceph --cluster ceph osd crush tunables default' fails due to valgrind: Unknown ...
- reef -
https://pulpito.ceph.com/yuriw-2023-05-15_15:22:39-fs-wip-yuri6-testing-2023-04-26-1247-reef-distro-default-s... - 11:18 AM Bug #49962: 'sudo ceph --cluster ceph osd crush tunables default' fails due to valgrind: Unknown ...
- Seen again reef qa run:
https://pulpito.ceph.com/yuriw-2023-05-09_19:37:41-fs-wip-yuri10-testing-2023-05-08-0849-ree... - 10:33 AM Backport #61150 (In Progress): quincy: osd/PeeringState.cc: ceph_abort_msg("past_interval start i...
- 09:52 AM Backport #61149 (In Progress): pacific: osd/PeeringState.cc: ceph_abort_msg("past_interval start ...
Also available in: Atom