Activity
From 08/10/2022 to 09/08/2022
09/08/2022
- 04:38 PM Backport #57209: quincy: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47932
merged - 04:37 PM Bug #57467 (Fix Under Review): EncodingException.Macros fails on make check on quincy
- 11:18 AM Bug #57467: EncodingException.Macros fails on make check on quincy
- Since this has probably fallen off of Kefu's radar, I went ahead and opened https://github.com/ceph/ceph/pull/48016.
- 03:26 PM Documentation #57448 (Resolved): Doc: Update release notes on the fix for high CPU usage during r...
- 03:26 PM Backport #57461 (Resolved): quincy: Doc: Update release notes on the fix for high CPU usage durin...
09/07/2022
- 10:09 PM Bug #57467: EncodingException.Macros fails on make check on quincy
- There was an attempt to fix this issue here: https://github.com/ceph/ceph/pull/47938
- 10:07 PM Bug #57467 (Resolved): EncodingException.Macros fails on make check on quincy
- irvingi07: https://jenkins.ceph.com/job/ceph-pull-requests/103416...
- 06:12 PM Bug #54558: malformed json in a Ceph RESTful API call can stop all ceph-mon services
- I don't think https://github.com/ceph/ceph/pull/45547 is a complete fix, see my comment in the PR.
- 03:01 PM Bug #55233: librados C++ API requires C++17 to build
- https://github.com/ceph/ceph/pull/46005 merged
- 02:57 PM Backport #56736: quincy: unessesarily long laggy PG state
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47901
merged - 02:55 PM Backport #55297: quincy: malformed json in a Ceph RESTful API call can stop all ceph-mon services
- nikhil kshirsagar wrote:
> please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/... - 02:12 PM Backport #57461 (In Progress): quincy: Doc: Update release notes on the fix for high CPU usage du...
- 02:03 PM Backport #57461 (Resolved): quincy: Doc: Update release notes on the fix for high CPU usage durin...
- https://github.com/ceph/ceph/pull/48004
- 12:47 PM Bug #46847: Loss of placement information on OSD reboot
- The PR https://github.com/ceph/ceph/pull/40849 for adding the test was marked stale. I left a comment and it would be...
- 12:10 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Took a look at why peering was happening in the first place. Looking at PG 7.16 logs below, we can see that the balan...
- 05:25 AM Backport #57346: quincy: expected valgrind issues and found none
- /a/yuriw-2022-09-03_14:52:22-rados-wip-yuri-testing-2022-09-02-0945-quincy-distro-default-smithi/7009611
- 02:57 AM Bug #42884: OSDMapTest.CleanPGUpmaps failure
- adami03: https://jenkins.ceph.com/job/ceph-pull-requests/103395/console
09/06/2022
- 08:45 PM Backport #55309: pacific: prometheus metrics shows incorrect ceph version for upgraded ceph daemon
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47693
merged - 08:42 PM Backport #55305: quincy: Manager is failing to keep updated metadata in daemon_state for upgraded...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46559
merged - 05:57 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
- /a/yuriw-2022-08-22_16:21:19-rados-wip-yuri8-testing-2022-08-22-0646-distro-default-smithi/6985175
- 03:19 PM Documentation #57448 (Resolved): Doc: Update release notes on the fix for high CPU usage during r...
- 02:15 PM Backport #57312 (Resolved): quincy: Heap command prints with "ceph tell", but not with "ceph daemon"
- 12:59 PM Backport #55156 (Resolved): pacific: mon: config commands do not accept whitespace style config name
- 12:29 PM Backport #55308 (Resolved): pacific: Manager is failing to keep updated metadata in daemon_state ...
- 05:56 AM Backport #57443 (In Progress): quincy: osd: Update osd's IOPS capacity using async Context comple...
- 05:09 AM Backport #57443 (Resolved): quincy: osd: Update osd's IOPS capacity using async Context completio...
- https://github.com/ceph/ceph/pull/47983
- 04:42 AM Fix #57040 (Pending Backport): osd: Update osd's IOPS capacity using async Context completion ins...
09/05/2022
- 02:10 PM Backport #56641: quincy: Log at 1 when Throttle::get_or_fail() fails
- Radoslaw Zarzynski wrote:
> https://github.com/ceph/ceph/pull/47765
merged - 02:02 PM Backport #57372: quincy: segfault in librados via libcephsqlite
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47909
merged
09/04/2022
- 02:22 PM Bug #53000: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMapper::Item>::_void_d...
- ...
- 08:08 AM Backport #57346: quincy: expected valgrind issues and found none
- /a/yuriw-2022-09-02_15:23:14-rados-wip-yuri6-testing-2022-09-01-1034-quincy-distro-default-smithi/7008140/
/a/yuri...
09/02/2022
- 05:38 PM Backport #57117: quincy: mon: race condition between `mgr fail` and MgrMonitor::prepare_beacon()
- Should be backported with https://github.com/ceph/ceph/pull/47834.
- 05:08 PM Backport #57346 (In Progress): quincy: expected valgrind issues and found none
- 05:06 PM Backport #57209 (In Progress): quincy: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
- 04:59 PM Backport #55972 (Resolved): quincy: found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi10...
- 04:59 PM Backport #55972: quincy: found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi10121515-14:e...
- Already in quincy. See https://github.com/ceph/ceph/pull/46498.
- 04:49 PM Backport #57257 (In Progress): quincy: Assert in Ceph messenger
- 04:48 PM Backport #56723 (In Progress): quincy: osd thread deadlock
- 04:47 PM Backport #56655 (In Progress): quincy: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierF...
- 04:46 PM Backport #56602 (In Progress): quincy: ceph report missing osdmap_clean_epochs if answered by peon
- 04:34 PM Backport #55543 (In Progress): quincy: should use TCMalloc for better performance
- 04:34 PM Backport #55282 (In Progress): quincy: osd: add scrub duration for scrubs after recovery
- 04:31 PM Backport #56648 (In Progress): quincy: [Progress] Do not show NEW PG_NUM value for pool if autosc...
- 04:18 PM Backport #57312: quincy: Heap command prints with "ceph tell", but not with "ceph daemon"
- Laura Flores wrote:
> https://github.com/ceph/ceph/pull/47825
merged - 02:20 PM Bug #54172 (Resolved): ceph version 16.2.7 PG scrubs not progressing
- 02:19 PM Backport #56409 (Resolved): pacific: ceph version 16.2.7 PG scrubs not progressing
- 02:12 PM Feature #54600 (Resolved): Add scrub_duration to pg dump json format
- 02:12 PM Backport #54602 (Duplicate): quincy: Add scrub_duration to pg dump json format
- 02:10 PM Backport #54601 (Resolved): quincy: Add scrub_duration to pg dump json format
- 02:10 PM Backport #55065 (Rejected): quincy: osd_fast_shutdown_notify_mon option should be true by default
- 01:59 PM Backport #56551 (Resolved): quincy: mon/Elector: notify_ranked_removed() does not properly erase ...
- 01:53 PM Backport #57030 (Resolved): quincy: rados/test.sh: Early exit right after LibRados global tests c...
- 01:45 PM Backport #57289 (Rejected): quincy: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<Parallel...
- This backport ticket is a result of a thinko. Rejecting.
- 01:44 PM Backport #57288 (Rejected): pacific: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<Paralle...
- https://github.com/ceph/ceph/pull/45582 is NOT the fix. This backport ticket is a result of a thinko. Rejecting.
- 01:40 PM Bug #53000 (New): OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMapper::Item>::_...
- Sorry, moving back to @New@.
- 01:31 PM Bug #53740 (Resolved): mon: all mon daemon always crash after rm pool
- No need for backporting to quincy – the fix is already there (see the comment in the backport ticket). Resolving.
- 01:30 PM Backport #53977 (Rejected): quincy: mon: all mon daemon always crash after rm pool
- 01:29 PM Backport #53977: quincy: mon: all mon daemon always crash after rm pool
- The fix is already in quincy:...
- 01:02 PM Backport #56408 (Resolved): quincy: ceph version 16.2.7 PG scrubs not progressing
- 01:01 PM Backport #55157 (Resolved): quincy: mon: config commands do not accept whitespace style config name
- 12:55 PM Backport #55632 (Resolved): quincy: ceph-osd takes all memory before oom on boot
- The last missing part (the online dups trimming) is merged.
- 02:47 AM Bug #57119 (Pending Backport): Heap command prints with "ceph tell", but not with "ceph daemon"
- 02:39 AM Bug #57165: expected valgrind issues and found none
- Quincy runs:
https://pulpito.ceph.com/yuriw-2022-09-01_16:26:28-rados-wip-lflores-testing-2-2022-08-26-2240-quincy...
09/01/2022
- 11:03 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- https://github.com/ceph/ceph/pull/47650 merged
- 04:13 PM Backport #57372 (In Progress): quincy: segfault in librados via libcephsqlite
- 04:05 PM Backport #57372 (Resolved): quincy: segfault in librados via libcephsqlite
- https://github.com/ceph/ceph/pull/47909
- 04:05 PM Backport #57373 (Resolved): pacific: segfault in librados via libcephsqlite
- https://github.com/ceph/ceph/pull/48187
- 04:00 PM Bug #57152 (Pending Backport): segfault in librados via libcephsqlite
- 01:39 PM Backport #56736 (In Progress): quincy: unessesarily long laggy PG state
- 01:35 PM Bug #57163 (Fix Under Review): free(): invalid pointer
- >Many thanks to Josh for suggesting we may be dealing with a compiler mismatch here and sorry if you were working on ...
- 12:30 PM Bug #57163: free(): invalid pointer
- /a/yuriw-2022-09-01_00:21:36-rados-wip-yuri7-testing-2022-08-31-0841-distro-default-smithi/7003413
- 01:24 PM Backport #56734 (In Progress): pacific: unessesarily long laggy PG state
- 10:36 AM Bug #49231: MONs unresponsive over extended periods of time
- OK, I did some more work and it looks like I can trigger the issue with some certainty by failing an MDS that was up ...
08/31/2022
- 10:52 PM Bug #57163: free(): invalid pointer
- Many thanks to Josh for suggesting we may be dealing with a compiler mismatch here and sorry if you were working on t...
- 10:21 AM Bug #51194: PG recovery_unfound after scrub repair failed on primary
- Hi,
We suffered exactly the same problem at IJCLab: a flappy OSD (with unmonitored smartd preventive errors) cause... - 09:55 AM Backport #57346 (Resolved): quincy: expected valgrind issues and found none
- https://github.com/ceph/ceph/pull/47933
- 09:55 AM Bug #57165 (Pending Backport): expected valgrind issues and found none
08/30/2022
- 12:56 PM Bug #57340 (Fix Under Review): ceph log last command fail to log by verbosity level
- 12:34 PM Bug #57340 (Resolved): ceph log last command fail to log by verbosity level
- We see debug logs even if we intend to get cluster log at log level WARN....
- 12:27 PM Bug #57310: StriperTest: The futex facility returned an unexpected error code
- Looks like https://github.com/ceph/ceph/pull/47841 will fix it
- 10:38 AM Backport #55633: octopus: ceph-osd takes all memory before oom on boot
- The original path was been reverted by https://github.com/ceph/ceph/pull/46611
Hence the issue shouldn't be in Res... - 07:02 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- I have been going through the failure logs mentioned above and I see that the health check does pass eventually:
...
08/29/2022
08/28/2022
- 06:30 AM Backport #57316 (In Progress): quincy: add an asok command for pg log investigations
- 06:27 AM Backport #57315 (In Progress): pacific: add an asok command for pg log investigations
08/27/2022
- 04:03 PM Bug #56847 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)
- #56850
- 04:02 PM Bug #56848 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)
- #56850
- 04:02 PM Bug #56849 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)
- #56850
- 04:00 PM Bug #56850 (Fix Under Review): crash: void PaxosService::propose_pending(): assert(have_pending)
- 06:44 AM Backport #57316 (In Progress): quincy: add an asok command for pg log investigations
- https://github.com/ceph/ceph/pull/47840
- 06:44 AM Backport #57315 (In Progress): pacific: add an asok command for pg log investigations
- https://github.com/ceph/ceph/pull/47839
- 06:39 AM Bug #55836 (Pending Backport): add an asok command for pg log investigations
08/26/2022
- 10:27 PM Backport #57312 (In Progress): quincy: Heap command prints with "ceph tell", but not with "ceph d...
- https://github.com/ceph/ceph/pull/47825
- 10:18 PM Backport #57312 (Resolved): quincy: Heap command prints with "ceph tell", but not with "ceph daemon"
- 10:19 PM Bug #57119 (Fix Under Review): Heap command prints with "ceph tell", but not with "ceph daemon"
- 10:13 PM Bug #57119 (Pending Backport): Heap command prints with "ceph tell", but not with "ceph daemon"
- Setting to "Pending backport" briefly so I can create backport trackers.
- 10:18 PM Backport #57313 (Resolved): pacific: Heap command prints with "ceph tell", but not with "ceph dae...
- https://github.com/ceph/ceph/pull/48106
- 06:54 PM Bug #57310: StriperTest: The futex facility returned an unexpected error code
- Laura Flores wrote:
> /a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/69... - 06:54 PM Bug #57310 (Resolved): StriperTest: The futex facility returned an unexpected error code
- /a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986262
/a/yuriw-2022-08... - 04:45 PM Backport #57076 (Resolved): pacific: Invalid read of size 8 in handle_recovery_delete()
- 05:18 AM Bug #57163: free(): invalid pointer
- I did an interactive rerun of /a/lflores-2022-08-17_21:04:23-rados:singleton-nomsgr-wip-yuri4-testing-2022-08-15-0951...
08/25/2022
- 10:27 PM Bug #57163: free(): invalid pointer
- /a/lflores-2022-08-17_21:04:23-rados:singleton-nomsgr-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6977853...
- 07:41 PM Bug #57165: expected valgrind issues and found none
- /a/yuriw-2022-08-17_19:34:54-rados-wip-yuri7-testing-2022-08-17-0943-quincy-distro-default-smithi/6977767
- 11:25 AM Bug #57165 (Fix Under Review): expected valgrind issues and found none
- 11:16 AM Bug #57165: expected valgrind issues and found none
- This is a memory optimization "fault" - the new gcc causing that to not leak the memory that we are trying to leak.
- 10:12 AM Bug #57165 (In Progress): expected valgrind issues and found none
- 10:12 AM Bug #57165: expected valgrind issues and found none
- we are leaking moemory with "ceph tell mon.a leak_some_memory" for some reason we are not seeing any memory leak in v...
- 06:48 AM Bug #53000: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMapper::Item>::_void_d...
- I don't understand why PR ID for this bug was set to 41323. I mentioned the PR 41323 not as the fix but as an example...
08/24/2022
- 06:31 PM Backport #57288 (Resolved): pacific: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<Paralle...
- https://github.com/ceph/ceph/pull/45582
- 06:30 PM Backport #57288 (Rejected): pacific: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<Paralle...
- 06:30 PM Backport #57289 (Rejected): quincy: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<Parallel...
- 06:29 PM Bug #53000 (Pending Backport): OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMap...
- 06:27 PM Bug #53000 (Fix Under Review): OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMap...
- 06:24 PM Bug #51168 (New): ceph-osd state machine crash during peering process
- I plan to work on this one and combine with implementing the backfill cancellation in crimson. However, not a terribl...
- 06:16 PM Bug #50536: "Command failed (workunit test rados/test.sh)" - rados/test.sh times out on master.
- It think the reoccurrences are about failure in a different place – at least in the latest one the @LibRadosServicePP...
- 06:06 PM Bug #57165: expected valgrind issues and found none
- Bumped the priority up as I'm afraid the longer we wait with ensuring valgrind is fully operational, the greater is t...
- 12:19 AM Bug #57165: expected valgrind issues and found none
- What I'm seeing is that the jobs in question were told to expect valgrind errors via the @expect_valgrind_errors: tru...
- 05:50 PM Bug #57163: free(): invalid pointer
- How about having it as _High_?
- 04:15 PM Feature #57180: option for pg_autoscaler to retain same state of existing pools when upgrading to...
- Downstream ceph-ansible BZ - https://bugzilla.redhat.com/show_bug.cgi?id=2121097
- 03:50 PM Feature #57180 (Rejected): option for pg_autoscaler to retain same state of existing pools when u...
- 12:22 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- Alexandre Marangone wrote:
> Attached the debug_osd 20 logs for one the OSD. I turned off (deep)scrub cause the logs...
08/23/2022
- 11:27 PM Bug #57163: free(): invalid pointer
- Maybe "urgent" is too dramatic, but this seems to be affecting a lot of tests in main.
- 10:50 PM Bug #57163: free(): invalid pointer
- /a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986255...
- 04:19 PM Bug #57163: free(): invalid pointer
- Local Run with -fsanitize=address warns about a data race at the same stage, may be relevant....
- 03:53 PM Bug #57163: free(): invalid pointer
- Kefu Chai wrote:
> /a/kchai-2022-08-23_13:19:39-rados-wip-kefu-testing-2022-08-22-2243-distro-default-smithi/6987883... - 03:33 PM Bug #57163: free(): invalid pointer
- This failure (as of right now) only occurs on Ubuntu 20.04. See https://github.com/ceph/ceph/pull/47642 for some exam...
- 03:11 PM Bug #57163: free(): invalid pointer
- /a/kchai-2022-08-23_13:19:39-rados-wip-kefu-testing-2022-08-22-2243-distro-default-smithi/6987883/teuthology.log
- 10:13 PM Bug #57165: expected valgrind issues and found none
- To me, this seems like a Teuthology failure. Perhaps Zack Cerza can rule this theory in/out.
In any case, it look... - 10:09 PM Bug #57165: expected valgrind issues and found none
- /a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986197
- 07:48 PM Feature #57180: option for pg_autoscaler to retain same state of existing pools when upgrading to...
- We discussed this one and the issue is in the following rolling upgrade playbook -
https://github.com/ceph/ceph-a... - 07:03 PM Feature #57180: option for pg_autoscaler to retain same state of existing pools when upgrading to...
- Update:
Since all the upgrade suites in Pacific turn the autoscaler off by default. I had to write a new upgrade t... - 07:00 PM Bug #57267 (New): Valgrind reports memory "Leak_IndirectlyLost" errors on ceph-mon in "KeyServerD...
- /a/yuriw-2022-08-19_20:57:42-rados-wip-yuri6-testing-2022-08-19-0940-pacific-distro-default-smithi/6981517/remote/smi...
- 05:30 PM Backport #56641 (In Progress): quincy: Log at 1 when Throttle::get_or_fail() fails
- https://github.com/ceph/ceph/pull/47765
- 05:17 PM Backport #56642 (In Progress): pacific: Log at 1 when Throttle::get_or_fail() fails
- https://github.com/ceph/ceph/pull/47764
- 03:32 PM Bug #57122 (Resolved): test failure: rados:singleton-nomsgr librados_hello_world
- 03:30 PM Backport #57258 (Resolved): pacific: Assert in Ceph messenger
- https://github.com/ceph/ceph/pull/48255
- 03:30 PM Backport #57257 (Resolved): quincy: Assert in Ceph messenger
- https://github.com/ceph/ceph/pull/47931
- 03:28 PM Bug #55851 (Pending Backport): Assert in Ceph messenger
- 03:12 PM Bug #56147 (Resolved): snapshots will not be deleted after upgrade from nautilus to pacific
- 03:11 PM Backport #56579 (Resolved): pacific: snapshots will not be deleted after upgrade from nautilus to...
- 03:09 PM Backport #56579: pacific: snapshots will not be deleted after upgrade from nautilus to pacific
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47134
merged - 09:29 AM Bug #50536: "Command failed (workunit test rados/test.sh)" - rados/test.sh times out on master.
- /a/yuriw-2022-08-22_21:19:34-rados-wip-yuri4-testing-2022-08-18-1020-pacific-distro-default-smithi/6986471
- 07:54 AM Backport #54386 (Resolved): octopus: [RFE] Limit slow request details to mgr log
08/22/2022
- 08:52 PM Backport #57029: pacific: rados/test.sh: Early exit right after LibRados global tests complete
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47451
merged - 06:24 PM Bug #51168: ceph-osd state machine crash during peering process
- Radoslaw Zarzynski wrote:
> The PG was in @ReplicaActive@ so we shouldn't see any backfill activity. A delayed event... - 03:50 PM Bug #53000: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMapper::Item>::_void_d...
- Quincy PR:
https://jenkins.ceph.com/job/ceph-pull-requests/102036/consoleFull - 03:31 PM Bug #43268: Restrict admin socket commands more from the Ceph tool
- Radek, I think this was misunderstood. It's a security issue that resulted from exposing all admin socket commands vi...
- 01:24 PM Bug #57152: segfault in librados via libcephsqlite
- Matan Breizman wrote:
> I have managed to reproduce similar segfault.
> The relevant code:
> https://github.com/ce... - 08:44 AM Bug #57152: segfault in librados via libcephsqlite
- I have managed to reproduce similar segfault.
The relevant code:
https://github.com/ceph/ceph/blob/main/src/SimpleR... - 09:45 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Seen in these recent pacific runs:
1. https://pulpito.ceph.com/yuriw-2022-08-18_23:16:33-fs-wip-yuri10-testing-202...
08/21/2022
- 06:39 AM Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific
- Stefan Kooman wrote:
> Is this bug also affecting rbd snapshots / clones?
Yes
08/19/2022
- 11:24 PM Backport #55157: quincy: mon: config commands do not accept whitespace style config name
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47381
merged - 09:27 PM Backport #57209 (Resolved): quincy: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
- https://github.com/ceph/ceph/pull/47932
- 09:27 PM Backport #57208 (In Progress): pacific: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
- 09:18 PM Bug #49727 (Pending Backport): lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
- /a/yuriw-2022-08-11_16:46:00-rados-wip-yuri3-testing-2022-08-11-0809-pacific-distro-default-smithi/6968195...
- 04:21 PM Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific
- Is this bug also affecting rbd snapshots / clones?
- 11:34 AM Backport #55631: pacific: ceph-osd takes all memory before oom on boot
- Off-line fix: https://github.com/ceph/ceph/pull/46252
Online fix: https://github.com/ceph/ceph/pull/47701 - 10:26 AM Backport #55631 (In Progress): pacific: ceph-osd takes all memory before oom on boot
- Unresolving as the ultimate fix is consisted of 2 PRs (off-line + on-line trimming).
- 10:28 AM Backport #55632: quincy: ceph-osd takes all memory before oom on boot
- Radoslaw Zarzynski wrote:
> Unresolving as the ultimate fix is consisted of 2 PRs while the 2nd one is under review.... - 10:24 AM Backport #55632 (In Progress): quincy: ceph-osd takes all memory before oom on boot
- Unresolving as the ultimate fix is consisted of 2 PRs while the 2nd one is under review.
- 10:06 AM Backport #55632: quincy: ceph-osd takes all memory before oom on boot
- The online (by OSD in contrast to the COT-based off-line one) is here: https://github.com/ceph/ceph/pull/47688.
- 07:48 AM Bug #57190 (New): pg shard status inconsistency in one pg
- ...
- 05:57 AM Backport #55309 (In Progress): pacific: prometheus metrics shows incorrect ceph version for upgra...
- 05:56 AM Backport #55309 (New): pacific: prometheus metrics shows incorrect ceph version for upgraded ceph...
- 04:17 AM Backport #55309: pacific: prometheus metrics shows incorrect ceph version for upgraded ceph daemon
- Reverted backport PR#46429 from pacific (revert PR https://github.com/ceph/ceph/pull/46921) due to tracker https://tr...
- 05:54 AM Backport #55308 (In Progress): pacific: Manager is failing to keep updated metadata in daemon_sta...
- 05:52 AM Backport #55308 (New): pacific: Manager is failing to keep updated metadata in daemon_state for u...
- 04:16 AM Backport #55308: pacific: Manager is failing to keep updated metadata in daemon_state for upgrade...
- We had to revert backport PR#46427 from pacific (revert PR https://github.com/ceph/ceph/pull/46920) due to https://tr...
08/18/2022
- 07:51 PM Bug #23117 (Fix Under Review): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has ...
- 07:50 PM Bug #23117 (Duplicate): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has been ex...
- 07:50 PM Bug #57185 (Duplicate): EC 4+2 PG stuck in activating+degraded+remapped
- 07:49 PM Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped
- This should have been easily caught if we had this implemented:
https://tracker.ceph.com/issues/23117
https://git... - 07:48 PM Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped
- Tim - Two workaround as you see your script sometimes balancer module keeps changing PGs stat that is not a valid tes...
- 07:43 PM Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped
- - Here we were testing OSD failure - recovery/backfill for mClock Scheduler and for this we bring in phases one OSD n...
- 07:41 PM Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped
- We did capture the debug logs(debug_osd = 20 and debug_ms = 1) and they are here - f28-h28-000-r630.rdu2.scalelab.red...
- 07:25 PM Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped
- From Cluster logs:...
- 05:31 PM Bug #57185 (Duplicate): EC 4+2 PG stuck in activating+degraded+remapped
- - PG Query...
- 04:27 PM Feature #57180 (Rejected): option for pg_autoscaler to retain same state of existing pools when u...
- Currently, any version of Ceph that is >= Pacific will have autoscaler enabled by default even for existing pools.
W... - 03:20 PM Bug #57152: segfault in librados via libcephsqlite
- Patrick Donnelly wrote:
> Matan Breizman wrote:
> > > So the problem is that gcc 8.5, which compiles successfully, ... - 01:19 PM Bug #57152: segfault in librados via libcephsqlite
- Matan Breizman wrote:
> > So the problem is that gcc 8.5, which compiles successfully, generates code which causes t... - 01:11 PM Bug #57152: segfault in librados via libcephsqlite
- > So the problem is that gcc 8.5, which compiles successfully, generates code which causes the segfault?
From the ... - 12:54 PM Bug #57152: segfault in librados via libcephsqlite
- Matan Breizman wrote:
> This PR seems to resolve the compilation errors mentioned above.
> Please let me know your ... - 12:44 PM Bug #57152: segfault in librados via libcephsqlite
- This PR seems to resolve the compilation errors mentioned above.
Please let me know your thoughts. - 11:59 AM Bug #57152: segfault in librados via libcephsqlite
- There seems to be an issue with the gcc version used to compile, I noticed similar issue when compiling `examples/lib...
- 03:18 PM Backport #57030: quincy: rados/test.sh: Early exit right after LibRados global tests complete
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47452
merged - 03:16 PM Backport #56578 (Resolved): quincy: snapshots will not be deleted after upgrade from nautilus to ...
- 03:15 PM Backport #56578: quincy: snapshots will not be deleted after upgrade from nautilus to pacific
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47133
merged - 11:44 AM Bug #45721 (Fix Under Review): CommandFailedError: Command failed (workunit test rados/test_pytho...
- 07:48 AM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
- /a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973760
- 07:22 AM Bug #50536: "Command failed (workunit test rados/test.sh)" - rados/test.sh times out on master.
- OSError: Socket is closed
/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-defau... - 06:40 AM Bug #57165: expected valgrind issues and found none
- /a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973889
/a/yuriw-2...
08/17/2022
- 07:05 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- Although there was a report from Telemetry, we still need more logs (read: a reoccurance at Sepia) which, hopefully, ...
- 07:00 PM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
- It looks like this must be some inconsistency between the head_obc->blocked state, and its presence in objects_blocke...
- 06:56 PM Bug #56661 (Need More Info): Quincy: OSD crashing one after another with data loss with ceph_asse...
- Moving into _Need More Info_ :-( per Myoungwon Oh's comment.
- 06:41 PM Bug #57017 (Fix Under Review): mon-stretched_cluster: degraded stretched mode lead to Monitor crash
- 06:34 PM Bug #45702: PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter...
- We're poking with @read_log_and_missing()@ pretty recently (the dups issue). Does it ring a bell?
- 06:32 PM Bug #45702: PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter...
- @Ronen: @debug_verify_stored_missing@ a an input parameter with default value. Unfortunately, it doesn't look like a ...
- 06:24 PM Bug #45702: PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter...
- Even if we don't want to deep dive into right now, we should refactor the assertion:...
- 06:19 PM Bug #57147: qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failure
- Neha Ojha wrote:
> How reproducible is this? Following logs indicate that we ran out of space.
>
We have seen t... - 02:25 PM Bug #57147: qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failure
- How reproducible is this? Following logs indicate that we ran out of space....
- 06:04 PM Bug #57152: segfault in librados via libcephsqlite
- The reporter on the ML shared the .mgr pool in question:...
- 01:59 PM Bug #57165 (Resolved): expected valgrind issues and found none
- ...
- 01:42 PM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
- a/teuthworker/archive/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975387
- 01:35 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- ...
- 01:24 PM Bug #57119 (Fix Under Review): Heap command prints with "ceph tell", but not with "ceph daemon"
- 01:12 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- There is another thing we should unify: @daemon@ and @tell@ differently treat @ss@ (the error stream) when the error ...
- 12:48 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- Similar problem may affect e.g. @flush_store_cache@:...
- 12:27 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- Oops, this is the case:...
- 12:25 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- The interface between `outbl` and the formatter is described at the beginning of the function:...
- 12:27 AM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- Appending the stringstream to the "outbl" bufferlist here makes the output print with the "daemon" version, but it is...
- 12:11 AM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- ...
- 12:08 AM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- ...
- 01:21 PM Bug #57163 (Resolved): free(): invalid pointer
- ...
08/16/2022
- 07:28 PM Bug #57152 (Resolved): segfault in librados via libcephsqlite
- We have a post on the ML about a segfault in the mgr:
"[ceph-users] Quincy: Corrupted devicehealth sqlite3 databas... - 06:07 PM Bug #57122 (Fix Under Review): test failure: rados:singleton-nomsgr librados_hello_world
- 06:00 PM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
- The fix is just to change any reference from "master" to "main":...
- 05:51 PM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
- A couple present in this run:
/a/yuriw-2022-08-15_18:43:38-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-sm... - 07:30 AM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
- What surprised me is we still use GNU Make there.
- 07:30 AM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
- Huh, this doesn't look like a compiler's failure. It happens earlier:...
- 01:36 PM Bug #57147 (New): qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failure
- The teuthology link https://pulpito.ceph.com/yuriw-2022-08-11_16:57:01-fs-wip-yuri3-testing-2022-08-11-0809-pacific-d...
- 07:53 AM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
- How about adding to suppression file regarding the comment #1?
- 07:50 AM Bug #52513: BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
- Would you like to have a look? Neha confirms the logs mentioned in #9 are still available.
- 07:40 AM Bug #56097: Timeout on `sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ...
- NCB was added in Quincy, so breaking the relationship.
- 07:28 AM Bug #57136: ecpool pg stay active+clean+remapped
- Radoslaw Zarzynski wrote:
> It looks @osd.18@ isn't up. Could you please share the @ceph -s@ and @ceph osd tree@?
<... - 07:20 AM Bug #57136 (Need More Info): ecpool pg stay active+clean+remapped
- It looks @osd.18@ isn't up. Could you please share the @ceph -s@ and @ceph osd tree@?
- 05:29 AM Bug #57136 (Need More Info): ecpool pg stay active+clean+remapped
- I create a ec pool, the erasure code profile is:...
- 07:12 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Gonzalo Aguilar Delgado wrote:
> Wow I`m quite surprised to see this is taking so much time to be resolved. >
> Can... - 07:09 AM Bug #53729 (Fix Under Review): ceph-osd takes all memory before oom on boot
- 04:23 AM Backport #56134 (In Progress): quincy: scrub starts message missing in cluster log
08/15/2022
- 03:29 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- Attached the debug_osd 20 logs for one the OSD. I turned off (deep)scrub cause the logs were spammed with scrub error...
- 11:22 AM Backport #51173 (Rejected): nautilus: regression in ceph daemonperf command output, osd columns a...
- Nautilus is EOL
- 11:21 AM Bug #51115 (Resolved): When read failed, ret can not take as data len, in FillInVerifyExtent
- 11:21 AM Backport #51151 (Rejected): nautilus: When read failed, ret can not take as data len, in FillInVe...
- Nautilus is EOL
- 11:21 AM Bug #50978 (Resolved): unaligned access to member variables of crush_work_bucket
- 11:21 AM Backport #50985 (Rejected): nautilus: unaligned access to member variables of crush_work_bucket
- Nautilus is EOL
- 11:19 AM Bug #50763 (Resolved): osd: write_trunc omitted to clear data digest
- 11:19 AM Backport #50789 (Rejected): nautilus: osd: write_trunc omitted to clear data digest
- Nautilus is EOL
- 11:17 AM Bug #50745 (Resolved): max_misplaced was replaced by target_max_misplaced_ratio
- 11:17 AM Backport #50749 (Rejected): nautilus: max_misplaced was replaced by target_max_misplaced_ratio
- Nautilus is EOL
- 08:19 AM Bug #56386 (Can't reproduce): Writes to a cephfs after metadata pool snapshot causes inconsistent...
08/14/2022
- 08:59 AM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
- Possibly related to the move to C++20? the makefile specifies c++11
- 08:58 AM Bug #57122 (Resolved): test failure: rados:singleton-nomsgr librados_hello_world
- e.g. rfriedma-2022-08-13_08:48:40-rados:singleton-nomsgr-wip-rf-like-main-distro-default-smithi/6970569
(the branc...
08/12/2022
- 08:18 PM Bug #57119 (Resolved): Heap command prints with "ceph tell", but not with "ceph daemon"
- *How to reproduce:*
# Start a vstart cluster, or access any working cluster
# Run `ceph tell <daemon>.<id> heap <he... - 02:21 PM Backport #57117 (Resolved): quincy: mon: race condition between `mgr fail` and MgrMonitor::prepar...
- https://github.com/ceph/ceph/pull/50979
- 02:17 PM Bug #55711 (Pending Backport): mon: race condition between `mgr fail` and MgrMonitor::prepare_bea...
- 01:55 PM Cleanup #56581 (Resolved): mon: fix ElectionLogic warnings
08/11/2022
- 11:54 PM Bug #56097: Timeout on `sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ...
- @Adam maybe you'd have an idea of what's going on here?
- 11:54 PM Bug #56097: Timeout on `sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ...
- This one went dead after awhile:
/a/yuriw-2022-08-04_20:43:31-rados-wip-yuri6-testing-2022-08-04-0617-pacific-distro... - 06:57 PM Bug #57105: quincy: ceph osd pool set <pool> size math error
- So I thought this may have been because I re-used the name, so I went to create a pool with a different name to conti...
- 06:54 PM Bug #57105: quincy: ceph osd pool set <pool> size math error
- Looks like one of the page groups is "inactive":...
- 06:46 PM Bug #57105 (Resolved): quincy: ceph osd pool set <pool> size math error
- Context, I created a pool with a block device and intentionally filled a set of OSDs.
This of course broke things,... - 04:07 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- I won't be able to rerun the patched branch until Monday. Haven't you been able to reproduce it? Feels trivial to so ...
- 05:01 AM Bug #56707: pglog growing unbounded on EC with copy by ref
- Alex, can you share logs of the osd that caused the 500s? my theory is peering un-match pglog since one osd (with the...
- 03:56 PM Bug #57074 (Duplicate): common: Latest version of main experiences build failures
- 02:16 PM Bug #57097 (Fix Under Review): ceph status does not report an application is not enabled on the p...
- 12:36 PM Bug #57097 (Pending Backport): ceph status does not report an application is not enabled on the p...
- If pool has 0 objects in it then ceph status (and ceph health detail) does not report application is not enabled for ...
- 12:30 PM Bug #57049 (Fix Under Review): cluster logging does not adhere to mon_cluster_log_file_level
- 10:25 AM Bug #52807 (Resolved): ceph-erasure-code-tool: new tool to encode/decode files
- 10:24 AM Backport #52808 (Rejected): nautilus: ceph-erasure-code-tool: new tool to encode/decode files
- Nautilus is EOL
- 10:24 AM Bug #52448 (Resolved): osd: pg may get stuck in backfill_toofull after backfill is interrupted du...
- 10:24 AM Backport #52832 (Rejected): nautilus: osd: pg may get stuck in backfill_toofull after backfill is...
- 10:24 AM Backport #52832 (Resolved): nautilus: osd: pg may get stuck in backfill_toofull after backfill is...
- Nautilus is EOL
- 10:23 AM Backport #52938 (Rejected): nautilus: Primary OSD crash caused corrupted object and further crash...
- Nautilus is EOL
- 10:21 AM Backport #52771 (Rejected): nautilus: pg scrub stat mismatch with special objects that have hash ...
- Nautilus is EOL
- 10:20 AM Bug #42742 (Resolved): "failing miserably..." in Infiniband.cc
- 10:20 AM Backport #42848 (Rejected): nautilus: "failing miserably..." in Infiniband.cc
- Nautilus is EOL
- 10:20 AM Bug #43656 (Resolved): AssertionError: not all PGs are active or peered 15 seconds after marking ...
- 10:20 AM Backport #43776 (Rejected): nautilus: AssertionError: not all PGs are active or peered 15 seconds...
- 10:19 AM Backport #43776 (Resolved): nautilus: AssertionError: not all PGs are active or peered 15 seconds...
- Nautilus is EOL
08/10/2022
- 02:35 PM Bug #57074: common: Latest version of main experiences build failures
- @Kefu the main issue seems to be that install-deps is broken for Centos 8 Stream; currently, it halts when trying to ...
- 06:13 AM Bug #57074: common: Latest version of main experiences build failures
- probably another thing i can do is to enable cmake to error out if a C++ compiler not compliant with C++20 standard i...
- 06:11 AM Bug #57074: common: Latest version of main experiences build failures
- Laura, i am sorry for the inconvenience.
this is expected if the stock gcc compiler on an aged (even not ancient) ... - 05:17 AM Bug #57074: common: Latest version of main experiences build failures
- I have encountered these compilation errors on Ubuntu 20.4.0. Basically you need gcc version >= 10.1 Using install-de...
- 02:03 PM Fix #56709 (Resolved): test/osd/TestPGLog: Fix confusing description between log and olog.
- 10:44 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Seen in these pacific runs
1. https://pulpito.ceph.com/yuriw-2022-08-04_20:54:08-fs-wip-yuri6-testing-2022-08-04-061... - 07:15 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Seen in https://pulpito.ceph.com/yuriw-2022-08-09_15:36:21-fs-wip-yuri8-testing-2022-08-03-1028-quincy-distro-default...
- 07:11 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Seen in https://pulpito.ceph.com/yuriw-2022-08-04_11:54:20-fs-wip-yuri8-testing-2022-08-03-1028-quincy-distro-default...
- 08:48 AM Bug #43813 (Resolved): objecter doesn't send osd_op
- 08:48 AM Backport #43992 (Rejected): nautilus: objecter doesn't send osd_op
- Nautilus is EOL
- 08:48 AM Bug #52486 (Closed): test tracker: please ignore
- 08:47 AM Backport #52498 (Rejected): nautilus: test tracker: please ignore
- 08:47 AM Backport #52497 (Rejected): octopus: test tracker: please ignore
- 08:47 AM Backport #52495 (Rejected): pacific: test tracker: please ignore
- 07:54 AM Bug #52509 (Can't reproduce): PG merge: PG stuck in premerge+peered state
- We have never experienced this problem again
- 05:56 AM Backport #57076 (In Progress): pacific: Invalid read of size 8 in handle_recovery_delete()
Also available in: Atom