Project

General

Profile

Activity

From 08/16/2022 to 09/14/2022

09/14/2022

09:18 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
Quincy revert PR https://github.com/ceph/ceph/pull/48104
Not sure if we want to put this as a "fix".
Laura Flores
09:03 PM Bug #57546 (Fix Under Review): rados/thrash-erasure-code: wait_for_recovery timeout due to "activ...
When testing the Quincy RC for 17.2.4, we discovered this failure:
Description: rados/thrash-erasure-code/{ceph cl...
Laura Flores
07:24 PM Backport #57545 (Resolved): quincy: CommandFailedError: Command failed (workunit test rados/test_...
https://github.com/ceph/ceph/pull/48113 Backport Bot
07:24 PM Backport #57544 (Resolved): pacific: CommandFailedError: Command failed (workunit test rados/test...
https://github.com/ceph/ceph/pull/48112 Backport Bot
07:16 PM Bug #45721 (Pending Backport): CommandFailedError: Command failed (workunit test rados/test_pytho...
Neha Ojha
03:01 PM Bug #44595: cache tiering: Error: oid 48 copy_from 493 returned error code -2
/a/yuriw-2022-09-10_14:05:53-rados-quincy-release-distro-default-smithi/7024401... Laura Flores
09:44 AM Bug #49524 (In Progress): ceph_test_rados_delete_pools_parallel didn't start
Nitzan Mordechai
09:44 AM Bug #49524: ceph_test_rados_delete_pools_parallel didn't start
My theory is that fork failed, which caused all the test not to run, this is the only place we won't get any printing... Nitzan Mordechai
09:35 AM Bug #45702 (Fix Under Review): PGLog::read_log_and_missing: ceph_assert(miter == missing.get_item...
Nitzan Mordechai
07:10 AM Bug #57533 (Resolved): Able to modify the mclock reservation, weight and limit parameters when bu...

[ceph: root@magna086 /]# ceph config get osd osd_mclock_scheduler_client_res
1
[ceph: root@magna086 /]# ceph conf...
Srinivasa Bharath Kanta
06:27 AM Bug #57532: Notice discrepancies in the performance of mclock built-in profiles
From the following data, I noticed that -
1. In the case-1, for all profiles the IO reservations for high_clien...
Srinivasa Bharath Kanta
06:23 AM Bug #57532 (Duplicate): Notice discrepancies in the performance of mclock built-in profiles
Downstream BZ- https://bugzilla.redhat.com/show_bug.cgi?id=2126274 Srinivasa Bharath Kanta

09/13/2022

09:16 PM Bug #57529 (Resolved): mclock backfill is getting higher priority than WPQ
Downstream BZ - https://bugzilla.redhat.com/show_bug.cgi?id=2126559
Version - 17.2.1
Vikhyat Umrao

09/12/2022

11:39 PM Bug #48840 (Closed): Octopus: Assert failure: test_ceph_osd_pool_create_utf8
Closing, as this was only reported in Octopus, which is EOL. Laura Flores
08:15 PM Bug #57310: StriperTest: The futex facility returned an unexpected error code
@Radek yes, thanks, this issue is still ongoing. Laura Flores
07:13 PM Bug #57310: StriperTest: The futex facility returned an unexpected error code
Accordingly to "Patrick's comment in PR #47841":https://github.com/ceph/ceph/pull/47841 it doesn't addresses the prob... Radoslaw Zarzynski
06:57 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
Let's move back to it next week. Radoslaw Zarzynski
06:54 PM Bug #57467 (Resolved): EncodingException.Macros fails on make check on quincy
Radoslaw Zarzynski
06:51 PM Bug #51194: PG recovery_unfound after scrub repair failed on primary
Would be great to have a replicator for this. Let's see whether we can make a standalone test exercising the sequence... Radoslaw Zarzynski
06:33 PM Bug #43268: Restrict admin socket commands more from the Ceph tool
Tagging as medium-hanging-fruit as, IIUC, we would need to:
0. (only if necessary): introduce a config variable to...
Radoslaw Zarzynski
02:34 PM Feature #53050 (Resolved): Support blocklisting a CIDR range
Greg Farnum
02:34 PM Backport #55747 (Resolved): pacific: Support blocklisting a CIDR range
Greg Farnum
02:14 PM Bug #53729 (Pending Backport): ceph-osd takes all memory before oom on boot
Konstantin Shalygin
02:14 PM Backport #55631: pacific: ceph-osd takes all memory before oom on boot
https://github.com/ceph/ceph/pull/47701 Konstantin Shalygin
11:07 AM Bug #49524: ceph_test_rados_delete_pools_parallel didn't start
The printing will be flushed only after the process complete, in that case of ceph_test_rados_delete_pools_parallel, ... Nitzan Mordechai
06:53 AM Bug #54558: malformed json in a Ceph RESTful API call can stop all ceph-mon services
Ilya Dryomov wrote:
> I don't think https://github.com/ceph/ceph/pull/45547 is a complete fix, see my comment in the...
nikhil kshirsagar

09/11/2022

03:42 PM Bug #51194: PG recovery_unfound after scrub repair failed on primary
Just hit this in a v15.2.15 cluster too. Michel which version does your cluster run? Dan van der Ster
05:05 AM Backport #57496 (In Progress): quincy: Invalid read of size 8 in handle_recovery_delete()
Nitzan Mordechai
05:03 AM Backport #57496 (Resolved): quincy: Invalid read of size 8 in handle_recovery_delete()
https://github.com/ceph/ceph/pull/48039 Nitzan Mordechai

09/09/2022

06:50 PM Backport #57257: quincy: Assert in Ceph messenger
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47931
merged
Yuri Weinstein
04:59 PM Bug #49524: ceph_test_rados_delete_pools_parallel didn't start
Nitzan, can you please take a look at this issue? seems intermittent, but still exists Neha Ojha
04:44 PM Bug #49524: ceph_test_rados_delete_pools_parallel didn't start
These failures are diagnosed by noting the failed pid (in this case 59576), and backtracking to see which test it was... Laura Flores
04:09 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
/a/yuriw-2022-09-05_13:59:13-rados-wip-yuri10-testing-2022-09-04-0811-quincy-distro-default-smithi/7012481
Needs a...
Laura Flores

09/08/2022

04:38 PM Backport #57209: quincy: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47932
merged
Yuri Weinstein
04:37 PM Bug #57467 (Fix Under Review): EncodingException.Macros fails on make check on quincy
Laura Flores
11:18 AM Bug #57467: EncodingException.Macros fails on make check on quincy
Since this has probably fallen off of Kefu's radar, I went ahead and opened https://github.com/ceph/ceph/pull/48016. Ilya Dryomov
03:26 PM Documentation #57448 (Resolved): Doc: Update release notes on the fix for high CPU usage during r...
Neha Ojha
03:26 PM Backport #57461 (Resolved): quincy: Doc: Update release notes on the fix for high CPU usage durin...
Neha Ojha

09/07/2022

10:09 PM Bug #57467: EncodingException.Macros fails on make check on quincy
There was an attempt to fix this issue here: https://github.com/ceph/ceph/pull/47938 Laura Flores
10:07 PM Bug #57467 (Resolved): EncodingException.Macros fails on make check on quincy
irvingi07: https://jenkins.ceph.com/job/ceph-pull-requests/103416... Laura Flores
06:12 PM Bug #54558: malformed json in a Ceph RESTful API call can stop all ceph-mon services
I don't think https://github.com/ceph/ceph/pull/45547 is a complete fix, see my comment in the PR. Ilya Dryomov
03:01 PM Bug #55233: librados C++ API requires C++17 to build
https://github.com/ceph/ceph/pull/46005 merged Yuri Weinstein
02:57 PM Backport #56736: quincy: unessesarily long laggy PG state
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47901
merged
Yuri Weinstein
02:55 PM Backport #55297: quincy: malformed json in a Ceph RESTful API call can stop all ceph-mon services
nikhil kshirsagar wrote:
> please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/...
Yuri Weinstein
02:12 PM Backport #57461 (In Progress): quincy: Doc: Update release notes on the fix for high CPU usage du...
Sridhar Seshasayee
02:03 PM Backport #57461 (Resolved): quincy: Doc: Update release notes on the fix for high CPU usage durin...
https://github.com/ceph/ceph/pull/48004 Backport Bot
12:47 PM Bug #46847: Loss of placement information on OSD reboot
The PR https://github.com/ceph/ceph/pull/40849 for adding the test was marked stale. I left a comment and it would be... Frank Schilder
12:10 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
Took a look at why peering was happening in the first place. Looking at PG 7.16 logs below, we can see that the balan... Aishwarya Mathuria
05:25 AM Backport #57346: quincy: expected valgrind issues and found none
/a/yuriw-2022-09-03_14:52:22-rados-wip-yuri-testing-2022-09-02-0945-quincy-distro-default-smithi/7009611 Nitzan Mordechai
02:57 AM Bug #42884: OSDMapTest.CleanPGUpmaps failure
adami03: https://jenkins.ceph.com/job/ceph-pull-requests/103395/console Laura Flores

09/06/2022

08:45 PM Backport #55309: pacific: prometheus metrics shows incorrect ceph version for upgraded ceph daemon
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47693
merged
Yuri Weinstein
08:42 PM Backport #55305: quincy: Manager is failing to keep updated metadata in daemon_state for upgraded...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46559
merged
Yuri Weinstein
05:57 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
/a/yuriw-2022-08-22_16:21:19-rados-wip-yuri8-testing-2022-08-22-0646-distro-default-smithi/6985175 Laura Flores
03:19 PM Documentation #57448 (Resolved): Doc: Update release notes on the fix for high CPU usage during r...
Sridhar Seshasayee
02:15 PM Backport #57312 (Resolved): quincy: Heap command prints with "ceph tell", but not with "ceph daemon"
Laura Flores
12:59 PM Backport #55156 (Resolved): pacific: mon: config commands do not accept whitespace style config name
Kefu Chai
12:29 PM Backport #55308 (Resolved): pacific: Manager is failing to keep updated metadata in daemon_state ...
Kefu Chai
05:56 AM Backport #57443 (In Progress): quincy: osd: Update osd's IOPS capacity using async Context comple...
Sridhar Seshasayee
05:09 AM Backport #57443 (Resolved): quincy: osd: Update osd's IOPS capacity using async Context completio...
https://github.com/ceph/ceph/pull/47983 Backport Bot
04:42 AM Fix #57040 (Pending Backport): osd: Update osd's IOPS capacity using async Context completion ins...
Sridhar Seshasayee

09/05/2022

02:10 PM Backport #56641: quincy: Log at 1 when Throttle::get_or_fail() fails
Radoslaw Zarzynski wrote:
> https://github.com/ceph/ceph/pull/47765
merged
Yuri Weinstein
02:02 PM Backport #57372: quincy: segfault in librados via libcephsqlite
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47909
merged
Yuri Weinstein

09/04/2022

02:22 PM Bug #53000: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMapper::Item>::_void_d...
... Kefu Chai
08:08 AM Backport #57346: quincy: expected valgrind issues and found none
/a/yuriw-2022-09-02_15:23:14-rados-wip-yuri6-testing-2022-09-01-1034-quincy-distro-default-smithi/7008140/
/a/yuri...
Matan Breizman

09/02/2022

05:38 PM Backport #57117: quincy: mon: race condition between `mgr fail` and MgrMonitor::prepare_beacon()
Should be backported with https://github.com/ceph/ceph/pull/47834. Radoslaw Zarzynski
05:08 PM Backport #57346 (In Progress): quincy: expected valgrind issues and found none
Radoslaw Zarzynski
05:06 PM Backport #57209 (In Progress): quincy: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
Radoslaw Zarzynski
04:59 PM Backport #55972 (Resolved): quincy: found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi10...
Radoslaw Zarzynski
04:59 PM Backport #55972: quincy: found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi10121515-14:e...
Already in quincy. See https://github.com/ceph/ceph/pull/46498. Radoslaw Zarzynski
04:49 PM Backport #57257 (In Progress): quincy: Assert in Ceph messenger
Radoslaw Zarzynski
04:48 PM Backport #56723 (In Progress): quincy: osd thread deadlock
Radoslaw Zarzynski
04:47 PM Backport #56655 (In Progress): quincy: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierF...
Radoslaw Zarzynski
04:46 PM Backport #56602 (In Progress): quincy: ceph report missing osdmap_clean_epochs if answered by peon
Radoslaw Zarzynski
04:34 PM Backport #55543 (In Progress): quincy: should use TCMalloc for better performance
Radoslaw Zarzynski
04:34 PM Backport #55282 (In Progress): quincy: osd: add scrub duration for scrubs after recovery
Radoslaw Zarzynski
04:31 PM Backport #56648 (In Progress): quincy: [Progress] Do not show NEW PG_NUM value for pool if autosc...
Radoslaw Zarzynski
04:18 PM Backport #57312: quincy: Heap command prints with "ceph tell", but not with "ceph daemon"
Laura Flores wrote:
> https://github.com/ceph/ceph/pull/47825
merged
Yuri Weinstein
02:20 PM Bug #54172 (Resolved): ceph version 16.2.7 PG scrubs not progressing
Konstantin Shalygin
02:19 PM Backport #56409 (Resolved): pacific: ceph version 16.2.7 PG scrubs not progressing
Konstantin Shalygin
02:12 PM Feature #54600 (Resolved): Add scrub_duration to pg dump json format
Radoslaw Zarzynski
02:12 PM Backport #54602 (Duplicate): quincy: Add scrub_duration to pg dump json format
Radoslaw Zarzynski
02:10 PM Backport #54601 (Resolved): quincy: Add scrub_duration to pg dump json format
Radoslaw Zarzynski
02:10 PM Backport #55065 (Rejected): quincy: osd_fast_shutdown_notify_mon option should be true by default
Radoslaw Zarzynski
01:59 PM Backport #56551 (Resolved): quincy: mon/Elector: notify_ranked_removed() does not properly erase ...
Radoslaw Zarzynski
01:53 PM Backport #57030 (Resolved): quincy: rados/test.sh: Early exit right after LibRados global tests c...
Radoslaw Zarzynski
01:45 PM Backport #57289 (Rejected): quincy: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<Parallel...
This backport ticket is a result of a thinko. Rejecting. Radoslaw Zarzynski
01:44 PM Backport #57288 (Rejected): pacific: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<Paralle...
https://github.com/ceph/ceph/pull/45582 is NOT the fix. This backport ticket is a result of a thinko. Rejecting. Radoslaw Zarzynski
01:40 PM Bug #53000 (New): OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMapper::Item>::_...
Sorry, moving back to @New@. Radoslaw Zarzynski
01:31 PM Bug #53740 (Resolved): mon: all mon daemon always crash after rm pool
No need for backporting to quincy – the fix is already there (see the comment in the backport ticket). Resolving. Radoslaw Zarzynski
01:30 PM Backport #53977 (Rejected): quincy: mon: all mon daemon always crash after rm pool
Radoslaw Zarzynski
01:29 PM Backport #53977: quincy: mon: all mon daemon always crash after rm pool
The fix is already in quincy:... Radoslaw Zarzynski
01:02 PM Backport #56408 (Resolved): quincy: ceph version 16.2.7 PG scrubs not progressing
Radoslaw Zarzynski
01:01 PM Backport #55157 (Resolved): quincy: mon: config commands do not accept whitespace style config name
Radoslaw Zarzynski
12:55 PM Backport #55632 (Resolved): quincy: ceph-osd takes all memory before oom on boot
The last missing part (the online dups trimming) is merged. Radoslaw Zarzynski
02:47 AM Bug #57119 (Pending Backport): Heap command prints with "ceph tell", but not with "ceph daemon"
Yaarit Hatuka
02:39 AM Bug #57165: expected valgrind issues and found none
Quincy runs:
https://pulpito.ceph.com/yuriw-2022-09-01_16:26:28-rados-wip-lflores-testing-2-2022-08-26-2240-quincy...
Yaarit Hatuka

09/01/2022

11:03 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
https://github.com/ceph/ceph/pull/47650 merged Yuri Weinstein
04:13 PM Backport #57372 (In Progress): quincy: segfault in librados via libcephsqlite
Matan Breizman
04:05 PM Backport #57372 (Resolved): quincy: segfault in librados via libcephsqlite
https://github.com/ceph/ceph/pull/47909 Backport Bot
04:05 PM Backport #57373 (Resolved): pacific: segfault in librados via libcephsqlite
https://github.com/ceph/ceph/pull/48187 Backport Bot
04:00 PM Bug #57152 (Pending Backport): segfault in librados via libcephsqlite
Matan Breizman
01:39 PM Backport #56736 (In Progress): quincy: unessesarily long laggy PG state
Aishwarya Mathuria
01:35 PM Bug #57163 (Fix Under Review): free(): invalid pointer
>Many thanks to Josh for suggesting we may be dealing with a compiler mismatch here and sorry if you were working on ... Matan Breizman
12:30 PM Bug #57163: free(): invalid pointer
/a/yuriw-2022-09-01_00:21:36-rados-wip-yuri7-testing-2022-08-31-0841-distro-default-smithi/7003413 Nitzan Mordechai
01:24 PM Backport #56734 (In Progress): pacific: unessesarily long laggy PG state
Aishwarya Mathuria
10:36 AM Bug #49231: MONs unresponsive over extended periods of time
OK, I did some more work and it looks like I can trigger the issue with some certainty by failing an MDS that was up ... Frank Schilder

08/31/2022

10:52 PM Bug #57163: free(): invalid pointer
Many thanks to Josh for suggesting we may be dealing with a compiler mismatch here and sorry if you were working on t... Brad Hubbard
10:21 AM Bug #51194: PG recovery_unfound after scrub repair failed on primary
Hi,
We suffered exactly the same problem at IJCLab: a flappy OSD (with unmonitored smartd preventive errors) cause...
Michel Jouvin
09:55 AM Backport #57346 (Resolved): quincy: expected valgrind issues and found none
https://github.com/ceph/ceph/pull/47933 Backport Bot
09:55 AM Bug #57165 (Pending Backport): expected valgrind issues and found none
Kefu Chai

08/30/2022

12:56 PM Bug #57340 (Fix Under Review): ceph log last command fail to log by verbosity level
Prashant D
12:34 PM Bug #57340 (Resolved): ceph log last command fail to log by verbosity level
We see debug logs even if we intend to get cluster log at log level WARN.... Prashant D
12:27 PM Bug #57310: StriperTest: The futex facility returned an unexpected error code
Looks like https://github.com/ceph/ceph/pull/47841 will fix it Nitzan Mordechai
10:38 AM Backport #55633: octopus: ceph-osd takes all memory before oom on boot
The original path was been reverted by https://github.com/ceph/ceph/pull/46611
Hence the issue shouldn't be in Res...
Igor Fedotov
07:02 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
I have been going through the failure logs mentioned above and I see that the health check does pass eventually:
...
Aishwarya Mathuria

08/29/2022

04:00 PM Bug #56850 (Resolved): crash: void PaxosService::propose_pending(): assert(have_pending)
Kefu Chai

08/28/2022

06:30 AM Backport #57316 (In Progress): quincy: add an asok command for pg log investigations
Nitzan Mordechai
06:27 AM Backport #57315 (In Progress): pacific: add an asok command for pg log investigations
Nitzan Mordechai

08/27/2022

04:03 PM Bug #56847 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)
#56850 Kefu Chai
04:02 PM Bug #56848 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)
#56850 Kefu Chai
04:02 PM Bug #56849 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)
#56850 Kefu Chai
04:00 PM Bug #56850 (Fix Under Review): crash: void PaxosService::propose_pending(): assert(have_pending)
Kefu Chai
06:44 AM Backport #57316 (Resolved): quincy: add an asok command for pg log investigations
https://github.com/ceph/ceph/pull/47840 Backport Bot
06:44 AM Backport #57315 (Rejected): pacific: add an asok command for pg log investigations
https://github.com/ceph/ceph/pull/47839 Backport Bot
06:39 AM Bug #55836 (Pending Backport): add an asok command for pg log investigations
Konstantin Shalygin

08/26/2022

10:27 PM Backport #57312 (In Progress): quincy: Heap command prints with "ceph tell", but not with "ceph d...
https://github.com/ceph/ceph/pull/47825 Laura Flores
10:18 PM Backport #57312 (Resolved): quincy: Heap command prints with "ceph tell", but not with "ceph daemon"
Laura Flores
10:19 PM Bug #57119 (Fix Under Review): Heap command prints with "ceph tell", but not with "ceph daemon"
Laura Flores
10:13 PM Bug #57119 (Pending Backport): Heap command prints with "ceph tell", but not with "ceph daemon"
Setting to "Pending backport" briefly so I can create backport trackers. Laura Flores
10:18 PM Backport #57313 (Resolved): pacific: Heap command prints with "ceph tell", but not with "ceph dae...
https://github.com/ceph/ceph/pull/48106 Laura Flores
06:54 PM Bug #57310: StriperTest: The futex facility returned an unexpected error code
Laura Flores wrote:
> /a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/69...
Laura Flores
06:54 PM Bug #57310 (Resolved): StriperTest: The futex facility returned an unexpected error code
/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986262
/a/yuriw-2022-08...
Laura Flores
04:45 PM Backport #57076 (Resolved): pacific: Invalid read of size 8 in handle_recovery_delete()
Kefu Chai
05:18 AM Bug #57163: free(): invalid pointer
I did an interactive rerun of /a/lflores-2022-08-17_21:04:23-rados:singleton-nomsgr-wip-yuri4-testing-2022-08-15-0951... Brad Hubbard

08/25/2022

10:27 PM Bug #57163: free(): invalid pointer
/a/lflores-2022-08-17_21:04:23-rados:singleton-nomsgr-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6977853... Laura Flores
07:41 PM Bug #57165: expected valgrind issues and found none
/a/yuriw-2022-08-17_19:34:54-rados-wip-yuri7-testing-2022-08-17-0943-quincy-distro-default-smithi/6977767 Laura Flores
11:25 AM Bug #57165 (Fix Under Review): expected valgrind issues and found none
Nitzan Mordechai
11:16 AM Bug #57165: expected valgrind issues and found none
This is a memory optimization "fault" - the new gcc causing that to not leak the memory that we are trying to leak. Nitzan Mordechai
10:12 AM Bug #57165 (In Progress): expected valgrind issues and found none
Nitzan Mordechai
10:12 AM Bug #57165: expected valgrind issues and found none
we are leaking moemory with "ceph tell mon.a leak_some_memory" for some reason we are not seeing any memory leak in v... Nitzan Mordechai
06:48 AM Bug #53000: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMapper::Item>::_void_d...
I don't understand why PR ID for this bug was set to 41323. I mentioned the PR 41323 not as the fix but as an example... Mykola Golub

08/24/2022

06:31 PM Backport #57288 (Resolved): pacific: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<Paralle...
https://github.com/ceph/ceph/pull/45582 Radoslaw Zarzynski
06:30 PM Backport #57288 (Rejected): pacific: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<Paralle...
Backport Bot
06:30 PM Backport #57289 (Rejected): quincy: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<Parallel...
Backport Bot
06:29 PM Bug #53000 (Pending Backport): OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMap...
Radoslaw Zarzynski
06:27 PM Bug #53000 (Fix Under Review): OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMap...
Radoslaw Zarzynski
06:24 PM Bug #51168 (New): ceph-osd state machine crash during peering process
I plan to work on this one and combine with implementing the backfill cancellation in crimson. However, not a terribl... Radoslaw Zarzynski
06:16 PM Bug #50536: "Command failed (workunit test rados/test.sh)" - rados/test.sh times out on master.
It think the reoccurrences are about failure in a different place – at least in the latest one the @LibRadosServicePP... Radoslaw Zarzynski
06:06 PM Bug #57165: expected valgrind issues and found none
Bumped the priority up as I'm afraid the longer we wait with ensuring valgrind is fully operational, the greater is t... Radoslaw Zarzynski
12:19 AM Bug #57165: expected valgrind issues and found none
What I'm seeing is that the jobs in question were told to expect valgrind errors via the @expect_valgrind_errors: tru... Zack Cerza
05:50 PM Bug #57163: free(): invalid pointer
How about having it as _High_? Radoslaw Zarzynski
04:15 PM Feature #57180: option for pg_autoscaler to retain same state of existing pools when upgrading to...
Downstream ceph-ansible BZ - https://bugzilla.redhat.com/show_bug.cgi?id=2121097 Vikhyat Umrao
03:50 PM Feature #57180 (Rejected): option for pg_autoscaler to retain same state of existing pools when u...
Vikhyat Umrao
12:22 PM Bug #56707: pglog growing unbounded on EC with copy by ref
Alexandre Marangone wrote:
> Attached the debug_osd 20 logs for one the OSD. I turned off (deep)scrub cause the logs...
Nitzan Mordechai

08/23/2022

11:27 PM Bug #57163: free(): invalid pointer
Maybe "urgent" is too dramatic, but this seems to be affecting a lot of tests in main. Laura Flores
10:50 PM Bug #57163: free(): invalid pointer
/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986255... Laura Flores
04:19 PM Bug #57163: free(): invalid pointer
Local Run with -fsanitize=address warns about a data race at the same stage, may be relevant.... Matan Breizman
03:53 PM Bug #57163: free(): invalid pointer
Kefu Chai wrote:
> /a/kchai-2022-08-23_13:19:39-rados-wip-kefu-testing-2022-08-22-2243-distro-default-smithi/6987883...
Laura Flores
03:33 PM Bug #57163: free(): invalid pointer
This failure (as of right now) only occurs on Ubuntu 20.04. See https://github.com/ceph/ceph/pull/47642 for some exam... Laura Flores
03:11 PM Bug #57163: free(): invalid pointer
/a/kchai-2022-08-23_13:19:39-rados-wip-kefu-testing-2022-08-22-2243-distro-default-smithi/6987883/teuthology.log Kefu Chai
10:13 PM Bug #57165: expected valgrind issues and found none
To me, this seems like a Teuthology failure. Perhaps Zack Cerza can rule this theory in/out.
In any case, it look...
Laura Flores
10:09 PM Bug #57165: expected valgrind issues and found none
/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986197 Laura Flores
07:48 PM Feature #57180: option for pg_autoscaler to retain same state of existing pools when upgrading to...
We discussed this one and the issue is in the following rolling upgrade playbook -
https://github.com/ceph/ceph-a...
Vikhyat Umrao
07:03 PM Feature #57180: option for pg_autoscaler to retain same state of existing pools when upgrading to...
Update:
Since all the upgrade suites in Pacific turn the autoscaler off by default. I had to write a new upgrade t...
Kamoltat (Junior) Sirivadhna
07:00 PM Bug #57267 (New): Valgrind reports memory "Leak_IndirectlyLost" errors on ceph-mon in "KeyServerD...
/a/yuriw-2022-08-19_20:57:42-rados-wip-yuri6-testing-2022-08-19-0940-pacific-distro-default-smithi/6981517/remote/smi... Laura Flores
05:30 PM Backport #56641 (In Progress): quincy: Log at 1 when Throttle::get_or_fail() fails
https://github.com/ceph/ceph/pull/47765 Radoslaw Zarzynski
05:17 PM Backport #56642 (In Progress): pacific: Log at 1 when Throttle::get_or_fail() fails
https://github.com/ceph/ceph/pull/47764 Radoslaw Zarzynski
03:32 PM Bug #57122 (Resolved): test failure: rados:singleton-nomsgr librados_hello_world
Laura Flores
03:30 PM Backport #57258 (Resolved): pacific: Assert in Ceph messenger
https://github.com/ceph/ceph/pull/48255 Backport Bot
03:30 PM Backport #57257 (Resolved): quincy: Assert in Ceph messenger
https://github.com/ceph/ceph/pull/47931 Backport Bot
03:28 PM Bug #55851 (Pending Backport): Assert in Ceph messenger
Kefu Chai
03:12 PM Bug #56147 (Resolved): snapshots will not be deleted after upgrade from nautilus to pacific
Matan Breizman
03:11 PM Backport #56579 (Resolved): pacific: snapshots will not be deleted after upgrade from nautilus to...
Matan Breizman
03:09 PM Backport #56579: pacific: snapshots will not be deleted after upgrade from nautilus to pacific
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47134
merged
Yuri Weinstein
09:29 AM Bug #50536: "Command failed (workunit test rados/test.sh)" - rados/test.sh times out on master.
/a/yuriw-2022-08-22_21:19:34-rados-wip-yuri4-testing-2022-08-18-1020-pacific-distro-default-smithi/6986471
Matan Breizman
07:54 AM Backport #54386 (Resolved): octopus: [RFE] Limit slow request details to mgr log
Prashant D

08/22/2022

08:52 PM Backport #57029: pacific: rados/test.sh: Early exit right after LibRados global tests complete
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47451
merged
Yuri Weinstein
06:24 PM Bug #51168: ceph-osd state machine crash during peering process
Radoslaw Zarzynski wrote:
> The PG was in @ReplicaActive@ so we shouldn't see any backfill activity. A delayed event...
Yao Ning
03:50 PM Bug #53000: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMapper::Item>::_void_d...
Quincy PR:
https://jenkins.ceph.com/job/ceph-pull-requests/102036/consoleFull
Laura Flores
03:31 PM Bug #43268: Restrict admin socket commands more from the Ceph tool
Radek, I think this was misunderstood. It's a security issue that resulted from exposing all admin socket commands vi... Greg Farnum
01:24 PM Bug #57152: segfault in librados via libcephsqlite
Matan Breizman wrote:
> I have managed to reproduce similar segfault.
> The relevant code:
> https://github.com/ce...
Patrick Donnelly
08:44 AM Bug #57152: segfault in librados via libcephsqlite
I have managed to reproduce similar segfault.
The relevant code:
https://github.com/ceph/ceph/blob/main/src/SimpleR...
Matan Breizman
09:45 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
Seen in these recent pacific runs:
1. https://pulpito.ceph.com/yuriw-2022-08-18_23:16:33-fs-wip-yuri10-testing-202...
Kotresh Hiremath Ravishankar

08/21/2022

06:39 AM Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific
Stefan Kooman wrote:
> Is this bug also affecting rbd snapshots / clones?
Yes
Matan Breizman

08/19/2022

11:24 PM Backport #55157: quincy: mon: config commands do not accept whitespace style config name
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47381
merged
Yuri Weinstein
09:27 PM Backport #57209 (Resolved): quincy: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
https://github.com/ceph/ceph/pull/47932 Backport Bot
09:27 PM Backport #57208 (Resolved): pacific: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
https://github.com/ceph/ceph/pull/50518 Backport Bot
09:18 PM Bug #49727 (Pending Backport): lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
/a/yuriw-2022-08-11_16:46:00-rados-wip-yuri3-testing-2022-08-11-0809-pacific-distro-default-smithi/6968195... Laura Flores
04:21 PM Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific
Is this bug also affecting rbd snapshots / clones? Stefan Kooman
11:34 AM Backport #55631: pacific: ceph-osd takes all memory before oom on boot
Off-line fix: https://github.com/ceph/ceph/pull/46252
Online fix: https://github.com/ceph/ceph/pull/47701
Radoslaw Zarzynski
10:26 AM Backport #55631 (In Progress): pacific: ceph-osd takes all memory before oom on boot
Unresolving as the ultimate fix is consisted of 2 PRs (off-line + on-line trimming). Radoslaw Zarzynski
10:28 AM Backport #55632: quincy: ceph-osd takes all memory before oom on boot
Radoslaw Zarzynski wrote:
> Unresolving as the ultimate fix is consisted of 2 PRs while the 2nd one is under review....
Neha Ojha
10:24 AM Backport #55632 (In Progress): quincy: ceph-osd takes all memory before oom on boot
Unresolving as the ultimate fix is consisted of 2 PRs while the 2nd one is under review. Radoslaw Zarzynski
10:06 AM Backport #55632: quincy: ceph-osd takes all memory before oom on boot
The online (by OSD in contrast to the COT-based off-line one) is here: https://github.com/ceph/ceph/pull/47688. Radoslaw Zarzynski
07:48 AM Bug #57190 (New): pg shard status inconsistency in one pg
... yite gu
05:57 AM Backport #55309 (In Progress): pacific: prometheus metrics shows incorrect ceph version for upgra...
Prashant D
05:56 AM Backport #55309 (New): pacific: prometheus metrics shows incorrect ceph version for upgraded ceph...
Prashant D
04:17 AM Backport #55309: pacific: prometheus metrics shows incorrect ceph version for upgraded ceph daemon
Reverted backport PR#46429 from pacific (revert PR https://github.com/ceph/ceph/pull/46921) due to tracker https://tr... Prashant D
05:54 AM Backport #55308 (In Progress): pacific: Manager is failing to keep updated metadata in daemon_sta...
Prashant D
05:52 AM Backport #55308 (New): pacific: Manager is failing to keep updated metadata in daemon_state for u...
Prashant D
04:16 AM Backport #55308: pacific: Manager is failing to keep updated metadata in daemon_state for upgrade...
We had to revert backport PR#46427 from pacific (revert PR https://github.com/ceph/ceph/pull/46920) due to https://tr... Prashant D

08/18/2022

07:51 PM Bug #23117 (Fix Under Review): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has ...
Vikhyat Umrao
07:50 PM Bug #23117 (Duplicate): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has been ex...
Vikhyat Umrao
07:50 PM Bug #57185 (Duplicate): EC 4+2 PG stuck in activating+degraded+remapped
Vikhyat Umrao
07:49 PM Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped
This should have been easily caught if we had this implemented:
https://tracker.ceph.com/issues/23117
https://git...
Vikhyat Umrao
07:48 PM Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped
Tim - Two workaround as you see your script sometimes balancer module keeps changing PGs stat that is not a valid tes... Vikhyat Umrao
07:43 PM Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped
- Here we were testing OSD failure - recovery/backfill for mClock Scheduler and for this we bring in phases one OSD n... Vikhyat Umrao
07:41 PM Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped
We did capture the debug logs(debug_osd = 20 and debug_ms = 1) and they are here - f28-h28-000-r630.rdu2.scalelab.red... Vikhyat Umrao
07:25 PM Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped
From Cluster logs:... Vikhyat Umrao
05:31 PM Bug #57185 (Duplicate): EC 4+2 PG stuck in activating+degraded+remapped
- PG Query... Vikhyat Umrao
04:27 PM Feature #57180 (Rejected): option for pg_autoscaler to retain same state of existing pools when u...
Currently, any version of Ceph that is >= Pacific will have autoscaler enabled by default even for existing pools.
W...
Kamoltat (Junior) Sirivadhna
03:20 PM Bug #57152: segfault in librados via libcephsqlite
Patrick Donnelly wrote:
> Matan Breizman wrote:
> > > So the problem is that gcc 8.5, which compiles successfully, ...
Patrick Donnelly
01:19 PM Bug #57152: segfault in librados via libcephsqlite
Matan Breizman wrote:
> > So the problem is that gcc 8.5, which compiles successfully, generates code which causes t...
Patrick Donnelly
01:11 PM Bug #57152: segfault in librados via libcephsqlite
> So the problem is that gcc 8.5, which compiles successfully, generates code which causes the segfault?
From the ...
Matan Breizman
12:54 PM Bug #57152: segfault in librados via libcephsqlite
Matan Breizman wrote:
> This PR seems to resolve the compilation errors mentioned above.
> Please let me know your ...
Patrick Donnelly
12:44 PM Bug #57152: segfault in librados via libcephsqlite
This PR seems to resolve the compilation errors mentioned above.
Please let me know your thoughts.
Matan Breizman
11:59 AM Bug #57152: segfault in librados via libcephsqlite
There seems to be an issue with the gcc version used to compile, I noticed similar issue when compiling `examples/lib... Matan Breizman
03:18 PM Backport #57030: quincy: rados/test.sh: Early exit right after LibRados global tests complete
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47452
merged
Yuri Weinstein
03:16 PM Backport #56578 (Resolved): quincy: snapshots will not be deleted after upgrade from nautilus to ...
Matan Breizman
03:15 PM Backport #56578: quincy: snapshots will not be deleted after upgrade from nautilus to pacific
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47133
merged
Yuri Weinstein
11:44 AM Bug #45721 (Fix Under Review): CommandFailedError: Command failed (workunit test rados/test_pytho...
Nitzan Mordechai
07:48 AM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973760 Matan Breizman
07:22 AM Bug #50536: "Command failed (workunit test rados/test.sh)" - rados/test.sh times out on master.
OSError: Socket is closed
/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-defau...
Matan Breizman
06:40 AM Bug #57165: expected valgrind issues and found none
/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973889
/a/yuriw-2...
Matan Breizman

08/17/2022

07:05 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
Although there was a report from Telemetry, we still need more logs (read: a reoccurance at Sepia) which, hopefully, ... Radoslaw Zarzynski
07:00 PM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
It looks like this must be some inconsistency between the head_obc->blocked state, and its presence in objects_blocke... Josh Durgin
06:56 PM Bug #56661 (Need More Info): Quincy: OSD crashing one after another with data loss with ceph_asse...
Moving into _Need More Info_ :-( per Myoungwon Oh's comment. Radoslaw Zarzynski
06:41 PM Bug #57017 (Fix Under Review): mon-stretched_cluster: degraded stretched mode lead to Monitor crash
Radoslaw Zarzynski
06:34 PM Bug #45702: PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter...
We're poking with @read_log_and_missing()@ pretty recently (the dups issue). Does it ring a bell? Radoslaw Zarzynski
06:32 PM Bug #45702: PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter...
@Ronen: @debug_verify_stored_missing@ a an input parameter with default value. Unfortunately, it doesn't look like a ... Radoslaw Zarzynski
06:24 PM Bug #45702: PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter...
Even if we don't want to deep dive into right now, we should refactor the assertion:... Radoslaw Zarzynski
06:19 PM Bug #57147: qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failure
Neha Ojha wrote:
> How reproducible is this? Following logs indicate that we ran out of space.
>
We have seen t...
Kotresh Hiremath Ravishankar
02:25 PM Bug #57147: qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failure
How reproducible is this? Following logs indicate that we ran out of space.... Neha Ojha
06:04 PM Bug #57152: segfault in librados via libcephsqlite
The reporter on the ML shared the .mgr pool in question:... Patrick Donnelly
01:59 PM Bug #57165 (Resolved): expected valgrind issues and found none
... Nitzan Mordechai
01:42 PM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
a/teuthworker/archive/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975387 Nitzan Mordechai
01:35 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
... Radoslaw Zarzynski
01:24 PM Bug #57119 (Fix Under Review): Heap command prints with "ceph tell", but not with "ceph daemon"
Radoslaw Zarzynski
01:12 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
There is another thing we should unify: @daemon@ and @tell@ differently treat @ss@ (the error stream) when the error ... Radoslaw Zarzynski
12:48 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
Similar problem may affect e.g. @flush_store_cache@:... Radoslaw Zarzynski
12:27 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
Oops, this is the case:... Radoslaw Zarzynski
12:25 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
The interface between `outbl` and the formatter is described at the beginning of the function:... Radoslaw Zarzynski
12:27 AM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
Appending the stringstream to the "outbl" bufferlist here makes the output print with the "daemon" version, but it is... Laura Flores
12:11 AM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
... Vikhyat Umrao
12:08 AM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
... Vikhyat Umrao
01:21 PM Bug #57163 (Resolved): free(): invalid pointer
... Nitzan Mordechai

08/16/2022

07:28 PM Bug #57152 (Resolved): segfault in librados via libcephsqlite
We have a post on the ML about a segfault in the mgr:
"[ceph-users] Quincy: Corrupted devicehealth sqlite3 databas...
Patrick Donnelly
06:07 PM Bug #57122 (Fix Under Review): test failure: rados:singleton-nomsgr librados_hello_world
Laura Flores
06:00 PM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
The fix is just to change any reference from "master" to "main":... Laura Flores
05:51 PM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
A couple present in this run:
/a/yuriw-2022-08-15_18:43:38-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-sm...
Laura Flores
07:30 AM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
What surprised me is we still use GNU Make there. Radoslaw Zarzynski
07:30 AM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
Huh, this doesn't look like a compiler's failure. It happens earlier:... Radoslaw Zarzynski
01:36 PM Bug #57147 (New): qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failure
The teuthology link https://pulpito.ceph.com/yuriw-2022-08-11_16:57:01-fs-wip-yuri3-testing-2022-08-11-0809-pacific-d... Kotresh Hiremath Ravishankar
07:53 AM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
How about adding to suppression file regarding the comment #1? Radoslaw Zarzynski
07:50 AM Bug #52513: BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
Would you like to have a look? Neha confirms the logs mentioned in #9 are still available. Radoslaw Zarzynski
07:40 AM Bug #56097: Timeout on `sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ...
NCB was added in Quincy, so breaking the relationship. Neha Ojha
07:28 AM Bug #57136: ecpool pg stay active+clean+remapped
Radoslaw Zarzynski wrote:
> It looks @osd.18@ isn't up. Could you please share the @ceph -s@ and @ceph osd tree@?
<...
yite gu
07:20 AM Bug #57136 (Need More Info): ecpool pg stay active+clean+remapped
It looks @osd.18@ isn't up. Could you please share the @ceph -s@ and @ceph osd tree@? Radoslaw Zarzynski
05:29 AM Bug #57136 (Need More Info): ecpool pg stay active+clean+remapped
I create a ec pool, the erasure code profile is:... yite gu
07:12 AM Bug #53729: ceph-osd takes all memory before oom on boot
Gonzalo Aguilar Delgado wrote:
> Wow I`m quite surprised to see this is taking so much time to be resolved. >
> Can...
Neha Ojha
07:09 AM Bug #53729 (Fix Under Review): ceph-osd takes all memory before oom on boot
Neha Ojha
04:23 AM Backport #56134 (In Progress): quincy: scrub starts message missing in cluster log
Prashant D
 

Also available in: Atom