Project

General

Profile

Activity

From 07/14/2022 to 08/12/2022

08/12/2022

08:18 PM Bug #57119 (Resolved): Heap command prints with "ceph tell", but not with "ceph daemon"
*How to reproduce:*
# Start a vstart cluster, or access any working cluster
# Run `ceph tell <daemon>.<id> heap <he...
Laura Flores
02:21 PM Backport #57117 (Resolved): quincy: mon: race condition between `mgr fail` and MgrMonitor::prepar...
https://github.com/ceph/ceph/pull/50979 Backport Bot
02:17 PM Bug #55711 (Pending Backport): mon: race condition between `mgr fail` and MgrMonitor::prepare_bea...
Kefu Chai
01:55 PM Cleanup #56581 (Resolved): mon: fix ElectionLogic warnings
Kefu Chai

08/11/2022

11:54 PM Bug #56097: Timeout on `sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ...
@Adam maybe you'd have an idea of what's going on here? Laura Flores
11:54 PM Bug #56097: Timeout on `sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ...
This one went dead after awhile:
/a/yuriw-2022-08-04_20:43:31-rados-wip-yuri6-testing-2022-08-04-0617-pacific-distro...
Laura Flores
06:57 PM Bug #57105: quincy: ceph osd pool set <pool> size math error
So I thought this may have been because I re-used the name, so I went to create a pool with a different name to conti... Brian Woods
06:54 PM Bug #57105: quincy: ceph osd pool set <pool> size math error
Looks like one of the page groups is "inactive":... Brian Woods
06:46 PM Bug #57105 (Resolved): quincy: ceph osd pool set <pool> size math error
Context, I created a pool with a block device and intentionally filled a set of OSDs.
This of course broke things,...
Brian Woods
04:07 PM Bug #56707: pglog growing unbounded on EC with copy by ref
I won't be able to rerun the patched branch until Monday. Haven't you been able to reproduce it? Feels trivial to so ... Alexandre Marangone
05:01 AM Bug #56707: pglog growing unbounded on EC with copy by ref
Alex, can you share logs of the osd that caused the 500s? my theory is peering un-match pglog since one osd (with the... Nitzan Mordechai
03:56 PM Bug #57074 (Duplicate): common: Latest version of main experiences build failures
Laura Flores
02:16 PM Bug #57097 (Fix Under Review): ceph status does not report an application is not enabled on the p...
Prashant D
12:36 PM Bug #57097 (Pending Backport): ceph status does not report an application is not enabled on the p...
If pool has 0 objects in it then ceph status (and ceph health detail) does not report application is not enabled for ... Prashant D
12:30 PM Bug #57049 (Fix Under Review): cluster logging does not adhere to mon_cluster_log_file_level
Prashant D
10:25 AM Bug #52807 (Resolved): ceph-erasure-code-tool: new tool to encode/decode files
Konstantin Shalygin
10:24 AM Backport #52808 (Rejected): nautilus: ceph-erasure-code-tool: new tool to encode/decode files
Nautilus is EOL Konstantin Shalygin
10:24 AM Bug #52448 (Resolved): osd: pg may get stuck in backfill_toofull after backfill is interrupted du...
Konstantin Shalygin
10:24 AM Backport #52832 (Rejected): nautilus: osd: pg may get stuck in backfill_toofull after backfill is...
Konstantin Shalygin
10:24 AM Backport #52832 (Resolved): nautilus: osd: pg may get stuck in backfill_toofull after backfill is...
Nautilus is EOL Konstantin Shalygin
10:23 AM Backport #52938 (Rejected): nautilus: Primary OSD crash caused corrupted object and further crash...
Nautilus is EOL Konstantin Shalygin
10:21 AM Backport #52771 (Rejected): nautilus: pg scrub stat mismatch with special objects that have hash ...
Nautilus is EOL Konstantin Shalygin
10:20 AM Bug #42742 (Resolved): "failing miserably..." in Infiniband.cc
Konstantin Shalygin
10:20 AM Backport #42848 (Rejected): nautilus: "failing miserably..." in Infiniband.cc
Nautilus is EOL Konstantin Shalygin
10:20 AM Bug #43656 (Resolved): AssertionError: not all PGs are active or peered 15 seconds after marking ...
Konstantin Shalygin
10:20 AM Backport #43776 (Rejected): nautilus: AssertionError: not all PGs are active or peered 15 seconds...
Konstantin Shalygin
10:19 AM Backport #43776 (Resolved): nautilus: AssertionError: not all PGs are active or peered 15 seconds...
Nautilus is EOL Konstantin Shalygin

08/10/2022

02:35 PM Bug #57074: common: Latest version of main experiences build failures
@Kefu the main issue seems to be that install-deps is broken for Centos 8 Stream; currently, it halts when trying to ... Laura Flores
06:13 AM Bug #57074: common: Latest version of main experiences build failures
probably another thing i can do is to enable cmake to error out if a C++ compiler not compliant with C++20 standard i... Kefu Chai
06:11 AM Bug #57074: common: Latest version of main experiences build failures
Laura, i am sorry for the inconvenience.
this is expected if the stock gcc compiler on an aged (even not ancient) ...
Kefu Chai
05:17 AM Bug #57074: common: Latest version of main experiences build failures
I have encountered these compilation errors on Ubuntu 20.4.0. Basically you need gcc version >= 10.1 Using install-de... Prashant D
02:03 PM Fix #56709 (Resolved): test/osd/TestPGLog: Fix confusing description between log and olog.
Kefu Chai
10:44 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
Seen in these pacific runs
1. https://pulpito.ceph.com/yuriw-2022-08-04_20:54:08-fs-wip-yuri6-testing-2022-08-04-061...
Kotresh Hiremath Ravishankar
07:15 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
Seen in https://pulpito.ceph.com/yuriw-2022-08-09_15:36:21-fs-wip-yuri8-testing-2022-08-03-1028-quincy-distro-default... Kotresh Hiremath Ravishankar
07:11 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
Seen in https://pulpito.ceph.com/yuriw-2022-08-04_11:54:20-fs-wip-yuri8-testing-2022-08-03-1028-quincy-distro-default... Kotresh Hiremath Ravishankar
08:48 AM Bug #43813 (Resolved): objecter doesn't send osd_op
Konstantin Shalygin
08:48 AM Backport #43992 (Rejected): nautilus: objecter doesn't send osd_op
Nautilus is EOL Konstantin Shalygin
08:48 AM Bug #52486 (Closed): test tracker: please ignore
Konstantin Shalygin
08:47 AM Backport #52498 (Rejected): nautilus: test tracker: please ignore
Konstantin Shalygin
08:47 AM Backport #52497 (Rejected): octopus: test tracker: please ignore
Konstantin Shalygin
08:47 AM Backport #52495 (Rejected): pacific: test tracker: please ignore
Konstantin Shalygin
07:54 AM Bug #52509 (Can't reproduce): PG merge: PG stuck in premerge+peered state
We have never experienced this problem again Konstantin Shalygin
05:56 AM Backport #57076 (In Progress): pacific: Invalid read of size 8 in handle_recovery_delete()
Nitzan Mordechai

08/09/2022

07:49 PM Bug #57074: common: Latest version of main experiences build failures
Per https://en.cppreference.com/w/cpp/compiler_support/20 (found by Mark Nelson), only some features were enabled in ... Laura Flores
05:23 PM Bug #57074: common: Latest version of main experiences build failures
'-std=c++2a' seems to be the way that gcc versions < 9 add support for C++20, per https://gcc.gnu.org/projects/cxx-st... Laura Flores
05:08 PM Bug #57074: common: Latest version of main experiences build failures
The gcc version running here is 8.5.0:
$ gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-15)
For the out...
Laura Flores
04:52 PM Bug #57074: common: Latest version of main experiences build failures
one thing that stands out from the command line is '-std=c++2a' instead of '-std=c++20'. what compiler version is run... Casey Bodley
03:07 PM Bug #57074 (Duplicate): common: Latest version of main experiences build failures
Built on:... Laura Flores
07:22 PM Backport #56663 (Resolved): pacific: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should...
Kamoltat (Junior) Sirivadhna
04:52 PM Backport #56663: pacific: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should be gap >= ...
Kamoltat Sirivadhna wrote:
> https://github.com/ceph/ceph/pull/47211
merged
Yuri Weinstein
07:22 PM Backport #56664 (Resolved): quincy: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should ...
Kamoltat (Junior) Sirivadhna
04:43 PM Backport #56664: quincy: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should be gap >= m...
Kamoltat Sirivadhna wrote:
> https://github.com/ceph/ceph/pull/47210
merged
Yuri Weinstein
05:59 PM Backport #57025 (Resolved): quincy: test_pool_min_size:AssertionError:wait_for_clean:failed befor...
Kamoltat (Junior) Sirivadhna
05:57 PM Backport #57024 (Resolved): quincy: test_pool_min_size: 'check for active or peered' reached maxi...
Kamoltat (Junior) Sirivadhna
05:56 PM Backport #57019 (Resolved): quincy: test_pool_min_size: AssertionError: not clean before minsize ...
Kamoltat (Junior) Sirivadhna
05:43 PM Backport #57076 (Resolved): pacific: Invalid read of size 8 in handle_recovery_delete()
https://github.com/ceph/ceph/pull/47525 Backport Bot
04:51 PM Backport #56099: pacific: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
Laura Flores wrote:
> https://github.com/ceph/ceph/pull/46748
merged
Yuri Weinstein
04:44 PM Bug #55153 (Resolved): Make the mClock config options related to [res, wgt, lim] modifiable durin...
Sridhar Seshasayee
04:44 PM Backport #56498 (Resolved): quincy: Make the mClock config options related to [res, wgt, lim] mod...
Sridhar Seshasayee
04:40 PM Backport #56498: quincy: Make the mClock config options related to [res, wgt, lim] modifiable dur...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47020
merged
Yuri Weinstein
04:41 PM Bug #55435: mon/Elector: notify_ranked_removed() does not properly erase dead_ping in the case of...
https://github.com/ceph/ceph/pull/47086 merged Yuri Weinstein
04:06 PM Bug #52124 (Pending Backport): Invalid read of size 8 in handle_recovery_delete()
Kefu Chai
02:36 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
/a/yuriw-2022-08-04_11:58:29-rados-wip-yuri3-testing-2022-08-03-0828-pacific-distro-default-smithi/6958376 Kamoltat (Junior) Sirivadhna
03:32 PM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
/a/yuriw-2022-08-08_22:19:17-rados-wip-yuri-testing-2022-08-08-1230-quincy-distro-default-smithi/6962388/
Kamoltat (Junior) Sirivadhna
01:53 PM Bug #45318: Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in cluster log running...
/a/yuriw-2022-08-04_11:58:29-rados-wip-yuri3-testing-2022-08-03-0828-pacific-distro-default-smithi/6958138 Kamoltat (Junior) Sirivadhna
12:35 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
Hi,
Another longer one.
OSD.25, data on sdh, db on sdb
Gilles Mocellin
11:13 AM Bug #56530 (Resolved): Quincy: High CPU and slow progress during backfill
Neha Ojha
11:13 AM Backport #57052 (Resolved): quincy: Quincy: High CPU and slow progress during backfill
Neha Ojha
09:11 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
Seen in recent quincy run https://pulpito.ceph.com/yuriw-2022-08-02_21:20:37-fs-wip-yuri7-testing-2022-07-27-0808-qui... Kotresh Hiremath Ravishankar
08:08 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
Seeing this in a Quincy run:
/a/yuriw-2022-08-08_22:19:32-rados-wip-yuri4-testing-2022-08-08-1009-quincy-distro-defa...
Sridhar Seshasayee
06:58 AM Bug #47589: radosbench times out "reached maximum tries (800) after waiting for 4800 seconds"
Seeing this on a Quincy run:
/a/yuriw-2022-08-08_22:19:32-rados-wip-yuri4-testing-2022-08-08-1009-quincy-distro-defa...
Sridhar Seshasayee
06:34 AM Backport #49775 (Rejected): nautilus: Get more parallel scrubs within osd_max_scrubs limits
Nautilus is EOL Konstantin Shalygin

08/08/2022

03:53 PM Bug #57061 (Fix Under Review): Use single cluster log level (mon_cluster_log_level) config to con...
Prashant D
02:56 PM Bug #57061 (Fix Under Review): Use single cluster log level (mon_cluster_log_level) config to con...
We donot control the verbosity of the cluster logs which are getting logged to stderr, graylog and journald. Each Log... Prashant D
12:19 PM Bug #49231: MONs unresponsive over extended periods of time
We are planning to upgrade to Octopus. However, I do not believe we can reproduce the issue here. The above config ha... Frank Schilder
08:51 AM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
Dan van der Ster wrote:
> Good point. In fact it is sufficient to just create some files in the cephfs after taking ...
Matan Breizman
08:43 AM Backport #51497 (Rejected): nautilus: mgr spamming with repeated set pgp_num_actual while merging
Nautilus is EOL Konstantin Shalygin
08:42 AM Bug #48212 (Resolved): poollast_epoch_clean floor is stuck after pg merging
Nautilus is EOL Konstantin Shalygin
08:41 AM Backport #52644 (Rejected): nautilus: pool last_epoch_clean floor is stuck after pg merging
Nautilus EOL Konstantin Shalygin
04:50 AM Backport #57052 (In Progress): quincy: Quincy: High CPU and slow progress during backfill
Sridhar Seshasayee
04:30 AM Backport #57052 (Resolved): quincy: Quincy: High CPU and slow progress during backfill
https://github.com/ceph/ceph/pull/47490 Backport Bot
04:27 AM Bug #56530 (Pending Backport): Quincy: High CPU and slow progress during backfill
Sridhar Seshasayee

08/07/2022

09:38 AM Bug #45702: PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter...
http://pulpito.front.sepia.ceph.com/rfriedma-2022-08-06_12:17:03-rados-wip-rf-snprefix-distro-default-smithi/6960416/... Ronen Friedman

08/06/2022

10:06 PM Backport #55631: pacific: ceph-osd takes all memory before oom on boot
Who is in charge of this one? Is there any advance? Gonzalo Aguilar Delgado
10:03 PM Bug #53729: ceph-osd takes all memory before oom on boot
Wow I`m quite surprised to see this is taking so much time to be resolved. >
Can someone do a small recap on what's ...
Gonzalo Aguilar Delgado

08/05/2022

04:13 PM Bug #57049 (Duplicate): cluster logging does not adhere to mon_cluster_log_file_level
Even after setting mon_cluster_log_file_level to info or less verbose level, we are still seeing debug logs are getti... Prashant D
01:26 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
Hello,
We don't have as much stalled moments these last days, only ~ 5 min.
I've taken some logs but really at the ...
Gilles Mocellin

08/04/2022

03:29 PM Bug #55809: "Leak_IndirectlyLost" valgrind report on mon.c
/a/yuriw-2022-08-03_20:33:43-rados-wip-yuri8-testing-2022-08-03-1028-quincy-distro-default-smithi/6957591 Laura Flores
11:36 AM Fix #57040 (Fix Under Review): osd: Update osd's IOPS capacity using async Context completion ins...
Sridhar Seshasayee
10:47 AM Fix #57040 (Resolved): osd: Update osd's IOPS capacity using async Context completion instead of ...
The method, OSD::mon_cmd_set_config(), sets a config option related to
mClock during OSD boot-up. The method waits o...
Sridhar Seshasayee
05:12 AM Backport #57030 (In Progress): quincy: rados/test.sh: Early exit right after LibRados global test...
Nitzan Mordechai
05:10 AM Backport #57029 (In Progress): pacific: rados/test.sh: Early exit right after LibRados global tes...
Nitzan Mordechai

08/03/2022

07:26 PM Backport #57020: pacific: test_pool_min_size: AssertionError: not clean before minsize thrashing ...
https://github.com/ceph/ceph/pull/47446 Kamoltat (Junior) Sirivadhna
03:10 PM Backport #57020 (Resolved): pacific: test_pool_min_size: AssertionError: not clean before minsize...
Backport Bot
07:25 PM Backport #57022: pacific: test_pool_min_size: 'check for active or peered' reached maximum tries ...
https://github.com/ceph/ceph/pull/47446 Kamoltat (Junior) Sirivadhna
03:12 PM Backport #57022 (Resolved): pacific: test_pool_min_size: 'check for active or peered' reached max...
Backport Bot
07:23 PM Backport #57019: quincy: test_pool_min_size: AssertionError: not clean before minsize thrashing s...
https://github.com/ceph/ceph/pull/47445 Kamoltat (Junior) Sirivadhna
03:10 PM Backport #57019 (Resolved): quincy: test_pool_min_size: AssertionError: not clean before minsize ...
Backport Bot
07:22 PM Backport #57024: quincy: test_pool_min_size: 'check for active or peered' reached maximum tries (...
https://github.com/ceph/ceph/pull/47445 Kamoltat (Junior) Sirivadhna
03:12 PM Backport #57024 (Resolved): quincy: test_pool_min_size: 'check for active or peered' reached maxi...
Backport Bot
07:22 PM Backport #57023 (Rejected): octopus: test_pool_min_size: 'check for active or peered' reached max...
Kamoltat (Junior) Sirivadhna
03:12 PM Backport #57023 (Rejected): octopus: test_pool_min_size: 'check for active or peered' reached max...
Backport Bot
07:13 PM Backport #57026: pacific: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout ...
https://github.com/ceph/ceph/pull/47446/ Kamoltat (Junior) Sirivadhna
03:16 PM Backport #57026 (Resolved): pacific: test_pool_min_size:AssertionError:wait_for_clean:failed befo...
Backport Bot
07:04 PM Backport #57025: quincy: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout e...
https://github.com/ceph/ceph/pull/47445 Kamoltat (Junior) Sirivadhna
03:16 PM Backport #57025 (Resolved): quincy: test_pool_min_size:AssertionError:wait_for_clean:failed befor...
Backport Bot
05:30 PM Backport #57030 (Resolved): quincy: rados/test.sh: Early exit right after LibRados global tests c...
https://github.com/ceph/ceph/pull/47452 Backport Bot
05:30 PM Backport #57029 (Resolved): pacific: rados/test.sh: Early exit right after LibRados global tests ...
https://github.com/ceph/ceph/pull/47451 Backport Bot
05:27 PM Bug #55001 (Pending Backport): rados/test.sh: Early exit right after LibRados global tests complete
Laura Flores
03:05 PM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
https://github.com/ceph/ceph/pull/47165 merged Yuri Weinstein
03:10 PM Bug #51904 (Pending Backport): test_pool_min_size:AssertionError:wait_for_clean:failed before tim...
Kamoltat (Junior) Sirivadhna
03:09 PM Bug #54511 (Pending Backport): test_pool_min_size: AssertionError: not clean before minsize thras...
Kamoltat (Junior) Sirivadhna
03:09 PM Bug #49777 (Pending Backport): test_pool_min_size: 'check for active or peered' reached maximum t...
Kamoltat (Junior) Sirivadhna
02:58 PM Bug #57017 (Pending Backport): mon-stretched_cluster: degraded stretched mode lead to Monitor crash
There are certain scenarios in degraded
stretched cluster where will try to
go into the
function Monitor::go_recov...
Kamoltat (Junior) Sirivadhna
02:33 PM Feature #23493 (Resolved): config: strip/escape single-quotes in values when setting them via con...
Laura Flores
11:50 AM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
To get some more insight on the issue I would suggest to do the following once the issue is faced again:
1) For OSD-...
Igor Fedotov
06:02 AM Bug #55773 (Resolved): Assertion failure (ceph_assert(have_pending)) when creating new OSDs durin...
Sridhar Seshasayee
06:01 AM Backport #56060 (Resolved): quincy: Assertion failure (ceph_assert(have_pending)) when creating n...
Sridhar Seshasayee

08/01/2022

11:32 PM Bug #37808: osd: osdmap cache weak_refs assert during shutdown
/a/yuriw-2022-07-27_22:35:53-rados-wip-yuri8-testing-2022-07-27-1303-pacific-distro-default-smithi/6950918 Laura Flores
08:13 PM Tasks #56952 (In Progress): Set mgr_pool to true for a handful of tests in the rados qa suite
17.2.2 had the libcephsqlite failure. I am scheduling some rados/thrash tests here to see the current results. Since ... Laura Flores
05:54 PM Bug #56707: pglog growing unbounded on EC with copy by ref
I was able to try the patch on Pacific this morning. Running one OSD with the patch, getting 500s from RGW when I pre... Alexandre Marangone
03:05 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
I've just had a latency plateau. No scrub/deep-scrub on the impacted OSD during that time...
At least, no message in...
Gilles Mocellin
12:42 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
Zero occurrence of "timed out" in all my ceph-osds logs for 2 days. But, as I have increased bluestore_prefer_deferre... Gilles Mocellin
12:10 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
Gilles Mocellin wrote:
> This morning, I have :
> PG_NOT_DEEP_SCRUBBED: 11 pgs not deep-scrubbed in time
> Never h...
Igor Fedotov
08:36 AM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
This morning, I have :
PG_NOT_DEEP_SCRUBBED: 11 pgs not deep-scrubbed in time
Never had before Pacific.
Could it...
Gilles Mocellin
06:29 AM Backport #55157 (In Progress): quincy: mon: config commands do not accept whitespace style config...
Nitzan Mordechai
06:28 AM Backport #55156 (In Progress): pacific: mon: config commands do not accept whitespace style confi...
Nitzan Mordechai
06:03 AM Bug #52124 (Fix Under Review): Invalid read of size 8 in handle_recovery_delete()
Nitzan Mordechai

07/29/2022

02:31 PM Bug #51945: qa/workunits/mon/caps.sh: Error: Expected return 13, got 0
... Radoslaw Zarzynski
10:52 AM Bug #51945: qa/workunits/mon/caps.sh: Error: Expected return 13, got 0
The wrong return code is just an echo of a failure with an auth entity deletion:... Radoslaw Zarzynski
05:57 AM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
as i said, i dont have any more logs, as i had to bring the cluster back in a working state.
As this issue is comi...
Chris Kul
04:24 AM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
Hm... at first glance, OSD calls stop_block() on a head object, which is already stopped, in kick_object_context_bloc... Myoungwon Oh

07/28/2022

09:54 PM Feature #56956 (Fix Under Review): osdc: Add objecter fastfail
Vikhyat Umrao
09:54 PM Feature #56956 (Fix Under Review): osdc: Add objecter fastfail
There is no point in indefinitely waiting when pg of an object is inactive. It is appropriate to cancel the op in suc... Vikhyat Umrao
07:12 PM Tasks #56952 (Closed): Set mgr_pool to true for a handful of tests in the rados qa suite
In most places in the rados suite we use `sudo ceph config set mgr mgr_pool false --force` (see https://github.com/ce... Laura Flores
01:37 PM Bug #56707: pglog growing unbounded on EC with copy by ref
That is very strange. I've been able to reproduce 100% of the time with this:... Alexandre Marangone
12:41 PM Bug #56707 (Fix Under Review): pglog growing unbounded on EC with copy by ref
Nitzan Mordechai
12:41 PM Bug #56707: pglog growing unbounded on EC with copy by ref
Alex, thanks for the information. Unfortunately, I couldn't recreate the issue, but I did found some issue with refco... Nitzan Mordechai
02:23 AM Bug #56926 (New): crash: int BlueFS::_flush_range_F(BlueFS::FileWriter*, uint64_t, uint64_t): abort

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=97c9a15c7262222fd841813a...
Telemetry Bot
02:22 AM Bug #56903 (New): crash: int fork_function(int, std::ostream&, std::function<signed char()>): ass...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8749e9b5d1fac718fbbb96fb...
Telemetry Bot
02:22 AM Bug #56901 (New): crash: LogMonitor::log_external_backlog()

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=64ca4b6b04c168da450a852a...
Telemetry Bot
02:22 AM Bug #56896 (New): crash: int OSD::shutdown(): assert(end_time - start_time_func < cct->_conf->osd...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=50bf2266e28cc1764b47775b...
Telemetry Bot
02:22 AM Bug #56895 (New): crash: void MissingLoc::add_active_missing(const pg_missing_t&): assert(0 == "u...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=f96348a2ae0d2c754de01fc7...
Telemetry Bot
02:22 AM Bug #56892 (New): crash: StackStringBuf<4096ul>::xsputn(char const*, long)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3a3287f5eaa9fbb99295b2b7...
Telemetry Bot
02:22 AM Bug #56890 (New): crash: MOSDRepOp::encode_payload(unsigned long)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=9be8aeab4dd246c5baf1f1c7...
Telemetry Bot
02:22 AM Bug #56889 (New): crash: MOSDRepOp::encode_payload(unsigned long)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=fce79f2ea6c1a34825a23dd9...
Telemetry Bot
02:22 AM Bug #56888 (New): crash: int fork_function(int, std::ostream&, std::function<signed char()>): ass...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8df8f5fbb1ef85f0956e0f78...
Telemetry Bot
02:22 AM Bug #56887 (New): crash: void BlueStore::_do_write_small(BlueStore::TransContext*, BlueStore::Col...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=625223857a28a74eae75273a...
Telemetry Bot
02:22 AM Bug #56883 (New): crash: rocksdb::BlockBasedTableBuilder::Add(rocksdb::Slice const&, rocksdb::Sli...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=ae08527e7a8d310b5740fbf6...
Telemetry Bot
02:21 AM Bug #56878 (New): crash: MonitorDBStore::get_synchronizer(std::pair<std::basic_string<char, std::...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=5cacc7785f8a352e3cd86982...
Telemetry Bot
02:21 AM Bug #56873 (New): crash: int OSD::shutdown(): assert(end_time - start_time_func < cct->_conf->osd...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=210d418989a6bc9fdb60989c...
Telemetry Bot
02:21 AM Bug #56872 (New): crash: __cxa_rethrow()

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=5ce84c33423abe42eac8cc98...
Telemetry Bot
02:21 AM Bug #56871 (New): crash: __cxa_rethrow()

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3c6c9906c46f7979e39f2a3d...
Telemetry Bot
02:21 AM Bug #56867 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e151a6a9ae5a0a079dad1ca4...
Telemetry Bot
02:21 AM Bug #56863 (New): crash: void RDMAConnectedSocketImpl::handle_connection(): assert(!r)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=d1c8198db9a116b38c161a79...
Telemetry Bot
02:21 AM Bug #56856 (New): crash: ceph::buffer::list::iterator_impl<true>::copy(unsigned int, std::basic_s...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=03d7803d6cda8b31445b5fa2...
Telemetry Bot
02:21 AM Bug #56855 (New): crash: rocksdb::CompactionJob::Run()

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=b79a082186434ab8becebddb...
Telemetry Bot
02:21 AM Bug #56850 (Resolved): crash: void PaxosService::propose_pending(): assert(have_pending)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=106ff764dfe8a5f766a511a1...
Telemetry Bot
02:21 AM Bug #56849 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=5ff0cd923e0b4beb646ae133...
Telemetry Bot
02:20 AM Bug #56848 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=0dcd9dfbff0c25591d64a41a...
Telemetry Bot
02:20 AM Bug #56847 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=7a53cbc0bcdeffa2f26d71d0...
Telemetry Bot
02:20 AM Bug #56843 (New): crash: int fork_function(int, std::ostream&, std::function<signed char()>): ass...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=339539062c280c5c4e5e605c...
Telemetry Bot
02:20 AM Bug #56837 (New): crash: __assert_perror_fail()

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8b423fcbfb14f36724d15462...
Telemetry Bot
02:20 AM Bug #56835 (New): crash: ceph::logging::detail::JournaldClient::JournaldClient(): assert(fd > 0)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e226e4ce8be4c94d64dd6104...
Telemetry Bot
02:20 AM Bug #56833 (New): crash: __assert_perror_fail()

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e0d06d29c57064910751db9d...
Telemetry Bot
02:20 AM Bug #56826 (New): crash: MOSDPGLog::encode_payload(unsigned long)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=ee3ed1408924d926185a65e3...
Telemetry Bot
02:20 AM Bug #56821 (New): crash: MOSDRepOp::encode_payload(unsigned long)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=6d21b2c78bcc5092dac5bcc9...
Telemetry Bot
02:19 AM Bug #56816 (New): crash: unsigned long const md_config_t::get_val<unsigned long>(ConfigValues con...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=66ff3f43b85f15283932865d...
Telemetry Bot
02:19 AM Bug #56814 (New): crash: rocksdb::MemTableIterator::key() const

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=7329bea2aaafb66aa5060938...
Telemetry Bot
02:19 AM Bug #56813 (New): crash: MOSDPGLog::encode_payload(unsigned long)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e4eeb1a3b34df8062d7d1788...
Telemetry Bot
02:19 AM Bug #56809 (New): crash: MOSDPGScan::encode_payload(unsigned long)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=2fe9b06ce88dccd8c9fe8f41...
Telemetry Bot
02:18 AM Bug #56797 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3c3fa597eda743682305f64b...
Telemetry Bot
02:18 AM Bug #56796 (New): crash: void ECBackend::handle_recovery_push(const PushOp&, RecoveryMessages*, b...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=dbf2120428a133c3689fa508...
Telemetry Bot
02:18 AM Bug #56794 (New): crash: void LogMonitor::_create_sub_incremental(MLog*, int, version_t): assert(...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=1f3b5497ed0df042120d8ff7...
Telemetry Bot
02:17 AM Bug #56793 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=54876dfe5b7062de7d1d3ee5...
Telemetry Bot
02:17 AM Bug #56789 (New): crash: void RDMAConnectedSocketImpl::handle_connection(): assert(!r)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=a87f94f67786787071927f90...
Telemetry Bot
02:17 AM Bug #56787 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=f58b099fd24ce33032cf74bd...
Telemetry Bot
02:17 AM Bug #56785 (New): crash: void OSDShard::register_and_wake_split_child(PG*): assert(!slot->waiting...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=d44ea277d2ae53e186d6b488...
Telemetry Bot
02:16 AM Bug #56781 (New): crash: virtual void OSDMonitor::update_from_paxos(bool*): assert(version > osdm...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=4aed07fd08164fe65fe7c6e0...
Telemetry Bot
02:16 AM Bug #56780 (New): crash: virtual void AuthMonitor::update_from_paxos(bool*): assert(version > key...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=11756492895a3349dfb227aa...
Telemetry Bot
02:16 AM Bug #56779 (New): crash: void MissingLoc::add_active_missing(const pg_missing_t&): assert(0 == "u...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=83d5be7b2d08c79f23a10dba...
Telemetry Bot
02:16 AM Bug #56778 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=803b4a91fd84c3d26353cb47...
Telemetry Bot
02:16 AM Bug #56776 (New): crash: std::string MonMap::get_name(unsigned int) const: assert(n < ranks.size())

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=7464294c2c2ac69856297e37...
Telemetry Bot
02:16 AM Bug #56773 (New): crash: int64_t BlueFS::_read_random(BlueFS::FileReader*, uint64_t, uint64_t, ch...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=ba26d388e9213afb18b683ee...
Telemetry Bot
02:16 AM Bug #56772 (New): crash: uint64_t SnapSet::get_clone_bytes(snapid_t) const: assert(clone_overlap....

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=62b8a9e7f0bb7fc1fc81b2dc...
Telemetry Bot
02:16 AM Bug #56770 (New): crash: void OSDShard::register_and_wake_split_child(PG*): assert(p != pg_slots....

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=d9289f1067de7f0cc0e374ff...
Telemetry Bot
02:15 AM Bug #56764 (New): crash: uint64_t SnapSet::get_clone_bytes(snapid_t) const: assert(clone_size.cou...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3969752632dfdff2c710083a...
Telemetry Bot
02:14 AM Bug #56756 (New): crash: long const md_config_t::get_val<long>(ConfigValues const&, std::basic_st...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=a4792692d74b82c4590d9b51...
Telemetry Bot
02:14 AM Bug #56755 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

*New crash events were reported via Telemetry with newer versions (['16.2.6', '16.2.7', '16.2.9']) than encountered...
Telemetry Bot
02:14 AM Bug #56754 (New): crash: DeviceList::DeviceList(ceph::common::CephContext*): assert(num)

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=17b0ccd87cab46177149698e...
Telemetry Bot
02:14 AM Bug #56752 (New): crash: void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_t) [wi...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=34f05776defb000d033885b3...
Telemetry Bot
02:14 AM Bug #56750 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=eb1729ae63d80bd79b6ea92b...
Telemetry Bot
02:14 AM Bug #56749 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=9bfe9728f3e90e92bcab42f9...
Telemetry Bot
02:13 AM Bug #56748 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...

*New crash events were reported via Telemetry with newer versions (['16.2.0', '16.2.1', '16.2.2', '16.2.5', '16.2.6...
Telemetry Bot
02:13 AM Bug #56747 (New): crash: std::__cxx11::string MonMap::get_name(unsigned int) const: assert(n < ra...

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=0846d215ecad4c78633623e5...
Telemetry Bot

07/27/2022

11:46 PM Backport #56736 (In Progress): quincy: unessesarily long laggy PG state
https://github.com/ceph/ceph/pull/47901 Backport Bot
11:46 PM Backport #56735 (Resolved): octopus: unessesarily long laggy PG state
Backport Bot
11:46 PM Backport #56734 (In Progress): pacific: unessesarily long laggy PG state
https://github.com/ceph/ceph/pull/47899 Backport Bot
11:40 PM Bug #53806 (Pending Backport): unessesarily long laggy PG state
Kefu Chai
06:23 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943721/remote/smithi042/l... Kamoltat (Junior) Sirivadhna
05:58 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
Moving to next week's bug scrub. Radoslaw Zarzynski
05:59 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
Tried that a few times for different PGs on different OSDs, but it doesn't help Pascal Ehlert
05:47 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
Pascal Ehlert wrote:
> This indeed happened during an upgrade from Octopus to Pacific.
> I had forgotten to reduce ...
Neha Ojha
12:24 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
This indeed happened during an upgrade from Octopus to Pacific.
I had forgotten to reduce the number of ranks in Cep...
Pascal Ehlert
05:54 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
Nitzan, could it be a different issue? Radoslaw Zarzynski
04:27 PM Bug #56733 (New): Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
Hello,
Since our upgrade to Pacific, we suffer from sporadic latencies on disks, not always the same.
The cluster...
Gilles Mocellin
02:09 PM Bug #55851 (Fix Under Review): Assert in Ceph messenger
Radoslaw Zarzynski
01:37 PM Bug #56707: pglog growing unbounded on EC with copy by ref
>1. "dumping the refcount" - how did you dump the refcount?
I extracted it with rados getxattr refcont and used the...
Alexandre Marangone
10:50 AM Bug #56707: pglog growing unbounded on EC with copy by ref
Alex
few more question, so i'll be able to recreate the scenario as you got it
1. "dumping the refcount" - how did ...
Nitzan Mordechai
07:20 AM Backport #56723 (Resolved): quincy: osd thread deadlock
https://github.com/ceph/ceph/pull/47930 Backport Bot
07:20 AM Backport #56722 (Resolved): pacific: osd thread deadlock
https://github.com/ceph/ceph/pull/48254 Backport Bot
07:16 AM Bug #55355 (Pending Backport): osd thread deadlock
Radoslaw Zarzynski

07/26/2022

03:14 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
All the tests that this has failed on involve thrashing. Specifically, they all use thrashosds-health.yaml (https://g... Laura Flores
03:09 PM Bug #56707: pglog growing unbounded on EC with copy by ref
That was faster than I thought. Attached massif outfile (let me know if that's what you expect not super familiar wit... Alexandre Marangone
02:41 PM Bug #56707: pglog growing unbounded on EC with copy by ref
I don't have one handy everything is in prometheus and sharing a screen of all the mempools isn't very legible. Valgr... Alexandre Marangone
02:04 PM Bug #56707: pglog growing unbounded on EC with copy by ref
Alexandre, can you please send us the dump_mempools and if you can also run valgrind massif ? Nitzan Mordechai
02:57 PM Backport #51287 (Resolved): pacific: LibRadosService.StatusFormat failed, Expected: (0) != (retry...
Laura Flores
02:32 PM Bug #55851: Assert in Ceph messenger
Perhaps we should move into @deactivate_existing@ part of @reuse_connection()@ where we hold both locks the same time. Radoslaw Zarzynski
02:28 PM Bug #55851 (In Progress): Assert in Ceph messenger
Radoslaw Zarzynski
02:27 PM Bug #55851: Assert in Ceph messenger
It looks @reuse_connection()@ holds the ... Radoslaw Zarzynski
02:18 PM Bug #55851: Assert in Ceph messenger
The number of elements in @FrameAssembler::m_desc@ can be altered only by:
1. ...
Radoslaw Zarzynski
03:17 AM Fix #56709 (Resolved): test/osd/TestPGLog: Fix confusing description between log and olog.
https://github.com/ceph/ceph/pull/47272
test/osd/TestPGLog.cc has a mistake description between log and olog in ...
dheart joe

07/25/2022

11:21 PM Bug #56707: pglog growing unbounded on EC with copy by ref
Attached a pglog at the peak of one prod issue. I had to redact the object names since it's prod but let me know if y... Alexandre Marangone
11:07 PM Bug #56707: pglog growing unbounded on EC with copy by ref
Can you dump the pg log using the ceph-objectstore-tool when the OSD is consuming high memory and share it with us? Neha Ojha
10:51 PM Bug #56707 (Pending Backport): pglog growing unbounded on EC with copy by ref

*How to reproduce*
- create a 10GB object in bucket1 using multipart upload
- copy object 200x via s3:Objec...
Alexandre Marangone
10:46 PM Bug #56700: MGR pod on CLBO on rook deployment
I am hitting a bunch of these failures on a recent teuthology run I scheduled. The ceph version is 17.2.0:
http://...
Laura Flores
05:34 PM Bug #56700: MGR pod on CLBO on rook deployment
Quoting from a chat group:
@Travis Nielsen I think the issue you are seeing was first seen in https://tracker.ceph...
Neha Ojha
04:59 PM Bug #56700 (Duplicate): MGR pod on CLBO on rook deployment
Vikhyat Umrao
04:51 PM Bug #56700: MGR pod on CLBO on rook deployment
Parth Arora wrote:
> MGR pod is failing for the new ceph version v17.2.2, till v17.2.1 it was working fine.
>
> ...
Parth Arora
04:48 PM Bug #56700 (Duplicate): MGR pod on CLBO on rook deployment
MGR pod is failing for the new ceph version v17.2.2, tillv17.2.1 it was working fine.
```
29: PyObject_Call()
3...
Parth Arora
10:19 PM Bug #53768 (New): timed out waiting for admin_socket to appear after osd.2 restart in thrasher/de...
Joseph Sawaya
08:55 PM Bug #53768: timed out waiting for admin_socket to appear after osd.2 restart in thrasher/defaults...
job dead hit max timeout but trace back suggests:... Kamoltat (Junior) Sirivadhna
08:32 PM Bug #53768: timed out waiting for admin_socket to appear after osd.2 restart in thrasher/defaults...
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6944338/ Kamoltat (Junior) Sirivadhna
07:30 PM Bug #53768: timed out waiting for admin_socket to appear after osd.2 restart in thrasher/defaults...
Hey Joseph what's the status on this? Kamoltat (Junior) Sirivadhna
07:28 PM Bug #53768: timed out waiting for admin_socket to appear after osd.2 restart in thrasher/defaults...
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943791/ Kamoltat (Junior) Sirivadhna
07:03 PM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943763/ Kamoltat (Junior) Sirivadhna
02:04 PM Bug #55435: mon/Elector: notify_ranked_removed() does not properly erase dead_ping in the case of...
https://github.com/ceph/ceph/pull/47087 merged Yuri Weinstein

07/24/2022

08:09 AM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
Myoungwon Oh, can you please take a look? Nitzan Mordechai
07:36 AM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
Sadly i dont have any more logs anymore, as i had to destroy the ceph - getting it back in working order was top prio... Chris Kul
05:28 AM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
@Chris Kul, I'm trying to understand the sequence of failing osd's, can you please upload the osds logs that failed?
...
Nitzan Mordechai

07/21/2022

08:29 PM Bug #55836: add an asok command for pg log investigations
https://github.com/ceph/ceph/pull/46561 merged Yuri Weinstein
07:19 PM Bug #56530 (Fix Under Review): Quincy: High CPU and slow progress during backfill
Sridhar Seshasayee
06:58 PM Bug #56530: Quincy: High CPU and slow progress during backfill
The issue is addressed currently in Ceph's main branch. Please see the linked PR. This will be back-ported to Quincy ... Sridhar Seshasayee
02:59 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
Just a note, i was able to recreate it with vstart, without error injection but with valgrind
as soon as we step in...
Nitzan Mordechai
02:00 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
Ah, thanks Sridhar. I will compare the two Trackers and mark this one as a duplicate if needed. Laura Flores
02:57 AM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
This looks similar to https://tracker.ceph.com/issues/52948. See comment https://tracker.ceph.com/issues/52948#note-5... Sridhar Seshasayee
02:57 PM Backport #56664 (In Progress): quincy: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change shou...
https://github.com/ceph/ceph/pull/47210 Kamoltat (Junior) Sirivadhna
02:45 PM Backport #56664 (Resolved): quincy: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should ...
Backport Bot
02:49 PM Backport #56663: pacific: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should be gap >= ...
https://github.com/ceph/ceph/pull/47211 Kamoltat (Junior) Sirivadhna
02:45 PM Backport #56663 (Resolved): pacific: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should...
Backport Bot
02:40 PM Bug #56151 (Pending Backport): mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should be ga...
Kamoltat (Junior) Sirivadhna
01:34 PM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
BTW the initial version was 17.2.0, we tried to update to 17.2.1 in hope this bug got fixed, sadly without luck. Chris Kul
01:33 PM Bug #56661 (Need More Info): Quincy: OSD crashing one after another with data loss with ceph_asse...
After two weeks after an upgrade to quincy from a octopus setup, the SSD pool reported one OSD down in the middle of ... Chris Kul
09:05 AM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
Looks like a race condition. Does our a @Context@ makes a dependency on @RefCountedObj@ (e.g. @TrackedOp@) but forget... Radoslaw Zarzynski

07/20/2022

11:33 PM Bug #44089 (New): mon: --format=json does not work for config get or show
This would be a good issue for Open Source Day if someone would be willing to take over the closed PR: https://github... Laura Flores
09:40 PM Bug #56530: Quincy: High CPU and slow progress during backfill
ceph-users discussion - https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/Z7AILAXZDBIT6IIF2E6M3BLUE6B7L... Vikhyat Umrao
07:45 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
Found another occurrence here: /a/yuriw-2022-07-18_18:20:02-rados-wip-yuri8-testing-2022-07-18-0918-distro-default-sm... Laura Flores
06:11 PM Bug #56574 (Need More Info): rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down...
Watching for more reoccurances. Radoslaw Zarzynski
10:25 AM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
osd.0 is still down..
The valagrind for osd.0 shows:...
Nitzan Mordechai
06:25 PM Bug #51168: ceph-osd state machine crash during peering process
Yao Ning wrote:
> Radoslaw Zarzynski wrote:
> > The PG was in @ReplicaActive@ so we shouldn't see any backfill acti...
Neha Ojha
06:06 PM Backport #56656 (New): pacific: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDur...
Backport Bot
06:06 PM Backport #56655 (Resolved): quincy: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlus...
https://github.com/ceph/ceph/pull/47929 Backport Bot
06:03 PM Bug #53294 (Pending Backport): rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuri...
Neha Ojha
03:20 PM Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
/a/yuriw-2022-07-19_23:25:12-rados-wip-yuri2-testing-2022-07-15-0755-pacific-distro-default-smithi/6939431... Laura Flores
06:02 PM Bug #49754: osd/OSD.cc: ceph_abort_msg("abort() called") during OSD::shutdown()
Notes from the scrub:
1. It looks this happens mostly (only?) on pacific.
2. In at least of two replications Valg...
Radoslaw Zarzynski
05:56 PM Bug #49754: osd/OSD.cc: ceph_abort_msg("abort() called") during OSD::shutdown()
... Radoslaw Zarzynski
03:58 PM Bug #49754: osd/OSD.cc: ceph_abort_msg("abort() called") during OSD::shutdown()
/a/yuriw-2022-07-19_23:25:12-rados-wip-yuri2-testing-2022-07-15-0755-pacific-distro-default-smithi/6939660 Laura Flores
04:42 PM Backport #56408: quincy: ceph version 16.2.7 PG scrubs not progressing
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46844
merged
Yuri Weinstein
04:40 PM Backport #56060: quincy: Assertion failure (ceph_assert(have_pending)) when creating new OSDs dur...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46689
merged
Yuri Weinstein
04:40 PM Bug #49525: found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi10121515-14:e4 snaps missi...
https://github.com/ceph/ceph/pull/46498 merged Yuri Weinstein
04:08 PM Bug #55809: "Leak_IndirectlyLost" valgrind report on mon.c
/a/yuriw-2022-07-19_23:25:12-rados-wip-yuri2-testing-2022-07-15-0755-pacific-distro-default-smithi/6939513 Laura Flores
04:07 PM Bug #53767 (Duplicate): qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing c...
Same failure on test_cls_2pc_queue.sh, but this one came with remote logs. I suspect this is a duplicate of #55809.
...
Laura Flores
03:43 PM Bug #43584: MON_DOWN during mon_join process
/a/yuriw-2022-07-19_23:25:12-rados-wip-yuri2-testing-2022-07-15-0755-pacific-distro-default-smithi/6939512 Laura Flores
02:50 PM Bug #56650: ceph df reports invalid MAX AVAIL value for stretch mode crush rule
Before applying PR#47189, MAX AVAIL for stretch_rule pools is incorrect :... Prashant D
02:07 PM Bug #56650 (Fix Under Review): ceph df reports invalid MAX AVAIL value for stretch mode crush rule
Prashant D
01:26 PM Bug #56650 (Fix Under Review): ceph df reports invalid MAX AVAIL value for stretch mode crush rule
If we define crush rule for stretch mode cluster with multiple take then MAX AVAIL for pools associated with crush ru... Prashant D
01:15 PM Backport #56649 (Resolved): pacific: [Progress] Do not show NEW PG_NUM value for pool if autoscal...
https://github.com/ceph/ceph/pull/53464 Backport Bot
01:15 PM Backport #56648 (Resolved): quincy: [Progress] Do not show NEW PG_NUM value for pool if autoscale...
https://github.com/ceph/ceph/pull/47925 Backport Bot
01:14 PM Bug #56136 (Pending Backport): [Progress] Do not show NEW PG_NUM value for pool if autoscaler is ...
Prashant D

07/19/2022

09:20 PM Backport #56642 (Resolved): pacific: Log at 1 when Throttle::get_or_fail() fails
Backport Bot
09:20 PM Backport #56641 (Resolved): quincy: Log at 1 when Throttle::get_or_fail() fails
Backport Bot
09:18 PM Bug #56495 (Pending Backport): Log at 1 when Throttle::get_or_fail() fails
Brad Hubbard
02:07 PM Bug #56495: Log at 1 when Throttle::get_or_fail() fails
https://github.com/ceph/ceph/pull/47019 merged Yuri Weinstein
04:24 PM Bug #50222 (In Progress): osd: 5.2s0 deep-scrub : stat mismatch
Thanks Rishabh, I am having a look into this. Laura Flores
04:11 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
This error showed up in QA runs -
http://pulpito.front.sepia.ceph.com/rishabh-2022-07-08_23:53:34-fs-wip-rishabh-tes...
Rishabh Dave
10:25 AM Bug #55001 (Fix Under Review): rados/test.sh: Early exit right after LibRados global tests complete
Nitzan Mordechai
08:28 AM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
the core dump showing:... Nitzan Mordechai
08:28 AM Bug #49689 (Fix Under Review): osd/PeeringState.cc: ceph_abort_msg("past_interval start interval ...
PR is marked as draft for now. Matan Breizman
08:26 AM Backport #56580 (Resolved): octopus: snapshots will not be deleted after upgrade from nautilus to...
Matan Breizman
12:48 AM Bug #50853 (Can't reproduce): libcephsqlite: Core dump while running test_libcephsqlite.sh.
Patrick Donnelly

07/18/2022

08:43 PM Backport #56580: octopus: snapshots will not be deleted after upgrade from nautilus to pacific
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47108
merged
Yuri Weinstein
01:52 PM Bug #49777: test_pool_min_size: 'check for active or peered' reached maximum tries (5) after wait...
I was able to reproduce the problem after modifying qa/tasks/ceph_manager.py: https://github.com/ceph/ceph/pull/46931... Kamoltat (Junior) Sirivadhna
12:44 PM Bug #49777 (Fix Under Review): test_pool_min_size: 'check for active or peered' reached maximum t...
Kamoltat (Junior) Sirivadhna
01:50 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
/a/yuriw-2022-07-13_19:41:18-rados-wip-yuri7-testing-2022-07-11-1631-distro-default-smithi/6929396/remote/smithi204/... Aishwarya Mathuria
01:47 PM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
We have coredump and the console_log showing:
smithi042.log:[ 852.382596] ceph_test_rados[110223]: segfault at 0 ip...
Nitzan Mordechai
01:42 PM Backport #56604 (Resolved): pacific: ceph report missing osdmap_clean_epochs if answered by peon
https://github.com/ceph/ceph/pull/51258 Backport Bot
01:42 PM Backport #56603 (Rejected): octopus: ceph report missing osdmap_clean_epochs if answered by peon
Backport Bot
01:42 PM Backport #56602 (Resolved): quincy: ceph report missing osdmap_clean_epochs if answered by peon
https://github.com/ceph/ceph/pull/47928 Backport Bot
01:37 PM Bug #47273 (Pending Backport): ceph report missing osdmap_clean_epochs if answered by peon
Dan van der Ster
01:34 PM Bug #54511: test_pool_min_size: AssertionError: not clean before minsize thrashing starts
I was able to reproduce the problem after modifying qa/tasks/ceph_manager.py: https://github.com/ceph/ceph/pull/46931... Kamoltat (Junior) Sirivadhna
12:44 PM Bug #54511 (Fix Under Review): test_pool_min_size: AssertionError: not clean before minsize thras...
Kamoltat (Junior) Sirivadhna
01:16 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
I was able to reproduce the problem after modifying qa/tasks/ceph_manager.py: https://github.com/ceph/ceph/pull/46931... Kamoltat (Junior) Sirivadhna
12:44 PM Bug #51904 (Fix Under Review): test_pool_min_size:AssertionError:wait_for_clean:failed before tim...
Kamoltat (Junior) Sirivadhna
10:18 AM Bug #56575 (Fix Under Review): test_cls_lock.sh: ClsLock.TestExclusiveEphemeralStealEphemeral fai...
Nitzan Mordechai

07/17/2022

01:16 PM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
/a/yuriw-2022-07-15_19:06:53-rados-wip-yuri-testing-2022-07-15-0950-octopus-distro-default-smithi/6932690 Matan Breizman
01:04 PM Bug #52621: cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_de...
/a/yuriw-2022-07-15_19:06:53-rados-wip-yuri-testing-2022-07-15-0950-octopus-distro-default-smithi/6932687 Matan Breizman
09:03 AM Backport #56579 (In Progress): pacific: snapshots will not be deleted after upgrade from nautilus...
Matan Breizman
09:02 AM Backport #56578 (In Progress): quincy: snapshots will not be deleted after upgrade from nautilus ...
Matan Breizman
06:51 AM Bug #56575: test_cls_lock.sh: ClsLock.TestExclusiveEphemeralStealEphemeral fails from "method loc...
The lock expired, so the next ioctx.stat won't return -2 (-ENOENT) we need to change that as well based on r1 that re... Nitzan Mordechai

07/16/2022

03:18 PM Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific
This issue is fixed (including a unit test) and will be backported in order to prevent future clusters upgrades from ... Matan Breizman

07/15/2022

09:17 PM Cleanup #56581 (Fix Under Review): mon: fix ElectionLogic warnings
Laura Flores
09:06 PM Cleanup #56581 (Resolved): mon: fix ElectionLogic warnings
h3. Problem: compilation warnings in the ElectionLogic code... Laura Flores
08:58 PM Backport #56580 (In Progress): octopus: snapshots will not be deleted after upgrade from nautilus...
Neha Ojha
08:55 PM Backport #56580 (Resolved): octopus: snapshots will not be deleted after upgrade from nautilus to...
https://github.com/ceph/ceph/pull/47108 Backport Bot
08:55 PM Backport #56579 (Resolved): pacific: snapshots will not be deleted after upgrade from nautilus to...
https://github.com/ceph/ceph/pull/47134 Backport Bot
08:55 PM Backport #56578 (Resolved): quincy: snapshots will not be deleted after upgrade from nautilus to ...
https://github.com/ceph/ceph/pull/47133 Backport Bot
08:51 PM Bug #56147 (Pending Backport): snapshots will not be deleted after upgrade from nautilus to pacific
Neha Ojha
07:31 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
/a/nojha-2022-07-15_14:45:04-rados-snapshot_key_conversion-distro-default-smithi/6932156 Laura Flores
07:23 PM Bug #56574 (Need More Info): rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down...
Description: rados/valgrind-leaks/{1-start 2-inject-leak/osd centos_latest}
/a/nojha-2022-07-14_20:32:09-rados-sn...
Laura Flores
07:29 PM Bug #56575 (Pending Backport): test_cls_lock.sh: ClsLock.TestExclusiveEphemeralStealEphemeral fai...
/a/nojha-2022-07-14_20:32:09-rados-snapshot_key_conversion-distro-default-smithi/6930848... Laura Flores
12:09 PM Bug #56565 (Won't Fix): Not upgraded nautilus mons crash if upgraded pacific mon updates fsmap
I was just told there is a step in the upgrade documentation to set mon_mds_skip_sanity param before upgrade [1], whi... Mykola Golub
10:07 AM Bug #51168: ceph-osd state machine crash during peering process
Radoslaw Zarzynski wrote:
> The PG was in @ReplicaActive@ so we shouldn't see any backfill activity. A delayed event...
Yao Ning

07/14/2022

12:19 PM Bug #56565 (Won't Fix): Not upgraded nautilus mons crash if upgraded pacific mon updates fsmap
I have no idea if this needs to be fixed but at least the case looks worth reporting.
We faced the issue when upgr...
Mykola Golub
 

Also available in: Atom