Project

General

Profile

Activity

From 12/27/2022 to 01/25/2023

01/25/2023

11:53 PM Bug #58587: test_dedup_tool.sh: test_dedup_object fails when pool 'dedup_chunk_pool' does not exist
Hey Myoungwon, would you be able to take a look at this? Laura Flores
11:47 PM Bug #58587 (Pending Backport): test_dedup_tool.sh: test_dedup_object fails when pool 'dedup_chunk...
/a/yuriw-2023-01-21_17:58:46-rados-wip-yuri6-testing-2023-01-20-0728-distro-default-smithi/7132613... Laura Flores
11:25 PM Backport #58586 (Resolved): quincy: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in funct...
https://github.com/ceph/ceph/pull/49881 Backport Bot
11:21 PM Bug #56101 (Pending Backport): Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function s...
Laura Flores
04:01 PM Bug #58239 (In Progress): pacific: src/mon/Monitor.cc: FAILED ceph_assert(osdmon()->is_writeable())
This bug is not yet resolved, also removing PR number since
49412 is a revert PR.
Kamoltat (Junior) Sirivadhna
02:26 PM Bug #58496: osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.empty())
I think I know where the bug is. Will update. Ronen Friedman

01/24/2023

05:20 PM Bug #58141: mon/MonCommands: Support dump_historic_slow_ops
https://github.com/ceph/ceph/pull/48972 merged Yuri Weinstein
05:16 PM Bug #56707: pglog growing unbounded on EC with copy by ref
https://github.com/ceph/ceph/pull/47332 merged Yuri Weinstein
07:55 AM Bug #58496: osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.empty())
/a/yuriw-2023-01-23_17:16:25-rados-wip-yuri6-testing-2023-01-22-0750-distro-default-smithi/7134021 Nitzan Mordechai

01/22/2023

07:58 AM Bug #56097: Timeout on `sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ...
/a/yuriw-2023-01-21_17:58:46-rados-wip-yuri6-testing-2023-01-20-0728-distro-default-smithi/7132857 Nitzan Mordechai

01/20/2023

08:48 PM Bug #58529: osd: very slow recovery due to delayed push reply messages
I've opened this bug to track the slow backfill behavior from https://tracker.ceph.com/issues/58498, which appears to... Samuel Just
08:47 PM Bug #58529 (Resolved): osd: very slow recovery due to delayed push reply messages
I took a look at the logs for pg114.d6 attached to this tracker. The cost for the push replies is calculated to over
...
Samuel Just
04:46 AM Bug #58505: Wrong calculate free space OSD and PG used bytes
NOT quite sure, but it looks like it's calculated here (./src/osd/OSD.cc #1070):... Andrey Groshev

01/19/2023

11:04 AM Bug #58505 (Need More Info): Wrong calculate free space OSD and PG used bytes
I added a new node with OSD to the cluster. Now I'm adding several disks each. After a short balancing time , the fol... Andrey Groshev
09:06 AM Bug #58467: osd: Only have one osd daemon no reply heartbeat on one node
It is recommended to adjust the upload file size limit to 10M :) yite gu
09:04 AM Bug #58467: osd: Only have one osd daemon no reply heartbeat on one node
osd.12 log file with debug_ms=5 yite gu
08:49 AM Bug #58467: osd: Only have one osd daemon no reply heartbeat on one node
This problem happed again, but terrible osd is 12 in this time. Other osd report heartbeat no reply as below
osd.15
...
yite gu
02:58 AM Bug #58467: osd: Only have one osd daemon no reply heartbeat on one node
Radoslaw Zarzynski wrote:
> This is what struck me at first glance:
>
> [...]
>
> So @osd.9@ is seeing slow op...
yite gu
05:41 AM Bug #58379 (Fix Under Review): no active mgr after ~1 hour
Nitzan Mordechai
03:09 AM Bug #58370: OSD crash
Radoslaw Zarzynski wrote:
> OK, then it's susceptible to the nonce issue. Would a @debug_ms=5@ log.
Ok, but But I'm...
yite gu

01/18/2023

09:16 PM Bug #58496: osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.empty())
Didn't mean to change those fields. Laura Flores
09:15 PM Bug #58496: osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.empty())
... Laura Flores
07:53 PM Bug #58496: osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.empty())
... Neha Ojha
07:09 PM Bug #58496 (Pending Backport): osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.emp...
/a/yuriw-2023-01-12_20:11:41-rados-main-distro-default-smithi/7138659... Laura Flores
07:34 PM Bug #58370: OSD crash
OK, then it's susceptible to the nonce issue. Would a @debug_ms=5@ log. Radoslaw Zarzynski
07:32 PM Bug #58467 (Need More Info): osd: Only have one osd daemon no reply heartbeat on one node
Radoslaw Zarzynski
07:32 PM Bug #58467: osd: Only have one osd daemon no reply heartbeat on one node
This is what struck me at first glance:... Radoslaw Zarzynski
07:19 PM Bug #50637: OSD slow ops warning stuck after OSD fail
> Prashant, would you mind taking a look at time?
Sure Radoslaw. I will have a look at this.
Radoslaw Zarzynski
07:17 PM Bug #50637: OSD slow ops warning stuck after OSD fail
I think the problem is that we lack a machinery for cleaning the slow-ops status when a monitor marks on OSD down. Radoslaw Zarzynski
07:03 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
bump up Radoslaw Zarzynski
07:01 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
bump up Radoslaw Zarzynski
01:00 PM Bug #56028: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
This looks like tier cache issue, it is causing the version to be incorrect Nitzan Mordechai
09:11 AM Bug #45615 (Fix Under Review): api_watch_notify_pp: LibRadosWatchNotifyPPTests/LibRadosWatchNotif...
Nitzan Mordechai
08:10 AM Bug #44400 (Fix Under Review): Marking OSD out causes primary-affinity 0 to be ignored when up_se...
Nitzan Mordechai

01/17/2023

10:33 PM Bug #57632 (Closed): test_envlibrados_for_rocksdb: free(): invalid pointer
I'm going to "close" this since my PR was more of a workaround rather than a true solution. Laura Flores
05:30 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
Bumping this up, since it's still occurring in main:
/a/yuriw-2023-01-12_20:11:41-rados-main-distro-default-smithi...
Laura Flores
11:23 AM Bug #44400 (In Progress): Marking OSD out causes primary-affinity 0 to be ignored when up_set has...
Our function OSDMap::_apply_primary_affinity will set osd as primary even if it is set to primary affinity 0, we are ... Nitzan Mordechai
09:22 AM Documentation #58469: "ceph config set mgr" command -- how to set it in ceph.conf
<bl___> zdover, I don't know if this is about the same config: https://docs.ceph.com/en/quincy/dev/config-key/ I've s... Zac Dover
09:21 AM Documentation #58469: "ceph config set mgr" command -- how to set it in ceph.conf
<zdover> bl___, your question about how to set options in ceph.conf that can be set with "ceph config set mgr" comman... Zac Dover

01/16/2023

10:53 AM Documentation #58469 (In Progress): "ceph config set mgr" command -- how to set it in ceph.conf
<bl___> confusing. if I have configuration command like `ceph config set mgr mgr/cephadm/daemon_cache_timeout` how co... Zac Dover
10:46 AM Documentation #58354 (Resolved): doc/ceph-volume/lvm/encryption.rst is inaccurate -- LUKS version...
Zac Dover
10:26 AM Documentation #58468: cephadm installation guide -- refine and correct
root@RX570:~# ceph health detail
HEALTH_WARN failed to probe daemons or devices; OSD count 0 < osd_pool_default_size...
Zac Dover
10:26 AM Documentation #58468: cephadm installation guide -- refine and correct
root@RX570:~# ceph orch daemon add osd RX570:/dev/sdl
Error EINVAL: Traceback (most recent call last):
File "/usr...
Zac Dover
10:26 AM Documentation #58468: cephadm installation guide -- refine and correct
Ubuntu Jammy | Purged all docker/ceph packages and files from system. Starting from scratch.
Following: https://do...
Zac Dover
10:25 AM Documentation #58468 (New): cephadm installation guide -- refine and correct
<trevorksmith> zdover, I am following these instructions. - https://docs.ceph.com/en/quincy/cephadm/install/ These a... Zac Dover
09:07 AM Bug #50637: OSD slow ops warning stuck after OSD fail
We just observed this exact behavior with a dying server and its OSDs down:... Christian Rohmann
08:26 AM Bug #58467 (Closed): osd: Only have one osd daemon no reply heartbeat on one node
osd.9 log file:... yite gu

01/15/2023

09:54 PM Documentation #58462 (New): Installation Documentation - indicate which strings are specified by ...
<IcePic> Also, if we can wake up zdover, it would be nice if the installation docs could have a different color or so... Zac Dover
09:52 PM Documentation #58354 (Fix Under Review): doc/ceph-volume/lvm/encryption.rst is inaccurate -- LUKS...
doc/ceph-volume/lvm/encryption.lvm is currently written informally. At some future time, the English in that file sho... Zac Dover

01/14/2023

08:27 AM Bug #58461 (Fix Under Review): osd/scrub: replica-response timeout is handled without locking the PG
Ronen Friedman
08:25 AM Bug #58461 (Fix Under Review): osd/scrub: replica-response timeout is handled without locking the PG
In ReplicaReservations::no_reply_t, a callback calls handle_no_reply_timeout()
without first locking the PG.
Intr...
Ronen Friedman

01/13/2023

09:41 AM Bug #58370: OSD crash
Radoslaw Zarzynski wrote:
> PG the was 2.50:
>
> [...]
>
> The PG was the @Deleting@ substate:
>
> [...]
>...
yite gu

01/12/2023

08:48 PM Bug #58436 (Fix Under Review): ceph cluster log reporting log level in numeric format for the clo...
Prashant D
08:43 PM Bug #58436 (Fix Under Review): ceph cluster log reporting log level in numeric format for the clo...
The cluster log now reporting log level in integer value compared to human readable log level e.g DBG, INF etc
16735...
Prashant D
01:20 PM Bug #51194: PG recovery_unfound after scrub repair failed on primary
We had another occurrence of this on Pacific v16.2.9 Enrico Bocchi
11:28 AM Backport #58040 (In Progress): quincy: osd: add created_at and ceph_version_when_created metadata
Igor Fedotov

01/10/2023

02:12 PM Bug #58410: Set single compression algorithm as a default value in ms_osd_compression_algorithm i...
BZ link: https://bugzilla.redhat.com/show_bug.cgi?id=2155380 Shreyansh Sancheti
02:10 PM Bug #58410 (Pending Backport): Set single compression algorithm as a default value in ms_osd_comp...
Description of problem:
The default value for the compression parameter "ms_osd_compression_algorithm" is assigned t...
Shreyansh Sancheti
01:38 AM Bug #57977: osd:tick checking mon for new map
Radoslaw Zarzynski wrote:
> Per the comment #11 I'm redirecting Prashant's questions from comment #9 to the reporter...
yite gu
12:50 AM Documentation #58401 (Resolved): cephadm's "Replacing an OSD" instructions work better than RADOS...
Zac Dover

01/09/2023

06:50 PM Bug #58370: OSD crash
PG the was 2.50:... Radoslaw Zarzynski
06:36 PM Bug #57852 (In Progress): osd: unhealthy osd cannot be marked down in time
Radoslaw Zarzynski
06:35 PM Bug #57977: osd:tick checking mon for new map
Per the comment #11 I'm redirecting Prashant's questions from comment #9 to the reporter.
@yite gu: is the deploym...
Radoslaw Zarzynski
02:24 PM Bug #57977: osd:tick checking mon for new map
@Prashant, I was thinking about this further. Although it is a containerized env, hostpid=true so the PIDs should be ... Joshua Baergen
06:29 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
Label assigned but blocked due to the lab issue. Bump up. Radoslaw Zarzynski
06:27 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Still blocked due to the lab issue. Bump up. Radoslaw Zarzynski
06:12 PM Documentation #58401: cephadm's "Replacing an OSD" instructions work better than RADOS's "Replaci...
https://github.com/ceph/ceph/pull/49677 Zac Dover
05:43 PM Documentation #58401 (Resolved): cephadm's "Replacing an OSD" instructions work better than RADOS...
<Infinoid> For posterity, https://docs.ceph.com/en/quincy/cephadm/services/osd/#replacing-an-osd seems to be working ... Zac Dover
02:40 PM Bug #58379 (In Progress): no active mgr after ~1 hour
Nitzan Mordechai

01/06/2023

11:06 PM Bug #44400: Marking OSD out causes primary-affinity 0 to be ignored when up_set has no common OSD...
Just confirming this is still present in pacific:... Dan van der Ster
05:55 PM Documentation #58374 (Resolved): crushtool flags remain undocumented in the crushtool manpage
Zac Dover
05:55 PM Documentation #58374: crushtool flags remain undocumented in the crushtool manpage
https://github.com/ceph/ceph/pull/49653 Zac Dover
05:37 PM Bug #57977: osd:tick checking mon for new map
@Prashant - thanks! Yes, this is containerized, so that's certainly possible in our case. Joshua Baergen
03:20 AM Bug #57977: osd:tick checking mon for new map
Radoslaw Zarzynski wrote:
> The issue during the upgrade looks awfully similar to a downstream Prashant has working ...
Prashant D
03:27 AM Bug #57852: osd: unhealthy osd cannot be marked down in time
Sure Radek. Let me have a look at this. Prashant D
01:50 AM Bug #58370: OSD crash
Radoslaw Zarzynski wrote:
> Is there the related log available by any chance?
yite gu

01/05/2023

04:00 PM Feature #58389 (New): CRUSH algorithm should support 1 copy on SSD/NVME and 2 copies on HDD (and ...
Brad Fitzpatrick makes the following request to Zac Dover in private correspondence on 05 Jan 2023:
"I'm kinda dis...
Zac Dover

01/04/2023

08:02 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
This has been added to a testing batch. The holdup is that main builds are failing from a dependency. See https://tra... Laura Flores
06:52 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
bumping this up (fix waits for testing) Radoslaw Zarzynski
07:34 PM Bug #58355 (Need More Info): OSD: segmentation fault in tc_newarray
A coredump would be really useful here.
BTW: it's better to paste the backtraces as plain text to let people searc...
Radoslaw Zarzynski
07:31 PM Bug #58356 (Need More Info): osd:segmentation fault in tcmalloc's ReleaseToCentralCache
Thanks for the report. Do you still the coredump maybe? Radoslaw Zarzynski
07:27 PM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
... Radoslaw Zarzynski
07:18 PM Bug #58370 (Need More Info): OSD crash
Is there the related log available by any chance? Radoslaw Zarzynski
07:17 PM Bug #58052: Empty Pool (zero objects) shows usage.
Downloading manually. Neha is testing ceph-post-file. Radoslaw Zarzynski
06:57 PM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
bumping up (fix awaits qa) Radoslaw Zarzynski
06:54 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
bumping up (fix awaits qa) Radoslaw Zarzynski
06:52 PM Backport #58381 (Resolved): quincy: mon:stretch-cluster: mishandled removed_ranks -> inconsistent...
Backport Bot
06:51 PM Backport #58380 (Resolved): pacific: mon:stretch-cluster: mishandled removed_ranks -> inconsisten...
Backport Bot
06:51 PM Bug #58049 (Pending Backport): mon:stretch-cluster: mishandled removed_ranks -> inconsistent peer...
Radoslaw Zarzynski
06:29 PM Bug #58379: no active mgr after ~1 hour
When the message :... Nitzan Mordechai
06:21 PM Bug #58379 (Resolved): no active mgr after ~1 hour
After checking the BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2106031
i was able to recreate the issue on main ...
Nitzan Mordechai

01/03/2023

09:12 AM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
Justin Mammarella wrote:
> We are seeing this bug in Nautilus 14.2.15 to 14.2.22 replicated pool.
>
> Two of our...
hoan nv
08:43 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
Sergii Kuzko wrote:
> Hi
> Can you update the bug status
> Or transfer to the group of the current version 17.2.6
Sergii Kuzko
08:41 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
Hi
Can you update the bug status
Or transfer to the group of the current version 17.2.5
Sergii Kuzko
06:01 AM Documentation #58374 (Resolved): crushtool flags remain undocumented in the crushtool manpage
>2023-01-01: brad@danga.com: https://docs.ceph.com/en/quincy/man/8/crushtool/ seems out of date. I'm running the quin... Zac Dover

12/31/2022

08:56 AM Bug #58052: Empty Pool (zero objects) shows usage.
Any thoughts on this? Brian Woods

12/29/2022

08:55 AM Bug #58370 (Need More Info): OSD crash
... yite gu

12/28/2022

11:26 AM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
/a/ksirivad-2022-12-22_17:58:01-rados-wip-ksirivad-testing-pacific-distro-default-smithi/7125137/ Nitzan Mordechai
11:26 AM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
Kamoltat (Junior) Sirivadhna wrote:
> /a/ksirivad-2022-12-22_17:58:01-rados-wip-ksirivad-testing-pacific-distro-defa...
Nitzan Mordechai
 

Also available in: Atom