Project

General

Profile

Activity

From 12/21/2022 to 01/19/2023

01/19/2023

11:04 AM Bug #58505 (Need More Info): Wrong calculate free space OSD and PG used bytes
I added a new node with OSD to the cluster. Now I'm adding several disks each. After a short balancing time , the fol... Andrey Groshev
09:06 AM Bug #58467: osd: Only have one osd daemon no reply heartbeat on one node
It is recommended to adjust the upload file size limit to 10M :) yite gu
09:04 AM Bug #58467: osd: Only have one osd daemon no reply heartbeat on one node
osd.12 log file with debug_ms=5 yite gu
08:49 AM Bug #58467: osd: Only have one osd daemon no reply heartbeat on one node
This problem happed again, but terrible osd is 12 in this time. Other osd report heartbeat no reply as below
osd.15
...
yite gu
02:58 AM Bug #58467: osd: Only have one osd daemon no reply heartbeat on one node
Radoslaw Zarzynski wrote:
> This is what struck me at first glance:
>
> [...]
>
> So @osd.9@ is seeing slow op...
yite gu
05:41 AM Bug #58379 (Fix Under Review): no active mgr after ~1 hour
Nitzan Mordechai
03:09 AM Bug #58370: OSD crash
Radoslaw Zarzynski wrote:
> OK, then it's susceptible to the nonce issue. Would a @debug_ms=5@ log.
Ok, but But I'm...
yite gu

01/18/2023

09:16 PM Bug #58496: osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.empty())
Didn't mean to change those fields. Laura Flores
09:15 PM Bug #58496: osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.empty())
... Laura Flores
07:53 PM Bug #58496: osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.empty())
... Neha Ojha
07:09 PM Bug #58496 (Pending Backport): osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.emp...
/a/yuriw-2023-01-12_20:11:41-rados-main-distro-default-smithi/7138659... Laura Flores
07:34 PM Bug #58370: OSD crash
OK, then it's susceptible to the nonce issue. Would a @debug_ms=5@ log. Radoslaw Zarzynski
07:32 PM Bug #58467 (Need More Info): osd: Only have one osd daemon no reply heartbeat on one node
Radoslaw Zarzynski
07:32 PM Bug #58467: osd: Only have one osd daemon no reply heartbeat on one node
This is what struck me at first glance:... Radoslaw Zarzynski
07:19 PM Bug #50637: OSD slow ops warning stuck after OSD fail
> Prashant, would you mind taking a look at time?
Sure Radoslaw. I will have a look at this.
Radoslaw Zarzynski
07:17 PM Bug #50637: OSD slow ops warning stuck after OSD fail
I think the problem is that we lack a machinery for cleaning the slow-ops status when a monitor marks on OSD down. Radoslaw Zarzynski
07:03 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
bump up Radoslaw Zarzynski
07:01 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
bump up Radoslaw Zarzynski
01:00 PM Bug #56028: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
This looks like tier cache issue, it is causing the version to be incorrect Nitzan Mordechai
09:11 AM Bug #45615 (Fix Under Review): api_watch_notify_pp: LibRadosWatchNotifyPPTests/LibRadosWatchNotif...
Nitzan Mordechai
08:10 AM Bug #44400 (Fix Under Review): Marking OSD out causes primary-affinity 0 to be ignored when up_se...
Nitzan Mordechai

01/17/2023

10:33 PM Bug #57632 (Closed): test_envlibrados_for_rocksdb: free(): invalid pointer
I'm going to "close" this since my PR was more of a workaround rather than a true solution. Laura Flores
05:30 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
Bumping this up, since it's still occurring in main:
/a/yuriw-2023-01-12_20:11:41-rados-main-distro-default-smithi...
Laura Flores
11:23 AM Bug #44400 (In Progress): Marking OSD out causes primary-affinity 0 to be ignored when up_set has...
Our function OSDMap::_apply_primary_affinity will set osd as primary even if it is set to primary affinity 0, we are ... Nitzan Mordechai
09:22 AM Documentation #58469: "ceph config set mgr" command -- how to set it in ceph.conf
<bl___> zdover, I don't know if this is about the same config: https://docs.ceph.com/en/quincy/dev/config-key/ I've s... Zac Dover
09:21 AM Documentation #58469: "ceph config set mgr" command -- how to set it in ceph.conf
<zdover> bl___, your question about how to set options in ceph.conf that can be set with "ceph config set mgr" comman... Zac Dover

01/16/2023

10:53 AM Documentation #58469 (In Progress): "ceph config set mgr" command -- how to set it in ceph.conf
<bl___> confusing. if I have configuration command like `ceph config set mgr mgr/cephadm/daemon_cache_timeout` how co... Zac Dover
10:46 AM Documentation #58354 (Resolved): doc/ceph-volume/lvm/encryption.rst is inaccurate -- LUKS version...
Zac Dover
10:26 AM Documentation #58468: cephadm installation guide -- refine and correct
root@RX570:~# ceph health detail
HEALTH_WARN failed to probe daemons or devices; OSD count 0 < osd_pool_default_size...
Zac Dover
10:26 AM Documentation #58468: cephadm installation guide -- refine and correct
root@RX570:~# ceph orch daemon add osd RX570:/dev/sdl
Error EINVAL: Traceback (most recent call last):
File "/usr...
Zac Dover
10:26 AM Documentation #58468: cephadm installation guide -- refine and correct
Ubuntu Jammy | Purged all docker/ceph packages and files from system. Starting from scratch.
Following: https://do...
Zac Dover
10:25 AM Documentation #58468 (New): cephadm installation guide -- refine and correct
<trevorksmith> zdover, I am following these instructions. - https://docs.ceph.com/en/quincy/cephadm/install/ These a... Zac Dover
09:07 AM Bug #50637: OSD slow ops warning stuck after OSD fail
We just observed this exact behavior with a dying server and its OSDs down:... Christian Rohmann
08:26 AM Bug #58467 (Closed): osd: Only have one osd daemon no reply heartbeat on one node
osd.9 log file:... yite gu

01/15/2023

09:54 PM Documentation #58462 (New): Installation Documentation - indicate which strings are specified by ...
<IcePic> Also, if we can wake up zdover, it would be nice if the installation docs could have a different color or so... Zac Dover
09:52 PM Documentation #58354 (Fix Under Review): doc/ceph-volume/lvm/encryption.rst is inaccurate -- LUKS...
doc/ceph-volume/lvm/encryption.lvm is currently written informally. At some future time, the English in that file sho... Zac Dover

01/14/2023

08:27 AM Bug #58461 (Fix Under Review): osd/scrub: replica-response timeout is handled without locking the PG
Ronen Friedman
08:25 AM Bug #58461 (Fix Under Review): osd/scrub: replica-response timeout is handled without locking the PG
In ReplicaReservations::no_reply_t, a callback calls handle_no_reply_timeout()
without first locking the PG.
Intr...
Ronen Friedman

01/13/2023

09:41 AM Bug #58370: OSD crash
Radoslaw Zarzynski wrote:
> PG the was 2.50:
>
> [...]
>
> The PG was the @Deleting@ substate:
>
> [...]
>...
yite gu

01/12/2023

08:48 PM Bug #58436 (Fix Under Review): ceph cluster log reporting log level in numeric format for the clo...
Prashant D
08:43 PM Bug #58436 (Fix Under Review): ceph cluster log reporting log level in numeric format for the clo...
The cluster log now reporting log level in integer value compared to human readable log level e.g DBG, INF etc
16735...
Prashant D
01:20 PM Bug #51194: PG recovery_unfound after scrub repair failed on primary
We had another occurrence of this on Pacific v16.2.9 Enrico Bocchi
11:28 AM Backport #58040 (In Progress): quincy: osd: add created_at and ceph_version_when_created metadata
Igor Fedotov

01/10/2023

02:12 PM Bug #58410: Set single compression algorithm as a default value in ms_osd_compression_algorithm i...
BZ link: https://bugzilla.redhat.com/show_bug.cgi?id=2155380 Shreyansh Sancheti
02:10 PM Bug #58410 (Pending Backport): Set single compression algorithm as a default value in ms_osd_comp...
Description of problem:
The default value for the compression parameter "ms_osd_compression_algorithm" is assigned t...
Shreyansh Sancheti
01:38 AM Bug #57977: osd:tick checking mon for new map
Radoslaw Zarzynski wrote:
> Per the comment #11 I'm redirecting Prashant's questions from comment #9 to the reporter...
yite gu
12:50 AM Documentation #58401 (Resolved): cephadm's "Replacing an OSD" instructions work better than RADOS...
Zac Dover

01/09/2023

06:50 PM Bug #58370: OSD crash
PG the was 2.50:... Radoslaw Zarzynski
06:36 PM Bug #57852 (In Progress): osd: unhealthy osd cannot be marked down in time
Radoslaw Zarzynski
06:35 PM Bug #57977: osd:tick checking mon for new map
Per the comment #11 I'm redirecting Prashant's questions from comment #9 to the reporter.
@yite gu: is the deploym...
Radoslaw Zarzynski
02:24 PM Bug #57977: osd:tick checking mon for new map
@Prashant, I was thinking about this further. Although it is a containerized env, hostpid=true so the PIDs should be ... Joshua Baergen
06:29 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
Label assigned but blocked due to the lab issue. Bump up. Radoslaw Zarzynski
06:27 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
Still blocked due to the lab issue. Bump up. Radoslaw Zarzynski
06:12 PM Documentation #58401: cephadm's "Replacing an OSD" instructions work better than RADOS's "Replaci...
https://github.com/ceph/ceph/pull/49677 Zac Dover
05:43 PM Documentation #58401 (Resolved): cephadm's "Replacing an OSD" instructions work better than RADOS...
<Infinoid> For posterity, https://docs.ceph.com/en/quincy/cephadm/services/osd/#replacing-an-osd seems to be working ... Zac Dover
02:40 PM Bug #58379 (In Progress): no active mgr after ~1 hour
Nitzan Mordechai

01/06/2023

11:06 PM Bug #44400: Marking OSD out causes primary-affinity 0 to be ignored when up_set has no common OSD...
Just confirming this is still present in pacific:... Dan van der Ster
05:55 PM Documentation #58374 (Resolved): crushtool flags remain undocumented in the crushtool manpage
Zac Dover
05:55 PM Documentation #58374: crushtool flags remain undocumented in the crushtool manpage
https://github.com/ceph/ceph/pull/49653 Zac Dover
05:37 PM Bug #57977: osd:tick checking mon for new map
@Prashant - thanks! Yes, this is containerized, so that's certainly possible in our case. Joshua Baergen
03:20 AM Bug #57977: osd:tick checking mon for new map
Radoslaw Zarzynski wrote:
> The issue during the upgrade looks awfully similar to a downstream Prashant has working ...
Prashant D
03:27 AM Bug #57852: osd: unhealthy osd cannot be marked down in time
Sure Radek. Let me have a look at this. Prashant D
01:50 AM Bug #58370: OSD crash
Radoslaw Zarzynski wrote:
> Is there the related log available by any chance?
yite gu

01/05/2023

04:00 PM Feature #58389 (New): CRUSH algorithm should support 1 copy on SSD/NVME and 2 copies on HDD (and ...
Brad Fitzpatrick makes the following request to Zac Dover in private correspondence on 05 Jan 2023:
"I'm kinda dis...
Zac Dover

01/04/2023

08:02 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
This has been added to a testing batch. The holdup is that main builds are failing from a dependency. See https://tra... Laura Flores
06:52 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
bumping this up (fix waits for testing) Radoslaw Zarzynski
07:34 PM Bug #58355 (Need More Info): OSD: segmentation fault in tc_newarray
A coredump would be really useful here.
BTW: it's better to paste the backtraces as plain text to let people searc...
Radoslaw Zarzynski
07:31 PM Bug #58356 (Need More Info): osd:segmentation fault in tcmalloc's ReleaseToCentralCache
Thanks for the report. Do you still the coredump maybe? Radoslaw Zarzynski
07:27 PM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
... Radoslaw Zarzynski
07:18 PM Bug #58370 (Need More Info): OSD crash
Is there the related log available by any chance? Radoslaw Zarzynski
07:17 PM Bug #58052: Empty Pool (zero objects) shows usage.
Downloading manually. Neha is testing ceph-post-file. Radoslaw Zarzynski
06:57 PM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
bumping up (fix awaits qa) Radoslaw Zarzynski
06:54 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
bumping up (fix awaits qa) Radoslaw Zarzynski
06:52 PM Backport #58381 (Resolved): quincy: mon:stretch-cluster: mishandled removed_ranks -> inconsistent...
Backport Bot
06:51 PM Backport #58380 (Resolved): pacific: mon:stretch-cluster: mishandled removed_ranks -> inconsisten...
Backport Bot
06:51 PM Bug #58049 (Pending Backport): mon:stretch-cluster: mishandled removed_ranks -> inconsistent peer...
Radoslaw Zarzynski
06:29 PM Bug #58379: no active mgr after ~1 hour
When the message :... Nitzan Mordechai
06:21 PM Bug #58379 (Pending Backport): no active mgr after ~1 hour
After checking the BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2106031
i was able to recreate the issue on main ...
Nitzan Mordechai

01/03/2023

09:12 AM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
Justin Mammarella wrote:
> We are seeing this bug in Nautilus 14.2.15 to 14.2.22 replicated pool.
>
> Two of our...
hoan nv
08:43 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
Sergii Kuzko wrote:
> Hi
> Can you update the bug status
> Or transfer to the group of the current version 17.2.6
Sergii Kuzko
08:41 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
Hi
Can you update the bug status
Or transfer to the group of the current version 17.2.5
Sergii Kuzko
06:01 AM Documentation #58374 (Resolved): crushtool flags remain undocumented in the crushtool manpage
>2023-01-01: brad@danga.com: https://docs.ceph.com/en/quincy/man/8/crushtool/ seems out of date. I'm running the quin... Zac Dover

12/31/2022

08:56 AM Bug #58052: Empty Pool (zero objects) shows usage.
Any thoughts on this? Brian Woods

12/29/2022

08:55 AM Bug #58370 (Need More Info): OSD crash
... yite gu

12/28/2022

11:26 AM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
/a/ksirivad-2022-12-22_17:58:01-rados-wip-ksirivad-testing-pacific-distro-default-smithi/7125137/ Nitzan Mordechai
11:26 AM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
Kamoltat (Junior) Sirivadhna wrote:
> /a/ksirivad-2022-12-22_17:58:01-rados-wip-ksirivad-testing-pacific-distro-defa...
Nitzan Mordechai

12/26/2022

07:46 AM Bug #58305: src/mon/AuthMonitor.cc: FAILED ceph_assert(version > keys_ver)
Radoslaw Zarzynski wrote:
> Thanks for the log file! Would you be able to try to replicate with higher debugs levels...
yite gu
04:54 AM Bug #58356 (Need More Info): osd:segmentation fault in tcmalloc's ReleaseToCentralCache
osd crash. Program terminated with signal SIGSEGV, Segmentation fault Detailed stack information is in the attachme... 王子敬 wang
04:52 AM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
/a/ksirivad-2022-12-22_17:58:01-rados-wip-ksirivad-testing-pacific-distro-default-smithi/7125137/ Kamoltat (Junior) Sirivadhna
03:50 AM Bug #58355 (Need More Info): OSD: segmentation fault in tc_newarray
The osd core appears when cosbench uploads data. Detailed stack information is in the attachment, in Bluestore::_write. 王子敬 wang

12/24/2022

10:10 AM Documentation #58354 (Resolved): doc/ceph-volume/lvm/encryption.rst is inaccurate -- LUKS version...
Stefan Kooman's email of 20 Dec 2022 to dev@ceph.io, bearing the subject line "ceph-volume questions / enhancements",... Zac Dover

12/21/2022

10:16 PM Bug #57546 (Fix Under Review): rados/thrash-erasure-code: wait_for_recovery timeout due to "activ...
Radoslaw Zarzynski
06:57 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
I guess in main we should revert the opposite commit as both are there. Radoslaw Zarzynski
09:27 PM Backport #58117: quincy: qa/workunits/rados/test_librados_build.sh: specify redirect in curl command
I know the backport is in progress but dumping this here just for reference.
/a/ksirivad-2022-12-21_15:23:02-rados...
Kamoltat (Junior) Sirivadhna
08:09 PM Backport #58337 (In Progress): pacific: mon-stretched_cluster: degraded stretched mode lead to Mo...
Neha Ojha
08:06 PM Backport #58337 (In Progress): pacific: mon-stretched_cluster: degraded stretched mode lead to Mo...
Original backport https://github.com/ceph/ceph/pull/48803 was reverted in https://github.com/ceph/ceph/pull/49412 due... Neha Ojha
08:07 PM Backport #58338 (In Progress): quincy: mon-stretched_cluster: degraded stretched mode lead to Mon...
Neha Ojha
08:06 PM Backport #58338 (Resolved): quincy: mon-stretched_cluster: degraded stretched mode lead to Monito...
https://github.com/ceph/ceph/pull/48802 Neha Ojha
07:50 PM Bug #58052: Empty Pool (zero objects) shows usage.
Radoslaw Zarzynski wrote:
> Glad you've found it! Would mind uploading via the @ceph-post-file@ (https://docs.ceph.c...
Brian Woods
07:10 PM Bug #58052: Empty Pool (zero objects) shows usage.
Glad you've found it! Would mind uploading via the @ceph-post-file@ (https://docs.ceph.com/en/quincy/man/8/ceph-post-... Radoslaw Zarzynski
07:33 PM Bug #58155 (Fix Under Review): mon:ceph_assert(m < ranks.size()) `different code path than tracke...
Radoslaw Zarzynski
07:32 PM Bug #58305: src/mon/AuthMonitor.cc: FAILED ceph_assert(version > keys_ver)
Thanks for the log file! Would you be able to try to replicate with higher debugs levels?
Perhaps something like: ...
Radoslaw Zarzynski
07:26 PM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
Well, values around 600-900 kitems aren't looking very large to me. Definitely they are much, much smaller than anyth... Radoslaw Zarzynski
07:23 PM Backport #58336 (Resolved): pacific: qa/standalone/mon: --mon-initial-members setting causes us t...
Backport Bot
07:23 PM Backport #58335 (Resolved): quincy: qa/standalone/mon: --mon-initial-members setting causes us to...
Backport Bot
07:20 PM Bug #57937 (Rejected): pg autoscaler of rgw pools doesn't work after creating otp pool
Not a Ceph issue per the last comment. Radoslaw Zarzynski
07:19 PM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
Basing comment #14 we have a "fix candidate" that might also with issue.
If that's correct, we may wait for merging ...
Radoslaw Zarzynski
07:16 PM Bug #58132 (Pending Backport): qa/standalone/mon: --mon-initial-members setting causes us to popu...
Radoslaw Zarzynski
07:16 PM Backport #58334 (Resolved): quincy: mon/monclient: update "unable to obtain rotating service keys...
https://github.com/ceph/ceph/pull/50405 Backport Bot
07:16 PM Backport #58333 (Rejected): pacific: mon/monclient: update "unable to obtain rotating service key...
https://github.com/ceph/ceph/pull/54556 Backport Bot
07:14 PM Bug #17170 (Pending Backport): mon/monclient: update "unable to obtain rotating service keys when...
Radoslaw Zarzynski
07:13 PM Bug #48896: osd/OSDMap.cc: FAILED ceph_assert(osd_weight.count(i.first))
Low due to the low occurrence frequency. Radoslaw Zarzynski
06:58 PM Bug #58240 (Fix Under Review): osd/scrub: modifying osd_deep_scrub_stride while pg is doing deep ...
Radoslaw Zarzynski
06:51 PM Bug #58239 (Resolved): pacific: src/mon/Monitor.cc: FAILED ceph_assert(osdmon()->is_writeable())
Radoslaw Zarzynski
06:50 PM Bug #57017: mon-stretched_cluster: degraded stretched mode lead to Monitor crash
The pacific backport got reverted in https://github.com/ceph/ceph/pull/49412. Radoslaw Zarzynski
06:44 PM Bug #51729 (In Progress): Upmap verification fails for multi-level crush rule
Radoslaw Zarzynski
03:25 PM Bug #57105: quincy: ceph osd pool set <pool> size math error
This was fixed in main https://github.com/ceph/ceph/pull/44430 but was not backported to Q.
Instead of backporting t...
Matan Breizman
10:50 AM Bug #57105 (Fix Under Review): quincy: ceph osd pool set <pool> size math error
This PR is proposed after a BZ was reporting the same issue.
Matan Breizman
11:22 AM Bug #58288: quincy: mon: pg_num_check() according to crush rule
After the revert is merged (https://github.com/ceph/ceph/pull/49465),
pg_num_check() will return to not taking the c...
Matan Breizman
10:53 AM Bug #54188: Setting too many PGs leads error handling overflow
Setting this tracker a duplicate. Seems like the same issue, and 57105 proposed PR should address this one as well. Matan Breizman
 

Also available in: Atom