Activity
From 12/21/2022 to 01/19/2023
01/19/2023
- 11:04 AM Bug #58505 (Need More Info): Wrong calculate free space OSD and PG used bytes
- I added a new node with OSD to the cluster. Now I'm adding several disks each. After a short balancing time , the fol...
- 09:06 AM Bug #58467: osd: Only have one osd daemon no reply heartbeat on one node
- It is recommended to adjust the upload file size limit to 10M :)
- 09:04 AM Bug #58467: osd: Only have one osd daemon no reply heartbeat on one node
- osd.12 log file with debug_ms=5
- 08:49 AM Bug #58467: osd: Only have one osd daemon no reply heartbeat on one node
- This problem happed again, but terrible osd is 12 in this time. Other osd report heartbeat no reply as below
osd.15
... - 02:58 AM Bug #58467: osd: Only have one osd daemon no reply heartbeat on one node
- Radoslaw Zarzynski wrote:
> This is what struck me at first glance:
>
> [...]
>
> So @osd.9@ is seeing slow op... - 05:41 AM Bug #58379 (Fix Under Review): no active mgr after ~1 hour
- 03:09 AM Bug #58370: OSD crash
- Radoslaw Zarzynski wrote:
> OK, then it's susceptible to the nonce issue. Would a @debug_ms=5@ log.
Ok, but But I'm...
01/18/2023
- 09:16 PM Bug #58496: osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.empty())
- Didn't mean to change those fields.
- 09:15 PM Bug #58496: osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.empty())
- ...
- 07:53 PM Bug #58496: osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.empty())
- ...
- 07:09 PM Bug #58496 (Pending Backport): osd/PeeringState: FAILED ceph_assert(!acting_recovery_backfill.emp...
- /a/yuriw-2023-01-12_20:11:41-rados-main-distro-default-smithi/7138659...
- 07:34 PM Bug #58370: OSD crash
- OK, then it's susceptible to the nonce issue. Would a @debug_ms=5@ log.
- 07:32 PM Bug #58467 (Need More Info): osd: Only have one osd daemon no reply heartbeat on one node
- 07:32 PM Bug #58467: osd: Only have one osd daemon no reply heartbeat on one node
- This is what struck me at first glance:...
- 07:19 PM Bug #50637: OSD slow ops warning stuck after OSD fail
- > Prashant, would you mind taking a look at time?
Sure Radoslaw. I will have a look at this. - 07:17 PM Bug #50637: OSD slow ops warning stuck after OSD fail
- I think the problem is that we lack a machinery for cleaning the slow-ops status when a monitor marks on OSD down.
- 07:03 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
- bump up
- 07:01 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- bump up
- 01:00 PM Bug #56028: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
- This looks like tier cache issue, it is causing the version to be incorrect
- 09:11 AM Bug #45615 (Fix Under Review): api_watch_notify_pp: LibRadosWatchNotifyPPTests/LibRadosWatchNotif...
- 08:10 AM Bug #44400 (Fix Under Review): Marking OSD out causes primary-affinity 0 to be ignored when up_se...
01/17/2023
- 10:33 PM Bug #57632 (Closed): test_envlibrados_for_rocksdb: free(): invalid pointer
- I'm going to "close" this since my PR was more of a workaround rather than a true solution.
- 05:30 PM Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted
- Bumping this up, since it's still occurring in main:
/a/yuriw-2023-01-12_20:11:41-rados-main-distro-default-smithi... - 11:23 AM Bug #44400 (In Progress): Marking OSD out causes primary-affinity 0 to be ignored when up_set has...
- Our function OSDMap::_apply_primary_affinity will set osd as primary even if it is set to primary affinity 0, we are ...
- 09:22 AM Documentation #58469: "ceph config set mgr" command -- how to set it in ceph.conf
- <bl___> zdover, I don't know if this is about the same config: https://docs.ceph.com/en/quincy/dev/config-key/ I've s...
- 09:21 AM Documentation #58469: "ceph config set mgr" command -- how to set it in ceph.conf
- <zdover> bl___, your question about how to set options in ceph.conf that can be set with "ceph config set mgr" comman...
01/16/2023
- 10:53 AM Documentation #58469 (In Progress): "ceph config set mgr" command -- how to set it in ceph.conf
- <bl___> confusing. if I have configuration command like `ceph config set mgr mgr/cephadm/daemon_cache_timeout` how co...
- 10:46 AM Documentation #58354 (Resolved): doc/ceph-volume/lvm/encryption.rst is inaccurate -- LUKS version...
- 10:26 AM Documentation #58468: cephadm installation guide -- refine and correct
- root@RX570:~# ceph health detail
HEALTH_WARN failed to probe daemons or devices; OSD count 0 < osd_pool_default_size... - 10:26 AM Documentation #58468: cephadm installation guide -- refine and correct
- root@RX570:~# ceph orch daemon add osd RX570:/dev/sdl
Error EINVAL: Traceback (most recent call last):
File "/usr... - 10:26 AM Documentation #58468: cephadm installation guide -- refine and correct
- Ubuntu Jammy | Purged all docker/ceph packages and files from system. Starting from scratch.
Following: https://do... - 10:25 AM Documentation #58468 (New): cephadm installation guide -- refine and correct
- <trevorksmith> zdover, I am following these instructions. - https://docs.ceph.com/en/quincy/cephadm/install/ These a...
- 09:07 AM Bug #50637: OSD slow ops warning stuck after OSD fail
- We just observed this exact behavior with a dying server and its OSDs down:...
- 08:26 AM Bug #58467 (Closed): osd: Only have one osd daemon no reply heartbeat on one node
- osd.9 log file:...
01/15/2023
- 09:54 PM Documentation #58462 (New): Installation Documentation - indicate which strings are specified by ...
- <IcePic> Also, if we can wake up zdover, it would be nice if the installation docs could have a different color or so...
- 09:52 PM Documentation #58354 (Fix Under Review): doc/ceph-volume/lvm/encryption.rst is inaccurate -- LUKS...
- doc/ceph-volume/lvm/encryption.lvm is currently written informally. At some future time, the English in that file sho...
01/14/2023
- 08:27 AM Bug #58461 (Fix Under Review): osd/scrub: replica-response timeout is handled without locking the PG
- 08:25 AM Bug #58461 (Fix Under Review): osd/scrub: replica-response timeout is handled without locking the PG
- In ReplicaReservations::no_reply_t, a callback calls handle_no_reply_timeout()
without first locking the PG.
Intr...
01/13/2023
- 09:41 AM Bug #58370: OSD crash
- Radoslaw Zarzynski wrote:
> PG the was 2.50:
>
> [...]
>
> The PG was the @Deleting@ substate:
>
> [...]
>...
01/12/2023
- 08:48 PM Bug #58436 (Fix Under Review): ceph cluster log reporting log level in numeric format for the clo...
- 08:43 PM Bug #58436 (Fix Under Review): ceph cluster log reporting log level in numeric format for the clo...
- The cluster log now reporting log level in integer value compared to human readable log level e.g DBG, INF etc
16735... - 01:20 PM Bug #51194: PG recovery_unfound after scrub repair failed on primary
- We had another occurrence of this on Pacific v16.2.9
- 11:28 AM Backport #58040 (In Progress): quincy: osd: add created_at and ceph_version_when_created metadata
01/10/2023
- 02:12 PM Bug #58410: Set single compression algorithm as a default value in ms_osd_compression_algorithm i...
- BZ link: https://bugzilla.redhat.com/show_bug.cgi?id=2155380
- 02:10 PM Bug #58410 (Pending Backport): Set single compression algorithm as a default value in ms_osd_comp...
- Description of problem:
The default value for the compression parameter "ms_osd_compression_algorithm" is assigned t... - 01:38 AM Bug #57977: osd:tick checking mon for new map
- Radoslaw Zarzynski wrote:
> Per the comment #11 I'm redirecting Prashant's questions from comment #9 to the reporter... - 12:50 AM Documentation #58401 (Resolved): cephadm's "Replacing an OSD" instructions work better than RADOS...
01/09/2023
- 06:50 PM Bug #58370: OSD crash
- PG the was 2.50:...
- 06:36 PM Bug #57852 (In Progress): osd: unhealthy osd cannot be marked down in time
- 06:35 PM Bug #57977: osd:tick checking mon for new map
- Per the comment #11 I'm redirecting Prashant's questions from comment #9 to the reporter.
@yite gu: is the deploym... - 02:24 PM Bug #57977: osd:tick checking mon for new map
- @Prashant, I was thinking about this further. Although it is a containerized env, hostpid=true so the PIDs should be ...
- 06:29 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- Label assigned but blocked due to the lab issue. Bump up.
- 06:27 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
- Still blocked due to the lab issue. Bump up.
- 06:12 PM Documentation #58401: cephadm's "Replacing an OSD" instructions work better than RADOS's "Replaci...
- https://github.com/ceph/ceph/pull/49677
- 05:43 PM Documentation #58401 (Resolved): cephadm's "Replacing an OSD" instructions work better than RADOS...
- <Infinoid> For posterity, https://docs.ceph.com/en/quincy/cephadm/services/osd/#replacing-an-osd seems to be working ...
- 02:40 PM Bug #58379 (In Progress): no active mgr after ~1 hour
01/06/2023
- 11:06 PM Bug #44400: Marking OSD out causes primary-affinity 0 to be ignored when up_set has no common OSD...
- Just confirming this is still present in pacific:...
- 05:55 PM Documentation #58374 (Resolved): crushtool flags remain undocumented in the crushtool manpage
- 05:55 PM Documentation #58374: crushtool flags remain undocumented in the crushtool manpage
- https://github.com/ceph/ceph/pull/49653
- 05:37 PM Bug #57977: osd:tick checking mon for new map
- @Prashant - thanks! Yes, this is containerized, so that's certainly possible in our case.
- 03:20 AM Bug #57977: osd:tick checking mon for new map
- Radoslaw Zarzynski wrote:
> The issue during the upgrade looks awfully similar to a downstream Prashant has working ... - 03:27 AM Bug #57852: osd: unhealthy osd cannot be marked down in time
- Sure Radek. Let me have a look at this.
- 01:50 AM Bug #58370: OSD crash
- Radoslaw Zarzynski wrote:
> Is there the related log available by any chance?
01/05/2023
- 04:00 PM Feature #58389 (New): CRUSH algorithm should support 1 copy on SSD/NVME and 2 copies on HDD (and ...
- Brad Fitzpatrick makes the following request to Zac Dover in private correspondence on 05 Jan 2023:
"I'm kinda dis...
01/04/2023
- 08:02 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
- This has been added to a testing batch. The holdup is that main builds are failing from a dependency. See https://tra...
- 06:52 PM Bug #56101: Gibba Cluster: 17.2.0 to 17.2.1 RC upgrade OSD crash in function safe_timer
- bumping this up (fix waits for testing)
- 07:34 PM Bug #58355 (Need More Info): OSD: segmentation fault in tc_newarray
- A coredump would be really useful here.
BTW: it's better to paste the backtraces as plain text to let people searc... - 07:31 PM Bug #58356 (Need More Info): osd:segmentation fault in tcmalloc's ReleaseToCentralCache
- Thanks for the report. Do you still the coredump maybe?
- 07:27 PM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
- ...
- 07:18 PM Bug #58370 (Need More Info): OSD crash
- Is there the related log available by any chance?
- 07:17 PM Bug #58052: Empty Pool (zero objects) shows usage.
- Downloading manually. Neha is testing ceph-post-file.
- 06:57 PM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
- bumping up (fix awaits qa)
- 06:54 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- bumping up (fix awaits qa)
- 06:52 PM Backport #58381 (Resolved): quincy: mon:stretch-cluster: mishandled removed_ranks -> inconsistent...
- 06:51 PM Backport #58380 (Resolved): pacific: mon:stretch-cluster: mishandled removed_ranks -> inconsisten...
- 06:51 PM Bug #58049 (Pending Backport): mon:stretch-cluster: mishandled removed_ranks -> inconsistent peer...
- 06:29 PM Bug #58379: no active mgr after ~1 hour
- When the message :...
- 06:21 PM Bug #58379 (Pending Backport): no active mgr after ~1 hour
- After checking the BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2106031
i was able to recreate the issue on main ...
01/03/2023
- 09:12 AM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
- Justin Mammarella wrote:
> We are seeing this bug in Nautilus 14.2.15 to 14.2.22 replicated pool.
>
> Two of our... - 08:43 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
- Sergii Kuzko wrote:
> Hi
> Can you update the bug status
> Or transfer to the group of the current version 17.2.6
- 08:41 AM Bug #57699: slow osd boot with valgrind (reached maximum tries (50) after waiting for 300 seconds)
- Hi
Can you update the bug status
Or transfer to the group of the current version 17.2.5 - 06:01 AM Documentation #58374 (Resolved): crushtool flags remain undocumented in the crushtool manpage
- >2023-01-01: brad@danga.com: https://docs.ceph.com/en/quincy/man/8/crushtool/ seems out of date. I'm running the quin...
12/31/2022
- 08:56 AM Bug #58052: Empty Pool (zero objects) shows usage.
- Any thoughts on this?
12/29/2022
- 08:55 AM Bug #58370 (Need More Info): OSD crash
- ...
12/28/2022
- 11:26 AM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
- /a/ksirivad-2022-12-22_17:58:01-rados-wip-ksirivad-testing-pacific-distro-default-smithi/7125137/
- 11:26 AM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
- Kamoltat (Junior) Sirivadhna wrote:
> /a/ksirivad-2022-12-22_17:58:01-rados-wip-ksirivad-testing-pacific-distro-defa...
12/26/2022
- 07:46 AM Bug #58305: src/mon/AuthMonitor.cc: FAILED ceph_assert(version > keys_ver)
- Radoslaw Zarzynski wrote:
> Thanks for the log file! Would you be able to try to replicate with higher debugs levels... - 04:54 AM Bug #58356 (Need More Info): osd:segmentation fault in tcmalloc's ReleaseToCentralCache
- osd crash. Program terminated with signal SIGSEGV, Segmentation fault Detailed stack information is in the attachme...
- 04:52 AM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
- /a/ksirivad-2022-12-22_17:58:01-rados-wip-ksirivad-testing-pacific-distro-default-smithi/7125137/
- 03:50 AM Bug #58355 (Need More Info): OSD: segmentation fault in tc_newarray
- The osd core appears when cosbench uploads data. Detailed stack information is in the attachment, in Bluestore::_write.
12/24/2022
- 10:10 AM Documentation #58354 (Resolved): doc/ceph-volume/lvm/encryption.rst is inaccurate -- LUKS version...
- Stefan Kooman's email of 20 Dec 2022 to dev@ceph.io, bearing the subject line "ceph-volume questions / enhancements",...
12/21/2022
- 10:16 PM Bug #57546 (Fix Under Review): rados/thrash-erasure-code: wait_for_recovery timeout due to "activ...
- 06:57 PM Bug #57546: rados/thrash-erasure-code: wait_for_recovery timeout due to "active+clean+remapped+la...
- I guess in main we should revert the opposite commit as both are there.
- 09:27 PM Backport #58117: quincy: qa/workunits/rados/test_librados_build.sh: specify redirect in curl command
- I know the backport is in progress but dumping this here just for reference.
/a/ksirivad-2022-12-21_15:23:02-rados... - 08:09 PM Backport #58337 (In Progress): pacific: mon-stretched_cluster: degraded stretched mode lead to Mo...
- 08:06 PM Backport #58337 (In Progress): pacific: mon-stretched_cluster: degraded stretched mode lead to Mo...
- Original backport https://github.com/ceph/ceph/pull/48803 was reverted in https://github.com/ceph/ceph/pull/49412 due...
- 08:07 PM Backport #58338 (In Progress): quincy: mon-stretched_cluster: degraded stretched mode lead to Mon...
- 08:06 PM Backport #58338 (Resolved): quincy: mon-stretched_cluster: degraded stretched mode lead to Monito...
- https://github.com/ceph/ceph/pull/48802
- 07:50 PM Bug #58052: Empty Pool (zero objects) shows usage.
- Radoslaw Zarzynski wrote:
> Glad you've found it! Would mind uploading via the @ceph-post-file@ (https://docs.ceph.c... - 07:10 PM Bug #58052: Empty Pool (zero objects) shows usage.
- Glad you've found it! Would mind uploading via the @ceph-post-file@ (https://docs.ceph.com/en/quincy/man/8/ceph-post-...
- 07:33 PM Bug #58155 (Fix Under Review): mon:ceph_assert(m < ranks.size()) `different code path than tracke...
- 07:32 PM Bug #58305: src/mon/AuthMonitor.cc: FAILED ceph_assert(version > keys_ver)
- Thanks for the log file! Would you be able to try to replicate with higher debugs levels?
Perhaps something like: ... - 07:26 PM Bug #58106: when a large number of error ops appear in the OSDs,pglog does not trim.
- Well, values around 600-900 kitems aren't looking very large to me. Definitely they are much, much smaller than anyth...
- 07:23 PM Backport #58336 (Resolved): pacific: qa/standalone/mon: --mon-initial-members setting causes us t...
- 07:23 PM Backport #58335 (Resolved): quincy: qa/standalone/mon: --mon-initial-members setting causes us to...
- 07:20 PM Bug #57937 (Rejected): pg autoscaler of rgw pools doesn't work after creating otp pool
- Not a Ceph issue per the last comment.
- 07:19 PM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
- Basing comment #14 we have a "fix candidate" that might also with issue.
If that's correct, we may wait for merging ... - 07:16 PM Bug #58132 (Pending Backport): qa/standalone/mon: --mon-initial-members setting causes us to popu...
- 07:16 PM Backport #58334 (Resolved): quincy: mon/monclient: update "unable to obtain rotating service keys...
- https://github.com/ceph/ceph/pull/50405
- 07:16 PM Backport #58333 (Rejected): pacific: mon/monclient: update "unable to obtain rotating service key...
- https://github.com/ceph/ceph/pull/54556
- 07:14 PM Bug #17170 (Pending Backport): mon/monclient: update "unable to obtain rotating service keys when...
- 07:13 PM Bug #48896: osd/OSDMap.cc: FAILED ceph_assert(osd_weight.count(i.first))
- Low due to the low occurrence frequency.
- 06:58 PM Bug #58240 (Fix Under Review): osd/scrub: modifying osd_deep_scrub_stride while pg is doing deep ...
- 06:51 PM Bug #58239 (Resolved): pacific: src/mon/Monitor.cc: FAILED ceph_assert(osdmon()->is_writeable())
- 06:50 PM Bug #57017: mon-stretched_cluster: degraded stretched mode lead to Monitor crash
- The pacific backport got reverted in https://github.com/ceph/ceph/pull/49412.
- 06:44 PM Bug #51729 (In Progress): Upmap verification fails for multi-level crush rule
- 03:25 PM Bug #57105: quincy: ceph osd pool set <pool> size math error
- This was fixed in main https://github.com/ceph/ceph/pull/44430 but was not backported to Q.
Instead of backporting t... - 10:50 AM Bug #57105 (Fix Under Review): quincy: ceph osd pool set <pool> size math error
- This PR is proposed after a BZ was reporting the same issue.
- 11:22 AM Bug #58288: quincy: mon: pg_num_check() according to crush rule
- After the revert is merged (https://github.com/ceph/ceph/pull/49465),
pg_num_check() will return to not taking the c... - 10:53 AM Bug #54188: Setting too many PGs leads error handling overflow
- Setting this tracker a duplicate. Seems like the same issue, and 57105 proposed PR should address this one as well.
Also available in: Atom