Activity
From 01/16/2022 to 02/14/2022
02/14/2022
- 11:46 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-02-08_17:00:23-rados-wip-yuri5-testing-2022-02-08-0733-pacific-distro-default-smithi/6670360
- 11:29 PM Bug #51234: LibRadosService.StatusFormat failed, Expected: (0) != (retry), actual: 0 vs 0
- Pacific:
/a/yuriw-2022-02-09_22:52:18-rados-wip-yuri5-testing-2022-02-09-1322-pacific-distro-default-smithi/6672177 - 08:21 PM Feature #54280 (Resolved): support truncation sequences in sparse reads
- I've been working on sparse read support in the kclient, and got something working today, only to notice that after t...
- 03:39 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-02-11_22:59:19-rados-wip-yuri4-testing-2022-02-11-0858-distro-default-smithi/6677733
Last pg map bef... - 10:06 AM Bug #46847: Loss of placement information on OSD reboot
- Could somebody please set the status back to open and Affected Versions to all?
02/11/2022
- 11:01 PM Backport #52769 (Resolved): octopus: pg scrub stat mismatch with special objects that have hash '...
- 10:41 PM Backport #52769: octopus: pg scrub stat mismatch with special objects that have hash 'ffffffff'
- Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/44978
merged - 10:48 PM Bug #54263: cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 32768 for ceph...
- The following path has MGR logs, Mon logs, Cluster logs, audit logs, and system logs....
- 10:39 PM Bug #54263 (Resolved): cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 327...
- Pacific version - 16.2.7-34.el8cp
Quincy version - 17.0.0-10315-ga00e8b31
After doing some analysis it looks like... - 09:23 PM Bug #54262 (Closed): ERROR: test_cluster_info (tasks.cephfs.test_nfs.TestNFS)
- Since the PR has not merged yet, no need to create a tracker https://github.com/ceph/ceph/pull/44911#issuecomment-103...
- 08:48 PM Bug #54262 (Closed): ERROR: test_cluster_info (tasks.cephfs.test_nfs.TestNFS)
- /a/yuriw-2022-02-11_18:38:05-rados-wip-yuri-testing-2022-02-09-1607-distro-default-smithi/6677099/...
- 09:17 PM Backport #53769 (Resolved): pacific: [ceph osd set noautoscale] Global on/off flag for PG autosca...
- 08:52 PM Feature #51213 (Resolved): [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
- 08:37 PM Bug #50089 (Fix Under Review): mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing n...
- https://github.com/ceph/ceph/pull/44993
- 12:35 PM Bug #51338: osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>())
- I'm also encountering this issue on Pacific (16.2.7):...
- 05:42 AM Bug #54255 (New): utc time is used when ceph crash ls
- ceph crash id currently uses utc time but not local time
it is a little confused when debugging issues. - 01:23 AM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
- Hmm, I've just tried to get rid of...
- 12:18 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- Added the logs for OSD 12,23,24 part of pg 4.6b. I don't think the logs are from the beginning when the osd booted an...
02/10/2022
- 06:52 PM Bug #54238: cephadm upgrade pacifc to quincy -> causing osd's FULL/cascading failure
- ...
- 01:26 AM Bug #54238: cephadm upgrade pacifc to quincy -> causing osd's FULL/cascading failure
- The node e24-h01-000-r640 has a file - upgrade.txt from the following command:...
- 12:58 AM Bug #54238 (New): cephadm upgrade pacifc to quincy -> causing osd's FULL/cascading failure
- - Upgrade was started at 2022-02-08T01:54:28...
- 05:15 PM Backport #52771 (In Progress): nautilus: pg scrub stat mismatch with special objects that have ha...
- https://github.com/ceph/ceph/pull/44981
- 04:19 PM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- There was (I'll have to check in which Ceph versions) a bug, where setting noscrub or nodeepscrub at the "wrong"
tim... - 02:57 PM Backport #52769 (In Progress): octopus: pg scrub stat mismatch with special objects that have has...
- https://github.com/ceph/ceph/pull/44978
- 02:46 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Christian Rohmann wrote:
> Dieter, could you maybe describe your test setup a little more? How many instances of R... - 01:34 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Christian Rohmann wrote:
> Dieter Roels wrote:
> > All inconsistencies were on non-primary shards, so we repaired t... - 01:17 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Dieter Roels wrote:
> Not sure if this helps or not, but we are experiencing very similar issues in our clusters the... - 01:12 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Christian Rohmann wrote:
> yite gu wrote:
> > This is inconsistent pg 7.2 from your upload files. It is look like m... - 12:17 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- yite gu wrote:
> This is inconsistent pg 7.2 from your upload files. It is look like mismatch osd is 10. So, you can... - 11:20 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- yite gu wrote:
> "shards": [
> {
> "osd": 10,
> "primary": false,
> "error... - 11:15 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- "shards": [
{
"osd": 10,
"primary": false,
"errors": [
... - 10:36 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Not sure if this helps or not, but we are experiencing very similar issues in our clusters the last few days.
We a... - 10:06 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- yite gu wrote:
> Can you show me that primary osd log report when happen deep-scrub error?
> I hope to know which o... - 09:25 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Can you show me that primary osd log report when happen deep-scrub error?
I hope to know which osd shard happend error - 12:00 PM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
- I installed the cluster using the "Manual Deployment" method (https://docs.ceph.com/en/pacific/install/manual-deploym...
- 08:50 AM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- Matan Breizman wrote:
> Shu Yu wrote:
> >
> > Missing a message, PG 8.243 status
> > # ceph pg ls 8 | grep -w 8....
02/09/2022
- 08:58 PM Bug #23117 (Fix Under Review): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has ...
- 06:44 PM Backport #54233: octopus: devices: mon devices appear empty when scraping SMART metrics
- please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/44960
ceph-backport.sh versi... - 02:36 PM Backport #54233 (Resolved): octopus: devices: mon devices appear empty when scraping SMART metrics
- https://github.com/ceph/ceph/pull/44960
- 06:41 PM Backport #54232: pacific: devices: mon devices appear empty when scraping SMART metrics
- please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/44959
ceph-backport.sh versi... - 02:36 PM Backport #54232 (Resolved): pacific: devices: mon devices appear empty when scraping SMART metrics
- https://github.com/ceph/ceph/pull/44959
- 06:39 PM Bug #52416: devices: mon devices appear empty when scraping SMART metrics
- Ah, indeed! I don't think I would have been able to change the status myself though, so thanks for doing it!
- 02:32 PM Bug #52416 (Pending Backport): devices: mon devices appear empty when scraping SMART metrics
- Thanks, Benoît,
Once the status is changed to "Pending Backport" the bot should find it. - 10:09 AM Bug #52416: devices: mon devices appear empty when scraping SMART metrics
- I'd like to backport this to Pacific and Octopus, but the Backport Bot didn't create the corresponding tickets; what ...
- 04:17 PM Bug #54210: pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd pool get a pg_num | ...
- Laura Flores wrote:
> [...]
>
> Also seen in pacific:
> /a/yuriw-2022-02-05_22:51:11-rados-wip-yuri2-testing-202...
02/08/2022
- 08:13 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
- /a/yuriw-2022-02-05_22:51:11-rados-wip-yuri2-testing-2022-02-04-1646-pacific-distro-default-smithi/6663906
last pg... - 07:21 PM Bug #54210: pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd pool get a pg_num | ...
- Junior, maybe you have an idea of what's going on?
- 07:20 PM Bug #54210 (Resolved): pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd pool get ...
- ...
- 03:25 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- I did run a manual deep-scrub on another inconsistent PG as well, you'll find the logs of all OSDs handling this PG i...
- 03:05 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Neha - I did upload the logs of a deep-scrub via ceph-post-file: 1e5ff0f8-9b76-4489-8529-ee5e6f246093
There is a lit... - 01:49 PM Bug #45457: CEPH Graylog Logging Missing "host" Field
- So is this going to be backported to Pacific?
- 03:08 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- Some more information on the non-default config settings (excluding MGR):...
- 03:02 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- One more thing to add is that when I set the noscrub, nodeep-scrub flags the pgs actually don't stop scrubbing either...
- 12:16 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- Correction. osd_recovery_sleep_hdd was set to 0.0 from the original 0.1. osd_scrub_sleep has been untouched.
- 12:12 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- This is set to 0.0. I think we did this to speed up recovery after we did some CRUSH tuning....
02/07/2022
- 11:33 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- PG 7.dc4 - all osd logs.
- 11:14 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- - PG query:...
- 09:31 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- Vikhyat Umrao wrote:
> Vikhyat Umrao wrote:
> > - This was reproduced again today
>
> As this issue is random we... - 08:30 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- Vikhyat Umrao wrote:
> - This was reproduced again today
As this issue is random we did not have debug logs from ... - 08:27 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- - This was reproduced again today...
- 11:17 PM Bug #54188 (Resolved): Setting too many PGs leads error handling overflow
- This happened on gibba001:...
- 11:15 PM Bug #54166: ceph version 15.2.15, osd configuration osd_op_num_shards_ssd or osd_op_num_threads_p...
- Sridhar, can you please take a look?
- 11:12 PM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- What is osd_scrub_sleep set to?
Ronen, this sounds similar to one of the issues you were looking into, here it is ... - 02:46 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- Additional information:
I tried disabling all client I/O and after that there's zero I/O on the devices hosting th... - 02:45 AM Bug #54172 (Resolved): ceph version 16.2.7 PG scrubs not progressing
- A week ago I've upgraded a 16.2.4 cluster (3 nodes, 33 osds) to 16.2.7 using cephadm and since then we're experiencin...
- 10:57 PM Bug #53751 (Need More Info): "N monitors have not enabled msgr2" is always shown for new clusters
- Can you share the output of "ceph mon dump"? And how did you install this cluster? We are not seeing this issue in 16...
- 10:39 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Massive thanks for your reply Neha, I greatly appreciate it!
Neha Ojha wrote:
> Is it possible for you to trigg... - 10:31 PM Bug #53663 (Need More Info): Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadat...
- Is it possible for you to trigger a deep-scrub on one PG (with debug_osd=20,debug_ms=1), let it go into inconsistent ...
- 03:52 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- The issue is still happening:
1) Find all pools with scrub errors via... - 10:23 PM Bug #54182: OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
- We can include clear_shards_repaired in master and backport it.
- 04:47 PM Bug #54182 (New): OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
- The newly added warning OSD_TOO_MANY_REPAIRS (https://tracker.ceph.com/issues/41564) is raised on a certain count of ...
- 10:14 PM Bug #46847: Loss of placement information on OSD reboot
- Can you share your ec profile and the output of "ceph osd pool ls detail"?
- 12:35 AM Bug #46847: Loss of placement information on OSD reboot
- So in even more fun news, I created the EC pool according to the instructions provided in the documentation.
It's... - 12:17 AM Bug #46847: Loss of placement information on OSD reboot
- Sorry I should add some context/data
ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stab... - 06:58 PM Bug #54180 (In Progress): In some cases osdmaptool takes forever to complete
- 02:44 PM Bug #54180 (Resolved): In some cases osdmaptool takes forever to complete
- with the attached file run the command:
osdmaptool osdmap.GD.bin --upmap out-file --upmap-deviation 1 --upmap-pool d... - 05:52 PM Bug #53806 (Fix Under Review): unessesarily long laggy PG state
02/06/2022
- 11:57 PM Bug #46847: Loss of placement information on OSD reboot
- Neha Ojha wrote:
> Is this issue reproducible in Octopus or later?
Yes. I hit it last night. It's minced one of m... - 12:28 PM Bug #54166 (New): ceph version 15.2.15, osd configuration osd_op_num_shards_ssd or osd_op_num_thr...
- Configure osd_op_num_shards_ssd=8 or osd_op_num_threads_per_shard_ssd=8 in ceph.config, use ceph daemin osd.x config ...
02/04/2022
- 06:09 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- http://pulpito.front.sepia.ceph.com/lflores-2022-01-31_19:11:11-rados:thrash-erasure-code-big-master-distro-default-s...
- 02:03 PM Backport #53480 (In Progress): pacific: Segmentation fault under Pacific 16.2.1 when using a cust...
- 12:00 AM Bug #53757 (Fix Under Review): I have a rados object that data size is 0, and this object have a ...
02/03/2022
- 09:26 PM Backport #53974 (Resolved): quincy: BufferList.rebuild_aligned_size_and_memory failure
- 07:08 PM Backport #53974 (In Progress): quincy: BufferList.rebuild_aligned_size_and_memory failure
- https://github.com/ceph/ceph/pull/44891
- 07:46 PM Bug #23117 (In Progress): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has been ...
- 07:50 AM Bug #54122 (Fix Under Review): Validate monitor ID provided with ok-to-stop similar to ok-to-rm
- 07:49 AM Bug #54122 (Resolved): Validate monitor ID provided with ok-to-stop similar to ok-to-rm
- ceph mon ok-to-stop doesn't validate the monitor ID provided. Thus returns that "quorum should be preserved " without...
02/02/2022
- 05:25 PM Backport #53551: pacific: [RFE] Provide warning when the 'require-osd-release' flag does not matc...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44259
merged - 02:04 PM Feature #54115 (In Progress): Log pglog entry size in OSD log if it exceeds certain size limit
- Even after all PGs are active+clean, we see some OSDs consuming high amount memory. From dump_mempools, osd_pglog con...
- 10:32 AM Bug #51002 (Resolved): regression in ceph daemonperf command output, osd columns aren't visible a...
- 10:32 AM Backport #51172 (Resolved): pacific: regression in ceph daemonperf command output, osd columns ar...
- 12:02 AM Backport #51172: pacific: regression in ceph daemonperf command output, osd columns aren't visibl...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44175
merged - 12:04 AM Backport #53702: pacific: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44387
merged
02/01/2022
- 10:12 PM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
- Myoungwon Oh wrote:
> https://github.com/ceph/ceph/pull/44181
merged - 08:42 PM Backport #53486: pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44202
merged - 08:40 PM Backport #51150: pacific: When read failed, ret can not take as data len, in FillInVerifyExtent
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44173
merged - 08:40 PM Backport #53388: pacific: pg-temp entries are not cleared for PGs that no longer exist
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44096
merged - 05:23 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- I will collect these logs as you've requested. As an update: I am now seeing snaptrim occurring automatically without...
- 04:27 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- Yes that is the case.
Can you collect the log starting at the manual repeer? The intent was to capture the logs s... - 03:23 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- Christopher Hoffman wrote:
> 1. A single OSD will be fine, just ensure it is one exhibiting the issue.
> 2. Can you... - 02:41 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- David Prude wrote:
> Christopher Hoffman wrote:
> > Can you collect and share OSD logs (with debug_osd=20 and debug... - 12:01 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- Christopher Hoffman wrote:
> Can you collect and share OSD logs (with debug_osd=20 and debug_ms=1) when you are enco... - 11:38 AM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
- Neha Ojha wrote:
> Maybe you are missing the square brackets when specifying the mon_host like in https://docs.ceph.... - 12:12 AM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
- Maybe you are missing the square brackets when specifying the mon_host like in https://docs.ceph.com/en/pacific/rados...
- 08:54 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Neha Ojha wrote:
> Are you using filestore or bluestore?
On bluestore - 12:20 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Are you using filestore or bluestore?
- 02:14 AM Bug #44184: Slow / Hanging Ops after pool creation
- Neha Ojha wrote:
> So on both occasions this crash was a side effect of new pool creation? Can you provide the outpu... - 12:28 AM Bug #53667 (Fix Under Review): osd cannot be started after being set to stop
01/31/2022
- 11:59 PM Bug #44184: Slow / Hanging Ops after pool creation
- Ist Gab wrote:
> Neha Ojha wrote:
> > Are you still seeing this problem? Will you be able to provide debug data aro... - 11:53 PM Bug #54005 (Duplicate): Why can wrong parameters be specified when creating erasure-code-profile,...
- 11:38 PM Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
- /a/benhanokh-2022-01-26_21:12:05-rados-WIP_GBH_NCB_new_alloc_map_A6-distro-basic-smithi/6642148
- 11:35 PM Bug #53767: qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing causes timeout
- /a/benhanokh-2022-01-26_21:12:05-rados-WIP_GBH_NCB_new_alloc_map_A6-distro-basic-smithi/6642122
- 10:56 PM Bug #53767: qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing causes timeout
- /a/yuriw-2022-01-27_15:09:25-rados-wip-yuri6-testing-2022-01-26-1547-distro-default-smithi/6644093...
- 11:09 PM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
- /a/yuriw-2022-01-27_15:09:25-rados-wip-yuri6-testing-2022-01-26-1547-distro-default-smithi/6644223
- 11:07 PM Bug #53326 (Resolved): pgs wait for read lease after osd start
- 10:21 PM Bug #51433 (Resolved): mgr spamming with repeated set pgp_num_actual while merging
- nautilus is EOL
- 10:20 PM Backport #53876 (Resolved): pacific: pgs wait for read lease after osd start
- 10:12 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-01-27_14:57:16-rados-wip-yuri-testing-2022-01-26-1810-pacific-distro-default-smithi/6643449
- 09:27 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- Can you collect and share OSD logs (with debug_osd=20 and debug_ms=1) when you are encountering this issue?
- 03:14 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- We are also seeing this issue on *16.2.5*. We schedule cephfs snapshots via cron in a 24h7d2w rotation schedule. Over...
- 07:52 PM Backport #54082: pacific: mon: osd pool create <pool-name> with --bulk flag
- pull request: https://github.com/ceph/ceph/pull/44847
- 07:51 PM Backport #54082 (Resolved): pacific: mon: osd pool create <pool-name> with --bulk flag
- Backporting https://github.com/ceph/ceph/pull/44241 to pacific
- 07:32 PM Bug #45318: Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in cluster log running...
- Happening in Pacific too:
/a/yuriw-2022-01-27_14:57:16-rados-wip-yuri-testing-2022-01-26-1810-pacific-distro-defau... - 02:38 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- Shu Yu wrote:
>
> Missing a message, PG 8.243 status
> # ceph pg ls 8 | grep -w 8.243
> 8.243 0 0 ... - 12:18 PM Backport #53660 (Resolved): octopus: mon: "FAILED ceph_assert(session_map.sessions.empty())" when...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44544
m... - 12:18 PM Backport #53943 (Resolved): octopus: mon: all mon daemon always crash after rm pool
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44700
m... - 12:18 PM Backport #53534 (Resolved): octopus: mon: mgrstatmonitor spams mgr with service_map
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44722
m... - 12:17 PM Backport #53877 (Resolved): octopus: pgs wait for read lease after osd start
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44585
m... - 12:17 PM Backport #53701 (Resolved): octopus: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in bac...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43438
m... - 12:17 PM Backport #52833 (Resolved): octopus: osd: pg may get stuck in backfill_toofull after backfill is ...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43438
m...
01/28/2022
- 05:17 PM Backport #54048 (Rejected): octopus: [RFE] Add health warning in ceph status for filestore OSDs
- Original tracker was accidentally marked for pending backport. We are not backporting related PR to pre-quincy. Marki...
- 05:16 PM Backport #54047 (Rejected): nautilus: [RFE] Add health warning in ceph status for filestore OSDs
- Original tracker was accidentally marked for pending backport. We are not backporting related PR to pre-quincy. Marki...
- 05:14 PM Feature #49275 (Resolved): [RFE] Add health warning in ceph status for filestore OSDs
- It was accidentally marked as pending backport. We are not backporting PR for this tracker to pre-quincy. Marking it ...
- 01:41 PM Feature #49275: [RFE] Add health warning in ceph status for filestore OSDs
- @Dan, I think maybe this is copy-paste issue?
- 01:24 PM Feature #49275: [RFE] Add health warning in ceph status for filestore OSDs
- Why is this being backported to N and O?!
Filestore is deprecated since quincy, so we should only warn in quincy a... - 05:01 PM Backport #53978: quincy: [RFE] Limit slow request details to mgr log
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44764
merged - 10:08 AM Bug #54050: OSD: move message to cluster log when osd hitting the pg hard limit
- PR: https://github.com/ceph/ceph/pull/44821
- 10:02 AM Bug #54050 (Closed): OSD: move message to cluster log when osd hitting the pg hard limit
- OSD will print the below message if a pg creation had hit the hard limit of the max number of pgs per osd.
---
202... - 05:50 AM Bug #52657 (In Progress): MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, ...
01/27/2022
- 06:36 PM Backport #54048 (Rejected): octopus: [RFE] Add health warning in ceph status for filestore OSDs
- 06:36 PM Backport #54047 (Rejected): nautilus: [RFE] Add health warning in ceph status for filestore OSDs
- 06:33 PM Feature #49275 (Pending Backport): [RFE] Add health warning in ceph status for filestore OSDs
- 06:33 PM Feature #49275 (Resolved): [RFE] Add health warning in ceph status for filestore OSDs
- 03:07 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Gonzalo Aguilar Delgado wrote:
> Hi,
>
> Nothing a script can't do:
>
> > ceph osd pool ls | xargs -n1 -istr ... - 09:33 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Mark Nelson wrote:
> In the mean time, Neha mentioned that you might be able to prevent the pgs from splitting by tu... - 09:24 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Mark Nelson wrote:
> Hi Gonzalo,
>
> I'm not an expert regarding this code so please take my reply here with a gr... - 02:28 PM Bug #53327 (Fix Under Review): osd: osd_fast_shutdown_notify_mon not quite right and enable osd_f...
- 12:05 AM Backport #53660: octopus: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44544
merged - 12:04 AM Backport #53943: octopus: mon: all mon daemon always crash after rm pool
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44700
merged
01/26/2022
- 11:54 PM Backport #53534: octopus: mon: mgrstatmonitor spams mgr with service_map
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44722
merged - 11:39 PM Backport #53769: pacific: [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
- Kamoltat Sirivadhna wrote:
> https://github.com/ceph/ceph/pull/44540
merged - 08:53 PM Bug #53729: ceph-osd takes all memory before oom on boot
- In the mean time, Neha mentioned that you might be able to prevent the pgs from splitting by turning off the autoscal...
- 08:34 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Hi Gonzalo,
I'm not an expert regarding this code so please take my reply here with a grain of salt (and others pl... - 05:26 PM Bug #53729: ceph-osd takes all memory before oom on boot
- How can I help to accelerate a bugfix or workaround?
If comment your investigations I can builld a docker image t... - 04:16 PM Bug #53326: pgs wait for read lease after osd start
- https://github.com/ceph/ceph/pull/44585 merged
- 12:27 AM Bug #53326: pgs wait for read lease after osd start
- https://github.com/ceph/ceph/pull/44584 merged
- 04:14 PM Backport #53701: octopus: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling
- Mykola Golub wrote:
> PR: https://github.com/ceph/ceph/pull/43438
merged - 04:14 PM Backport #52833: octopus: osd: pg may get stuck in backfill_toofull after backfill is interrupted...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43438
merged - 12:06 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- >Igor Fedotov wrote:
> I doubt anyone can say what setup would be good for you without experiments in the field. M... - 12:04 PM Bug #44184: Slow / Hanging Ops after pool creation
- Neha Ojha wrote:
> Are you still seeing this problem? Will you be able to provide debug data around this issue?
H... - 12:47 AM Bug #45318 (New): Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in cluster log r...
- Octopus still has this issue /a/yuriw-2022-01-24_18:01:47-rados-wip-yuri10-testing-2022-01-24-0810-octopus-distro-def...
01/25/2022
- 05:40 PM Bug #50608 (Need More Info): ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- 05:36 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- Here is a representative run (wip-dis-testing is essentially master):
https://pulpito.ceph.com/dis-2022-01-25_16:1... - 12:50 AM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- Ilya Dryomov wrote:
> This has been bugging the rbd suite for a while. I don't think messenger failure injection is... - 01:34 PM Bug #53327 (In Progress): osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_s...
- 10:39 AM Backport #53944 (In Progress): pacific: [RFE] Limit slow request details to mgr log
- 09:01 AM Backport #53978 (In Progress): quincy: [RFE] Limit slow request details to mgr log
- 08:51 AM Bug #54005 (Duplicate): Why can wrong parameters be specified when creating erasure-code-profile,...
- My osd tree is like below:
ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-7 0.... - 08:45 AM Bug #54004 (Rejected): When creating erasure-code-profile incorrectly set parameters, it can be c...
- My osd tree is like below:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-7 0.19498 root mytest...
01/24/2022
- 11:19 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- This has been bugging the rbd suite for a while. I don't think messenger failure injection is the problem because th...
- 11:12 PM Bug #53327 (New): osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_...
- 10:59 PM Bug #53940 (Rejected): EC pool creation is setting min_size to K+1 instead of K
- As discussed offline, we should revisit our recovery test coverage for various EC profiles, but closing this issue.
- 10:56 PM Bug #52621 (Can't reproduce): cephx: verify_authorizer could not decrypt ticket info: error: bad ...
- 10:44 PM Bug #44184: Slow / Hanging Ops after pool creation
- Ist Gab wrote:
> Neha Ojha wrote:
>
> > Which version are you using?
>
> Octopus 15.2.14
Are you still seei... - 10:39 PM Bug #52535 (Need More Info): monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ...
- 10:37 PM Bug #48997 (Can't reproduce): rados/singleton/all/recovery-preemption: defer backfill|defer recov...
- 10:36 PM Bug #50106 (Can't reproduce): scrub/osd-scrub-repair.sh: corrupt_scrub_erasure: return 1
- 10:36 PM Bug #50245 (Can't reproduce): TEST_recovery_scrub_2: Not enough recovery started simultaneously
- 10:35 PM Bug #49961 (Can't reproduce): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub_1 failed
- 10:35 PM Bug #46847 (Need More Info): Loss of placement information on OSD reboot
- Is this issue reproducible in Octopus or later?
- 10:32 PM Bug #50462 (Won't Fix - EOL): OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.co...
- Please feel free to reopen if you see the issue in a recent version of Ceph.
- 10:31 PM Bug #49688 (Can't reproduce): FAILED ceph_assert(is_primary()) in submit_log_entries during Promo...
- 10:30 PM Bug #48028 (Won't Fix - EOL): ceph-mon always suffer lots of slow ops from v14.2.9
- Please feel free to reopen if you see the issue in a recent version of Ceph.
- 10:29 PM Bug #50512 (Won't Fix - EOL): upgrade:nautilus-p2p-nautilus: unhandled event in ToDelete
- 10:29 PM Bug #50473 (Can't reproduce): ceph_test_rados_api_lock_pp segfault in librados::v14_2_0::RadosCli...
- 10:28 PM Bug #50242 (Can't reproduce): test_repair_corrupted_obj fails with assert not inconsistent
- 10:28 PM Bug #50119 (Can't reproduce): Invalid read of size 4 in ceph::logging::Log::dump_recent()
- 10:26 PM Bug #47153 (Won't Fix - EOL): monitor crash during upgrade due to LogSummary encoding changes bet...
- 10:26 PM Bug #49523: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- Haven't seen this in recent runs.
- 10:24 PM Bug #49463 (Can't reproduce): qa/standalone/misc/rados-striper.sh: Caught signal in thread_name:r...
- 10:14 PM Bug #53910 (Closed): client: client session state stuck in opening and hang all the time
- 10:43 AM Bug #47273: ceph report missing osdmap_clean_epochs if answered by peon
- > Is it possible that this is related?
I'm not sure, but I guess not.
I think this bug is rather about not forwa... - 06:05 AM Bug #52486 (Pending Backport): test tracker: please ignore
01/22/2022
- 12:06 AM Backport #53978 (Resolved): quincy: [RFE] Limit slow request details to mgr log
- https://github.com/ceph/ceph/pull/44764
- 12:05 AM Backport #53977 (Rejected): quincy: mon: all mon daemon always crash after rm pool
- 12:05 AM Backport #53974 (Resolved): quincy: BufferList.rebuild_aligned_size_and_memory failure
01/21/2022
- 07:30 PM Backport #53972 (Resolved): pacific: BufferList.rebuild_aligned_size_and_memory failure
- 07:25 PM Backport #53971 (Resolved): octopus: BufferList.rebuild_aligned_size_and_memory failure
- 07:22 PM Bug #53969 (Pending Backport): BufferList.rebuild_aligned_size_and_memory failure
- 07:15 PM Bug #53969 (Fix Under Review): BufferList.rebuild_aligned_size_and_memory failure
- 07:14 PM Bug #53969 (Resolved): BufferList.rebuild_aligned_size_and_memory failure
- ...
- 06:59 PM Bug #45345 (Can't reproduce): tasks/rados.py fails with "psutil.NoSuchProcess: psutil.NoSuchProce...
- 06:58 PM Bug #45318 (Can't reproduce): Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in c...
- 06:56 PM Bug #38375: OSD segmentation fault on rbd create
- I do not have the files to reupload so might be worth closing this out as I have moved on to another release and this...
- 06:53 PM Bug #43553 (Can't reproduce): mon: client mon_status fails
- 06:49 PM Bug #43048 (Won't Fix - EOL): nautilus: upgrade/mimic-x/stress-split: failed to recover before ti...
- 06:48 PM Bug #42102 (Can't reproduce): use-after-free in Objecter timer handing
- 06:43 PM Bug #40521 (Can't reproduce): cli timeout (e.g., ceph pg dump)
- 06:38 PM Bug #23911 (Won't Fix - EOL): ceph:luminous: osd out/down when setup with ubuntu/bluestore
- 06:37 PM Bug #20952 (Can't reproduce): Glitchy monitor quorum causes spurious test failure
- 06:36 PM Bug #14115 (Can't reproduce): crypto: race in nss init
- 06:36 PM Bug #13385 (Can't reproduce): cephx: verify_authorizer could not decrypt ticket info: error: NSS ...
- 06:35 PM Bug #11235 (Can't reproduce): test_rados.py test_aio_read is racy
- 05:24 PM Backport #53534 (In Progress): octopus: mon: mgrstatmonitor spams mgr with service_map
- 05:22 PM Backport #53535 (In Progress): pacific: mon: mgrstatmonitor spams mgr with service_map
- 03:55 PM Bug #47273: ceph report missing osdmap_clean_epochs if answered by peon
- I am also seeing this behavior on the latest Octopus and Pacific releases.
The reason I'm looking is that I'm seei...
01/20/2022
- 10:24 PM Bug #53940: EC pool creation is setting min_size to K+1 instead of K
- Laura Flores wrote:
> Thanks for this info, Dan. We have held off on making a change to min_size, and we're currentl... - 08:16 PM Bug #53940: EC pool creation is setting min_size to K+1 instead of K
- Thanks for this info, Dan. We have held off on making a change to min_size, and we're currently discussing ways to en...
- 07:27 PM Backport #53943 (In Progress): octopus: mon: all mon daemon always crash after rm pool
- 06:44 PM Backport #53942 (In Progress): pacific: mon: all mon daemon always crash after rm pool
- 06:29 AM Bug #53910: client: client session state stuck in opening and hang all the time
- Sorry, close this issue please.
- 02:00 AM Backport #53944 (Resolved): pacific: [RFE] Limit slow request details to mgr log
- https://github.com/ceph/ceph/pull/44771
- 01:21 AM Feature #52424 (Pending Backport): [RFE] Limit slow request details to mgr log
- 01:13 AM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- We have marked the primary OSD.33 down [1] and it has helped the stuck recovery_unfound pg to get unstuck and recover...
01/19/2022
- 11:18 PM Bug #53855: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
- Myoungwon Oh: any ideas on this bug?
- 11:15 PM Bug #53875 (Duplicate): AssertionError: wait_for_recovery: failed before timeout expired due to d...
- 11:15 PM Backport #53943 (Resolved): octopus: mon: all mon daemon always crash after rm pool
- https://github.com/ceph/ceph/pull/44700
- 11:10 PM Backport #53942 (Resolved): pacific: mon: all mon daemon always crash after rm pool
- https://github.com/ceph/ceph/pull/44698
- 11:09 PM Bug #53910 (Need More Info): client: client session state stuck in opening and hang all the time
- Can you provide more details about this bug?
- 11:05 PM Bug #53740 (Pending Backport): mon: all mon daemon always crash after rm pool
- 09:00 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- Looks like the last time the PG was active was at "2022-01-18T17:38:23.338"...
- 07:26 PM Bug #53940: EC pool creation is setting min_size to K+1 instead of K
- For history, here's where the default was set to k+1.
https://github.com/ceph/ceph/pull/8008/commits/48e40fcde7b19... - 06:53 PM Bug #53940 (Rejected): EC pool creation is setting min_size to K+1 instead of K
- For more information please check the RHCS bug - https://bugzilla.redhat.com/show_bug.cgi?id=2039585.
- 03:33 PM Bug #53923 (In Progress): [Upgrade] mgr FAILED to decode MSG_PGSTATS
- 02:07 PM Bug #44092 (Fix Under Review): mon: config commands do not accept whitespace style config name
- 01:55 PM Backport #53933 (In Progress): pacific: Stretch mode: peering can livelock with acting set change...
- 01:50 PM Backport #53933 (Resolved): pacific: Stretch mode: peering can livelock with acting set changes s...
- https://github.com/ceph/ceph/pull/44664
- 01:46 PM Bug #53824 (Pending Backport): Stretch mode: peering can livelock with acting set changes swappin...
01/18/2022
- 09:20 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- Ceph OSD 33 Logs with grep unfound!
- 09:14 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- Ceph PG query!
- 09:11 PM Bug #53924 (Need More Info): EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- ...
- 08:36 PM Bug #53923 (Resolved): [Upgrade] mgr FAILED to decode MSG_PGSTATS
- ...
- 05:42 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-01-15_05:47:18-rados-wip-yuri8-testing-2022-01-14-1551-distro-default-smithi/6619577
/a/yuriw-2022-01-... - 04:23 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- /a/yuriw-2022-01-14_23:22:09-rados-wip-yuri6-testing-2022-01-14-1207-distro-default-smithi/6617813
- 08:26 AM Bug #53910 (Closed): client: client session state stuck in opening and hang all the time
01/16/2022
- 08:40 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Do you need something else to find a workaround or the full solution?
Is there anything I can do?
Also available in: Atom