Project

General

Profile

Activity

From 01/16/2022 to 02/14/2022

02/14/2022

11:46 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
/a/yuriw-2022-02-08_17:00:23-rados-wip-yuri5-testing-2022-02-08-0733-pacific-distro-default-smithi/6670360 Laura Flores
11:29 PM Bug #51234: LibRadosService.StatusFormat failed, Expected: (0) != (retry), actual: 0 vs 0
Pacific:
/a/yuriw-2022-02-09_22:52:18-rados-wip-yuri5-testing-2022-02-09-1322-pacific-distro-default-smithi/6672177
Laura Flores
08:21 PM Feature #54280 (Resolved): support truncation sequences in sparse reads
I've been working on sparse read support in the kclient, and got something working today, only to notice that after t... Jeff Layton
03:39 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
/a/yuriw-2022-02-11_22:59:19-rados-wip-yuri4-testing-2022-02-11-0858-distro-default-smithi/6677733
Last pg map bef...
Laura Flores
10:06 AM Bug #46847: Loss of placement information on OSD reboot
Could somebody please set the status back to open and Affected Versions to all? Frank Schilder

02/11/2022

11:01 PM Backport #52769 (Resolved): octopus: pg scrub stat mismatch with special objects that have hash '...
Igor Fedotov
10:41 PM Backport #52769: octopus: pg scrub stat mismatch with special objects that have hash 'ffffffff'
Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/44978
merged
Yuri Weinstein
10:48 PM Bug #54263: cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 32768 for ceph...
The following path has MGR logs, Mon logs, Cluster logs, audit logs, and system logs.... Vikhyat Umrao
10:39 PM Bug #54263 (Resolved): cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 327...
Pacific version - 16.2.7-34.el8cp
Quincy version - 17.0.0-10315-ga00e8b31
After doing some analysis it looks like...
Vikhyat Umrao
09:23 PM Bug #54262 (Closed): ERROR: test_cluster_info (tasks.cephfs.test_nfs.TestNFS)
Since the PR has not merged yet, no need to create a tracker https://github.com/ceph/ceph/pull/44911#issuecomment-103... Neha Ojha
08:48 PM Bug #54262 (Closed): ERROR: test_cluster_info (tasks.cephfs.test_nfs.TestNFS)
/a/yuriw-2022-02-11_18:38:05-rados-wip-yuri-testing-2022-02-09-1607-distro-default-smithi/6677099/... Kamoltat (Junior) Sirivadhna
09:17 PM Backport #53769 (Resolved): pacific: [ceph osd set noautoscale] Global on/off flag for PG autosca...
Kamoltat (Junior) Sirivadhna
08:52 PM Feature #51213 (Resolved): [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
Kamoltat (Junior) Sirivadhna
08:37 PM Bug #50089 (Fix Under Review): mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing n...
https://github.com/ceph/ceph/pull/44993 Kamoltat (Junior) Sirivadhna
12:35 PM Bug #51338: osd/scrub_machine.cc: FAILED ceph_assert(state_cast&lt;const NotActive*&gt;())
I'm also encountering this issue on Pacific (16.2.7):... André Cruz
05:42 AM Bug #54255 (New): utc time is used when ceph crash ls
ceph crash id currently uses utc time but not local time
it is a little confused when debugging issues.
liqun zhang
01:23 AM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
Hmm, I've just tried to get rid of... Niklas Hambuechen
12:18 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
Added the logs for OSD 12,23,24 part of pg 4.6b. I don't think the logs are from the beginning when the osd booted an... Daan van Gorkum

02/10/2022

06:52 PM Bug #54238: cephadm upgrade pacifc to quincy -> causing osd's FULL/cascading failure
... Vikhyat Umrao
01:26 AM Bug #54238: cephadm upgrade pacifc to quincy -> causing osd's FULL/cascading failure
The node e24-h01-000-r640 has a file - upgrade.txt from the following command:... Vikhyat Umrao
12:58 AM Bug #54238 (New): cephadm upgrade pacifc to quincy -> causing osd's FULL/cascading failure
- Upgrade was started at 2022-02-08T01:54:28... Vikhyat Umrao
05:15 PM Backport #52771 (In Progress): nautilus: pg scrub stat mismatch with special objects that have ha...
https://github.com/ceph/ceph/pull/44981 Igor Fedotov
04:19 PM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
There was (I'll have to check in which Ceph versions) a bug, where setting noscrub or nodeepscrub at the "wrong"
tim...
Ronen Friedman
02:57 PM Backport #52769 (In Progress): octopus: pg scrub stat mismatch with special objects that have has...
https://github.com/ceph/ceph/pull/44978 Igor Fedotov
02:46 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
Christian Rohmann wrote:
> Dieter, could you maybe describe your test setup a little more? How many instances of R...
Dieter Roels
01:34 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
Christian Rohmann wrote:
> Dieter Roels wrote:
> > All inconsistencies were on non-primary shards, so we repaired t...
Christian Rohmann
01:17 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
Dieter Roels wrote:
> Not sure if this helps or not, but we are experiencing very similar issues in our clusters the...
Christian Rohmann
01:12 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
Christian Rohmann wrote:
> yite gu wrote:
> > This is inconsistent pg 7.2 from your upload files. It is look like m...
yite gu
12:17 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
yite gu wrote:
> This is inconsistent pg 7.2 from your upload files. It is look like mismatch osd is 10. So, you can...
Christian Rohmann
11:20 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
yite gu wrote:
> "shards": [
> {
> "osd": 10,
> "primary": false,
> "error...
yite gu
11:15 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
"shards": [
{
"osd": 10,
"primary": false,
"errors": [

...
yite gu
10:36 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
Not sure if this helps or not, but we are experiencing very similar issues in our clusters the last few days.
We a...
Dieter Roels
10:06 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
yite gu wrote:
> Can you show me that primary osd log report when happen deep-scrub error?
> I hope to know which o...
Christian Rohmann
09:25 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
Can you show me that primary osd log report when happen deep-scrub error?
I hope to know which osd shard happend error
yite gu
12:00 PM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
I installed the cluster using the "Manual Deployment" method (https://docs.ceph.com/en/pacific/install/manual-deploym... Niklas Hambuechen
08:50 AM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
Matan Breizman wrote:
> Shu Yu wrote:
> >
> > Missing a message, PG 8.243 status
> > # ceph pg ls 8 | grep -w 8....
Shu Yu

02/09/2022

08:58 PM Bug #23117 (Fix Under Review): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has ...
Prashant D
06:44 PM Backport #54233: octopus: devices: mon devices appear empty when scraping SMART metrics
please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/44960
ceph-backport.sh versi...
Benoît Knecht
02:36 PM Backport #54233 (Resolved): octopus: devices: mon devices appear empty when scraping SMART metrics
https://github.com/ceph/ceph/pull/44960 Backport Bot
06:41 PM Backport #54232: pacific: devices: mon devices appear empty when scraping SMART metrics
please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/44959
ceph-backport.sh versi...
Benoît Knecht
02:36 PM Backport #54232 (Resolved): pacific: devices: mon devices appear empty when scraping SMART metrics
https://github.com/ceph/ceph/pull/44959 Backport Bot
06:39 PM Bug #52416: devices: mon devices appear empty when scraping SMART metrics
Ah, indeed! I don't think I would have been able to change the status myself though, so thanks for doing it! Benoît Knecht
02:32 PM Bug #52416 (Pending Backport): devices: mon devices appear empty when scraping SMART metrics
Thanks, Benoît,
Once the status is changed to "Pending Backport" the bot should find it.
Yaarit Hatuka
10:09 AM Bug #52416: devices: mon devices appear empty when scraping SMART metrics
I'd like to backport this to Pacific and Octopus, but the Backport Bot didn't create the corresponding tickets; what ... Benoît Knecht
04:17 PM Bug #54210: pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd pool get a pg_num | ...
Laura Flores wrote:
> [...]
>
> Also seen in pacific:
> /a/yuriw-2022-02-05_22:51:11-rados-wip-yuri2-testing-202...
Laura Flores

02/08/2022

08:13 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
/a/yuriw-2022-02-05_22:51:11-rados-wip-yuri2-testing-2022-02-04-1646-pacific-distro-default-smithi/6663906
last pg...
Laura Flores
07:21 PM Bug #54210: pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd pool get a pg_num | ...
Junior, maybe you have an idea of what's going on? Laura Flores
07:20 PM Bug #54210 (Resolved): pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd pool get ...
... Laura Flores
03:25 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
I did run a manual deep-scrub on another inconsistent PG as well, you'll find the logs of all OSDs handling this PG i... Christian Rohmann
03:05 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
Neha - I did upload the logs of a deep-scrub via ceph-post-file: 1e5ff0f8-9b76-4489-8529-ee5e6f246093
There is a lit...
Christian Rohmann
01:49 PM Bug #45457: CEPH Graylog Logging Missing "host" Field
So is this going to be backported to Pacific? Jan-Philipp Litza
03:08 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
Some more information on the non-default config settings (excluding MGR):... Daan van Gorkum
03:02 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
One more thing to add is that when I set the noscrub, nodeep-scrub flags the pgs actually don't stop scrubbing either... Daan van Gorkum
12:16 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
Correction. osd_recovery_sleep_hdd was set to 0.0 from the original 0.1. osd_scrub_sleep has been untouched. Daan van Gorkum
12:12 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
This is set to 0.0. I think we did this to speed up recovery after we did some CRUSH tuning.... Daan van Gorkum

02/07/2022

11:33 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
PG 7.dc4 - all osd logs. Vikhyat Umrao
11:14 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- PG query:... Vikhyat Umrao
09:31 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
Vikhyat Umrao wrote:
> Vikhyat Umrao wrote:
> > - This was reproduced again today
>
> As this issue is random we...
Vikhyat Umrao
08:30 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
Vikhyat Umrao wrote:
> - This was reproduced again today
As this issue is random we did not have debug logs from ...
Vikhyat Umrao
08:27 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- This was reproduced again today... Vikhyat Umrao
11:17 PM Bug #54188 (Resolved): Setting too many PGs leads error handling overflow
This happened on gibba001:... Mark Nelson
11:15 PM Bug #54166: ceph version 15.2.15, osd configuration osd_op_num_shards_ssd or osd_op_num_threads_p...
Sridhar, can you please take a look? Neha Ojha
11:12 PM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
What is osd_scrub_sleep set to?
Ronen, this sounds similar to one of the issues you were looking into, here it is ...
Neha Ojha
02:46 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
Additional information:
I tried disabling all client I/O and after that there's zero I/O on the devices hosting th...
Daan van Gorkum
02:45 AM Bug #54172 (Resolved): ceph version 16.2.7 PG scrubs not progressing
A week ago I've upgraded a 16.2.4 cluster (3 nodes, 33 osds) to 16.2.7 using cephadm and since then we're experiencin... Daan van Gorkum
10:57 PM Bug #53751 (Need More Info): "N monitors have not enabled msgr2" is always shown for new clusters
Can you share the output of "ceph mon dump"? And how did you install this cluster? We are not seeing this issue in 16... Neha Ojha
10:39 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
Massive thanks for your reply Neha, I greatly appreciate it!
Neha Ojha wrote:
> Is it possible for you to trigg...
Christian Rohmann
10:31 PM Bug #53663 (Need More Info): Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadat...
Is it possible for you to trigger a deep-scrub on one PG (with debug_osd=20,debug_ms=1), let it go into inconsistent ... Neha Ojha
03:52 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
The issue is still happening:
1) Find all pools with scrub errors via...
Christian Rohmann
10:23 PM Bug #54182: OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
We can include clear_shards_repaired in master and backport it. Neha Ojha
04:47 PM Bug #54182 (New): OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
The newly added warning OSD_TOO_MANY_REPAIRS (https://tracker.ceph.com/issues/41564) is raised on a certain count of ... Christian Rohmann
10:14 PM Bug #46847: Loss of placement information on OSD reboot
Can you share your ec profile and the output of "ceph osd pool ls detail"? Neha Ojha
12:35 AM Bug #46847: Loss of placement information on OSD reboot
So in even more fun news, I created the EC pool according to the instructions provided in the documentation.
It's...
Malcolm Haak
12:17 AM Bug #46847: Loss of placement information on OSD reboot
Sorry I should add some context/data
ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stab...
Malcolm Haak
06:58 PM Bug #54180 (In Progress): In some cases osdmaptool takes forever to complete
Neha Ojha
02:44 PM Bug #54180 (Resolved): In some cases osdmaptool takes forever to complete
with the attached file run the command:
osdmaptool osdmap.GD.bin --upmap out-file --upmap-deviation 1 --upmap-pool d...
Josh Salomon
05:52 PM Bug #53806 (Fix Under Review): unessesarily long laggy PG state
Josh Durgin

02/06/2022

11:57 PM Bug #46847: Loss of placement information on OSD reboot
Neha Ojha wrote:
> Is this issue reproducible in Octopus or later?
Yes. I hit it last night. It's minced one of m...
Malcolm Haak
12:28 PM Bug #54166 (New): ceph version 15.2.15, osd configuration osd_op_num_shards_ssd or osd_op_num_thr...
Configure osd_op_num_shards_ssd=8 or osd_op_num_threads_per_shard_ssd=8 in ceph.config, use ceph daemin osd.x config ... chao zhang

02/04/2022

06:09 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
http://pulpito.front.sepia.ceph.com/lflores-2022-01-31_19:11:11-rados:thrash-erasure-code-big-master-distro-default-s... Laura Flores
02:03 PM Backport #53480 (In Progress): pacific: Segmentation fault under Pacific 16.2.1 when using a cust...
Adam Kupczyk
12:00 AM Bug #53757 (Fix Under Review): I have a rados object that data size is 0, and this object have a ...
Neha Ojha

02/03/2022

09:26 PM Backport #53974 (Resolved): quincy: BufferList.rebuild_aligned_size_and_memory failure
Ilya Dryomov
07:08 PM Backport #53974 (In Progress): quincy: BufferList.rebuild_aligned_size_and_memory failure
https://github.com/ceph/ceph/pull/44891 Casey Bodley
07:46 PM Bug #23117 (In Progress): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has been ...
Prashant D
07:50 AM Bug #54122 (Fix Under Review): Validate monitor ID provided with ok-to-stop similar to ok-to-rm
Nitzan Mordechai
07:49 AM Bug #54122 (Resolved): Validate monitor ID provided with ok-to-stop similar to ok-to-rm
ceph mon ok-to-stop doesn't validate the monitor ID provided. Thus returns that "quorum should be preserved " without... Nitzan Mordechai

02/02/2022

05:25 PM Backport #53551: pacific: [RFE] Provide warning when the 'require-osd-release' flag does not matc...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44259
merged
Yuri Weinstein
02:04 PM Feature #54115 (In Progress): Log pglog entry size in OSD log if it exceeds certain size limit
Even after all PGs are active+clean, we see some OSDs consuming high amount memory. From dump_mempools, osd_pglog con... Prashant D
10:32 AM Bug #51002 (Resolved): regression in ceph daemonperf command output, osd columns aren't visible a...
Igor Fedotov
10:32 AM Backport #51172 (Resolved): pacific: regression in ceph daemonperf command output, osd columns ar...
Igor Fedotov
12:02 AM Backport #51172: pacific: regression in ceph daemonperf command output, osd columns aren't visibl...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44175
merged
Yuri Weinstein
12:04 AM Backport #53702: pacific: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44387
merged
Yuri Weinstein

02/01/2022

10:12 PM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
Myoungwon Oh wrote:
> https://github.com/ceph/ceph/pull/44181
merged
Yuri Weinstein
08:42 PM Backport #53486: pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44202
merged
Yuri Weinstein
08:40 PM Backport #51150: pacific: When read failed, ret can not take as data len, in FillInVerifyExtent
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44173
merged
Yuri Weinstein
08:40 PM Backport #53388: pacific: pg-temp entries are not cleared for PGs that no longer exist
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44096
merged
Yuri Weinstein
05:23 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
I will collect these logs as you've requested. As an update: I am now seeing snaptrim occurring automatically without... David Prude
04:27 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
Yes that is the case.
Can you collect the log starting at the manual repeer? The intent was to capture the logs s...
Christopher Hoffman
03:23 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
Christopher Hoffman wrote:
> 1. A single OSD will be fine, just ensure it is one exhibiting the issue.
> 2. Can you...
David Prude
02:41 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
David Prude wrote:
> Christopher Hoffman wrote:
> > Can you collect and share OSD logs (with debug_osd=20 and debug...
Christopher Hoffman
12:01 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
Christopher Hoffman wrote:
> Can you collect and share OSD logs (with debug_osd=20 and debug_ms=1) when you are enco...
David Prude
11:38 AM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
Neha Ojha wrote:
> Maybe you are missing the square brackets when specifying the mon_host like in https://docs.ceph....
Niklas Hambuechen
12:12 AM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
Maybe you are missing the square brackets when specifying the mon_host like in https://docs.ceph.com/en/pacific/rados... Neha Ojha
08:54 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
Neha Ojha wrote:
> Are you using filestore or bluestore?
On bluestore
Christian Rohmann
12:20 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
Are you using filestore or bluestore? Neha Ojha
02:14 AM Bug #44184: Slow / Hanging Ops after pool creation
Neha Ojha wrote:
> So on both occasions this crash was a side effect of new pool creation? Can you provide the outpu...
Ist Gab
12:28 AM Bug #53667 (Fix Under Review): osd cannot be started after being set to stop
Neha Ojha

01/31/2022

11:59 PM Bug #44184: Slow / Hanging Ops after pool creation
Ist Gab wrote:
> Neha Ojha wrote:
> > Are you still seeing this problem? Will you be able to provide debug data aro...
Neha Ojha
11:53 PM Bug #54005 (Duplicate): Why can wrong parameters be specified when creating erasure-code-profile,...
Neha Ojha
11:38 PM Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
/a/benhanokh-2022-01-26_21:12:05-rados-WIP_GBH_NCB_new_alloc_map_A6-distro-basic-smithi/6642148 Laura Flores
11:35 PM Bug #53767: qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing causes timeout
/a/benhanokh-2022-01-26_21:12:05-rados-WIP_GBH_NCB_new_alloc_map_A6-distro-basic-smithi/6642122 Laura Flores
10:56 PM Bug #53767: qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing causes timeout
/a/yuriw-2022-01-27_15:09:25-rados-wip-yuri6-testing-2022-01-26-1547-distro-default-smithi/6644093... Laura Flores
11:09 PM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
/a/yuriw-2022-01-27_15:09:25-rados-wip-yuri6-testing-2022-01-26-1547-distro-default-smithi/6644223 Laura Flores
11:07 PM Bug #53326 (Resolved): pgs wait for read lease after osd start
Vikhyat Umrao
10:21 PM Bug #51433 (Resolved): mgr spamming with repeated set pgp_num_actual while merging
nautilus is EOL Neha Ojha
10:20 PM Backport #53876 (Resolved): pacific: pgs wait for read lease after osd start
Neha Ojha
10:12 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
/a/yuriw-2022-01-27_14:57:16-rados-wip-yuri-testing-2022-01-26-1810-pacific-distro-default-smithi/6643449 Laura Flores
09:27 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
Can you collect and share OSD logs (with debug_osd=20 and debug_ms=1) when you are encountering this issue? Christopher Hoffman
03:14 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
We are also seeing this issue on *16.2.5*. We schedule cephfs snapshots via cron in a 24h7d2w rotation schedule. Over... David Prude
07:52 PM Backport #54082: pacific: mon: osd pool create <pool-name> with --bulk flag
pull request: https://github.com/ceph/ceph/pull/44847 Kamoltat (Junior) Sirivadhna
07:51 PM Backport #54082 (Resolved): pacific: mon: osd pool create <pool-name> with --bulk flag
Backporting https://github.com/ceph/ceph/pull/44241 to pacific Kamoltat (Junior) Sirivadhna
07:32 PM Bug #45318: Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in cluster log running...
Happening in Pacific too:
/a/yuriw-2022-01-27_14:57:16-rados-wip-yuri-testing-2022-01-26-1810-pacific-distro-defau...
Laura Flores
02:38 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
Shu Yu wrote:
>
> Missing a message, PG 8.243 status
> # ceph pg ls 8 | grep -w 8.243
> 8.243 0 0 ...
Matan Breizman
12:18 PM Backport #53660 (Resolved): octopus: mon: "FAILED ceph_assert(session_map.sessions.empty())" when...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44544
m...
Loïc Dachary
12:18 PM Backport #53943 (Resolved): octopus: mon: all mon daemon always crash after rm pool
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44700
m...
Loïc Dachary
12:18 PM Backport #53534 (Resolved): octopus: mon: mgrstatmonitor spams mgr with service_map
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44722
m...
Loïc Dachary
12:17 PM Backport #53877 (Resolved): octopus: pgs wait for read lease after osd start
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44585
m...
Loïc Dachary
12:17 PM Backport #53701 (Resolved): octopus: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in bac...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43438
m...
Loïc Dachary
12:17 PM Backport #52833 (Resolved): octopus: osd: pg may get stuck in backfill_toofull after backfill is ...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43438
m...
Loïc Dachary

01/28/2022

05:17 PM Backport #54048 (Rejected): octopus: [RFE] Add health warning in ceph status for filestore OSDs
Original tracker was accidentally marked for pending backport. We are not backporting related PR to pre-quincy. Marki... Prashant D
05:16 PM Backport #54047 (Rejected): nautilus: [RFE] Add health warning in ceph status for filestore OSDs
Original tracker was accidentally marked for pending backport. We are not backporting related PR to pre-quincy. Marki... Prashant D
05:14 PM Feature #49275 (Resolved): [RFE] Add health warning in ceph status for filestore OSDs
It was accidentally marked as pending backport. We are not backporting PR for this tracker to pre-quincy. Marking it ... Prashant D
01:41 PM Feature #49275: [RFE] Add health warning in ceph status for filestore OSDs
@Dan, I think maybe this is copy-paste issue? Konstantin Shalygin
01:24 PM Feature #49275: [RFE] Add health warning in ceph status for filestore OSDs
Why is this being backported to N and O?!
Filestore is deprecated since quincy, so we should only warn in quincy a...
Dan van der Ster
05:01 PM Backport #53978: quincy: [RFE] Limit slow request details to mgr log
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44764
merged
Yuri Weinstein
10:08 AM Bug #54050: OSD: move message to cluster log when osd hitting the pg hard limit
PR: https://github.com/ceph/ceph/pull/44821 dongdong tao
10:02 AM Bug #54050 (Closed): OSD: move message to cluster log when osd hitting the pg hard limit
OSD will print the below message if a pg creation had hit the hard limit of the max number of pgs per osd.
---
202...
dongdong tao
05:50 AM Bug #52657 (In Progress): MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, ...
Aishwarya Mathuria

01/27/2022

06:36 PM Backport #54048 (Rejected): octopus: [RFE] Add health warning in ceph status for filestore OSDs
Backport Bot
06:36 PM Backport #54047 (Rejected): nautilus: [RFE] Add health warning in ceph status for filestore OSDs
Backport Bot
06:33 PM Feature #49275 (Pending Backport): [RFE] Add health warning in ceph status for filestore OSDs
Prashant D
06:33 PM Feature #49275 (Resolved): [RFE] Add health warning in ceph status for filestore OSDs
Prashant D
03:07 PM Bug #53729: ceph-osd takes all memory before oom on boot
Gonzalo Aguilar Delgado wrote:
> Hi,
>
> Nothing a script can't do:
>
> > ceph osd pool ls | xargs -n1 -istr ...
Mark Nelson
09:33 AM Bug #53729: ceph-osd takes all memory before oom on boot
Mark Nelson wrote:
> In the mean time, Neha mentioned that you might be able to prevent the pgs from splitting by tu...
Gonzalo Aguilar Delgado
09:24 AM Bug #53729: ceph-osd takes all memory before oom on boot
Mark Nelson wrote:
> Hi Gonzalo,
>
> I'm not an expert regarding this code so please take my reply here with a gr...
Gonzalo Aguilar Delgado
02:28 PM Bug #53327 (Fix Under Review): osd: osd_fast_shutdown_notify_mon not quite right and enable osd_f...
Nitzan Mordechai
12:05 AM Backport #53660: octopus: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44544
merged
Yuri Weinstein
12:04 AM Backport #53943: octopus: mon: all mon daemon always crash after rm pool
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44700
merged
Yuri Weinstein

01/26/2022

11:54 PM Backport #53534: octopus: mon: mgrstatmonitor spams mgr with service_map
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44722
merged
Yuri Weinstein
11:39 PM Backport #53769: pacific: [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
Kamoltat Sirivadhna wrote:
> https://github.com/ceph/ceph/pull/44540
merged
Yuri Weinstein
08:53 PM Bug #53729: ceph-osd takes all memory before oom on boot
In the mean time, Neha mentioned that you might be able to prevent the pgs from splitting by turning off the autoscal... Mark Nelson
08:34 PM Bug #53729: ceph-osd takes all memory before oom on boot
Hi Gonzalo,
I'm not an expert regarding this code so please take my reply here with a grain of salt (and others pl...
Mark Nelson
05:26 PM Bug #53729: ceph-osd takes all memory before oom on boot
How can I help to accelerate a bugfix or workaround?
If comment your investigations I can builld a docker image t...
Gonzalo Aguilar Delgado
04:16 PM Bug #53326: pgs wait for read lease after osd start
https://github.com/ceph/ceph/pull/44585 merged Yuri Weinstein
12:27 AM Bug #53326: pgs wait for read lease after osd start
https://github.com/ceph/ceph/pull/44584 merged Yuri Weinstein
04:14 PM Backport #53701: octopus: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling
Mykola Golub wrote:
> PR: https://github.com/ceph/ceph/pull/43438
merged
Yuri Weinstein
04:14 PM Backport #52833: octopus: osd: pg may get stuck in backfill_toofull after backfill is interrupted...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43438
merged
Yuri Weinstein
12:06 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
>Igor Fedotov wrote:
> I doubt anyone can say what setup would be good for you without experiments in the field. M...
Ist Gab
12:04 PM Bug #44184: Slow / Hanging Ops after pool creation
Neha Ojha wrote:
> Are you still seeing this problem? Will you be able to provide debug data around this issue?
H...
Ist Gab
12:47 AM Bug #45318 (New): Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in cluster log r...
Octopus still has this issue /a/yuriw-2022-01-24_18:01:47-rados-wip-yuri10-testing-2022-01-24-0810-octopus-distro-def... Neha Ojha

01/25/2022

05:40 PM Bug #50608 (Need More Info): ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
Neha Ojha
05:36 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
Here is a representative run (wip-dis-testing is essentially master):
https://pulpito.ceph.com/dis-2022-01-25_16:1...
Ilya Dryomov
12:50 AM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
Ilya Dryomov wrote:
> This has been bugging the rbd suite for a while. I don't think messenger failure injection is...
Neha Ojha
01:34 PM Bug #53327 (In Progress): osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_s...
Nitzan Mordechai
10:39 AM Backport #53944 (In Progress): pacific: [RFE] Limit slow request details to mgr log
Prashant D
09:01 AM Backport #53978 (In Progress): quincy: [RFE] Limit slow request details to mgr log
Prashant D
08:51 AM Bug #54005 (Duplicate): Why can wrong parameters be specified when creating erasure-code-profile,...
My osd tree is like below:
ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-7 0....
wang kevin
08:45 AM Bug #54004 (Rejected): When creating erasure-code-profile incorrectly set parameters, it can be c...
My osd tree is like below:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-7 0.19498 root mytest...
wang kevin

01/24/2022

11:19 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
This has been bugging the rbd suite for a while. I don't think messenger failure injection is the problem because th... Ilya Dryomov
11:12 PM Bug #53327 (New): osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_...
Neha Ojha
10:59 PM Bug #53940 (Rejected): EC pool creation is setting min_size to K+1 instead of K
As discussed offline, we should revisit our recovery test coverage for various EC profiles, but closing this issue. Neha Ojha
10:56 PM Bug #52621 (Can't reproduce): cephx: verify_authorizer could not decrypt ticket info: error: bad ...
Neha Ojha
10:44 PM Bug #44184: Slow / Hanging Ops after pool creation
Ist Gab wrote:
> Neha Ojha wrote:
>
> > Which version are you using?
>
> Octopus 15.2.14
Are you still seei...
Neha Ojha
10:39 PM Bug #52535 (Need More Info): monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ...
Neha Ojha
10:37 PM Bug #48997 (Can't reproduce): rados/singleton/all/recovery-preemption: defer backfill|defer recov...
Neha Ojha
10:36 PM Bug #50106 (Can't reproduce): scrub/osd-scrub-repair.sh: corrupt_scrub_erasure: return 1
Neha Ojha
10:36 PM Bug #50245 (Can't reproduce): TEST_recovery_scrub_2: Not enough recovery started simultaneously
Neha Ojha
10:35 PM Bug #49961 (Can't reproduce): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub_1 failed
Neha Ojha
10:35 PM Bug #46847 (Need More Info): Loss of placement information on OSD reboot
Is this issue reproducible in Octopus or later? Neha Ojha
10:32 PM Bug #50462 (Won't Fix - EOL): OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.co...
Please feel free to reopen if you see the issue in a recent version of Ceph.
Neha Ojha
10:31 PM Bug #49688 (Can't reproduce): FAILED ceph_assert(is_primary()) in submit_log_entries during Promo...
Neha Ojha
10:30 PM Bug #48028 (Won't Fix - EOL): ceph-mon always suffer lots of slow ops from v14.2.9
Please feel free to reopen if you see the issue in a recent version of Ceph. Neha Ojha
10:29 PM Bug #50512 (Won't Fix - EOL): upgrade:nautilus-p2p-nautilus: unhandled event in ToDelete
Neha Ojha
10:29 PM Bug #50473 (Can't reproduce): ceph_test_rados_api_lock_pp segfault in librados::v14_2_0::RadosCli...
Neha Ojha
10:28 PM Bug #50242 (Can't reproduce): test_repair_corrupted_obj fails with assert not inconsistent
Neha Ojha
10:28 PM Bug #50119 (Can't reproduce): Invalid read of size 4 in ceph::logging::Log::dump_recent()
Neha Ojha
10:26 PM Bug #47153 (Won't Fix - EOL): monitor crash during upgrade due to LogSummary encoding changes bet...
Neha Ojha
10:26 PM Bug #49523: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
Haven't seen this in recent runs. Neha Ojha
10:24 PM Bug #49463 (Can't reproduce): qa/standalone/misc/rados-striper.sh: Caught signal in thread_name:r...
Neha Ojha
10:14 PM Bug #53910 (Closed): client: client session state stuck in opening and hang all the time
Neha Ojha
10:43 AM Bug #47273: ceph report missing osdmap_clean_epochs if answered by peon
> Is it possible that this is related?
I'm not sure, but I guess not.
I think this bug is rather about not forwa...
Dan van der Ster
06:05 AM Bug #52486 (Pending Backport): test tracker: please ignore
Deepika Upadhyay

01/22/2022

12:06 AM Backport #53978 (Resolved): quincy: [RFE] Limit slow request details to mgr log
https://github.com/ceph/ceph/pull/44764 Backport Bot
12:05 AM Backport #53977 (Rejected): quincy: mon: all mon daemon always crash after rm pool
Backport Bot
12:05 AM Backport #53974 (Resolved): quincy: BufferList.rebuild_aligned_size_and_memory failure
Backport Bot

01/21/2022

07:30 PM Backport #53972 (Resolved): pacific: BufferList.rebuild_aligned_size_and_memory failure
Backport Bot
07:25 PM Backport #53971 (Resolved): octopus: BufferList.rebuild_aligned_size_and_memory failure
Backport Bot
07:22 PM Bug #53969 (Pending Backport): BufferList.rebuild_aligned_size_and_memory failure
Neha Ojha
07:15 PM Bug #53969 (Fix Under Review): BufferList.rebuild_aligned_size_and_memory failure
Neha Ojha
07:14 PM Bug #53969 (Resolved): BufferList.rebuild_aligned_size_and_memory failure
... Neha Ojha
06:59 PM Bug #45345 (Can't reproduce): tasks/rados.py fails with "psutil.NoSuchProcess: psutil.NoSuchProce...
Neha Ojha
06:58 PM Bug #45318 (Can't reproduce): Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in c...
Neha Ojha
06:56 PM Bug #38375: OSD segmentation fault on rbd create
I do not have the files to reupload so might be worth closing this out as I have moved on to another release and this... Ryan Farrington
06:53 PM Bug #43553 (Can't reproduce): mon: client mon_status fails
Neha Ojha
06:49 PM Bug #43048 (Won't Fix - EOL): nautilus: upgrade/mimic-x/stress-split: failed to recover before ti...
Neha Ojha
06:48 PM Bug #42102 (Can't reproduce): use-after-free in Objecter timer handing
Neha Ojha
06:43 PM Bug #40521 (Can't reproduce): cli timeout (e.g., ceph pg dump)
Neha Ojha
06:38 PM Bug #23911 (Won't Fix - EOL): ceph:luminous: osd out/down when setup with ubuntu/bluestore
Neha Ojha
06:37 PM Bug #20952 (Can't reproduce): Glitchy monitor quorum causes spurious test failure
Neha Ojha
06:36 PM Bug #14115 (Can't reproduce): crypto: race in nss init
Neha Ojha
06:36 PM Bug #13385 (Can't reproduce): cephx: verify_authorizer could not decrypt ticket info: error: NSS ...
Neha Ojha
06:35 PM Bug #11235 (Can't reproduce): test_rados.py test_aio_read is racy
Neha Ojha
05:24 PM Backport #53534 (In Progress): octopus: mon: mgrstatmonitor spams mgr with service_map
Cory Snyder
05:22 PM Backport #53535 (In Progress): pacific: mon: mgrstatmonitor spams mgr with service_map
Cory Snyder
03:55 PM Bug #47273: ceph report missing osdmap_clean_epochs if answered by peon
I am also seeing this behavior on the latest Octopus and Pacific releases.
The reason I'm looking is that I'm seei...
Steve Taylor

01/20/2022

10:24 PM Bug #53940: EC pool creation is setting min_size to K+1 instead of K
Laura Flores wrote:
> Thanks for this info, Dan. We have held off on making a change to min_size, and we're currentl...
Vikhyat Umrao
08:16 PM Bug #53940: EC pool creation is setting min_size to K+1 instead of K
Thanks for this info, Dan. We have held off on making a change to min_size, and we're currently discussing ways to en... Laura Flores
07:27 PM Backport #53943 (In Progress): octopus: mon: all mon daemon always crash after rm pool
Cory Snyder
06:44 PM Backport #53942 (In Progress): pacific: mon: all mon daemon always crash after rm pool
Cory Snyder
06:29 AM Bug #53910: client: client session state stuck in opening and hang all the time
Sorry, close this issue please. Ivan Guan
02:00 AM Backport #53944 (Resolved): pacific: [RFE] Limit slow request details to mgr log
https://github.com/ceph/ceph/pull/44771 Backport Bot
01:21 AM Feature #52424 (Pending Backport): [RFE] Limit slow request details to mgr log
Vikhyat Umrao
01:13 AM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
We have marked the primary OSD.33 down [1] and it has helped the stuck recovery_unfound pg to get unstuck and recover... Vikhyat Umrao

01/19/2022

11:18 PM Bug #53855: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
Myoungwon Oh: any ideas on this bug?
Neha Ojha
11:15 PM Bug #53875 (Duplicate): AssertionError: wait_for_recovery: failed before timeout expired due to d...
Neha Ojha
11:15 PM Backport #53943 (Resolved): octopus: mon: all mon daemon always crash after rm pool
https://github.com/ceph/ceph/pull/44700 Backport Bot
11:10 PM Backport #53942 (Resolved): pacific: mon: all mon daemon always crash after rm pool
https://github.com/ceph/ceph/pull/44698 Backport Bot
11:09 PM Bug #53910 (Need More Info): client: client session state stuck in opening and hang all the time
Can you provide more details about this bug? Neha Ojha
11:05 PM Bug #53740 (Pending Backport): mon: all mon daemon always crash after rm pool
Neha Ojha
09:00 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
Looks like the last time the PG was active was at "2022-01-18T17:38:23.338"... Neha Ojha
07:26 PM Bug #53940: EC pool creation is setting min_size to K+1 instead of K
For history, here's where the default was set to k+1.
https://github.com/ceph/ceph/pull/8008/commits/48e40fcde7b19...
Dan van der Ster
06:53 PM Bug #53940 (Rejected): EC pool creation is setting min_size to K+1 instead of K
For more information please check the RHCS bug - https://bugzilla.redhat.com/show_bug.cgi?id=2039585. Vikhyat Umrao
03:33 PM Bug #53923 (In Progress): [Upgrade] mgr FAILED to decode MSG_PGSTATS
Neha Ojha
02:07 PM Bug #44092 (Fix Under Review): mon: config commands do not accept whitespace style config name
Patrick Donnelly
01:55 PM Backport #53933 (In Progress): pacific: Stretch mode: peering can livelock with acting set change...
Greg Farnum
01:50 PM Backport #53933 (Resolved): pacific: Stretch mode: peering can livelock with acting set changes s...
https://github.com/ceph/ceph/pull/44664 Backport Bot
01:46 PM Bug #53824 (Pending Backport): Stretch mode: peering can livelock with acting set changes swappin...
Greg Farnum

01/18/2022

09:20 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
Ceph OSD 33 Logs with grep unfound! Vikhyat Umrao
09:14 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
Ceph PG query! Vikhyat Umrao
09:11 PM Bug #53924 (Need More Info): EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
... Vikhyat Umrao
08:36 PM Bug #53923 (Resolved): [Upgrade] mgr FAILED to decode MSG_PGSTATS
... Vikhyat Umrao
05:42 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
/a/yuriw-2022-01-15_05:47:18-rados-wip-yuri8-testing-2022-01-14-1551-distro-default-smithi/6619577
/a/yuriw-2022-01-...
Laura Flores
04:23 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
/a/yuriw-2022-01-14_23:22:09-rados-wip-yuri6-testing-2022-01-14-1207-distro-default-smithi/6617813 Laura Flores
08:26 AM Bug #53910 (Closed): client: client session state stuck in opening and hang all the time
Ivan Guan

01/16/2022

08:40 PM Bug #53729: ceph-osd takes all memory before oom on boot
Do you need something else to find a workaround or the full solution?
Is there anything I can do?
Gonzalo Aguilar Delgado
 

Also available in: Atom