Activity
From 01/09/2022 to 02/07/2022
02/07/2022
- 11:33 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- PG 7.dc4 - all osd logs.
- 11:14 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- - PG query:...
- 09:31 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- Vikhyat Umrao wrote:
> Vikhyat Umrao wrote:
> > - This was reproduced again today
>
> As this issue is random we... - 08:30 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- Vikhyat Umrao wrote:
> - This was reproduced again today
As this issue is random we did not have debug logs from ... - 08:27 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- - This was reproduced again today...
- 11:17 PM Bug #54188 (Resolved): Setting too many PGs leads error handling overflow
- This happened on gibba001:...
- 11:15 PM Bug #54166: ceph version 15.2.15, osd configuration osd_op_num_shards_ssd or osd_op_num_threads_p...
- Sridhar, can you please take a look?
- 11:12 PM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- What is osd_scrub_sleep set to?
Ronen, this sounds similar to one of the issues you were looking into, here it is ... - 02:46 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- Additional information:
I tried disabling all client I/O and after that there's zero I/O on the devices hosting th... - 02:45 AM Bug #54172 (Resolved): ceph version 16.2.7 PG scrubs not progressing
- A week ago I've upgraded a 16.2.4 cluster (3 nodes, 33 osds) to 16.2.7 using cephadm and since then we're experiencin...
- 10:57 PM Bug #53751 (Need More Info): "N monitors have not enabled msgr2" is always shown for new clusters
- Can you share the output of "ceph mon dump"? And how did you install this cluster? We are not seeing this issue in 16...
- 10:39 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Massive thanks for your reply Neha, I greatly appreciate it!
Neha Ojha wrote:
> Is it possible for you to trigg... - 10:31 PM Bug #53663 (Need More Info): Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadat...
- Is it possible for you to trigger a deep-scrub on one PG (with debug_osd=20,debug_ms=1), let it go into inconsistent ...
- 03:52 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- The issue is still happening:
1) Find all pools with scrub errors via... - 10:23 PM Bug #54182: OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
- We can include clear_shards_repaired in master and backport it.
- 04:47 PM Bug #54182 (New): OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
- The newly added warning OSD_TOO_MANY_REPAIRS (https://tracker.ceph.com/issues/41564) is raised on a certain count of ...
- 10:14 PM Bug #46847: Loss of placement information on OSD reboot
- Can you share your ec profile and the output of "ceph osd pool ls detail"?
- 12:35 AM Bug #46847: Loss of placement information on OSD reboot
- So in even more fun news, I created the EC pool according to the instructions provided in the documentation.
It's... - 12:17 AM Bug #46847: Loss of placement information on OSD reboot
- Sorry I should add some context/data
ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stab... - 06:58 PM Bug #54180 (In Progress): In some cases osdmaptool takes forever to complete
- 02:44 PM Bug #54180 (Resolved): In some cases osdmaptool takes forever to complete
- with the attached file run the command:
osdmaptool osdmap.GD.bin --upmap out-file --upmap-deviation 1 --upmap-pool d... - 05:52 PM Bug #53806 (Fix Under Review): unessesarily long laggy PG state
02/06/2022
- 11:57 PM Bug #46847: Loss of placement information on OSD reboot
- Neha Ojha wrote:
> Is this issue reproducible in Octopus or later?
Yes. I hit it last night. It's minced one of m... - 12:28 PM Bug #54166 (New): ceph version 15.2.15, osd configuration osd_op_num_shards_ssd or osd_op_num_thr...
- Configure osd_op_num_shards_ssd=8 or osd_op_num_threads_per_shard_ssd=8 in ceph.config, use ceph daemin osd.x config ...
02/04/2022
- 06:09 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- http://pulpito.front.sepia.ceph.com/lflores-2022-01-31_19:11:11-rados:thrash-erasure-code-big-master-distro-default-s...
- 02:03 PM Backport #53480 (In Progress): pacific: Segmentation fault under Pacific 16.2.1 when using a cust...
- 12:00 AM Bug #53757 (Fix Under Review): I have a rados object that data size is 0, and this object have a ...
02/03/2022
- 09:26 PM Backport #53974 (Resolved): quincy: BufferList.rebuild_aligned_size_and_memory failure
- 07:08 PM Backport #53974 (In Progress): quincy: BufferList.rebuild_aligned_size_and_memory failure
- https://github.com/ceph/ceph/pull/44891
- 07:46 PM Bug #23117 (In Progress): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has been ...
- 07:50 AM Bug #54122 (Fix Under Review): Validate monitor ID provided with ok-to-stop similar to ok-to-rm
- 07:49 AM Bug #54122 (Resolved): Validate monitor ID provided with ok-to-stop similar to ok-to-rm
- ceph mon ok-to-stop doesn't validate the monitor ID provided. Thus returns that "quorum should be preserved " without...
02/02/2022
- 05:25 PM Backport #53551: pacific: [RFE] Provide warning when the 'require-osd-release' flag does not matc...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44259
merged - 02:04 PM Feature #54115 (In Progress): Log pglog entry size in OSD log if it exceeds certain size limit
- Even after all PGs are active+clean, we see some OSDs consuming high amount memory. From dump_mempools, osd_pglog con...
- 10:32 AM Bug #51002 (Resolved): regression in ceph daemonperf command output, osd columns aren't visible a...
- 10:32 AM Backport #51172 (Resolved): pacific: regression in ceph daemonperf command output, osd columns ar...
- 12:02 AM Backport #51172: pacific: regression in ceph daemonperf command output, osd columns aren't visibl...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44175
merged - 12:04 AM Backport #53702: pacific: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44387
merged
02/01/2022
- 10:12 PM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
- Myoungwon Oh wrote:
> https://github.com/ceph/ceph/pull/44181
merged - 08:42 PM Backport #53486: pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44202
merged - 08:40 PM Backport #51150: pacific: When read failed, ret can not take as data len, in FillInVerifyExtent
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44173
merged - 08:40 PM Backport #53388: pacific: pg-temp entries are not cleared for PGs that no longer exist
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44096
merged - 05:23 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- I will collect these logs as you've requested. As an update: I am now seeing snaptrim occurring automatically without...
- 04:27 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- Yes that is the case.
Can you collect the log starting at the manual repeer? The intent was to capture the logs s... - 03:23 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- Christopher Hoffman wrote:
> 1. A single OSD will be fine, just ensure it is one exhibiting the issue.
> 2. Can you... - 02:41 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- David Prude wrote:
> Christopher Hoffman wrote:
> > Can you collect and share OSD logs (with debug_osd=20 and debug... - 12:01 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- Christopher Hoffman wrote:
> Can you collect and share OSD logs (with debug_osd=20 and debug_ms=1) when you are enco... - 11:38 AM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
- Neha Ojha wrote:
> Maybe you are missing the square brackets when specifying the mon_host like in https://docs.ceph.... - 12:12 AM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
- Maybe you are missing the square brackets when specifying the mon_host like in https://docs.ceph.com/en/pacific/rados...
- 08:54 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Neha Ojha wrote:
> Are you using filestore or bluestore?
On bluestore - 12:20 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Are you using filestore or bluestore?
- 02:14 AM Bug #44184: Slow / Hanging Ops after pool creation
- Neha Ojha wrote:
> So on both occasions this crash was a side effect of new pool creation? Can you provide the outpu... - 12:28 AM Bug #53667 (Fix Under Review): osd cannot be started after being set to stop
01/31/2022
- 11:59 PM Bug #44184: Slow / Hanging Ops after pool creation
- Ist Gab wrote:
> Neha Ojha wrote:
> > Are you still seeing this problem? Will you be able to provide debug data aro... - 11:53 PM Bug #54005 (Duplicate): Why can wrong parameters be specified when creating erasure-code-profile,...
- 11:38 PM Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
- /a/benhanokh-2022-01-26_21:12:05-rados-WIP_GBH_NCB_new_alloc_map_A6-distro-basic-smithi/6642148
- 11:35 PM Bug #53767: qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing causes timeout
- /a/benhanokh-2022-01-26_21:12:05-rados-WIP_GBH_NCB_new_alloc_map_A6-distro-basic-smithi/6642122
- 10:56 PM Bug #53767: qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing causes timeout
- /a/yuriw-2022-01-27_15:09:25-rados-wip-yuri6-testing-2022-01-26-1547-distro-default-smithi/6644093...
- 11:09 PM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
- /a/yuriw-2022-01-27_15:09:25-rados-wip-yuri6-testing-2022-01-26-1547-distro-default-smithi/6644223
- 11:07 PM Bug #53326 (Resolved): pgs wait for read lease after osd start
- 10:21 PM Bug #51433 (Resolved): mgr spamming with repeated set pgp_num_actual while merging
- nautilus is EOL
- 10:20 PM Backport #53876 (Resolved): pacific: pgs wait for read lease after osd start
- 10:12 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-01-27_14:57:16-rados-wip-yuri-testing-2022-01-26-1810-pacific-distro-default-smithi/6643449
- 09:27 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- Can you collect and share OSD logs (with debug_osd=20 and debug_ms=1) when you are encountering this issue?
- 03:14 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- We are also seeing this issue on *16.2.5*. We schedule cephfs snapshots via cron in a 24h7d2w rotation schedule. Over...
- 07:52 PM Backport #54082: pacific: mon: osd pool create <pool-name> with --bulk flag
- pull request: https://github.com/ceph/ceph/pull/44847
- 07:51 PM Backport #54082 (Resolved): pacific: mon: osd pool create <pool-name> with --bulk flag
- Backporting https://github.com/ceph/ceph/pull/44241 to pacific
- 07:32 PM Bug #45318: Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in cluster log running...
- Happening in Pacific too:
/a/yuriw-2022-01-27_14:57:16-rados-wip-yuri-testing-2022-01-26-1810-pacific-distro-defau... - 02:38 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- Shu Yu wrote:
>
> Missing a message, PG 8.243 status
> # ceph pg ls 8 | grep -w 8.243
> 8.243 0 0 ... - 12:18 PM Backport #53660 (Resolved): octopus: mon: "FAILED ceph_assert(session_map.sessions.empty())" when...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44544
m... - 12:18 PM Backport #53943 (Resolved): octopus: mon: all mon daemon always crash after rm pool
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44700
m... - 12:18 PM Backport #53534 (Resolved): octopus: mon: mgrstatmonitor spams mgr with service_map
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44722
m... - 12:17 PM Backport #53877 (Resolved): octopus: pgs wait for read lease after osd start
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44585
m... - 12:17 PM Backport #53701 (Resolved): octopus: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in bac...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43438
m... - 12:17 PM Backport #52833 (Resolved): octopus: osd: pg may get stuck in backfill_toofull after backfill is ...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43438
m...
01/28/2022
- 05:17 PM Backport #54048 (Rejected): octopus: [RFE] Add health warning in ceph status for filestore OSDs
- Original tracker was accidentally marked for pending backport. We are not backporting related PR to pre-quincy. Marki...
- 05:16 PM Backport #54047 (Rejected): nautilus: [RFE] Add health warning in ceph status for filestore OSDs
- Original tracker was accidentally marked for pending backport. We are not backporting related PR to pre-quincy. Marki...
- 05:14 PM Feature #49275 (Resolved): [RFE] Add health warning in ceph status for filestore OSDs
- It was accidentally marked as pending backport. We are not backporting PR for this tracker to pre-quincy. Marking it ...
- 01:41 PM Feature #49275: [RFE] Add health warning in ceph status for filestore OSDs
- @Dan, I think maybe this is copy-paste issue?
- 01:24 PM Feature #49275: [RFE] Add health warning in ceph status for filestore OSDs
- Why is this being backported to N and O?!
Filestore is deprecated since quincy, so we should only warn in quincy a... - 05:01 PM Backport #53978: quincy: [RFE] Limit slow request details to mgr log
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44764
merged - 10:08 AM Bug #54050: OSD: move message to cluster log when osd hitting the pg hard limit
- PR: https://github.com/ceph/ceph/pull/44821
- 10:02 AM Bug #54050 (Closed): OSD: move message to cluster log when osd hitting the pg hard limit
- OSD will print the below message if a pg creation had hit the hard limit of the max number of pgs per osd.
---
202... - 05:50 AM Bug #52657 (In Progress): MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, ...
01/27/2022
- 06:36 PM Backport #54048 (Rejected): octopus: [RFE] Add health warning in ceph status for filestore OSDs
- 06:36 PM Backport #54047 (Rejected): nautilus: [RFE] Add health warning in ceph status for filestore OSDs
- 06:33 PM Feature #49275 (Pending Backport): [RFE] Add health warning in ceph status for filestore OSDs
- 06:33 PM Feature #49275 (Resolved): [RFE] Add health warning in ceph status for filestore OSDs
- 03:07 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Gonzalo Aguilar Delgado wrote:
> Hi,
>
> Nothing a script can't do:
>
> > ceph osd pool ls | xargs -n1 -istr ... - 09:33 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Mark Nelson wrote:
> In the mean time, Neha mentioned that you might be able to prevent the pgs from splitting by tu... - 09:24 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Mark Nelson wrote:
> Hi Gonzalo,
>
> I'm not an expert regarding this code so please take my reply here with a gr... - 02:28 PM Bug #53327 (Fix Under Review): osd: osd_fast_shutdown_notify_mon not quite right and enable osd_f...
- 12:05 AM Backport #53660: octopus: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44544
merged - 12:04 AM Backport #53943: octopus: mon: all mon daemon always crash after rm pool
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44700
merged
01/26/2022
- 11:54 PM Backport #53534: octopus: mon: mgrstatmonitor spams mgr with service_map
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44722
merged - 11:39 PM Backport #53769: pacific: [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
- Kamoltat Sirivadhna wrote:
> https://github.com/ceph/ceph/pull/44540
merged - 08:53 PM Bug #53729: ceph-osd takes all memory before oom on boot
- In the mean time, Neha mentioned that you might be able to prevent the pgs from splitting by turning off the autoscal...
- 08:34 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Hi Gonzalo,
I'm not an expert regarding this code so please take my reply here with a grain of salt (and others pl... - 05:26 PM Bug #53729: ceph-osd takes all memory before oom on boot
- How can I help to accelerate a bugfix or workaround?
If comment your investigations I can builld a docker image t... - 04:16 PM Bug #53326: pgs wait for read lease after osd start
- https://github.com/ceph/ceph/pull/44585 merged
- 12:27 AM Bug #53326: pgs wait for read lease after osd start
- https://github.com/ceph/ceph/pull/44584 merged
- 04:14 PM Backport #53701: octopus: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling
- Mykola Golub wrote:
> PR: https://github.com/ceph/ceph/pull/43438
merged - 04:14 PM Backport #52833: octopus: osd: pg may get stuck in backfill_toofull after backfill is interrupted...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43438
merged - 12:06 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- >Igor Fedotov wrote:
> I doubt anyone can say what setup would be good for you without experiments in the field. M... - 12:04 PM Bug #44184: Slow / Hanging Ops after pool creation
- Neha Ojha wrote:
> Are you still seeing this problem? Will you be able to provide debug data around this issue?
H... - 12:47 AM Bug #45318 (New): Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in cluster log r...
- Octopus still has this issue /a/yuriw-2022-01-24_18:01:47-rados-wip-yuri10-testing-2022-01-24-0810-octopus-distro-def...
01/25/2022
- 05:40 PM Bug #50608 (Need More Info): ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- 05:36 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- Here is a representative run (wip-dis-testing is essentially master):
https://pulpito.ceph.com/dis-2022-01-25_16:1... - 12:50 AM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- Ilya Dryomov wrote:
> This has been bugging the rbd suite for a while. I don't think messenger failure injection is... - 01:34 PM Bug #53327 (In Progress): osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_s...
- 10:39 AM Backport #53944 (In Progress): pacific: [RFE] Limit slow request details to mgr log
- 09:01 AM Backport #53978 (In Progress): quincy: [RFE] Limit slow request details to mgr log
- 08:51 AM Bug #54005 (Duplicate): Why can wrong parameters be specified when creating erasure-code-profile,...
- My osd tree is like below:
ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-7 0.... - 08:45 AM Bug #54004 (Rejected): When creating erasure-code-profile incorrectly set parameters, it can be c...
- My osd tree is like below:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-7 0.19498 root mytest...
01/24/2022
- 11:19 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- This has been bugging the rbd suite for a while. I don't think messenger failure injection is the problem because th...
- 11:12 PM Bug #53327 (New): osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_...
- 10:59 PM Bug #53940 (Rejected): EC pool creation is setting min_size to K+1 instead of K
- As discussed offline, we should revisit our recovery test coverage for various EC profiles, but closing this issue.
- 10:56 PM Bug #52621 (Can't reproduce): cephx: verify_authorizer could not decrypt ticket info: error: bad ...
- 10:44 PM Bug #44184: Slow / Hanging Ops after pool creation
- Ist Gab wrote:
> Neha Ojha wrote:
>
> > Which version are you using?
>
> Octopus 15.2.14
Are you still seei... - 10:39 PM Bug #52535 (Need More Info): monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ...
- 10:37 PM Bug #48997 (Can't reproduce): rados/singleton/all/recovery-preemption: defer backfill|defer recov...
- 10:36 PM Bug #50106 (Can't reproduce): scrub/osd-scrub-repair.sh: corrupt_scrub_erasure: return 1
- 10:36 PM Bug #50245 (Can't reproduce): TEST_recovery_scrub_2: Not enough recovery started simultaneously
- 10:35 PM Bug #49961 (Can't reproduce): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub_1 failed
- 10:35 PM Bug #46847 (Need More Info): Loss of placement information on OSD reboot
- Is this issue reproducible in Octopus or later?
- 10:32 PM Bug #50462 (Won't Fix - EOL): OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.co...
- Please feel free to reopen if you see the issue in a recent version of Ceph.
- 10:31 PM Bug #49688 (Can't reproduce): FAILED ceph_assert(is_primary()) in submit_log_entries during Promo...
- 10:30 PM Bug #48028 (Won't Fix - EOL): ceph-mon always suffer lots of slow ops from v14.2.9
- Please feel free to reopen if you see the issue in a recent version of Ceph.
- 10:29 PM Bug #50512 (Won't Fix - EOL): upgrade:nautilus-p2p-nautilus: unhandled event in ToDelete
- 10:29 PM Bug #50473 (Can't reproduce): ceph_test_rados_api_lock_pp segfault in librados::v14_2_0::RadosCli...
- 10:28 PM Bug #50242 (Can't reproduce): test_repair_corrupted_obj fails with assert not inconsistent
- 10:28 PM Bug #50119 (Can't reproduce): Invalid read of size 4 in ceph::logging::Log::dump_recent()
- 10:26 PM Bug #47153 (Won't Fix - EOL): monitor crash during upgrade due to LogSummary encoding changes bet...
- 10:26 PM Bug #49523: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- Haven't seen this in recent runs.
- 10:24 PM Bug #49463 (Can't reproduce): qa/standalone/misc/rados-striper.sh: Caught signal in thread_name:r...
- 10:14 PM Bug #53910 (Closed): client: client session state stuck in opening and hang all the time
- 10:43 AM Bug #47273: ceph report missing osdmap_clean_epochs if answered by peon
- > Is it possible that this is related?
I'm not sure, but I guess not.
I think this bug is rather about not forwa... - 06:05 AM Bug #52486 (Pending Backport): test tracker: please ignore
01/22/2022
- 12:06 AM Backport #53978 (Resolved): quincy: [RFE] Limit slow request details to mgr log
- https://github.com/ceph/ceph/pull/44764
- 12:05 AM Backport #53977 (Rejected): quincy: mon: all mon daemon always crash after rm pool
- 12:05 AM Backport #53974 (Resolved): quincy: BufferList.rebuild_aligned_size_and_memory failure
01/21/2022
- 07:30 PM Backport #53972 (Resolved): pacific: BufferList.rebuild_aligned_size_and_memory failure
- 07:25 PM Backport #53971 (Resolved): octopus: BufferList.rebuild_aligned_size_and_memory failure
- 07:22 PM Bug #53969 (Pending Backport): BufferList.rebuild_aligned_size_and_memory failure
- 07:15 PM Bug #53969 (Fix Under Review): BufferList.rebuild_aligned_size_and_memory failure
- 07:14 PM Bug #53969 (Resolved): BufferList.rebuild_aligned_size_and_memory failure
- ...
- 06:59 PM Bug #45345 (Can't reproduce): tasks/rados.py fails with "psutil.NoSuchProcess: psutil.NoSuchProce...
- 06:58 PM Bug #45318 (Can't reproduce): Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in c...
- 06:56 PM Bug #38375: OSD segmentation fault on rbd create
- I do not have the files to reupload so might be worth closing this out as I have moved on to another release and this...
- 06:53 PM Bug #43553 (Can't reproduce): mon: client mon_status fails
- 06:49 PM Bug #43048 (Won't Fix - EOL): nautilus: upgrade/mimic-x/stress-split: failed to recover before ti...
- 06:48 PM Bug #42102 (Can't reproduce): use-after-free in Objecter timer handing
- 06:43 PM Bug #40521 (Can't reproduce): cli timeout (e.g., ceph pg dump)
- 06:38 PM Bug #23911 (Won't Fix - EOL): ceph:luminous: osd out/down when setup with ubuntu/bluestore
- 06:37 PM Bug #20952 (Can't reproduce): Glitchy monitor quorum causes spurious test failure
- 06:36 PM Bug #14115 (Can't reproduce): crypto: race in nss init
- 06:36 PM Bug #13385 (Can't reproduce): cephx: verify_authorizer could not decrypt ticket info: error: NSS ...
- 06:35 PM Bug #11235 (Can't reproduce): test_rados.py test_aio_read is racy
- 05:24 PM Backport #53534 (In Progress): octopus: mon: mgrstatmonitor spams mgr with service_map
- 05:22 PM Backport #53535 (In Progress): pacific: mon: mgrstatmonitor spams mgr with service_map
- 03:55 PM Bug #47273: ceph report missing osdmap_clean_epochs if answered by peon
- I am also seeing this behavior on the latest Octopus and Pacific releases.
The reason I'm looking is that I'm seei...
01/20/2022
- 10:24 PM Bug #53940: EC pool creation is setting min_size to K+1 instead of K
- Laura Flores wrote:
> Thanks for this info, Dan. We have held off on making a change to min_size, and we're currentl... - 08:16 PM Bug #53940: EC pool creation is setting min_size to K+1 instead of K
- Thanks for this info, Dan. We have held off on making a change to min_size, and we're currently discussing ways to en...
- 07:27 PM Backport #53943 (In Progress): octopus: mon: all mon daemon always crash after rm pool
- 06:44 PM Backport #53942 (In Progress): pacific: mon: all mon daemon always crash after rm pool
- 06:29 AM Bug #53910: client: client session state stuck in opening and hang all the time
- Sorry, close this issue please.
- 02:00 AM Backport #53944 (Resolved): pacific: [RFE] Limit slow request details to mgr log
- https://github.com/ceph/ceph/pull/44771
- 01:21 AM Feature #52424 (Pending Backport): [RFE] Limit slow request details to mgr log
- 01:13 AM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- We have marked the primary OSD.33 down [1] and it has helped the stuck recovery_unfound pg to get unstuck and recover...
01/19/2022
- 11:18 PM Bug #53855: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
- Myoungwon Oh: any ideas on this bug?
- 11:15 PM Bug #53875 (Duplicate): AssertionError: wait_for_recovery: failed before timeout expired due to d...
- 11:15 PM Backport #53943 (Resolved): octopus: mon: all mon daemon always crash after rm pool
- https://github.com/ceph/ceph/pull/44700
- 11:10 PM Backport #53942 (Resolved): pacific: mon: all mon daemon always crash after rm pool
- https://github.com/ceph/ceph/pull/44698
- 11:09 PM Bug #53910 (Need More Info): client: client session state stuck in opening and hang all the time
- Can you provide more details about this bug?
- 11:05 PM Bug #53740 (Pending Backport): mon: all mon daemon always crash after rm pool
- 09:00 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- Looks like the last time the PG was active was at "2022-01-18T17:38:23.338"...
- 07:26 PM Bug #53940: EC pool creation is setting min_size to K+1 instead of K
- For history, here's where the default was set to k+1.
https://github.com/ceph/ceph/pull/8008/commits/48e40fcde7b19... - 06:53 PM Bug #53940 (Rejected): EC pool creation is setting min_size to K+1 instead of K
- For more information please check the RHCS bug - https://bugzilla.redhat.com/show_bug.cgi?id=2039585.
- 03:33 PM Bug #53923 (In Progress): [Upgrade] mgr FAILED to decode MSG_PGSTATS
- 02:07 PM Bug #44092 (Fix Under Review): mon: config commands do not accept whitespace style config name
- 01:55 PM Backport #53933 (In Progress): pacific: Stretch mode: peering can livelock with acting set change...
- 01:50 PM Backport #53933 (Resolved): pacific: Stretch mode: peering can livelock with acting set changes s...
- https://github.com/ceph/ceph/pull/44664
- 01:46 PM Bug #53824 (Pending Backport): Stretch mode: peering can livelock with acting set changes swappin...
01/18/2022
- 09:20 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- Ceph OSD 33 Logs with grep unfound!
- 09:14 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- Ceph PG query!
- 09:11 PM Bug #53924 (Need More Info): EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- ...
- 08:36 PM Bug #53923 (Resolved): [Upgrade] mgr FAILED to decode MSG_PGSTATS
- ...
- 05:42 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-01-15_05:47:18-rados-wip-yuri8-testing-2022-01-14-1551-distro-default-smithi/6619577
/a/yuriw-2022-01-... - 04:23 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- /a/yuriw-2022-01-14_23:22:09-rados-wip-yuri6-testing-2022-01-14-1207-distro-default-smithi/6617813
- 08:26 AM Bug #53910 (Closed): client: client session state stuck in opening and hang all the time
01/16/2022
- 08:40 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Do you need something else to find a workaround or the full solution?
Is there anything I can do?
01/14/2022
- 11:21 PM Bug #53895 (Resolved): Unable to format `ceph config dump` command output in yaml using `-f yaml`
- https://bugzilla.redhat.com/show_bug.cgi?id=2040709
- 10:45 AM Bug #43266 (Resolved): common: admin socket compiler warning
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 07:56 AM Bug #43887: ceph_test_rados_delete_pools_parallel failure
- /a/yuriw-2022-01-13_18:06:52-rados-wip-yuri3-testing-2022-01-13-0809-distro-default-smithi/6614510
- 12:26 AM Backport #53877 (In Progress): octopus: pgs wait for read lease after osd start
- 12:12 AM Backport #53876 (In Progress): pacific: pgs wait for read lease after osd start
01/13/2022
- 11:15 PM Backport #53877 (Resolved): octopus: pgs wait for read lease after osd start
- https://github.com/ceph/ceph/pull/44585
- 11:15 PM Backport #53876 (Resolved): pacific: pgs wait for read lease after osd start
- https://github.com/ceph/ceph/pull/44584
- 11:11 PM Bug #53326 (Pending Backport): pgs wait for read lease after osd start
- 10:54 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Neha Ojha wrote:
> Gonzalo Aguilar Delgado wrote:
> > Neha Ojha wrote:
> > > Like the other case reported in the m... - 10:52 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Igor Fedotov wrote:
> One more case:
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/FQXV452YLHBJ... - 12:23 PM Bug #53729: ceph-osd takes all memory before oom on boot
- One more case:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/FQXV452YLHBJW6Y2UK7WUZP7HO5PVIA5/ - 10:13 PM Bug #53767: qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing causes timeout
- Same failed test, and same Traceback message as reported above. Pasted here is another relevant part of the log that ...
- 09:06 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-01-11_19:17:55-rados-wip-yuri5-testing-2022-01-11-0843-distro-default-smithi/6608450
- 09:01 PM Bug #53875 (Duplicate): AssertionError: wait_for_recovery: failed before timeout expired due to d...
- Description: rados/thrash-erasure-code-big/{ceph cluster/{12-osds openstack} mon_election/connectivity msgr-failures/...
- 08:57 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
- /a/yuriw-2022-01-12_21:37:22-rados-wip-yuri6-testing-2022-01-12-1131-distro-default-smithi/6611439
last pg map bef...
01/12/2022
- 11:04 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Gonzalo Aguilar Delgado wrote:
> Neha Ojha wrote:
> > Like the other case reported in the mailing list ([ceph-users... - 09:50 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Gonzalo Aguilar Delgado wrote:
> Hi,
>
> The logs I've already provided had:
> --debug_osd 90 --debug_mon 2 --d... - 08:40 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Neha Ojha wrote:
> Like the other case reported in the mailing list ([ceph-users] OSDs use 200GB RAM and crash) and ... - 08:38 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Neha Ojha wrote:
> Like the other case reported in the mailing list ([ceph-users] OSDs use 200GB RAM and crash) and ... - 08:37 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Hi,
The logs I've already provided had:
--debug_osd 90 --debug_mon 2 --debug_filestore 7 --debug_monc 99 --debug... - 06:24 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Like the other case reported in the mailing list ([ceph-users] OSDs use 200GB RAM and crash) and https://tracker.ceph...
- 07:12 PM Bug #53855 (Resolved): rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
- Description: rados/basic/{ceph clusters/{fixed-2 openstack} mon_election/connectivity msgr-failures/many msgr/async-v...
- 07:08 PM Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
- Later on in the example Neha originally posted (/a/yuriw-2021-11-15_19:24:05-rados-wip-yuri8-testing-2021-11-15-0845-...
- 06:55 PM Support #51609: OSD refuses to start (OOMK) due to pg split
- Tor Martin Ølberg wrote:
> Tor Martin Ølberg wrote:
> > After an upgrade to 15.2.13 from 15.2.4 my small home lab c... - 06:19 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-01-11_19:17:55-rados-wip-yuri5-testing-2022-01-11-0843-distro-default-smithi/6608445/
- 10:03 AM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- This present in 16.2.7. Any reason why the linked PR wasn't merged into that release?
01/11/2022
- 08:47 PM Backport #53719 (In Progress): octopus: mon: frequent cpu_tp had timed out messages
- 08:33 PM Backport #53718 (In Progress): pacific: mon: frequent cpu_tp had timed out messages
- 08:31 PM Backport #53507 (Duplicate): pacific: ceph -s mon quorum age negative number
- Backport was handled along with https://github.com/ceph/ceph/pull/43698 in PR: https://github.com/ceph/ceph/pull/43698
- 08:29 PM Backport #53660 (In Progress): octopus: mon: "FAILED ceph_assert(session_map.sessions.empty())" w...
- 08:29 PM Backport #53659 (In Progress): pacific: mon: "FAILED ceph_assert(session_map.sessions.empty())" w...
- 08:27 PM Backport #53721 (Resolved): octopus: common: admin socket compiler warning
- The relevant code has already made it into Octopus, no further backport required.
- 08:27 PM Backport #53720 (Resolved): pacific: common: admin socket compiler warning
- The relevant code has already made it to Pacific, no further backport necessary.
- 08:14 PM Backport #53769 (In Progress): pacific: [ceph osd set noautoscale] Global on/off flag for PG auto...
- 08:14 PM Backport #53769: pacific: [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
- https://github.com/ceph/ceph/pull/44540
- 01:55 PM Bug #53824 (Fix Under Review): Stretch mode: peering can livelock with acting set changes swappin...
- 12:14 AM Bug #53824: Stretch mode: peering can livelock with acting set changes swapping primary back and ...
- So, why is it accepting the non-acting-set member each time, when they seem to have the same data? There's a clue in ...
- 12:14 AM Bug #53824 (Pending Backport): Stretch mode: peering can livelock with acting set changes swappin...
- From https://bugzilla.redhat.com/show_bug.cgi?id=2025800
We're getting repeated swaps in the acting set, with logg... - 06:42 AM Bug #52319: LibRadosWatchNotify.WatchNotify2 fails
- /a/yuriw-2022-01-06_15:57:04-rados-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6599471...
- 05:29 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- /a/yuriw-2022-01-06_15:57:04-rados-wip-yuri6-testing-2022-01-05-1255-distro-default-smithi/6599449...
01/10/2022
- 10:20 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Forget about previous comment.
The stack trace is just the opposite, seems that the call to encode in PGog::_writ... - 10:02 PM Bug #53729: ceph-osd takes all memory before oom on boot
- I was taking a look to:
3,1 GiB: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (in /usr/bin/ce... - 09:37 PM Bug #53729: ceph-osd takes all memory before oom on boot
- I did something better. I added a new OSD with bluestore to see if it's a problem of the filestore backend.
Then ... - 09:43 AM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-01-08_17:57:43-rados-wip-yuri8-testing-2022-01-07-1541-distro-default-smithi/6603232
- 02:22 AM Bug #53740: mon: all mon daemon always crash after rm pool
- Neha Ojha wrote:
> Do you happen to have a coredump from this crash?
No
Also available in: Atom