Activity
From 01/26/2022 to 02/24/2022
02/24/2022
- 10:45 PM Backport #54386: octopus: [RFE] Limit slow request details to mgr log
- please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/45154
ceph-backport.sh versi... - 07:43 PM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
- /a/sseshasa-2022-02-24_11:27:07-rados-wip-45118-45121-quincy-testing-distro-default-smithi/6704275/remote/smithi174/l...
- 07:19 PM Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
- /a/sseshasa-2022-02-24_11:27:07-rados-wip-45118-45121-quincy-testing-distro-default-smithi/6704402...
- 06:36 PM Bug #54368 (Duplicate): ModuleNotFoundError: No module named 'tasks.cephadm'
- 05:51 PM Backport #53644 (In Progress): pacific: Disable health warning when autoscaler is on
- 03:33 PM Backport #53551 (Resolved): pacific: [RFE] Provide warning when the 'require-osd-release' flag do...
- 08:56 AM Bug #54396: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears the snaptrim queue
- More context:...
- 08:44 AM Bug #54396 (Fix Under Review): Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears t...
- 08:41 AM Bug #54396 (Resolved): Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears the snapt...
- See https://www.spinics.net/lists/ceph-users/msg71061.html...
- 08:38 AM Backport #54393 (Resolved): quincy: The built-in osd bench test shows inflated results.
- https://github.com/ceph/ceph/pull/45141
- 08:37 AM Bug #54364 (Pending Backport): The built-in osd bench test shows inflated results.
- 02:45 AM Bug #51627: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
- The error message looks like similar before, but the cause is difference from the prior case.
Anyway, I posted the f...
02/23/2022
- 05:32 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- Happened in a dead job.
/a/yuriw-2022-02-21_15:40:41-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6... - 05:16 PM Bug #51627: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
- Happened again. Could this be a new occurrence?
/a/yuriw-2022-02-21_15:40:41-rados-wip-yuri4-testing-2022-02-18-0800... - 05:00 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- /a/yuriw-2022-02-21_15:40:41-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6698327
- 03:15 PM Backport #54386 (Resolved): octopus: [RFE] Limit slow request details to mgr log
02/22/2022
- 09:10 PM Bug #54210 (Fix Under Review): pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd p...
- 09:09 PM Bug #54210: pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd pool get a pg_num | ...
- After going through sentry, I've realized that the only occurrence of this bug in master happens before the merge of
... - 08:39 PM Backport #54233 (In Progress): octopus: devices: mon devices appear empty when scraping SMART met...
- 08:39 PM Backport #54232 (In Progress): pacific: devices: mon devices appear empty when scraping SMART met...
- 08:14 PM Bug #54369 (New): mon/test_mon_osdmap_prune.sh: jq .osdmap_first_committed [[ 11 -eq 20 ]]
- /a/yuriw-2022-02-17_23:23:56-rados-wip-yuri7-testing-2022-02-17-0852-pacific-distro-default-smithi/6692990...
- 07:31 PM Bug #54368 (Duplicate): ModuleNotFoundError: No module named 'tasks.cephadm'
- /a/yuriw-2022-02-17_23:23:56-rados-wip-yuri7-testing-2022-02-17-0852-pacific-distro-default-smithi/6692894...
- 07:19 PM Bug #47589: radosbench times out "reached maximum tries (800) after waiting for 4800 seconds"
- /a/yuriw-2022-02-17_23:23:56-rados-wip-yuri7-testing-2022-02-17-0852-pacific-distro-default-smithi/6692841
- 06:29 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-02-21_18:20:15-rados-wip-yuri11-testing-2022-02-21-0831-quincy-distro-default-smithi/6699270
Happene... - 05:09 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- Chris Durham wrote:
> This issue bit us in our upgrade to 16.2.7 from 15.2.15. We have a manual cluster (non-cephadm... - 04:11 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- This issue bit us in our upgrade to 16.2.7 from 15.2.15. We have a manual cluster (non-cephadm). We followed the proc...
- 12:57 PM Bug #54364 (Resolved): The built-in osd bench test shows inflated results.
- The built-in osd bench shows inflated results with up to 3x-4x the expected values.
Example:
Before:
{
"b... - 07:13 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- We were able to narrow it down further. We can trigger the problem reliably by doing this:
- 2 clusters, multisite...
02/21/2022
- 05:09 PM Feature #44107 (Resolved): mon: produce stable election results when netsplits and other errors h...
- Oh, this has been done for ages.
- 12:47 PM Bug #51463: blocked requests while stopping/starting OSDs
- I think we hit the same issue while upgrading our nautilus cluster to pacific.
While I did not hit this when testing... - 12:42 PM Backport #53339: pacific: src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive...
- Hey I got here through the following mailing list post: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thre...
- 01:27 AM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-02-17_22:49:55-rados-wip-yuri3-testing-2022-02-17-1256-distro-default-smithi/6692376...
02/20/2022
- 09:25 AM Bug #52901: osd/scrub: setting then clearing noscrub may lock a PG in 'scrubbing' state
- Will this be backported to a stable release?
- 09:23 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- I'm also seeing the same issue on 16.2.7, but it's been going on for almost two weeks. Already set and unset noscrub/...
- 09:18 AM Backport #53339: pacific: src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive...
- Could this be the reason I'm seeing a spam of "handle_scrub_reserve_grant: received unsolicited reservation grant" me...
02/18/2022
- 01:32 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- Fortunately (or perhaps not so fortunately), in the process of dealing with this issue we performed a full restart of...
- 11:49 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Dieter Roels wrote:
> Hi CHristian. Are your rgws collocated with the osds of the metadata pools?
> We now notice i... - 11:31 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Hi CHristian. Are your rgws collocated with the osds of the metadata pools?
We now notice in our clusters that the...
02/17/2022
- 09:43 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-02-16_15:53:49-rados-wip-yuri11-testing-2022-02-15-1643-distro-default-smithi/6688846
- 07:43 PM Bug #54210: pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd pool get a pg_num | ...
- yep this is a bug, thanks for letting me know, patch coming up.
- 06:54 PM Backport #54290 (Resolved): quincy: pybind/mgr/progress: disable pg recovery event by default
- 05:46 PM Bug #54316 (Resolved): mon/MonCommands.h: target_size_ratio range is incorrect
- Currently if we give `target_size_ratio` a value more than 1.0 using the command: `ceph osd pool create <pool-name> -...
- 05:00 PM Bug #54263: cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 32768 for ceph...
- Update:
From the monitor sides of things of pool creation, target_size_ratio cannot be more than 1.0 or less than ... - 04:52 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-02-16_00:25:26-rados-wip-yuri-testing-2022-02-15-1431-distro-default-smithi/6687342
Same issue with ... - 04:30 PM Bug #51307: LibRadosWatchNotify.Watch2Delete fails
- /a/yuriw-2022-02-16_00:25:26-rados-wip-yuri-testing-2022-02-15-1431-distro-default-smithi/6687338
- 11:54 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- We just observed 12 more scrub errors spread across 7 pgs and all on our primary (used for user access, read/write) z...
- 09:47 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Neha Ojha wrote:
> yite gu wrote:
> > Christian Rohmann wrote:
> > > yite gu wrote:
> > > > This is inconsistent ... - 09:51 AM Bug #54296: OSDs using too much memory
- Hi Dan,
Thanks for your response.
I only adjusted osd_max_pg_log_entries and left osd_min_pg_log_entries alone. A... - 09:03 AM Bug #54296: OSDs using too much memory
- Ruben Kerkhof wrote:
> One thing I tried was to set osd_max_pg_log_entries to 500 instead of the default of 10000, b... - 09:33 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- I would just like to add that scrubs started all of the sudden and the cluster is HEALTH_OK again.
02/16/2022
- 10:32 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- /ceph/teuthology-archive/yuriw-2022-02-15_22:35:42-rados-wip-yuri8-testing-2022-02-15-1214-distro-default-smithi/6686...
- 09:20 PM Backport #53718: pacific: mon: frequent cpu_tp had timed out messages
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44545
merged - 08:24 PM Backport #54290: quincy: pybind/mgr/progress: disable pg recovery event by default
- https://github.com/ceph/ceph/pull/45043 merged
- 07:59 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-02-15_22:40:39-rados-wip-yuri7-testing-2022-02-15-1102-quincy-distro-default-smithi/6686655/remote/smit...
- 07:07 PM Backport #53535: pacific: mon: mgrstatmonitor spams mgr with service_map
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44721
merged - 07:06 PM Backport #53942: pacific: mon: all mon daemon always crash after rm pool
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44698
merged - 07:00 PM Feature #54280: support truncation sequences in sparse reads
- Neha mentioned taking a look at the history, so I did a bit of git archeology today. The limitation dates back to the...
- 06:58 PM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
- Hello. Could you please provide the output from @ceph health detail@? We suspect the warning might got replaced with ...
- 06:46 PM Bug #54255: utc time is used when ceph crash ls
- Yaarit, was this choice intentional?
- 06:44 PM Bug #51338 (Duplicate): osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*&g...
- 06:44 PM Bug #51338: osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>())
- André Cruz wrote:
> I'm also encountering this issue on Pacific (16.2.7):
>
> [...]
>
> Any pointers?
I thi... - 06:31 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- yite gu wrote:
> Christian Rohmann wrote:
> > yite gu wrote:
> > > This is inconsistent pg 7.2 from your upload fi... - 06:24 PM Bug #53663 (New): Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- 09:50 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Dieter Roels wrote:
> After the repair the inconsistencies do not re-appear. However, we can reproduce the issue in ... - 06:18 PM Bug #46847: Loss of placement information on OSD reboot
- Frank Schilder wrote:
> Could somebody please set the status back to open and Affected Versions to all?
The ticke... - 06:13 PM Bug #53729 (Need More Info): ceph-osd takes all memory before oom on boot
- 12:21 PM Bug #54296: OSDs using too much memory
- Hi Igor,
See attachment.
One thing I tried was to set osd_max_pg_log_entries to 500 instead of the default of 1... - 12:15 PM Bug #54296: OSDs using too much memory
- Hi Ruben,
please share full dump_mempools output. - 10:34 AM Bug #54296 (Resolved): OSDs using too much memory
- One of our customers upgraded from Nautilus to Octopus, and now a lot of his OSDs are using way more ram than allowed...
02/15/2022
- 11:21 PM Bug #54263: cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 32768 for ceph...
- In summary,
the root cause of the problem is after the upgrade to quincy, cephfs meta data pool was somehow given a ... - 10:57 PM Bug #53855 (Fix Under Review): rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlush...
- 02:07 AM Bug #53855: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
- https://github.com/ceph/ceph/pull/45035
- 07:27 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
- /a/yuriw-2022-02-08_17:00:23-rados-wip-yuri5-testing-2022-02-08-0733-pacific-distro-default-smithi/6670539
last pg... - 07:15 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
- Looks similar, but different test.
/a/yuriw-2022-02-09_22:52:18-rados-wip-yuri5-testing-2022-02-09-1322-pacific-di... - 06:55 PM Bug #45702: PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter...
- /a/yuriw-2022-02-09_22:52:18-rados-wip-yuri5-testing-2022-02-09-1322-pacific-distro-default-smithi/6672070
- 06:47 PM Bug #53327: osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_notify...
- Hi Nitzan,
I checked your patch on the current pacific branch.
unfortunately I still get slow ops (slow >= 5 seco... - 06:46 PM Bug #48997: rados/singleton/all/recovery-preemption: defer backfill|defer recovery not found in logs
- /a/yuriw-2022-02-09_22:52:18-rados-wip-yuri5-testing-2022-02-09-1322-pacific-distro-default-smithi/6672005
- 03:45 PM Backport #54290 (Resolved): quincy: pybind/mgr/progress: disable pg recovery event by default
- 03:42 PM Bug #47273 (Fix Under Review): ceph report missing osdmap_clean_epochs if answered by peon
- 03:08 AM Bug #52421: test tracker
- Crash signature (v1) and Crash signature (v2) are of invalid format, and are breaking the telemetry crashes bot, remo...
02/14/2022
- 11:46 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-02-08_17:00:23-rados-wip-yuri5-testing-2022-02-08-0733-pacific-distro-default-smithi/6670360
- 11:29 PM Bug #51234: LibRadosService.StatusFormat failed, Expected: (0) != (retry), actual: 0 vs 0
- Pacific:
/a/yuriw-2022-02-09_22:52:18-rados-wip-yuri5-testing-2022-02-09-1322-pacific-distro-default-smithi/6672177 - 08:21 PM Feature #54280 (Resolved): support truncation sequences in sparse reads
- I've been working on sparse read support in the kclient, and got something working today, only to notice that after t...
- 03:39 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-02-11_22:59:19-rados-wip-yuri4-testing-2022-02-11-0858-distro-default-smithi/6677733
Last pg map bef... - 10:06 AM Bug #46847: Loss of placement information on OSD reboot
- Could somebody please set the status back to open and Affected Versions to all?
02/11/2022
- 11:01 PM Backport #52769 (Resolved): octopus: pg scrub stat mismatch with special objects that have hash '...
- 10:41 PM Backport #52769: octopus: pg scrub stat mismatch with special objects that have hash 'ffffffff'
- Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/44978
merged - 10:48 PM Bug #54263: cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 32768 for ceph...
- The following path has MGR logs, Mon logs, Cluster logs, audit logs, and system logs....
- 10:39 PM Bug #54263 (Resolved): cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 327...
- Pacific version - 16.2.7-34.el8cp
Quincy version - 17.0.0-10315-ga00e8b31
After doing some analysis it looks like... - 09:23 PM Bug #54262 (Closed): ERROR: test_cluster_info (tasks.cephfs.test_nfs.TestNFS)
- Since the PR has not merged yet, no need to create a tracker https://github.com/ceph/ceph/pull/44911#issuecomment-103...
- 08:48 PM Bug #54262 (Closed): ERROR: test_cluster_info (tasks.cephfs.test_nfs.TestNFS)
- /a/yuriw-2022-02-11_18:38:05-rados-wip-yuri-testing-2022-02-09-1607-distro-default-smithi/6677099/...
- 09:17 PM Backport #53769 (Resolved): pacific: [ceph osd set noautoscale] Global on/off flag for PG autosca...
- 08:52 PM Feature #51213 (Resolved): [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
- 08:37 PM Bug #50089 (Fix Under Review): mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing n...
- https://github.com/ceph/ceph/pull/44993
- 12:35 PM Bug #51338: osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>())
- I'm also encountering this issue on Pacific (16.2.7):...
- 05:42 AM Bug #54255 (New): utc time is used when ceph crash ls
- ceph crash id currently uses utc time but not local time
it is a little confused when debugging issues. - 01:23 AM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
- Hmm, I've just tried to get rid of...
- 12:18 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- Added the logs for OSD 12,23,24 part of pg 4.6b. I don't think the logs are from the beginning when the osd booted an...
02/10/2022
- 06:52 PM Bug #54238: cephadm upgrade pacifc to quincy -> causing osd's FULL/cascading failure
- ...
- 01:26 AM Bug #54238: cephadm upgrade pacifc to quincy -> causing osd's FULL/cascading failure
- The node e24-h01-000-r640 has a file - upgrade.txt from the following command:...
- 12:58 AM Bug #54238 (New): cephadm upgrade pacifc to quincy -> causing osd's FULL/cascading failure
- - Upgrade was started at 2022-02-08T01:54:28...
- 05:15 PM Backport #52771 (In Progress): nautilus: pg scrub stat mismatch with special objects that have ha...
- https://github.com/ceph/ceph/pull/44981
- 04:19 PM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- There was (I'll have to check in which Ceph versions) a bug, where setting noscrub or nodeepscrub at the "wrong"
tim... - 02:57 PM Backport #52769 (In Progress): octopus: pg scrub stat mismatch with special objects that have has...
- https://github.com/ceph/ceph/pull/44978
- 02:46 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Christian Rohmann wrote:
> Dieter, could you maybe describe your test setup a little more? How many instances of R... - 01:34 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Christian Rohmann wrote:
> Dieter Roels wrote:
> > All inconsistencies were on non-primary shards, so we repaired t... - 01:17 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Dieter Roels wrote:
> Not sure if this helps or not, but we are experiencing very similar issues in our clusters the... - 01:12 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Christian Rohmann wrote:
> yite gu wrote:
> > This is inconsistent pg 7.2 from your upload files. It is look like m... - 12:17 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- yite gu wrote:
> This is inconsistent pg 7.2 from your upload files. It is look like mismatch osd is 10. So, you can... - 11:20 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- yite gu wrote:
> "shards": [
> {
> "osd": 10,
> "primary": false,
> "error... - 11:15 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- "shards": [
{
"osd": 10,
"primary": false,
"errors": [
... - 10:36 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Not sure if this helps or not, but we are experiencing very similar issues in our clusters the last few days.
We a... - 10:06 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- yite gu wrote:
> Can you show me that primary osd log report when happen deep-scrub error?
> I hope to know which o... - 09:25 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Can you show me that primary osd log report when happen deep-scrub error?
I hope to know which osd shard happend error - 12:00 PM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
- I installed the cluster using the "Manual Deployment" method (https://docs.ceph.com/en/pacific/install/manual-deploym...
- 08:50 AM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- Matan Breizman wrote:
> Shu Yu wrote:
> >
> > Missing a message, PG 8.243 status
> > # ceph pg ls 8 | grep -w 8....
02/09/2022
- 08:58 PM Bug #23117 (Fix Under Review): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has ...
- 06:44 PM Backport #54233: octopus: devices: mon devices appear empty when scraping SMART metrics
- please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/44960
ceph-backport.sh versi... - 02:36 PM Backport #54233 (Resolved): octopus: devices: mon devices appear empty when scraping SMART metrics
- https://github.com/ceph/ceph/pull/44960
- 06:41 PM Backport #54232: pacific: devices: mon devices appear empty when scraping SMART metrics
- please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/44959
ceph-backport.sh versi... - 02:36 PM Backport #54232 (Resolved): pacific: devices: mon devices appear empty when scraping SMART metrics
- https://github.com/ceph/ceph/pull/44959
- 06:39 PM Bug #52416: devices: mon devices appear empty when scraping SMART metrics
- Ah, indeed! I don't think I would have been able to change the status myself though, so thanks for doing it!
- 02:32 PM Bug #52416 (Pending Backport): devices: mon devices appear empty when scraping SMART metrics
- Thanks, Benoît,
Once the status is changed to "Pending Backport" the bot should find it. - 10:09 AM Bug #52416: devices: mon devices appear empty when scraping SMART metrics
- I'd like to backport this to Pacific and Octopus, but the Backport Bot didn't create the corresponding tickets; what ...
- 04:17 PM Bug #54210: pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd pool get a pg_num | ...
- Laura Flores wrote:
> [...]
>
> Also seen in pacific:
> /a/yuriw-2022-02-05_22:51:11-rados-wip-yuri2-testing-202...
02/08/2022
- 08:13 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
- /a/yuriw-2022-02-05_22:51:11-rados-wip-yuri2-testing-2022-02-04-1646-pacific-distro-default-smithi/6663906
last pg... - 07:21 PM Bug #54210: pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd pool get a pg_num | ...
- Junior, maybe you have an idea of what's going on?
- 07:20 PM Bug #54210 (Resolved): pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd pool get ...
- ...
- 03:25 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- I did run a manual deep-scrub on another inconsistent PG as well, you'll find the logs of all OSDs handling this PG i...
- 03:05 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Neha - I did upload the logs of a deep-scrub via ceph-post-file: 1e5ff0f8-9b76-4489-8529-ee5e6f246093
There is a lit... - 01:49 PM Bug #45457: CEPH Graylog Logging Missing "host" Field
- So is this going to be backported to Pacific?
- 03:08 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- Some more information on the non-default config settings (excluding MGR):...
- 03:02 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- One more thing to add is that when I set the noscrub, nodeep-scrub flags the pgs actually don't stop scrubbing either...
- 12:16 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- Correction. osd_recovery_sleep_hdd was set to 0.0 from the original 0.1. osd_scrub_sleep has been untouched.
- 12:12 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- This is set to 0.0. I think we did this to speed up recovery after we did some CRUSH tuning....
02/07/2022
- 11:33 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- PG 7.dc4 - all osd logs.
- 11:14 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- - PG query:...
- 09:31 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- Vikhyat Umrao wrote:
> Vikhyat Umrao wrote:
> > - This was reproduced again today
>
> As this issue is random we... - 08:30 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- Vikhyat Umrao wrote:
> - This was reproduced again today
As this issue is random we did not have debug logs from ... - 08:27 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- - This was reproduced again today...
- 11:17 PM Bug #54188 (Resolved): Setting too many PGs leads error handling overflow
- This happened on gibba001:...
- 11:15 PM Bug #54166: ceph version 15.2.15, osd configuration osd_op_num_shards_ssd or osd_op_num_threads_p...
- Sridhar, can you please take a look?
- 11:12 PM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- What is osd_scrub_sleep set to?
Ronen, this sounds similar to one of the issues you were looking into, here it is ... - 02:46 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- Additional information:
I tried disabling all client I/O and after that there's zero I/O on the devices hosting th... - 02:45 AM Bug #54172 (Resolved): ceph version 16.2.7 PG scrubs not progressing
- A week ago I've upgraded a 16.2.4 cluster (3 nodes, 33 osds) to 16.2.7 using cephadm and since then we're experiencin...
- 10:57 PM Bug #53751 (Need More Info): "N monitors have not enabled msgr2" is always shown for new clusters
- Can you share the output of "ceph mon dump"? And how did you install this cluster? We are not seeing this issue in 16...
- 10:39 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Massive thanks for your reply Neha, I greatly appreciate it!
Neha Ojha wrote:
> Is it possible for you to trigg... - 10:31 PM Bug #53663 (Need More Info): Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadat...
- Is it possible for you to trigger a deep-scrub on one PG (with debug_osd=20,debug_ms=1), let it go into inconsistent ...
- 03:52 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- The issue is still happening:
1) Find all pools with scrub errors via... - 10:23 PM Bug #54182: OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
- We can include clear_shards_repaired in master and backport it.
- 04:47 PM Bug #54182 (New): OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
- The newly added warning OSD_TOO_MANY_REPAIRS (https://tracker.ceph.com/issues/41564) is raised on a certain count of ...
- 10:14 PM Bug #46847: Loss of placement information on OSD reboot
- Can you share your ec profile and the output of "ceph osd pool ls detail"?
- 12:35 AM Bug #46847: Loss of placement information on OSD reboot
- So in even more fun news, I created the EC pool according to the instructions provided in the documentation.
It's... - 12:17 AM Bug #46847: Loss of placement information on OSD reboot
- Sorry I should add some context/data
ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stab... - 06:58 PM Bug #54180 (In Progress): In some cases osdmaptool takes forever to complete
- 02:44 PM Bug #54180 (Resolved): In some cases osdmaptool takes forever to complete
- with the attached file run the command:
osdmaptool osdmap.GD.bin --upmap out-file --upmap-deviation 1 --upmap-pool d... - 05:52 PM Bug #53806 (Fix Under Review): unessesarily long laggy PG state
02/06/2022
- 11:57 PM Bug #46847: Loss of placement information on OSD reboot
- Neha Ojha wrote:
> Is this issue reproducible in Octopus or later?
Yes. I hit it last night. It's minced one of m... - 12:28 PM Bug #54166 (New): ceph version 15.2.15, osd configuration osd_op_num_shards_ssd or osd_op_num_thr...
- Configure osd_op_num_shards_ssd=8 or osd_op_num_threads_per_shard_ssd=8 in ceph.config, use ceph daemin osd.x config ...
02/04/2022
- 06:09 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- http://pulpito.front.sepia.ceph.com/lflores-2022-01-31_19:11:11-rados:thrash-erasure-code-big-master-distro-default-s...
- 02:03 PM Backport #53480 (In Progress): pacific: Segmentation fault under Pacific 16.2.1 when using a cust...
- 12:00 AM Bug #53757 (Fix Under Review): I have a rados object that data size is 0, and this object have a ...
02/03/2022
- 09:26 PM Backport #53974 (Resolved): quincy: BufferList.rebuild_aligned_size_and_memory failure
- 07:08 PM Backport #53974 (In Progress): quincy: BufferList.rebuild_aligned_size_and_memory failure
- https://github.com/ceph/ceph/pull/44891
- 07:46 PM Bug #23117 (In Progress): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has been ...
- 07:50 AM Bug #54122 (Fix Under Review): Validate monitor ID provided with ok-to-stop similar to ok-to-rm
- 07:49 AM Bug #54122 (Resolved): Validate monitor ID provided with ok-to-stop similar to ok-to-rm
- ceph mon ok-to-stop doesn't validate the monitor ID provided. Thus returns that "quorum should be preserved " without...
02/02/2022
- 05:25 PM Backport #53551: pacific: [RFE] Provide warning when the 'require-osd-release' flag does not matc...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44259
merged - 02:04 PM Feature #54115 (In Progress): Log pglog entry size in OSD log if it exceeds certain size limit
- Even after all PGs are active+clean, we see some OSDs consuming high amount memory. From dump_mempools, osd_pglog con...
- 10:32 AM Bug #51002 (Resolved): regression in ceph daemonperf command output, osd columns aren't visible a...
- 10:32 AM Backport #51172 (Resolved): pacific: regression in ceph daemonperf command output, osd columns ar...
- 12:02 AM Backport #51172: pacific: regression in ceph daemonperf command output, osd columns aren't visibl...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44175
merged - 12:04 AM Backport #53702: pacific: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44387
merged
02/01/2022
- 10:12 PM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
- Myoungwon Oh wrote:
> https://github.com/ceph/ceph/pull/44181
merged - 08:42 PM Backport #53486: pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44202
merged - 08:40 PM Backport #51150: pacific: When read failed, ret can not take as data len, in FillInVerifyExtent
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44173
merged - 08:40 PM Backport #53388: pacific: pg-temp entries are not cleared for PGs that no longer exist
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44096
merged - 05:23 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- I will collect these logs as you've requested. As an update: I am now seeing snaptrim occurring automatically without...
- 04:27 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- Yes that is the case.
Can you collect the log starting at the manual repeer? The intent was to capture the logs s... - 03:23 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- Christopher Hoffman wrote:
> 1. A single OSD will be fine, just ensure it is one exhibiting the issue.
> 2. Can you... - 02:41 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- David Prude wrote:
> Christopher Hoffman wrote:
> > Can you collect and share OSD logs (with debug_osd=20 and debug... - 12:01 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- Christopher Hoffman wrote:
> Can you collect and share OSD logs (with debug_osd=20 and debug_ms=1) when you are enco... - 11:38 AM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
- Neha Ojha wrote:
> Maybe you are missing the square brackets when specifying the mon_host like in https://docs.ceph.... - 12:12 AM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
- Maybe you are missing the square brackets when specifying the mon_host like in https://docs.ceph.com/en/pacific/rados...
- 08:54 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Neha Ojha wrote:
> Are you using filestore or bluestore?
On bluestore - 12:20 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Are you using filestore or bluestore?
- 02:14 AM Bug #44184: Slow / Hanging Ops after pool creation
- Neha Ojha wrote:
> So on both occasions this crash was a side effect of new pool creation? Can you provide the outpu... - 12:28 AM Bug #53667 (Fix Under Review): osd cannot be started after being set to stop
01/31/2022
- 11:59 PM Bug #44184: Slow / Hanging Ops after pool creation
- Ist Gab wrote:
> Neha Ojha wrote:
> > Are you still seeing this problem? Will you be able to provide debug data aro... - 11:53 PM Bug #54005 (Duplicate): Why can wrong parameters be specified when creating erasure-code-profile,...
- 11:38 PM Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
- /a/benhanokh-2022-01-26_21:12:05-rados-WIP_GBH_NCB_new_alloc_map_A6-distro-basic-smithi/6642148
- 11:35 PM Bug #53767: qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing causes timeout
- /a/benhanokh-2022-01-26_21:12:05-rados-WIP_GBH_NCB_new_alloc_map_A6-distro-basic-smithi/6642122
- 10:56 PM Bug #53767: qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing causes timeout
- /a/yuriw-2022-01-27_15:09:25-rados-wip-yuri6-testing-2022-01-26-1547-distro-default-smithi/6644093...
- 11:09 PM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
- /a/yuriw-2022-01-27_15:09:25-rados-wip-yuri6-testing-2022-01-26-1547-distro-default-smithi/6644223
- 11:07 PM Bug #53326 (Resolved): pgs wait for read lease after osd start
- 10:21 PM Bug #51433 (Resolved): mgr spamming with repeated set pgp_num_actual while merging
- nautilus is EOL
- 10:20 PM Backport #53876 (Resolved): pacific: pgs wait for read lease after osd start
- 10:12 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-01-27_14:57:16-rados-wip-yuri-testing-2022-01-26-1810-pacific-distro-default-smithi/6643449
- 09:27 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- Can you collect and share OSD logs (with debug_osd=20 and debug_ms=1) when you are encountering this issue?
- 03:14 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- We are also seeing this issue on *16.2.5*. We schedule cephfs snapshots via cron in a 24h7d2w rotation schedule. Over...
- 07:52 PM Backport #54082: pacific: mon: osd pool create <pool-name> with --bulk flag
- pull request: https://github.com/ceph/ceph/pull/44847
- 07:51 PM Backport #54082 (Resolved): pacific: mon: osd pool create <pool-name> with --bulk flag
- Backporting https://github.com/ceph/ceph/pull/44241 to pacific
- 07:32 PM Bug #45318: Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in cluster log running...
- Happening in Pacific too:
/a/yuriw-2022-01-27_14:57:16-rados-wip-yuri-testing-2022-01-26-1810-pacific-distro-defau... - 02:38 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- Shu Yu wrote:
>
> Missing a message, PG 8.243 status
> # ceph pg ls 8 | grep -w 8.243
> 8.243 0 0 ... - 12:18 PM Backport #53660 (Resolved): octopus: mon: "FAILED ceph_assert(session_map.sessions.empty())" when...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44544
m... - 12:18 PM Backport #53943 (Resolved): octopus: mon: all mon daemon always crash after rm pool
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44700
m... - 12:18 PM Backport #53534 (Resolved): octopus: mon: mgrstatmonitor spams mgr with service_map
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44722
m... - 12:17 PM Backport #53877 (Resolved): octopus: pgs wait for read lease after osd start
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44585
m... - 12:17 PM Backport #53701 (Resolved): octopus: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in bac...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43438
m... - 12:17 PM Backport #52833 (Resolved): octopus: osd: pg may get stuck in backfill_toofull after backfill is ...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43438
m...
01/28/2022
- 05:17 PM Backport #54048 (Rejected): octopus: [RFE] Add health warning in ceph status for filestore OSDs
- Original tracker was accidentally marked for pending backport. We are not backporting related PR to pre-quincy. Marki...
- 05:16 PM Backport #54047 (Rejected): nautilus: [RFE] Add health warning in ceph status for filestore OSDs
- Original tracker was accidentally marked for pending backport. We are not backporting related PR to pre-quincy. Marki...
- 05:14 PM Feature #49275 (Resolved): [RFE] Add health warning in ceph status for filestore OSDs
- It was accidentally marked as pending backport. We are not backporting PR for this tracker to pre-quincy. Marking it ...
- 01:41 PM Feature #49275: [RFE] Add health warning in ceph status for filestore OSDs
- @Dan, I think maybe this is copy-paste issue?
- 01:24 PM Feature #49275: [RFE] Add health warning in ceph status for filestore OSDs
- Why is this being backported to N and O?!
Filestore is deprecated since quincy, so we should only warn in quincy a... - 05:01 PM Backport #53978: quincy: [RFE] Limit slow request details to mgr log
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44764
merged - 10:08 AM Bug #54050: OSD: move message to cluster log when osd hitting the pg hard limit
- PR: https://github.com/ceph/ceph/pull/44821
- 10:02 AM Bug #54050 (Closed): OSD: move message to cluster log when osd hitting the pg hard limit
- OSD will print the below message if a pg creation had hit the hard limit of the max number of pgs per osd.
---
202... - 05:50 AM Bug #52657 (In Progress): MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, ...
01/27/2022
- 06:36 PM Backport #54048 (Rejected): octopus: [RFE] Add health warning in ceph status for filestore OSDs
- 06:36 PM Backport #54047 (Rejected): nautilus: [RFE] Add health warning in ceph status for filestore OSDs
- 06:33 PM Feature #49275 (Pending Backport): [RFE] Add health warning in ceph status for filestore OSDs
- 06:33 PM Feature #49275 (Resolved): [RFE] Add health warning in ceph status for filestore OSDs
- 03:07 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Gonzalo Aguilar Delgado wrote:
> Hi,
>
> Nothing a script can't do:
>
> > ceph osd pool ls | xargs -n1 -istr ... - 09:33 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Mark Nelson wrote:
> In the mean time, Neha mentioned that you might be able to prevent the pgs from splitting by tu... - 09:24 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Mark Nelson wrote:
> Hi Gonzalo,
>
> I'm not an expert regarding this code so please take my reply here with a gr... - 02:28 PM Bug #53327 (Fix Under Review): osd: osd_fast_shutdown_notify_mon not quite right and enable osd_f...
- 12:05 AM Backport #53660: octopus: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44544
merged - 12:04 AM Backport #53943: octopus: mon: all mon daemon always crash after rm pool
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44700
merged
01/26/2022
- 11:54 PM Backport #53534: octopus: mon: mgrstatmonitor spams mgr with service_map
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44722
merged - 11:39 PM Backport #53769: pacific: [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
- Kamoltat Sirivadhna wrote:
> https://github.com/ceph/ceph/pull/44540
merged - 08:53 PM Bug #53729: ceph-osd takes all memory before oom on boot
- In the mean time, Neha mentioned that you might be able to prevent the pgs from splitting by turning off the autoscal...
- 08:34 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Hi Gonzalo,
I'm not an expert regarding this code so please take my reply here with a grain of salt (and others pl... - 05:26 PM Bug #53729: ceph-osd takes all memory before oom on boot
- How can I help to accelerate a bugfix or workaround?
If comment your investigations I can builld a docker image t... - 04:16 PM Bug #53326: pgs wait for read lease after osd start
- https://github.com/ceph/ceph/pull/44585 merged
- 12:27 AM Bug #53326: pgs wait for read lease after osd start
- https://github.com/ceph/ceph/pull/44584 merged
- 04:14 PM Backport #53701: octopus: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling
- Mykola Golub wrote:
> PR: https://github.com/ceph/ceph/pull/43438
merged - 04:14 PM Backport #52833: octopus: osd: pg may get stuck in backfill_toofull after backfill is interrupted...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43438
merged - 12:06 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- >Igor Fedotov wrote:
> I doubt anyone can say what setup would be good for you without experiments in the field. M... - 12:04 PM Bug #44184: Slow / Hanging Ops after pool creation
- Neha Ojha wrote:
> Are you still seeing this problem? Will you be able to provide debug data around this issue?
H... - 12:47 AM Bug #45318 (New): Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in cluster log r...
- Octopus still has this issue /a/yuriw-2022-01-24_18:01:47-rados-wip-yuri10-testing-2022-01-24-0810-octopus-distro-def...
Also available in: Atom