Activity
From 02/13/2022 to 03/14/2022
03/14/2022
- 08:24 PM Bug #54556 (Won't Fix): Pools are wrongly reported to have non-power-of-two pg_num after update
- We just updated our cluster from 14.2.1 to 14.2.22. Now (in addition two a few more) a new warning appears which we h...
- 05:21 PM Bug #54552 (Fix Under Review): ceph windows test hanging quincy backport PRs
- 03:42 PM Bug #54552 (Resolved): ceph windows test hanging quincy backport PRs
- ...
- 02:21 PM Backport #54526 (In Progress): pacific: cephadm upgrade pacific to quincy autoscaler is scaling p...
- 02:20 PM Backport #54526: pacific: cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> ...
- https://github.com/ceph/ceph/pull/45364
- 02:21 PM Backport #54527 (In Progress): quincy: cephadm upgrade pacific to quincy autoscaler is scaling pg...
- 02:21 PM Backport #54527: quincy: cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 3...
- https://github.com/ceph/ceph/pull/45363
- 02:10 PM Bug #46847: Loss of placement information on OSD reboot
- Oh also ceph pg repeer has not totally worked. I have a single object remaining unfound. ...
- 04:02 AM Bug #46847: Loss of placement information on OSD reboot
- Neha Ojha wrote:
> Frank Schilder wrote:
> > Could somebody please set the status back to open and Affected Version... - 01:05 PM Bug #54548 (Won't Fix): mon hang when run ceph -s command after execute "ceph osd in osd.<x>" com...
- 1. run command "ceph osd in osd.<x>"
2. run command "ceph -s", I want to see progress, but "ceph -s" hang at this ti...
03/13/2022
- 09:04 AM Bug #51307 (Fix Under Review): LibRadosWatchNotify.Watch2Delete fails
- https://github.com/ceph/ceph/pull/45366
- 08:34 AM Bug #51307: LibRadosWatchNotify.Watch2Delete fails
- In that case it was not injection socket failure, it was:
2022-02-16T09:56:22.598+0000 15af4700 1 -- [v2:172.21.1...
03/11/2022
- 01:02 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Dan van der Ster wrote:
> Guillaume Fenollar wrote:
> > Dan van der Ster wrote:
> > > Could you revert that and tr... - 08:22 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Guillaume Fenollar wrote:
> Dan van der Ster wrote:
> > Could you revert that and try running
> >
> > ceph-osd -... - 07:09 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Dan van der Ster wrote:
> Could you revert that and try running
>
> ceph-osd --debug_ms=1 --debug_osd=20 --debug_... - 06:47 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Guillaume Fenollar wrote:
> See that it reaches 14GB of RAM in 90 seconds approx and starts writing while crashing (... - 03:09 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Dan van der Ster wrote:
> > Can you somehow annotate the usage over time in the log?
>
> Could you please also se... - 03:02 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Mykola Golub wrote:
> Mykola Golub wrote:
>
> > pool 2 'ssd' replicated size 3 min_size 2 crush_rule 0 object_has... - 09:54 AM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- We are having the same issue with ceph 15.2.13. We take RBD snapshots that gets deleted after 3 days.
The problem ge... - 03:19 AM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- ...
03/10/2022
- 11:58 PM Bug #54516: mon/config.sh: unrecognized config option 'debug asok'
- This was the first occurrence of this test failure according to the Sentry history (March 5th 2022), and it has since...
- 03:10 PM Bug #54516 (Won't Fix): mon/config.sh: unrecognized config option 'debug asok'
- /a/yuriw-2022-03-04_21:56:41-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6721689...
- 11:46 PM Bug #54521: daemon: Error while waiting for process to exit
- This looks a lot like a valgrind failure, but there were unfortunately no osd logs collected....
- 03:35 PM Bug #54521 (Need More Info): daemon: Error while waiting for process to exit
- This causes dead job: hit max job timeout
/a/yuriw-2022-03-04_21:56:41-rados-wip-yuri4-testing-2022-03-03-1448-dis... - 11:30 PM Bug #54529: mon/mon-bind.sh: Failure due to cores found
- "Failure due to cores found" means that there is a coredump, and indeed there is a crash. Did we merge something rece...
- 11:17 PM Bug #54529 (Duplicate): mon/mon-bind.sh: Failure due to cores found
- Looks like this failed due to external connection issues, but I'll log it for documentation.
/a/teuthology-2022-01... - 11:30 PM Bug #54517: scrub/osd-scrub-snaps.sh: TEST FAILED WITH 1 ERRORS
- Ronen this looks a lot like https://tracker.ceph.com/issues/54458, just with a slightly different output. Can you che...
- 03:18 PM Bug #54517 (Duplicate): scrub/osd-scrub-snaps.sh: TEST FAILED WITH 1 ERRORS
- /a/teuthology-archive/yuriw-2022-03-04_21:56:41-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6721751...
- 10:52 PM Bug #54296: OSDs using too much memory
- Hi Ruben, Did you make any more progress on this?
I'm going through all the osd pglog memory usage tickets, and it... - 10:21 PM Bug #53729: ceph-osd takes all memory before oom on boot
- > Can you somehow annotate the usage over time in the log?
Could you please also set debug_prioritycache=5 -- this... - 09:35 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Guillaume Fenollar wrote:
> Neha Ojha wrote:
> > Can anyone provide osd logs with debug_osd=20,debug_ms=1 for OSDs ... - 05:23 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Mykola Golub wrote:
> pool 2 'ssd' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1024 pgp_... - 05:16 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Neha Ojha wrote:
> Can anyone provide osd logs with debug_osd=20,debug_ms=1 for OSDs that are hitting OOM?
I just... - 10:00 PM Backport #54527 (Resolved): quincy: cephadm upgrade pacific to quincy autoscaler is scaling pgs f...
- 10:00 PM Backport #54526 (Resolved): pacific: cephadm upgrade pacific to quincy autoscaler is scaling pgs ...
- 09:57 PM Bug #54263 (Pending Backport): cephadm upgrade pacific to quincy autoscaler is scaling pgs from 3...
- 09:15 PM Feature #54525 (New): osd/mon: log memory usage during tick
- The MDS has a nice feature that it prints out the rss and other memory stats every couple seconds at debug level 2.
... - 06:13 PM Bug #54507 (Duplicate): workunit test cls/test_cls_rgw: Manager failed: thrashosds
- 03:28 PM Bug #51846: rados/test.sh: LibRadosList.ListObjectsCursor did not complete.
- /a/yuriw-2022-03-04_21:56:41-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6721371
/a/yuriw-2022-03-... - 03:00 PM Bug #54515 (New): mon/health-mute.sh: TEST_mute: return 1 (HEALTH WARN 3 mgr modules have failed ...
- /a/yuriw-2022-03-04_21:56:41-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6721547...
- 02:48 PM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
- /a/yuriw-2022-03-04_21:56:41-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6721464
- 10:50 AM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- Neha Ojha wrote:
> jianwei zhang wrote:
> > 1711'7107 : s0/1/2/3/4/5都有所以都能写下去
> > 1715'7108 : s0/2/3/5 满足k=4,所以... - 01:57 AM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- Neha Ojha wrote:
> jianwei zhang wrote:
> > 1711'7107 : s0/1/2/3/4/5都有所以都能写下去
> > 1715'7108 : s0/2/3/5 满足k=4,所以... - 04:59 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- /a/yuriw-2022-03-04_21:56:41-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6721329
- 04:32 AM Bug #54511 (Resolved): test_pool_min_size: AssertionError: not clean before minsize thrashing starts
- /a/yuriw-2022-03-04_00:56:58-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6719015...
- 04:15 AM Bug #53767: qa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing causes timeout
- /a/yuriw-2022-03-04_00:56:58-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6718855
- 01:48 AM Bug #51627: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
- https://tracker.ceph.com/issues/54509
- 01:47 AM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
- https://github.com/ceph/ceph/pull/45137
- 01:47 AM Bug #54509 (Resolved): FAILED ceph_assert due to issue manifest API to the original object
- 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55f1f3750606]
2: ceph-osd(+0x5b...
03/09/2022
- 09:44 PM Bug #54507 (Duplicate): workunit test cls/test_cls_rgw: Manager failed: thrashosds
- /a/yuriw-2022-03-04_00:56:58-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6718934...
- 08:23 PM Bug #52535: monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ceph_assert(num_d...
- Hello Radoslaw,
thank you for your response!
About two weeks ago I did first remove and then add 6 OSDs. I did no... - 07:35 PM Bug #52535: monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ceph_assert(num_d...
- Neha has made an interesting observation about the occurrences among different versions.
http://telemetry.front.se... - 07:32 PM Bug #52535: monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ceph_assert(num_d...
- Hello Sebastian!
Was there any change about the OSD count? I mean particularly OSD removal. - 01:10 AM Bug #52535: monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ceph_assert(num_d...
- I Faced the same problem with ceph version 16.2.6. It occurred after shutting down all 3 physical servers of the clus...
- 08:16 PM Backport #54506 (In Progress): quincy: doc/rados/operations/placement-groups/#automated-scaling: ...
- https://github.com/ceph/ceph/pull/45321
- 07:50 PM Backport #54506 (Resolved): quincy: doc/rados/operations/placement-groups/#automated-scaling: --b...
- 08:15 PM Backport #54505 (In Progress): pacific: doc/rados/operations/placement-groups/#automated-scaling:...
- https://github.com/ceph/ceph/pull/45328
- 07:50 PM Backport #54505 (Resolved): pacific: doc/rados/operations/placement-groups/#automated-scaling: --...
- 07:59 PM Bug #47299: Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
- If this is easily reproducible could you please provide us with logs of replicas for the failing PG? It can be figure...
- 07:55 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- More for my own reference, but it's clear that the rw_manager problem occurs here in the PrimaryLogPG code when prepp...
- 07:47 PM Bug #54485 (Pending Backport): doc/rados/operations/placement-groups/#automated-scaling: --bulk i...
- 07:46 PM Bug #51307 (In Progress): LibRadosWatchNotify.Watch2Delete fails
- 07:36 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Neha Ojha wrote:
> Can you share the output of "ceph osd dump"? I suspect that though you may have disabled the au... - 07:31 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Neha Ojha wrote:
> Can anyone provide osd logs with debug_osd=20,debug_ms=1 for OSDs that are hitting OOM?
I uplo... - 06:58 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Can anyone provide osd logs with debug_osd=20,debug_ms=1 for OSDs that are hitting OOM?
- 06:45 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Mykola Golub wrote:
> We seem to observe a similar issue (16.2.7). On a pool with autoscale disabled pg num was chan... - 05:56 PM Bug #53729: ceph-osd takes all memory before oom on boot
- We seem to observe a similar issue (16.2.7). On a pool with autoscale disabled pg num was changed from 256 to 1024. A...
- 07:13 PM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- jianwei zhang wrote:
> 1711'7107 : s0/1/2/3/4/5都有所以都能写下去
> 1715'7108 : s0/2/3/5 满足k=4,所以能写下去
> 1715'7109 : s0/2... - 06:43 AM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- ceph v15.2.13 tag
- 05:42 AM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- [root@node1 ceph]# zcat ceph.client.log-20220308.gz|grep 202000000034931.0000001a
2022-03-08T03:12:25.531+0800 7f484... - 05:37 AM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- 1711'7107 : s0/1/2/3/4/5都有所以都能写下去
1715'7108 : s0/2/3/5 满足k=4,所以能写下去
1715'7109 : s0/2/3/5 满足k=4,所以能写下去
1715'71... - 05:35 AM Bug #53924: EC PG stuckrecovery_unfound+undersized+degraded+remapped+peered
- I had a similar problem with pg recovery_unfound ...
- 07:06 PM Bug #50042: rados/test.sh: api_watch_notify failures
- Let's use this tracker to track all the watch notify failures. For other api test failures, let's open new trackers. ...
- 03:36 PM Bug #50042: rados/test.sh: api_watch_notify failures
- Found a case of https://tracker.ceph.com/issues/45423 in master, which had a fix that was merged. Seems like it's pop...
- 05:06 PM Backport #54468 (In Progress): octopus: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely...
- 04:56 PM Backport #54466 (In Progress): pacific: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely...
- 04:42 PM Backport #54467 (In Progress): quincy: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely ...
- 04:39 PM Backport #53659 (Resolved): pacific: mon: "FAILED ceph_assert(session_map.sessions.empty())" when...
- https://github.com/ceph/ceph/pull/44543 has been merged.
- 04:36 PM Backport #53978 (Resolved): quincy: [RFE] Limit slow request details to mgr log
- 04:36 PM Backport #53388 (Resolved): pacific: pg-temp entries are not cleared for PGs that no longer exist
- 04:36 PM Backport #51150 (Resolved): pacific: When read failed, ret can not take as data len, in FillInVer...
- 04:35 PM Backport #53486 (Resolved): pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
- 04:35 PM Backport #53702 (Resolved): pacific: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in bac...
- 04:33 PM Backport #53942 (Resolved): pacific: mon: all mon daemon always crash after rm pool
- 04:33 PM Backport #53535 (Resolved): pacific: mon: mgrstatmonitor spams mgr with service_map
- 04:32 PM Backport #53718 (Resolved): pacific: mon: frequent cpu_tp had timed out messages
- 04:28 PM Backport #53480 (Resolved): pacific: Segmentation fault under Pacific 16.2.1 when using a custom ...
- 04:12 PM Backport #52077 (In Progress): octopus: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
- 04:11 PM Backport #52078 (In Progress): pacific: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
- 03:33 PM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
- Seems like this may have come back:
/a/dgalloway-2022-03-09_02:34:58-rados-wip-45272-distro-basic-smithi/6727572
03/08/2022
- 06:52 PM Bug #51627: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
- Myoungwon Oh wrote:
> The error message looks like similar before, but the cause is difference from the prior case.
... - 10:00 AM Bug #54489 (New): mon: ops get stuck in "resend forwarded message to leader"
- I hited this bug "BUG #22114":https://tracker.ceph.com/issues/22114#change-211414 in octopus.
"description": "log(2 ... - 09:02 AM Bug #51307: LibRadosWatchNotify.Watch2Delete fails
- Laura Flores wrote:
> /a/yuriw-2022-02-16_00:25:26-rados-wip-yuri-testing-2022-02-15-1431-distro-default-smithi/6687...
03/07/2022
- 03:00 PM Bug #54485 (Fix Under Review): doc/rados/operations/placement-groups/#automated-scaling: --bulk i...
- 02:45 PM Bug #54485 (Resolved): doc/rados/operations/placement-groups/#automated-scaling: --bulk invalid c...
- Command for creating a pool
was: `ceph osd create test_pool --bulk`
should be: `ceph osd pool create test_pool ...
03/06/2022
- 08:42 PM Bug #47299: Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
- Just got this again during a recovery after doing maintenance on another node this OSD crashed.
-1> 2022-03-06T...
03/04/2022
- 07:20 PM Backport #54232 (Resolved): pacific: devices: mon devices appear empty when scraping SMART metrics
- 06:40 PM Backport #54232: pacific: devices: mon devices appear empty when scraping SMART metrics
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44959
merged - 04:51 PM Bug #50042: rados/test.sh: api_watch_notify failures
- /a/yuriw-2022-03-01_17:45:51-rados-wip-yuri3-testing-2022-02-28-0757-pacific-distro-default-smithi/6714656...
- 01:39 PM Bug #53729: ceph-osd takes all memory before oom on boot
- BTW I'm using Ceph 15.2.16
- 03:13 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Hi everyone,
I'm having this issue as well for several weeks. Something situations stabilizes by themselves, sometim...
03/03/2022
- 06:54 PM Bug #54458 (Resolved): osd-scrub-snaps.sh: TEST_scrub_snaps failed due to malformed log message
- 08:10 AM Bug #54458 (Fix Under Review): osd-scrub-snaps.sh: TEST_scrub_snaps failed due to malformed log m...
- 07:47 AM Bug #54458 (Resolved): osd-scrub-snaps.sh: TEST_scrub_snaps failed due to malformed log message
- (created by PR #44941)
the test expects the following line:
"...found snap mapper error on pg 1.0 oid 1:461f8b5e:... - 06:15 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- /a/yuriw-2022-03-01_17:45:51-rados-wip-yuri3-testing-2022-02-28-0757-pacific-distro-default-smithi/6714724
- 06:13 PM Bug #50042: rados/test.sh: api_watch_notify failures
- /a/yuriw-2022-03-01_17:45:51-rados-wip-yuri3-testing-2022-02-28-0757-pacific-distro-default-smithi/6714863...
- 06:09 PM Bug #53294 (Duplicate): rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
- Marking this one as the duplicate because the other Tracker has the PR attached to it.
- 06:02 PM Bug #47838: mon/test_mon_osdmap_prune.sh: first_pinned != trim_to
- /a/yuriw-2022-03-01_17:45:51-rados-wip-yuri3-testing-2022-02-28-0757-pacific-distro-default-smithi/6714654
- 05:55 PM Backport #54468 (Resolved): octopus: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely cl...
- https://github.com/ceph/ceph/pull/45324
- 05:55 PM Backport #54467 (Resolved): quincy: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely cle...
- https://github.com/ceph/ceph/pull/45322
- 05:55 PM Backport #54466 (Resolved): pacific: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely cl...
- https://github.com/ceph/ceph/pull/45323
- 05:54 PM Bug #54396 (Pending Backport): Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears t...
- 05:48 PM Bug #54396 (Resolved): Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears the snapt...
- 03:59 PM Bug #53855 (Resolved): rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
- 03:09 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- This seems to be a pretty high priority issue, we just hit it upgrading from nautilus to 16.2.7 on a cluster with 100...
- 09:50 AM Bug #22114: mon: ops get stuck in "resend forwarded message to leader"
- I hited this bug in octopus.
"description": "log(2 entries from seq 1 at 2021-12-20T10:43:38.225243+0800... - 01:35 AM Bug #52319: LibRadosWatchNotify.WatchNotify2 fails
- This is a bit different to #47719. In that case we got an ENOENT when we expected an ENOTCONN but in the case of this...
- 12:17 AM Bug #52319: LibRadosWatchNotify.WatchNotify2 fails
- Thanks Laura and Radek. Let me take another look at this.
03/02/2022
- 11:14 PM Bug #54263: cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 32768 for ceph...
- Update:
After recreating the problem by tweaking the upgrade/pacific-x/parallel suite and adding additional logs, ... - 06:46 PM Bug #54263 (Fix Under Review): cephadm upgrade pacific to quincy autoscaler is scaling pgs from 3...
- 12:38 AM Bug #54263 (In Progress): cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> ...
- 11:04 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-03-01_22:42:19-rados-wip-yuri4-testing-2022-03-01-1206-distro-default-smithi/6715365
- 09:51 PM Backport #54412 (Rejected): pacific:osd:add pg_num_max value
- Don't need the backport in pacific at the moment, might do in the future tho.
- 07:20 PM Bug #54210 (Resolved): pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd pool get ...
- 07:16 PM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
- Let's add it to qa/valgrind.supp to suppress this error, based on Adam's comment https://tracker.ceph.com/issues/5213...
- 07:00 PM Bug #52319: LibRadosWatchNotify.WatchNotify2 fails
- Added a related one (hypothesis: same issue in multiple places, one of them already fix by Brad).
03/01/2022
- 11:22 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- I'm guessing that the problem involves pgs that are stuck in the `active+recovering+undersized+remapped` state (or `a...
- 05:26 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-02-15_16:22:25-rados-wip-yuri6-testing-2022-02-14-1456-distro-default-smithi/6685233...
- 08:11 PM Bug #54438: test/objectstore/store_test.cc: FAILED ceph_assert(bl_eq(state->contents[noid].data, ...
- /a/benhanokh-2021-08-04_06:12:22-rados-wip_gbenhano_ncbz-distro-basic-smithi/6310791/
- 05:40 PM Bug #54438 (New): test/objectstore/store_test.cc: FAILED ceph_assert(bl_eq(state->contents[noid]....
- /a/yuriw-2022-02-15_16:22:25-rados-wip-yuri6-testing-2022-02-14-1456-distro-default-smithi/6685291...
- 06:13 PM Bug #52319: LibRadosWatchNotify.WatchNotify2 fails
- I linked a related issue that looks very similar to this failure, except with a slightly different LibRadosWatchNotif...
- 06:11 PM Bug #54439 (New): LibRadosWatchNotify.WatchNotify2Multi fails
- /a/yuriw-2022-02-28_21:23:00-rados-wip-yuri-testing-2022-02-28-0823-quincy-distro-default-smithi/6711961...
- 05:21 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-02-15_16:22:25-rados-wip-yuri6-testing-2022-02-14-1456-distro-default-smithi/6685226
02/28/2022
- 09:28 PM Backport #54082 (Resolved): pacific: mon: osd pool create <pool-name> with --bulk flag
- 06:53 PM Bug #50842: pacific: recovery does not complete because of rw_manager lock not being released
- I recovered logs from a scenario that looks very similar.
See the full result of `zcat /a/yuriw-2022-02-17_22:49:5... - 11:34 AM Bug #54423 (New): osd/scrub: bogus DigestUpdate events are created, logged and (hopefully) rejected
- A mishandling of the counter of "the digest-updates we are waiting for, before finishing
with this scrubbed chunk" c...
02/25/2022
- 10:57 PM Backport #53480: pacific: Segmentation fault under Pacific 16.2.1 when using a custom crush locat...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44897
merged - 10:56 PM Backport #54082: pacific: mon: osd pool create <pool-name> with --bulk flag
- Kamoltat Sirivadhna wrote:
> pull request: https://github.com/ceph/ceph/pull/44847
merged - 09:59 PM Backport #54412 (Rejected): pacific:osd:add pg_num_max value
- https://github.com/ceph/ceph/pull/45173
- 05:55 PM Bug #50042: rados/test.sh: api_watch_notify failures
- /a/yuriw-2022-02-24_22:04:22-rados-wip-yuri7-testing-2022-02-17-0852-pacific-distro-default-smithi/6704772...
- 05:54 AM Bug #54364 (Resolved): The built-in osd bench test shows inflated results.
- 05:54 AM Backport #54393 (Resolved): quincy: The built-in osd bench test shows inflated results.
02/24/2022
- 10:45 PM Backport #54386: octopus: [RFE] Limit slow request details to mgr log
- please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/45154
ceph-backport.sh versi... - 07:43 PM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
- /a/sseshasa-2022-02-24_11:27:07-rados-wip-45118-45121-quincy-testing-distro-default-smithi/6704275/remote/smithi174/l...
- 07:19 PM Bug #53294: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
- /a/sseshasa-2022-02-24_11:27:07-rados-wip-45118-45121-quincy-testing-distro-default-smithi/6704402...
- 06:36 PM Bug #54368 (Duplicate): ModuleNotFoundError: No module named 'tasks.cephadm'
- 05:51 PM Backport #53644 (In Progress): pacific: Disable health warning when autoscaler is on
- 03:33 PM Backport #53551 (Resolved): pacific: [RFE] Provide warning when the 'require-osd-release' flag do...
- 08:56 AM Bug #54396: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears the snaptrim queue
- More context:...
- 08:44 AM Bug #54396 (Fix Under Review): Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears t...
- 08:41 AM Bug #54396 (Resolved): Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears the snapt...
- See https://www.spinics.net/lists/ceph-users/msg71061.html...
- 08:38 AM Backport #54393 (Resolved): quincy: The built-in osd bench test shows inflated results.
- https://github.com/ceph/ceph/pull/45141
- 08:37 AM Bug #54364 (Pending Backport): The built-in osd bench test shows inflated results.
- 02:45 AM Bug #51627: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
- The error message looks like similar before, but the cause is difference from the prior case.
Anyway, I posted the f...
02/23/2022
- 05:32 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- Happened in a dead job.
/a/yuriw-2022-02-21_15:40:41-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6... - 05:16 PM Bug #51627: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
- Happened again. Could this be a new occurrence?
/a/yuriw-2022-02-21_15:40:41-rados-wip-yuri4-testing-2022-02-18-0800... - 05:00 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- /a/yuriw-2022-02-21_15:40:41-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6698327
- 03:15 PM Backport #54386 (Resolved): octopus: [RFE] Limit slow request details to mgr log
02/22/2022
- 09:10 PM Bug #54210 (Fix Under Review): pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd p...
- 09:09 PM Bug #54210: pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd pool get a pg_num | ...
- After going through sentry, I've realized that the only occurrence of this bug in master happens before the merge of
... - 08:39 PM Backport #54233 (In Progress): octopus: devices: mon devices appear empty when scraping SMART met...
- 08:39 PM Backport #54232 (In Progress): pacific: devices: mon devices appear empty when scraping SMART met...
- 08:14 PM Bug #54369 (New): mon/test_mon_osdmap_prune.sh: jq .osdmap_first_committed [[ 11 -eq 20 ]]
- /a/yuriw-2022-02-17_23:23:56-rados-wip-yuri7-testing-2022-02-17-0852-pacific-distro-default-smithi/6692990...
- 07:31 PM Bug #54368 (Duplicate): ModuleNotFoundError: No module named 'tasks.cephadm'
- /a/yuriw-2022-02-17_23:23:56-rados-wip-yuri7-testing-2022-02-17-0852-pacific-distro-default-smithi/6692894...
- 07:19 PM Bug #47589: radosbench times out "reached maximum tries (800) after waiting for 4800 seconds"
- /a/yuriw-2022-02-17_23:23:56-rados-wip-yuri7-testing-2022-02-17-0852-pacific-distro-default-smithi/6692841
- 06:29 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-02-21_18:20:15-rados-wip-yuri11-testing-2022-02-21-0831-quincy-distro-default-smithi/6699270
Happene... - 05:09 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- Chris Durham wrote:
> This issue bit us in our upgrade to 16.2.7 from 15.2.15. We have a manual cluster (non-cephadm... - 04:11 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- This issue bit us in our upgrade to 16.2.7 from 15.2.15. We have a manual cluster (non-cephadm). We followed the proc...
- 12:57 PM Bug #54364 (Resolved): The built-in osd bench test shows inflated results.
- The built-in osd bench shows inflated results with up to 3x-4x the expected values.
Example:
Before:
{
"b... - 07:13 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- We were able to narrow it down further. We can trigger the problem reliably by doing this:
- 2 clusters, multisite...
02/21/2022
- 05:09 PM Feature #44107 (Resolved): mon: produce stable election results when netsplits and other errors h...
- Oh, this has been done for ages.
- 12:47 PM Bug #51463: blocked requests while stopping/starting OSDs
- I think we hit the same issue while upgrading our nautilus cluster to pacific.
While I did not hit this when testing... - 12:42 PM Backport #53339: pacific: src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive...
- Hey I got here through the following mailing list post: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thre...
- 01:27 AM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-02-17_22:49:55-rados-wip-yuri3-testing-2022-02-17-1256-distro-default-smithi/6692376...
02/20/2022
- 09:25 AM Bug #52901: osd/scrub: setting then clearing noscrub may lock a PG in 'scrubbing' state
- Will this be backported to a stable release?
- 09:23 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- I'm also seeing the same issue on 16.2.7, but it's been going on for almost two weeks. Already set and unset noscrub/...
- 09:18 AM Backport #53339: pacific: src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive...
- Could this be the reason I'm seeing a spam of "handle_scrub_reserve_grant: received unsolicited reservation grant" me...
02/18/2022
- 01:32 PM Bug #52026: osd: pgs went back into snaptrim state after osd restart
- Fortunately (or perhaps not so fortunately), in the process of dealing with this issue we performed a full restart of...
- 11:49 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Dieter Roels wrote:
> Hi CHristian. Are your rgws collocated with the osds of the metadata pools?
> We now notice i... - 11:31 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Hi CHristian. Are your rgws collocated with the osds of the metadata pools?
We now notice in our clusters that the...
02/17/2022
- 09:43 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-02-16_15:53:49-rados-wip-yuri11-testing-2022-02-15-1643-distro-default-smithi/6688846
- 07:43 PM Bug #54210: pacific: mon/pg_autoscaler.sh: echo failed on "bash -c 'ceph osd pool get a pg_num | ...
- yep this is a bug, thanks for letting me know, patch coming up.
- 06:54 PM Backport #54290 (Resolved): quincy: pybind/mgr/progress: disable pg recovery event by default
- 05:46 PM Bug #54316 (Resolved): mon/MonCommands.h: target_size_ratio range is incorrect
- Currently if we give `target_size_ratio` a value more than 1.0 using the command: `ceph osd pool create <pool-name> -...
- 05:00 PM Bug #54263: cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 32768 for ceph...
- Update:
From the monitor sides of things of pool creation, target_size_ratio cannot be more than 1.0 or less than ... - 04:52 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-02-16_00:25:26-rados-wip-yuri-testing-2022-02-15-1431-distro-default-smithi/6687342
Same issue with ... - 04:30 PM Bug #51307: LibRadosWatchNotify.Watch2Delete fails
- /a/yuriw-2022-02-16_00:25:26-rados-wip-yuri-testing-2022-02-15-1431-distro-default-smithi/6687338
- 11:54 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- We just observed 12 more scrub errors spread across 7 pgs and all on our primary (used for user access, read/write) z...
- 09:47 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Neha Ojha wrote:
> yite gu wrote:
> > Christian Rohmann wrote:
> > > yite gu wrote:
> > > > This is inconsistent ... - 09:51 AM Bug #54296: OSDs using too much memory
- Hi Dan,
Thanks for your response.
I only adjusted osd_max_pg_log_entries and left osd_min_pg_log_entries alone. A... - 09:03 AM Bug #54296: OSDs using too much memory
- Ruben Kerkhof wrote:
> One thing I tried was to set osd_max_pg_log_entries to 500 instead of the default of 10000, b... - 09:33 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
- I would just like to add that scrubs started all of the sudden and the cluster is HEALTH_OK again.
02/16/2022
- 10:32 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- /ceph/teuthology-archive/yuriw-2022-02-15_22:35:42-rados-wip-yuri8-testing-2022-02-15-1214-distro-default-smithi/6686...
- 09:20 PM Backport #53718: pacific: mon: frequent cpu_tp had timed out messages
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44545
merged - 08:24 PM Backport #54290: quincy: pybind/mgr/progress: disable pg recovery event by default
- https://github.com/ceph/ceph/pull/45043 merged
- 07:59 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-02-15_22:40:39-rados-wip-yuri7-testing-2022-02-15-1102-quincy-distro-default-smithi/6686655/remote/smit...
- 07:07 PM Backport #53535: pacific: mon: mgrstatmonitor spams mgr with service_map
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44721
merged - 07:06 PM Backport #53942: pacific: mon: all mon daemon always crash after rm pool
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44698
merged - 07:00 PM Feature #54280: support truncation sequences in sparse reads
- Neha mentioned taking a look at the history, so I did a bit of git archeology today. The limitation dates back to the...
- 06:58 PM Bug #53751: "N monitors have not enabled msgr2" is always shown for new clusters
- Hello. Could you please provide the output from @ceph health detail@? We suspect the warning might got replaced with ...
- 06:46 PM Bug #54255: utc time is used when ceph crash ls
- Yaarit, was this choice intentional?
- 06:44 PM Bug #51338 (Duplicate): osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*&g...
- 06:44 PM Bug #51338: osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>())
- André Cruz wrote:
> I'm also encountering this issue on Pacific (16.2.7):
>
> [...]
>
> Any pointers?
I thi... - 06:31 PM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- yite gu wrote:
> Christian Rohmann wrote:
> > yite gu wrote:
> > > This is inconsistent pg 7.2 from your upload fi... - 06:24 PM Bug #53663 (New): Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- 09:50 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- Dieter Roels wrote:
> After the repair the inconsistencies do not re-appear. However, we can reproduce the issue in ... - 06:18 PM Bug #46847: Loss of placement information on OSD reboot
- Frank Schilder wrote:
> Could somebody please set the status back to open and Affected Versions to all?
The ticke... - 06:13 PM Bug #53729 (Need More Info): ceph-osd takes all memory before oom on boot
- 12:21 PM Bug #54296: OSDs using too much memory
- Hi Igor,
See attachment.
One thing I tried was to set osd_max_pg_log_entries to 500 instead of the default of 1... - 12:15 PM Bug #54296: OSDs using too much memory
- Hi Ruben,
please share full dump_mempools output. - 10:34 AM Bug #54296 (Resolved): OSDs using too much memory
- One of our customers upgraded from Nautilus to Octopus, and now a lot of his OSDs are using way more ram than allowed...
02/15/2022
- 11:21 PM Bug #54263: cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 32768 for ceph...
- In summary,
the root cause of the problem is after the upgrade to quincy, cephfs meta data pool was somehow given a ... - 10:57 PM Bug #53855 (Fix Under Review): rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlush...
- 02:07 AM Bug #53855: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
- https://github.com/ceph/ceph/pull/45035
- 07:27 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
- /a/yuriw-2022-02-08_17:00:23-rados-wip-yuri5-testing-2022-02-08-0733-pacific-distro-default-smithi/6670539
last pg... - 07:15 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
- Looks similar, but different test.
/a/yuriw-2022-02-09_22:52:18-rados-wip-yuri5-testing-2022-02-09-1322-pacific-di... - 06:55 PM Bug #45702: PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter...
- /a/yuriw-2022-02-09_22:52:18-rados-wip-yuri5-testing-2022-02-09-1322-pacific-distro-default-smithi/6672070
- 06:47 PM Bug #53327: osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_notify...
- Hi Nitzan,
I checked your patch on the current pacific branch.
unfortunately I still get slow ops (slow >= 5 seco... - 06:46 PM Bug #48997: rados/singleton/all/recovery-preemption: defer backfill|defer recovery not found in logs
- /a/yuriw-2022-02-09_22:52:18-rados-wip-yuri5-testing-2022-02-09-1322-pacific-distro-default-smithi/6672005
- 03:45 PM Backport #54290 (Resolved): quincy: pybind/mgr/progress: disable pg recovery event by default
- 03:42 PM Bug #47273 (Fix Under Review): ceph report missing osdmap_clean_epochs if answered by peon
- 03:08 AM Bug #52421: test tracker
- Crash signature (v1) and Crash signature (v2) are of invalid format, and are breaking the telemetry crashes bot, remo...
02/14/2022
- 11:46 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-02-08_17:00:23-rados-wip-yuri5-testing-2022-02-08-0733-pacific-distro-default-smithi/6670360
- 11:29 PM Bug #51234: LibRadosService.StatusFormat failed, Expected: (0) != (retry), actual: 0 vs 0
- Pacific:
/a/yuriw-2022-02-09_22:52:18-rados-wip-yuri5-testing-2022-02-09-1322-pacific-distro-default-smithi/6672177 - 08:21 PM Feature #54280 (Resolved): support truncation sequences in sparse reads
- I've been working on sparse read support in the kclient, and got something working today, only to notice that after t...
- 03:39 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-02-11_22:59:19-rados-wip-yuri4-testing-2022-02-11-0858-distro-default-smithi/6677733
Last pg map bef... - 10:06 AM Bug #46847: Loss of placement information on OSD reboot
- Could somebody please set the status back to open and Affected Versions to all?
Also available in: Atom