Activity
From 05/16/2022 to 06/14/2022
06/14/2022
- 09:40 PM Bug #49777: test_pool_min_size: 'check for active or peered' reached maximum tries (5) after wait...
- Running some tests to try and reproduce the issue and get a sense of how frequently it fails. This has actually been ...
- 09:05 PM Backport #51287 (In Progress): pacific: LibRadosService.StatusFormat failed, Expected: (0) != (re...
- 08:08 PM Bug #53855: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
- @Myoungwon Oh does this look like the same thing to you? Perhaps your fix needs to be backported to Pacific.
/a/yu... - 03:03 PM Bug #52316: qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quorum']) == len(mons)
- /a/yuriw-2022-06-13_16:36:31-rados-wip-yuri7-testing-2022-06-13-0706-distro-default-smithi/6876523
Description: ra... - 02:37 PM Bug #56034: qa/standalone/osd/divergent-priors.sh fails in test TEST_divergent_3()
- Another detail to note is that this particular test has the pg autoscaler enabled, as opposed to TEST_divergent_2(), ...
- 10:48 AM Bug #56034 (Resolved): qa/standalone/osd/divergent-priors.sh fails in test TEST_divergent_3()
- /a/yuriw-2022-06-13_16:36:31-rados-wip-yuri7-testing-2022-06-13-0706-distro-default-smithi/6876516
Also historical... - 06:22 AM Bug #56030: frequently down and up a osd may cause recovery not in asynchronous
- i set osd_async_recovery_min_cost = 0 hope async recovery anyway
- 03:57 AM Bug #56030 (Fix Under Review): frequently down and up a osd may cause recovery not in asynchronous
- ceph version: octopus 15.2.13
in my test cluster, have 6 osds, 3 for bucket index pool,3 for other pools, there ar...
06/13/2022
- 10:40 PM Bug #56028 (New): thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.vers...
- This assertion is resurfacing in Pacific runs. The last fix for this was tracked in #46323, but this test branch incl...
- 10:27 PM Bug #52737: osd/tests: stat mismatch
- @Ronen I'm pretty sure this is a duplicate of #50222
- 10:26 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
- /a/yuriw-2022-06-07_19:48:58-rados-wip-yuri6-testing-2022-06-07-0955-pacific-distro-default-smithi/6866688
- 03:12 AM Bug #52948: osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
- /a/yuriw-2022-06-09_03:58:30-smoke-quincy-release-distro-default-smithi/6869659/
Test description: smoke/basic/{clus...
06/10/2022
- 05:38 PM Bug #52948: osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
- Sridhar Seshasayee wrote:
> *+Quick Update+*
> This was again hit recently in
> /a/yuriw-2022-06-09_03:58:30-smoke... - 05:10 PM Bug #52948: osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
- verifying again
http://pulpito.front.sepia.ceph.com/yuriw-2022-06-11_02:22:38-smoke-quincy-release-distro-default-sm... - 03:47 PM Bug #52948: osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
- *+Quick Update+*
This was again hit recently in
/a/yuriw-2022-06-09_03:58:30-smoke-quincy-release-distro-default-sm... - 05:20 PM Backport #55981: quincy: don't trim excessive PGLog::IndexedLog::dups entries on-line
- Radoslaw Zarzynski wrote:
> https://github.com/ceph/ceph/pull/46605
merged - 04:52 PM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
- /a/yuriw-2022-06-10_03:10:47-rados-wip-yuri4-testing-2022-06-09-1510-quincy-distro-default-smithi/6871955
Coredump... - 04:48 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- Nitzan Mordechai wrote:
> That could work, but when we have socket failure injection, the error callback will not be... - 04:44 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-06-10_03:10:47-rados-wip-yuri4-testing-2022-06-09-1510-quincy-distro-default-smithi/6872050
- 04:39 PM Feature #55982: log the numbers of dups in PG Log
- https://github.com/ceph/ceph/pull/46607 merged
- 08:18 AM Bug #55995 (New): OSD Crash: /lib64/libpthread.so.0(+0x12ce0) [0x7f94cdcbbce0]
- Hi,
i recently upgraded my ceph cluster from 14.2.x to 16.2.7 and switched to docker deployment. Since then, i see...
06/09/2022
- 09:42 PM Backport #55981: quincy: don't trim excessive PGLog::IndexedLog::dups entries on-line
- https://github.com/ceph/ceph/pull/46605
- 06:36 PM Backport #55981 (Resolved): quincy: don't trim excessive PGLog::IndexedLog::dups entries on-line
- 08:42 PM Backport #55985 (In Progress): octopus: log the numbers of dups in PG Log
- https://github.com/ceph/ceph/pull/46609
- 08:35 PM Backport #55985 (Resolved): octopus: log the numbers of dups in PG Log
- 08:40 PM Backport #55984 (In Progress): pacific: log the numbers of dups in PG Log
- https://github.com/ceph/ceph/pull/46608
- 08:35 PM Backport #55984 (Resolved): pacific: log the numbers of dups in PG Log
- 08:38 PM Backport #55983 (In Progress): quincy: log the numbers of dups in PG Log
- https://github.com/ceph/ceph/pull/46607
- 08:35 PM Backport #55983 (Resolved): quincy: log the numbers of dups in PG Log
- 08:32 PM Feature #55982 (Pending Backport): log the numbers of dups in PG Log
- Approved for `main`, QA is going on. Switching to _Pending backport_ before the merge to unblock backports.
- 08:03 PM Feature #55982 (Fix Under Review): log the numbers of dups in PG Log
- 07:59 PM Feature #55982 (Resolved): log the numbers of dups in PG Log
- This is a feature requests that is critical for investigating / verification of the dups inflation issue.
- 01:28 PM Backport #55747: pacific: Support blocklisting a CIDR range
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46470
merged
06/08/2022
- 06:23 PM Bug #52969 (Fix Under Review): use "ceph df" command found pool max avail increase when there are...
- 06:22 PM Backport #55973 (Rejected): pacific: found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi1...
- 06:22 PM Backport #55972 (Resolved): quincy: found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi10...
- 06:16 PM Bug #49525 (Pending Backport): found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi1012151...
- 06:13 PM Bug #55407 (Rejected): quincy osd's fail to boot and crash
- Closing this ticket. The new crash is tracked independently (https://tracker.ceph.com/issues/55698).
- 06:10 PM Bug #55851: Assert in Ceph messenger
- From Neha:
* http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?var-sig_v2=12eed3bdd041d05365... - 06:04 PM Bug #45318: Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in cluster log running...
- This is isn't octupus-specific as we saw it in pacific as well.
- 05:52 PM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
- No high priority. Possibly a test issue.
- 05:49 PM Bug #49777: test_pool_min_size: 'check for active or peered' reached maximum tries (5) after wait...
- Maybe let's talk on that in one of the RADOS Team meetings.
- 05:48 PM Bug #49777: test_pool_min_size: 'check for active or peered' reached maximum tries (5) after wait...
- Maybe let's talk on that in one of the RADOS Team meetings.
- 03:25 PM Backport #55309: pacific: prometheus metrics shows incorrect ceph version for upgraded ceph daemon
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46429
merged - 03:25 PM Backport #55308: pacific: Manager is failing to keep updated metadata in daemon_state for upgrade...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46427
merged - 03:12 PM Bug #52724 (Duplicate): octopus: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
- 03:09 PM Bug #53855 (Resolved): rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
- 02:38 PM Bug #51076 (Resolved): "wait_for_recovery: failed before timeout expired" during thrashosd test w...
- 02:38 PM Backport #55743 (Resolved): octopus: "wait_for_recovery: failed before timeout expired" during th...
- 07:53 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- That could work, but when we have socket failure injection, the error callback will not be calling in the python API ...
- 06:17 AM Bug #55836 (Fix Under Review): add an asok command for pg log investigations
- 02:24 AM Backport #55305 (In Progress): quincy: Manager is failing to keep updated metadata in daemon_stat...
06/07/2022
- 05:42 PM Bug #53729 (Resolved): ceph-osd takes all memory before oom on boot
- 05:42 PM Bug #54296 (Resolved): OSDs using too much memory
- 05:41 PM Backport #55633 (Resolved): octopus: ceph-osd takes all memory before oom on boot
- 05:41 PM Backport #55631 (Resolved): pacific: ceph-osd takes all memory before oom on boot
- 05:13 PM Bug #49777: test_pool_min_size: 'check for active or peered' reached maximum tries (5) after wait...
- /a/yuriw-2022-05-31_21:35:41-rados-wip-yuri2-testing-2022-05-31-1300-pacific-distro-default-smithi/6856269
Descrip... - 04:04 PM Backport #53972: pacific: BufferList.rebuild_aligned_size_and_memory failure
- Radoslaw Zarzynski wrote:
> https://github.com/ceph/ceph/pull/46215
merged - 04:03 PM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...
- https://github.com/ceph/ceph/pull/46120 merged
- 04:02 PM Backport #55281: pacific: mon/OSDMonitor: properly set last_force_op_resend in stretch mode
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45870
merged - 03:42 PM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
- /a/yuriw-2022-06-02_00:50:42-rados-wip-yuri4-testing-2022-06-01-1350-pacific-distro-default-smithi/6859734
Descrip... - 03:30 PM Bug #48965: qa/standalone/osd/osd-force-create-pg.sh: TEST_reuse_id: return 1
- /a/yuriw-2022-06-02_00:50:42-rados-wip-yuri4-testing-2022-06-01-1350-pacific-distro-default-smithi/6859929
- 03:29 PM Bug #55906: cephfs/metrics/Types.h: In function 'std::ostream& operator<<(std::ostream&, const Cl...
- Oops, updated the wrong Tracker.
- 06:01 AM Bug #55906: cephfs/metrics/Types.h: In function 'std::ostream& operator<<(std::ostream&, const Cl...
- This has been fixed by https://tracker.ceph.com/issues/50822
- 05:55 AM Bug #55906 (New): cephfs/metrics/Types.h: In function 'std::ostream& operator<<(std::ostream&, co...
- /home/teuthworker/archive/yuriw-2022-06-02_14:44:32-rados-wip-yuri4-testing-2022-06-01-1350-pacific-distro-default-sm...
- 03:20 PM Bug #45318: Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in cluster log running...
- /a/yuriw-2022-06-02_00:50:42-rados-wip-yuri4-testing-2022-06-01-1350-pacific-distro-default-smithi/6859916
- 02:01 PM Backport #55298: octopus: malformed json in a Ceph RESTful API call can stop all ceph-mon services
- nikhil kshirsagar wrote:
> please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/... - 01:56 AM Bug #55905 (New): Failed to build rados.cpython-310-x86_64-linux-gnu.so
- I build ceph on Ubuntu22.04, but I meet the error. And under my research, I found a way to solve the error, but I don...
06/06/2022
- 08:06 PM Bug #55836: add an asok command for pg log investigations
- It'd be nice if we could retrieve pg log dups length by means of an existing command. FWIW, we log the "approx pg log...
- 06:38 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
- Tested with the fixed version and now it is working fine!...
- 04:56 PM Bug #51076 (Pending Backport): "wait_for_recovery: failed before timeout expired" during thrashos...
- 04:55 PM Bug #51076 (Resolved): "wait_for_recovery: failed before timeout expired" during thrashosd test w...
- 04:55 PM Backport #55745 (Resolved): pacific: "wait_for_recovery: failed before timeout expired" during th...
- 04:55 PM Bug #50842 (Resolved): pacific: recovery does not complete because of rw_manager lock not being ...
- 02:58 PM Backport #55746: quincy: Support blocklisting a CIDR range
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46469
merged
06/05/2022
- 10:18 AM Bug #55407: quincy osd's fail to boot and crash
- Radoslaw Zarzynski wrote:
> This looks like something new and unrelated to other crashes in this ticket, so created ...
06/03/2022
- 10:14 PM Bug #46877: mon_clock_skew_check: expected MON_CLOCK_SKEW but got none
- Spotted in Quincy:
/a/yuriw-2022-06-02_20:24:42-rados-wip-yuri5-testing-2022-06-02-0825-quincy-distro-default-smit... - 02:50 PM Bug #55851 (Resolved): Assert in Ceph messenger
- Context:
Ceph balancer was busy balancing: PG remaps... - 01:23 AM Backport #55306 (Resolved): quincy: prometheus metrics shows incorrect ceph version for upgraded ...
- should be fixed in 17.2.1
06/02/2022
- 07:26 PM Bug #55836 (Resolved): add an asok command for pg log investigations
- The rationale is that @ceph-objectstore-tool -op log@ requires stopping OSD, and thus is intrusive.
This feature i... - 09:39 AM Backport #55767 (In Progress): octopus: rados_api_tests: LibRadosWatchNotify.AioWatchNotify2 fails
- 09:38 AM Backport #55768 (In Progress): pacific: rados_api_tests: LibRadosWatchNotify.AioWatchNotify2 fails
- 08:35 AM Bug #54004 (Rejected): When creating erasure-code-profile incorrectly set parameters, it can be c...
- 08:35 AM Bug #54004: When creating erasure-code-profile incorrectly set parameters, it can be created succ...
- The profile can be created, but that doesn't mean that you will use it, yet.
as long as you are not using it, no err... - 06:32 AM Bug #54172: ceph version 16.2.7 PG scrubs not progressing
We've seen this at a customers cluster as well. A simple repeer of the pg gets it unstuck. We've not investigated a...
06/01/2022
- 02:52 PM Backport #55631: pacific: ceph-osd takes all memory before oom on boot
- This PR is ready to merge. Can this be executed so this change will end up in the next Pacific release?
- 10:47 AM Bug #52969: use "ceph df" command found pool max avail increase when there are degraded objects i...
- jianwei zhang wrote:
> https://github.com/ceph/ceph/pull/46478
test result
- 10:40 AM Bug #52969: use "ceph df" command found pool max avail increase when there are degraded objects i...
- https://github.com/ceph/ceph/pull/46478
- 08:15 AM Bug #52969: use "ceph df" command found pool max avail increase when there are degraded objects i...
- ...
- 07:18 AM Bug #52969: use "ceph df" command found pool max avail increase when there are degraded objects i...
- The original intention of raising this question is that testers (users) are confused as to why MAX_AVAIL does not dec...
- 06:17 AM Bug #52969: use "ceph df" command found pool max avail increase when there are degraded objects i...
- step4 vs step5:
4. kill 9 osd.0.pid - OSD.0 OUT unset nobackfill --> recovery HEALTH_OK
STORED = 1.1G ///increase 1... - 06:15 AM Bug #52969: use "ceph df" command found pool max avail increase when there are degraded objects i...
- 5. remove out osd.0...
- 06:03 AM Bug #52969: use "ceph df" command found pool max avail increase when there are degraded objects i...
- for Problem2 step1 vs step4:
osd.0 already out and recovery complete HEALTH_OK, but STORED/(DATA) 1.0G increase ... - 05:59 AM Bug #52969: use "ceph df" command found pool max avail increase when there are degraded objects i...
- for ceph df detail commands
I don't think raw_used_rate should be adjusted:... - 05:51 AM Bug #52969: use "ceph df" command found pool max avail increase when there are degraded objects i...
- 针对MAX AVAIL字段,我认为应该将down or out osd.0去除掉...
- 05:44 AM Bug #52969: use "ceph df" command found pool max avail increase when there are degraded objects i...
- Problem1 step1 vs step2:
1. ceph cluster initial state
STORED = 1.0G
(DATA) = 1.0G
MAX AVAIL = 260G
2. ... - 05:30 AM Bug #52969: use "ceph df" command found pool max avail increase when there are degraded objects i...
- ceph v15.2.13
I found same problem
1. ceph cluster initial state... - 06:23 AM Fix #54565 (Resolved): Add snaptrim stats to the existing PG stats.
- 06:23 AM Backport #54612 (Resolved): quincy: Add snaptrim stats to the existing PG stats.
- 06:22 AM Bug #55186 (Resolved): Doc: Update mclock release notes regarding an existing issue.
- 06:21 AM Backport #55219 (Resolved): quincy: Doc: Update mclock release notes regarding an existing issue.
- 06:19 AM Feature #51984 (Resolved): [RFE] Provide warning when the 'require-osd-release' flag does not mat...
- 06:18 AM Backport #53549 (Rejected): nautilus: [RFE] Provide warning when the 'require-osd-release' flag d...
- The backport to nautilus was deemed not needed. See BZ https://bugzilla.redhat.com/show_bug.cgi?id=2033078 for more d...
- 05:57 AM Backport #53550 (Resolved): octopus: [RFE] Provide warning when the 'require-osd-release' flag do...
- 04:58 AM Bug #49525 (Fix Under Review): found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi1012151...
- Indeed caused by scrub starting while the PG is being snap-trimmed.
- 04:51 AM Bug #55794 (Duplicate): scrub: scrub is not prevented from started while snap-trimming is in prog...
- Laura Flores wrote:
> @Ronen is this already tracked in #49525?
Yes. Thanks. I will mark as duplicate.
05/31/2022
- 11:52 PM Bug #54316 (Resolved): mon/MonCommands.h: target_size_ratio range is incorrect
- 11:51 PM Backport #54567 (Resolved): pacific: mon/MonCommands.h: target_size_ratio range is incorrect
- 11:50 PM Backport #54568 (Resolved): octopus: mon/MonCommands.h: target_size_ratio range is incorrect
- 11:33 PM Backport #55747 (In Progress): pacific: Support blocklisting a CIDR range
- 11:18 PM Backport #55746 (In Progress): quincy: Support blocklisting a CIDR range
- 10:26 PM Bug #55794: scrub: scrub is not prevented from started while snap-trimming is in progress
- @Ronen is this already tracked in #49525?
- 09:38 PM Bug #55809: "Leak_IndirectlyLost" valgrind report on mon.c
- Laura Flores wrote:
> /a/yuriw-2022-05-27_21:59:17-rados-wip-yuri-testing-2022-05-27-0934-distro-default-smithi/6851... - 09:35 PM Bug #55809 (New): "Leak_IndirectlyLost" valgrind report on mon.c
- /a/yuriw-2022-05-27_21:59:17-rados-wip-yuri-testing-2022-05-27-0934-distro-default-smithi/6851271/remote/smithi085/lo...
- 06:13 PM Backport #53971 (Resolved): octopus: BufferList.rebuild_aligned_size_and_memory failure
- 06:07 PM Backport #53971: octopus: BufferList.rebuild_aligned_size_and_memory failure
- Radoslaw Zarzynski wrote:
> https://github.com/ceph/ceph/pull/46216
merged - 03:10 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
- Other reported instances of this `wait_for_clean` assertion failure where the pgmap has a pg stuck in recovery have l...
- 03:04 PM Bug #55726: Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on clients
- Hi,
set debug mode on OSDs and MONs but didn't find string 'choose_acting'.
Also what I found, our EC profile ... - 02:42 PM Bug #39150 (Resolved): mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- 02:41 PM Bug #50659 (Resolved): Segmentation fault under Pacific 16.2.1 when using a custom crush location...
- 02:39 PM Bug #53306 (Resolved): ceph -s mon quorum age negative number
- 02:38 PM Backport #55280 (Resolved): quincy: mon/OSDMonitor: properly set last_force_op_resend in stretch ...
- 02:37 PM Bug #53327 (Resolved): osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shut...
- 02:34 PM Backport #55632 (Resolved): quincy: ceph-osd takes all memory before oom on boot
- 12:52 PM Bug #55435 (Fix Under Review): mon/Elector: notify_ranked_removed() does not properly erase dead_...
- 05:18 AM Bug #55798 (Fix Under Review): scrub starts message missing in cluster log
- 05:15 AM Bug #55798 (Pending Backport): scrub starts message missing in cluster log
- We used to log "scrub starts" and "deep-scrub starts" message if scrub/deep-scrub process has been started for the pg...
05/30/2022
- 01:27 PM Bug #55773 (Fix Under Review): Assertion failure (ceph_assert(have_pending)) when creating new OS...
- 01:16 PM Backport #55309 (In Progress): pacific: prometheus metrics shows incorrect ceph version for upgra...
- 01:13 PM Backport #55308 (In Progress): pacific: Manager is failing to keep updated metadata in daemon_sta...
- 12:24 PM Bug #55794 (Duplicate): scrub: scrub is not prevented from started while snap-trimming is in prog...
- Scrub code only tests the target PG for 'active' & 'clean'. And snap-trimming PGs are
'clean'.
For example:
http... - 09:25 AM Backport #55792 (Rejected): octopus: CEPH Graylog Logging Missing "host" Field
- 09:25 AM Backport #55791 (Rejected): pacific: CEPH Graylog Logging Missing "host" Field
05/27/2022
- 10:29 PM Bug #55787 (New): mon/crush_ops.sh: Error ENOENT: item osd.7 does not exist
- Found in an Octopus teuthology run:
/a/yuriw-2022-05-14_14:30:10-rados-wip-yuri5-testing-2022-05-13-1402-octopus-d... - 03:59 PM Bug #55383 (Resolved): monitor cluster logs(ceph.log) appear empty until rotated
- 03:59 PM Backport #55742 (Resolved): quincy: monitor cluster logs(ceph.log) appear empty until rotated
- 03:50 PM Backport #55742: quincy: monitor cluster logs(ceph.log) appear empty until rotated
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46374
merged - 03:35 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- /a/yuriw-2022-05-26_23:23:48-rados-wip-yuri2-testing-2022-05-26-1430-quincy-distro-default-smithi/6849426$...
05/26/2022
- 10:56 PM Bug #55776 (New): octopus: map exx had wrong cluster addr
- Description: rados/objectstore/{backends/ceph_objectstore_tool supported-random-distro$/{ubuntu_18.04}}
/a/yuriw-2... - 10:33 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
- /a/yuriw-2022-05-13_14:13:55-rados-wip-yuri3-testing-2022-05-12-1609-octopus-distro-default-smithi/6832544
Descrip... - 05:06 PM Bug #55773: Assertion failure (ceph_assert(have_pending)) when creating new OSDs during OSD deplo...
- +*ANALYSIS*+
Note that the analysis is for the first crash when the leader was: mon.f25-h23-000-r730xd.rdu2.scalel... - 04:54 PM Bug #55773 (Resolved): Assertion failure (ceph_assert(have_pending)) when creating new OSDs durin...
- See https://bugzilla.redhat.com/show_bug.cgi?id=2086419 for more details.
+*Assertion Failure*+... - 08:52 AM Bug #55355: osd thread deadlock
- I think this problem may be a problem with ProtocolV2...
- 01:56 AM Bug #55750: mon: slow request of very long time
- https://github.com/ceph/ceph/pull/41516
https://github.com/ceph/ceph/commit/a124ee85b03e15f4ea371358008ecac65f9f4e50...
05/25/2022
- 08:37 PM Bug #55750: mon: slow request of very long time
- Radoslaw Zarzynski wrote:
> Could you please provide an info on which version of Ceph this issue happened?
# ceph -... - 06:19 PM Bug #55750 (Need More Info): mon: slow request of very long time
- Could you please provide an info on which version of Ceph this issue happened?
- 08:17 PM Bug #53895 (Resolved): Unable to format `ceph config dump` command output in yaml using `-f yaml`
- 06:46 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
- Not urgent, perhaps not low-hanging-fruit but still good as a training issue.
- 06:41 PM Bug #55726 (Need More Info): Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on c...
- It would be really helpful to compare logs around @choose_acting@ from Nautilus vs Octopus.
- 06:32 PM Backport #55768 (Resolved): pacific: rados_api_tests: LibRadosWatchNotify.AioWatchNotify2 fails
- https://github.com/ceph/ceph/pull/46499
- 06:32 PM Backport #55767 (Rejected): octopus: rados_api_tests: LibRadosWatchNotify.AioWatchNotify2 fails
- https://github.com/ceph/ceph/pull/46500
- 06:28 PM Bug #45868 (Pending Backport): rados_api_tests: LibRadosWatchNotify.AioWatchNotify2 fails
- 06:27 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- Let me paste a Laura's comment from https://github.com/ceph/ceph/pull/45825:
> @NitzanMordhai perhaps similar logi... - 06:11 PM Bug #46847: Loss of placement information on OSD reboot
- Notes from the bug scrub:
1. There is a theoretical way to enter backfill instead of recovery in such a scenario.
... - 05:57 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- https://tracker.ceph.com/issues/53685 shows the issue is not restricted just to @MOSDPGLog@.
- 05:56 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- The investigation doc: https://docs.google.com/document/d/1s-Vzv3yLTMSO8Hz_MHMg5ix1v53P4jlN6dX1L06yyls/edit#.
- 03:38 PM Backport #55743 (In Progress): octopus: "wait_for_recovery: failed before timeout expired" during...
- 03:37 PM Backport #55745 (In Progress): pacific: "wait_for_recovery: failed before timeout expired" during...
- 03:33 PM Backport #55744 (Resolved): quincy: "wait_for_recovery: failed before timeout expired" during thr...
- 03:01 PM Backport #55624 (Resolved): quincy: Unable to format `ceph config dump` command output in yaml us...
- 12:12 PM Feature #55764 (New): Adaptive mon_warn_pg_not_deep_scrubbed_ratio according to actual scrub thro...
- This request comes from the Science Users Working Group https://pad.ceph.com/p/Ceph_Science_User_Group_20220524
Fo... - 06:54 AM Bug #55355: osd thread deadlock
- ...
- 06:53 AM Bug #55355: osd thread deadlock
- 910583--wait-->910587: (gdb) t 32 wait MgrClient::lock (owner=910587)
910587--wait-->910429: (gdb) t 62 hold MgrClie... - 03:43 AM Bug #55355: osd thread deadlock
- thread-35 : holding AsyncMessenger::lock, waiting AsyncConnection::lock
thread-3: holding AsyncConnection::lock, wai... - 03:12 AM Bug #55355: osd thread deadlock
- ...
- 03:06 AM Bug #55355: osd thread deadlock
- ceph v15.2.13
I found an almost identical stack waiting for a lock...
05/24/2022
- 03:51 PM Backport #55744 (In Progress): quincy: "wait_for_recovery: failed before timeout expired" during ...
- 10:40 AM Bug #55662 (Rejected): EC: Clay assert fail ../src/osd/ECBackend.cc: 685: FAILED ceph_assert(pop....
- 10:39 AM Bug #55662: EC: Clay assert fail ../src/osd/ECBackend.cc: 685: FAILED ceph_assert(pop.data.length...
- The test needed osd_read_ec_check_for_errors to be set to true, when it is set, the EIO error is ignored and we can g...
- 03:04 AM Bug #55750: mon: slow request of very long time
- It appears that this mon request has been completed,but it have no erase from ops_in_flight_sharded?
- 02:47 AM Bug #55750: mon: slow request of very long time
- ...
- 02:45 AM Bug #55750 (Need More Info): mon: slow request of very long time
- ...
- 02:38 AM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
- We are seeing this bug in Nautilus 14.2.15 to 14.2.22 replicated pool.
Two of our osds are stuck in a crash loop ...
05/23/2022
- 11:56 PM Backport #55747 (Resolved): pacific: Support blocklisting a CIDR range
- https://github.com/ceph/ceph/pull/46470
- 11:56 PM Backport #55746 (Resolved): quincy: Support blocklisting a CIDR range
- https://github.com/ceph/ceph/pull/46469
- 11:52 PM Feature #53050: Support blocklisting a CIDR range
- The Backport field was empty, therefore no backport tickets were created.
- 11:32 PM Backport #55745 (Resolved): pacific: "wait_for_recovery: failed before timeout expired" during th...
- https://github.com/ceph/ceph/pull/46391
- 11:32 PM Backport #55744 (Resolved): quincy: "wait_for_recovery: failed before timeout expired" during thr...
- https://github.com/ceph/ceph/pull/46384
- 11:31 PM Backport #55743 (Resolved): octopus: "wait_for_recovery: failed before timeout expired" during th...
- https://github.com/ceph/ceph/pull/46392
- 11:26 PM Bug #51076 (Pending Backport): "wait_for_recovery: failed before timeout expired" during thrashos...
- 09:24 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- /a/yuriw-2022-05-19_18:50:25-rados-wip-yuri4-testing-2022-05-19-0831-quincy-distro-default-smithi/6841763
Descriptio... - 08:16 PM Backport #55742 (In Progress): quincy: monitor cluster logs(ceph.log) appear empty until rotated
- 07:55 PM Backport #55742 (Resolved): quincy: monitor cluster logs(ceph.log) appear empty until rotated
- https://github.com/ceph/ceph/pull/46374
- 07:50 PM Bug #55383 (Pending Backport): monitor cluster logs(ceph.log) appear empty until rotated
05/21/2022
- 11:45 PM Backport #55306 (In Progress): quincy: prometheus metrics shows incorrect ceph version for upgrad...
- 11:41 PM Backport #55306: quincy: prometheus metrics shows incorrect ceph version for upgraded ceph daemon
- including this in https://github.com/ceph/ceph/pull/46360
05/20/2022
- 03:33 PM Bug #55726: Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on clients
- https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NQKDCBJ2SH3DTUCMV6KU4T3EGKOSCGJV/
- 02:14 PM Bug #55726 (Need More Info): Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on c...
- Hi
I observed high latencies and mount points hanging since Octopus release
and it's still observed on Pacific l... - 03:24 PM Bug #46847: Loss of placement information on OSD reboot
- we also encounter the similar issue and when ecpool during rebalance; sometime (osd overload or pg peering crash), th...
05/19/2022
- 10:01 PM Bug #51076 (Fix Under Review): "wait_for_recovery: failed before timeout expired" during thrashos...
- 09:19 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
- Neha Ojha wrote:
> Laura Flores wrote:
> > /a/yuriw-2022-03-25_18:42:52-rados-wip-yuri7-testing-2022-03-24-1341-pac... - 01:55 PM Bug #55711 (Fix Under Review): mon: race condition between `mgr fail` and MgrMonitor::prepare_bea...
- 01:51 PM Bug #55711 (Resolved): mon: race condition between `mgr fail` and MgrMonitor::prepare_beacon()
- https://gist.github.com/rzarzynski/25ac59c8422e9ad0b1710a765a77f19a#the-race-condition
- 06:01 AM Bug #55708 (Fix Under Review): Reducing 2 Monitors Causes Stray Daemon
- Example of the problem:
Roles:
smithi001: mon.a
smithi002: mon.b
smithi070: mon.c
smithi100 : mon.d
smithi2... - 05:27 AM Bug #55662: EC: Clay assert fail ../src/osd/ECBackend.cc: 685: FAILED ceph_assert(pop.data.length...
- i used /qa/standalone/erasure-code/test-erasure-eio.sh, the test that failed is TEST_ec_object_attr_read_error when i...
05/18/2022
- 09:09 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-05-13_14:13:55-rados-wip-yuri3-testing-2022-05-12-1609-octopus-distro-default-smithi/6832699
- 08:56 PM Bug #52316: qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quorum']) == len(mons)
- /a/yuriw-2022-05-13_14:13:55-rados-wip-yuri3-testing-2022-05-12-1609-octopus-distro-default-smithi/6832711...
- 07:37 PM Bug #53485 (Fix Under Review): monstore: logm entries are not garbage collected
- 01:37 PM Bug #53485: monstore: logm entries are not garbage collected
- PR https://github.com/ceph/ceph/pull/44511
- 06:26 PM Bug #55662: EC: Clay assert fail ../src/osd/ECBackend.cc: 685: FAILED ceph_assert(pop.data.length...
- Can you please add the test that helped you discover this issue? I believe the same test was passing with other EC pl...
- 06:17 PM Bug #55407: quincy osd's fail to boot and crash
- This looks like something new and unrelated to other crashes in this ticket, so created a new one: https://tracker.ce...
- 06:17 PM Bug #51858: octopus: rados/test_crash.sh failure
- /a/nojha-2022-05-17_22:38:06-rados-wip-lrc-fix-pacific-distro-basic-smithi/6839177
- 06:17 PM Bug #55698 (New): osd: segfault at boot up
- In the https://tracker.ceph.com/issues/55407#note-14 an OSD crash during early boot up is reported:...
- 06:09 PM Bug #55559: osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
- The common theme between these failures (this one and #47026) is @check()@ function of @qa/standalone/osd-backfill/os...
- 03:02 PM Bug #55695: Shutting down a monitor forces Paxos to restart and sometimes disregard subsequent co...
- https://docs.google.com/document/d/1ucVz54vMlm26oiqQoqJ2upUPmiROd4AmwSwbVM_s2A0/edit#
- 03:01 PM Bug #55695 (Fix Under Review): Shutting down a monitor forces Paxos to restart and sometimes disr...
- *Problem:*
mon.a
mon.b
mon.c
mon.d
mon.e
ceph -a stop mon.d
ceph mon remove d
.
.
mon.d is down...
05/17/2022
- 11:21 PM Feature #55693 (Fix Under Review): Limit the Health Detail MSG log size in cluster logs
- RHBZ# https://bugzilla.redhat.com/show_bug.cgi?id=2087527
Version-Release number of selected component (if applica... - 10:58 PM Backport #55513: quincy: mount.ceph fails to understand AAAA records from SRV record
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46113
merged - 10:56 PM Backport #55280: quincy: mon/OSDMonitor: properly set last_force_op_resend in stretch mode
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45871
merged - 09:17 PM Bug #53485: monstore: logm entries are not garbage collected
- Sorry, forgot to add - we faced this issue on v15.2.13 and on v14.2.22 as well.
- 09:17 PM Bug #53485: monstore: logm entries are not garbage collected
- We observed this several times on a customer side. each out of 3 mon store.db rapidly growing, had tons of logm keys ...
- 06:05 PM Backport #52077 (Resolved): octopus: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
- 09:15 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
- ...
- 01:38 AM Bug #52884: osd: optimize pg peering latency when add new osd that need backfill
- ...
05/16/2022
- 05:07 PM Bug #55670 (Fix Under Review): osdmaptool is not mapping child pgs to the target OSDs
- 09:01 AM Bug #55670 (Fix Under Review): osdmaptool is not mapping child pgs to the target OSDs
- Step to reproduce the issue:
1. ceph osd getmap > osdmap.bin
2. ./bin/osdmaptool --test-map-pgs-dump --pool <pool... - 02:44 PM Bug #55559 (Duplicate): osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
- 02:41 PM Bug #55559: osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
- I opened a new issue since a different test failed this time. The failure does look the same though, so maybe the one...
- 02:43 PM Bug #47026: osd-backfill-stats.sh fails in TEST_backfill_ec_down_all_out
- This was originally tracked in #55559. A different test was affected (TEST_backfill_ec_prim_out), but the failure loo...
- 09:40 AM Bug #52884: osd: optimize pg peering latency when add new osd that need backfill
- https://github.com/ceph/ceph/pull/46281
add codes for master branch - 09:17 AM Bug #55407: quincy osd's fail to boot and crash
- Hi! Today I've just build master again.
Rebuilt osd and on first boot:
...
-79> 2022-05-16T09:10:21.707+0000... - 08:51 AM Bug #55669: osd: add log for pg peering and activiting complete
- https://github.com/ceph/ceph/pull/46279
- 08:51 AM Bug #55669 (New): osd: add log for pg peering and activiting complete
- ...
- 08:22 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
- Since I was developing on ceph15.2.13, I did not adapt to the master.
If there is no problem with the review, consi... - 08:21 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
- ...
- 08:16 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
- ...
- 08:07 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
- ...
- 08:06 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
- ...
- 08:05 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
- The problem was first described at this link:
https://tracker.ceph.com/issues/54966?next_issue_id=54965 - 08:03 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
- https://github.com/ceph/ceph/pull/46276
- 07:56 AM Bug #55668 (New): osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recover...
- problem:...
- 06:34 AM Bug #53327: osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_notify...
- Manuel Lausch wrote:
> Hi Nitzan,
> I checked your patch on the current pacific branch.
>
> unfortunately I stil... - 06:30 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
- pull_request: https://github.com/ceph/ceph/pull/46273
- 04:09 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
- For scenarios where a single node has both mon and osd, and the number of osd on a single node is large, or the mon e...
- 03:12 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
- My test result:...
- 03:11 AM Bug #55665 (Fix Under Review): osd: osd_fast_fail_on_connection_refused will cause the mon to con...
- The first issue is described at https://tracker.ceph.com/issues/55067
Problem Description:... - 04:10 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
- https://tracker.ceph.com/issues/55665?next_issue_id=55662
Also available in: Atom