Activity
From 07/26/2022 to 08/24/2022
08/24/2022
- 06:31 PM Backport #57288 (Resolved): pacific: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<Paralle...
- https://github.com/ceph/ceph/pull/45582
- 06:30 PM Backport #57288 (Rejected): pacific: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<Paralle...
- 06:30 PM Backport #57289 (Rejected): quincy: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<Parallel...
- 06:29 PM Bug #53000 (Pending Backport): OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMap...
- 06:27 PM Bug #53000 (Fix Under Review): OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMap...
- 06:24 PM Bug #51168 (New): ceph-osd state machine crash during peering process
- I plan to work on this one and combine with implementing the backfill cancellation in crimson. However, not a terribl...
- 06:16 PM Bug #50536: "Command failed (workunit test rados/test.sh)" - rados/test.sh times out on master.
- It think the reoccurrences are about failure in a different place – at least in the latest one the @LibRadosServicePP...
- 06:06 PM Bug #57165: expected valgrind issues and found none
- Bumped the priority up as I'm afraid the longer we wait with ensuring valgrind is fully operational, the greater is t...
- 12:19 AM Bug #57165: expected valgrind issues and found none
- What I'm seeing is that the jobs in question were told to expect valgrind errors via the @expect_valgrind_errors: tru...
- 05:50 PM Bug #57163: free(): invalid pointer
- How about having it as _High_?
- 04:15 PM Feature #57180: option for pg_autoscaler to retain same state of existing pools when upgrading to...
- Downstream ceph-ansible BZ - https://bugzilla.redhat.com/show_bug.cgi?id=2121097
- 03:50 PM Feature #57180 (Rejected): option for pg_autoscaler to retain same state of existing pools when u...
- 12:22 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- Alexandre Marangone wrote:
> Attached the debug_osd 20 logs for one the OSD. I turned off (deep)scrub cause the logs...
08/23/2022
- 11:27 PM Bug #57163: free(): invalid pointer
- Maybe "urgent" is too dramatic, but this seems to be affecting a lot of tests in main.
- 10:50 PM Bug #57163: free(): invalid pointer
- /a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986255...
- 04:19 PM Bug #57163: free(): invalid pointer
- Local Run with -fsanitize=address warns about a data race at the same stage, may be relevant....
- 03:53 PM Bug #57163: free(): invalid pointer
- Kefu Chai wrote:
> /a/kchai-2022-08-23_13:19:39-rados-wip-kefu-testing-2022-08-22-2243-distro-default-smithi/6987883... - 03:33 PM Bug #57163: free(): invalid pointer
- This failure (as of right now) only occurs on Ubuntu 20.04. See https://github.com/ceph/ceph/pull/47642 for some exam...
- 03:11 PM Bug #57163: free(): invalid pointer
- /a/kchai-2022-08-23_13:19:39-rados-wip-kefu-testing-2022-08-22-2243-distro-default-smithi/6987883/teuthology.log
- 10:13 PM Bug #57165: expected valgrind issues and found none
- To me, this seems like a Teuthology failure. Perhaps Zack Cerza can rule this theory in/out.
In any case, it look... - 10:09 PM Bug #57165: expected valgrind issues and found none
- /a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986197
- 07:48 PM Feature #57180: option for pg_autoscaler to retain same state of existing pools when upgrading to...
- We discussed this one and the issue is in the following rolling upgrade playbook -
https://github.com/ceph/ceph-a... - 07:03 PM Feature #57180: option for pg_autoscaler to retain same state of existing pools when upgrading to...
- Update:
Since all the upgrade suites in Pacific turn the autoscaler off by default. I had to write a new upgrade t... - 07:00 PM Bug #57267 (New): Valgrind reports memory "Leak_IndirectlyLost" errors on ceph-mon in "KeyServerD...
- /a/yuriw-2022-08-19_20:57:42-rados-wip-yuri6-testing-2022-08-19-0940-pacific-distro-default-smithi/6981517/remote/smi...
- 05:30 PM Backport #56641 (In Progress): quincy: Log at 1 when Throttle::get_or_fail() fails
- https://github.com/ceph/ceph/pull/47765
- 05:17 PM Backport #56642 (In Progress): pacific: Log at 1 when Throttle::get_or_fail() fails
- https://github.com/ceph/ceph/pull/47764
- 03:32 PM Bug #57122 (Resolved): test failure: rados:singleton-nomsgr librados_hello_world
- 03:30 PM Backport #57258 (Resolved): pacific: Assert in Ceph messenger
- https://github.com/ceph/ceph/pull/48255
- 03:30 PM Backport #57257 (Resolved): quincy: Assert in Ceph messenger
- https://github.com/ceph/ceph/pull/47931
- 03:28 PM Bug #55851 (Pending Backport): Assert in Ceph messenger
- 03:12 PM Bug #56147 (Resolved): snapshots will not be deleted after upgrade from nautilus to pacific
- 03:11 PM Backport #56579 (Resolved): pacific: snapshots will not be deleted after upgrade from nautilus to...
- 03:09 PM Backport #56579: pacific: snapshots will not be deleted after upgrade from nautilus to pacific
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47134
merged - 09:29 AM Bug #50536: "Command failed (workunit test rados/test.sh)" - rados/test.sh times out on master.
- /a/yuriw-2022-08-22_21:19:34-rados-wip-yuri4-testing-2022-08-18-1020-pacific-distro-default-smithi/6986471
- 07:54 AM Backport #54386 (Resolved): octopus: [RFE] Limit slow request details to mgr log
08/22/2022
- 08:52 PM Backport #57029: pacific: rados/test.sh: Early exit right after LibRados global tests complete
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47451
merged - 06:24 PM Bug #51168: ceph-osd state machine crash during peering process
- Radoslaw Zarzynski wrote:
> The PG was in @ReplicaActive@ so we shouldn't see any backfill activity. A delayed event... - 03:50 PM Bug #53000: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMapper::Item>::_void_d...
- Quincy PR:
https://jenkins.ceph.com/job/ceph-pull-requests/102036/consoleFull - 03:31 PM Bug #43268: Restrict admin socket commands more from the Ceph tool
- Radek, I think this was misunderstood. It's a security issue that resulted from exposing all admin socket commands vi...
- 01:24 PM Bug #57152: segfault in librados via libcephsqlite
- Matan Breizman wrote:
> I have managed to reproduce similar segfault.
> The relevant code:
> https://github.com/ce... - 08:44 AM Bug #57152: segfault in librados via libcephsqlite
- I have managed to reproduce similar segfault.
The relevant code:
https://github.com/ceph/ceph/blob/main/src/SimpleR... - 09:45 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Seen in these recent pacific runs:
1. https://pulpito.ceph.com/yuriw-2022-08-18_23:16:33-fs-wip-yuri10-testing-202...
08/21/2022
- 06:39 AM Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific
- Stefan Kooman wrote:
> Is this bug also affecting rbd snapshots / clones?
Yes
08/19/2022
- 11:24 PM Backport #55157: quincy: mon: config commands do not accept whitespace style config name
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47381
merged - 09:27 PM Backport #57209 (Resolved): quincy: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
- https://github.com/ceph/ceph/pull/47932
- 09:27 PM Backport #57208 (In Progress): pacific: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
- 09:18 PM Bug #49727 (Pending Backport): lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
- /a/yuriw-2022-08-11_16:46:00-rados-wip-yuri3-testing-2022-08-11-0809-pacific-distro-default-smithi/6968195...
- 04:21 PM Bug #56147: snapshots will not be deleted after upgrade from nautilus to pacific
- Is this bug also affecting rbd snapshots / clones?
- 11:34 AM Backport #55631: pacific: ceph-osd takes all memory before oom on boot
- Off-line fix: https://github.com/ceph/ceph/pull/46252
Online fix: https://github.com/ceph/ceph/pull/47701 - 10:26 AM Backport #55631 (In Progress): pacific: ceph-osd takes all memory before oom on boot
- Unresolving as the ultimate fix is consisted of 2 PRs (off-line + on-line trimming).
- 10:28 AM Backport #55632: quincy: ceph-osd takes all memory before oom on boot
- Radoslaw Zarzynski wrote:
> Unresolving as the ultimate fix is consisted of 2 PRs while the 2nd one is under review.... - 10:24 AM Backport #55632 (In Progress): quincy: ceph-osd takes all memory before oom on boot
- Unresolving as the ultimate fix is consisted of 2 PRs while the 2nd one is under review.
- 10:06 AM Backport #55632: quincy: ceph-osd takes all memory before oom on boot
- The online (by OSD in contrast to the COT-based off-line one) is here: https://github.com/ceph/ceph/pull/47688.
- 07:48 AM Bug #57190 (New): pg shard status inconsistency in one pg
- ...
- 05:57 AM Backport #55309 (In Progress): pacific: prometheus metrics shows incorrect ceph version for upgra...
- 05:56 AM Backport #55309 (New): pacific: prometheus metrics shows incorrect ceph version for upgraded ceph...
- 04:17 AM Backport #55309: pacific: prometheus metrics shows incorrect ceph version for upgraded ceph daemon
- Reverted backport PR#46429 from pacific (revert PR https://github.com/ceph/ceph/pull/46921) due to tracker https://tr...
- 05:54 AM Backport #55308 (In Progress): pacific: Manager is failing to keep updated metadata in daemon_sta...
- 05:52 AM Backport #55308 (New): pacific: Manager is failing to keep updated metadata in daemon_state for u...
- 04:16 AM Backport #55308: pacific: Manager is failing to keep updated metadata in daemon_state for upgrade...
- We had to revert backport PR#46427 from pacific (revert PR https://github.com/ceph/ceph/pull/46920) due to https://tr...
08/18/2022
- 07:51 PM Bug #23117 (Fix Under Review): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has ...
- 07:50 PM Bug #23117 (Duplicate): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has been ex...
- 07:50 PM Bug #57185 (Duplicate): EC 4+2 PG stuck in activating+degraded+remapped
- 07:49 PM Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped
- This should have been easily caught if we had this implemented:
https://tracker.ceph.com/issues/23117
https://git... - 07:48 PM Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped
- Tim - Two workaround as you see your script sometimes balancer module keeps changing PGs stat that is not a valid tes...
- 07:43 PM Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped
- - Here we were testing OSD failure - recovery/backfill for mClock Scheduler and for this we bring in phases one OSD n...
- 07:41 PM Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped
- We did capture the debug logs(debug_osd = 20 and debug_ms = 1) and they are here - f28-h28-000-r630.rdu2.scalelab.red...
- 07:25 PM Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped
- From Cluster logs:...
- 05:31 PM Bug #57185 (Duplicate): EC 4+2 PG stuck in activating+degraded+remapped
- - PG Query...
- 04:27 PM Feature #57180 (Rejected): option for pg_autoscaler to retain same state of existing pools when u...
- Currently, any version of Ceph that is >= Pacific will have autoscaler enabled by default even for existing pools.
W... - 03:20 PM Bug #57152: segfault in librados via libcephsqlite
- Patrick Donnelly wrote:
> Matan Breizman wrote:
> > > So the problem is that gcc 8.5, which compiles successfully, ... - 01:19 PM Bug #57152: segfault in librados via libcephsqlite
- Matan Breizman wrote:
> > So the problem is that gcc 8.5, which compiles successfully, generates code which causes t... - 01:11 PM Bug #57152: segfault in librados via libcephsqlite
- > So the problem is that gcc 8.5, which compiles successfully, generates code which causes the segfault?
From the ... - 12:54 PM Bug #57152: segfault in librados via libcephsqlite
- Matan Breizman wrote:
> This PR seems to resolve the compilation errors mentioned above.
> Please let me know your ... - 12:44 PM Bug #57152: segfault in librados via libcephsqlite
- This PR seems to resolve the compilation errors mentioned above.
Please let me know your thoughts. - 11:59 AM Bug #57152: segfault in librados via libcephsqlite
- There seems to be an issue with the gcc version used to compile, I noticed similar issue when compiling `examples/lib...
- 03:18 PM Backport #57030: quincy: rados/test.sh: Early exit right after LibRados global tests complete
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47452
merged - 03:16 PM Backport #56578 (Resolved): quincy: snapshots will not be deleted after upgrade from nautilus to ...
- 03:15 PM Backport #56578: quincy: snapshots will not be deleted after upgrade from nautilus to pacific
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47133
merged - 11:44 AM Bug #45721 (Fix Under Review): CommandFailedError: Command failed (workunit test rados/test_pytho...
- 07:48 AM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
- /a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973760
- 07:22 AM Bug #50536: "Command failed (workunit test rados/test.sh)" - rados/test.sh times out on master.
- OSError: Socket is closed
/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-defau... - 06:40 AM Bug #57165: expected valgrind issues and found none
- /a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973889
/a/yuriw-2...
08/17/2022
- 07:05 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- Although there was a report from Telemetry, we still need more logs (read: a reoccurance at Sepia) which, hopefully, ...
- 07:00 PM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
- It looks like this must be some inconsistency between the head_obc->blocked state, and its presence in objects_blocke...
- 06:56 PM Bug #56661 (Need More Info): Quincy: OSD crashing one after another with data loss with ceph_asse...
- Moving into _Need More Info_ :-( per Myoungwon Oh's comment.
- 06:41 PM Bug #57017 (Fix Under Review): mon-stretched_cluster: degraded stretched mode lead to Monitor crash
- 06:34 PM Bug #45702: PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter...
- We're poking with @read_log_and_missing()@ pretty recently (the dups issue). Does it ring a bell?
- 06:32 PM Bug #45702: PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter...
- @Ronen: @debug_verify_stored_missing@ a an input parameter with default value. Unfortunately, it doesn't look like a ...
- 06:24 PM Bug #45702: PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter...
- Even if we don't want to deep dive into right now, we should refactor the assertion:...
- 06:19 PM Bug #57147: qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failure
- Neha Ojha wrote:
> How reproducible is this? Following logs indicate that we ran out of space.
>
We have seen t... - 02:25 PM Bug #57147: qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failure
- How reproducible is this? Following logs indicate that we ran out of space....
- 06:04 PM Bug #57152: segfault in librados via libcephsqlite
- The reporter on the ML shared the .mgr pool in question:...
- 01:59 PM Bug #57165 (Resolved): expected valgrind issues and found none
- ...
- 01:42 PM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
- a/teuthworker/archive/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975387
- 01:35 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- ...
- 01:24 PM Bug #57119 (Fix Under Review): Heap command prints with "ceph tell", but not with "ceph daemon"
- 01:12 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- There is another thing we should unify: @daemon@ and @tell@ differently treat @ss@ (the error stream) when the error ...
- 12:48 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- Similar problem may affect e.g. @flush_store_cache@:...
- 12:27 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- Oops, this is the case:...
- 12:25 PM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- The interface between `outbl` and the formatter is described at the beginning of the function:...
- 12:27 AM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- Appending the stringstream to the "outbl" bufferlist here makes the output print with the "daemon" version, but it is...
- 12:11 AM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- ...
- 12:08 AM Bug #57119: Heap command prints with "ceph tell", but not with "ceph daemon"
- ...
- 01:21 PM Bug #57163 (Resolved): free(): invalid pointer
- ...
08/16/2022
- 07:28 PM Bug #57152 (Resolved): segfault in librados via libcephsqlite
- We have a post on the ML about a segfault in the mgr:
"[ceph-users] Quincy: Corrupted devicehealth sqlite3 databas... - 06:07 PM Bug #57122 (Fix Under Review): test failure: rados:singleton-nomsgr librados_hello_world
- 06:00 PM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
- The fix is just to change any reference from "master" to "main":...
- 05:51 PM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
- A couple present in this run:
/a/yuriw-2022-08-15_18:43:38-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-sm... - 07:30 AM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
- What surprised me is we still use GNU Make there.
- 07:30 AM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
- Huh, this doesn't look like a compiler's failure. It happens earlier:...
- 01:36 PM Bug #57147 (New): qa: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) failure
- The teuthology link https://pulpito.ceph.com/yuriw-2022-08-11_16:57:01-fs-wip-yuri3-testing-2022-08-11-0809-pacific-d...
- 07:53 AM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
- How about adding to suppression file regarding the comment #1?
- 07:50 AM Bug #52513: BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
- Would you like to have a look? Neha confirms the logs mentioned in #9 are still available.
- 07:40 AM Bug #56097: Timeout on `sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ...
- NCB was added in Quincy, so breaking the relationship.
- 07:28 AM Bug #57136: ecpool pg stay active+clean+remapped
- Radoslaw Zarzynski wrote:
> It looks @osd.18@ isn't up. Could you please share the @ceph -s@ and @ceph osd tree@?
<... - 07:20 AM Bug #57136 (Need More Info): ecpool pg stay active+clean+remapped
- It looks @osd.18@ isn't up. Could you please share the @ceph -s@ and @ceph osd tree@?
- 05:29 AM Bug #57136 (Need More Info): ecpool pg stay active+clean+remapped
- I create a ec pool, the erasure code profile is:...
- 07:12 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Gonzalo Aguilar Delgado wrote:
> Wow I`m quite surprised to see this is taking so much time to be resolved. >
> Can... - 07:09 AM Bug #53729 (Fix Under Review): ceph-osd takes all memory before oom on boot
- 04:23 AM Backport #56134 (In Progress): quincy: scrub starts message missing in cluster log
08/15/2022
- 03:29 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- Attached the debug_osd 20 logs for one the OSD. I turned off (deep)scrub cause the logs were spammed with scrub error...
- 11:22 AM Backport #51173 (Rejected): nautilus: regression in ceph daemonperf command output, osd columns a...
- Nautilus is EOL
- 11:21 AM Bug #51115 (Resolved): When read failed, ret can not take as data len, in FillInVerifyExtent
- 11:21 AM Backport #51151 (Rejected): nautilus: When read failed, ret can not take as data len, in FillInVe...
- Nautilus is EOL
- 11:21 AM Bug #50978 (Resolved): unaligned access to member variables of crush_work_bucket
- 11:21 AM Backport #50985 (Rejected): nautilus: unaligned access to member variables of crush_work_bucket
- Nautilus is EOL
- 11:19 AM Bug #50763 (Resolved): osd: write_trunc omitted to clear data digest
- 11:19 AM Backport #50789 (Rejected): nautilus: osd: write_trunc omitted to clear data digest
- Nautilus is EOL
- 11:17 AM Bug #50745 (Resolved): max_misplaced was replaced by target_max_misplaced_ratio
- 11:17 AM Backport #50749 (Rejected): nautilus: max_misplaced was replaced by target_max_misplaced_ratio
- Nautilus is EOL
- 08:19 AM Bug #56386 (Can't reproduce): Writes to a cephfs after metadata pool snapshot causes inconsistent...
08/14/2022
- 08:59 AM Bug #57122: test failure: rados:singleton-nomsgr librados_hello_world
- Possibly related to the move to C++20? the makefile specifies c++11
- 08:58 AM Bug #57122 (Resolved): test failure: rados:singleton-nomsgr librados_hello_world
- e.g. rfriedma-2022-08-13_08:48:40-rados:singleton-nomsgr-wip-rf-like-main-distro-default-smithi/6970569
(the branc...
08/12/2022
- 08:18 PM Bug #57119 (Resolved): Heap command prints with "ceph tell", but not with "ceph daemon"
- *How to reproduce:*
# Start a vstart cluster, or access any working cluster
# Run `ceph tell <daemon>.<id> heap <he... - 02:21 PM Backport #57117 (Resolved): quincy: mon: race condition between `mgr fail` and MgrMonitor::prepar...
- https://github.com/ceph/ceph/pull/50979
- 02:17 PM Bug #55711 (Pending Backport): mon: race condition between `mgr fail` and MgrMonitor::prepare_bea...
- 01:55 PM Cleanup #56581 (Resolved): mon: fix ElectionLogic warnings
08/11/2022
- 11:54 PM Bug #56097: Timeout on `sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ...
- @Adam maybe you'd have an idea of what's going on here?
- 11:54 PM Bug #56097: Timeout on `sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ...
- This one went dead after awhile:
/a/yuriw-2022-08-04_20:43:31-rados-wip-yuri6-testing-2022-08-04-0617-pacific-distro... - 06:57 PM Bug #57105: quincy: ceph osd pool set <pool> size math error
- So I thought this may have been because I re-used the name, so I went to create a pool with a different name to conti...
- 06:54 PM Bug #57105: quincy: ceph osd pool set <pool> size math error
- Looks like one of the page groups is "inactive":...
- 06:46 PM Bug #57105 (Resolved): quincy: ceph osd pool set <pool> size math error
- Context, I created a pool with a block device and intentionally filled a set of OSDs.
This of course broke things,... - 04:07 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- I won't be able to rerun the patched branch until Monday. Haven't you been able to reproduce it? Feels trivial to so ...
- 05:01 AM Bug #56707: pglog growing unbounded on EC with copy by ref
- Alex, can you share logs of the osd that caused the 500s? my theory is peering un-match pglog since one osd (with the...
- 03:56 PM Bug #57074 (Duplicate): common: Latest version of main experiences build failures
- 02:16 PM Bug #57097 (Fix Under Review): ceph status does not report an application is not enabled on the p...
- 12:36 PM Bug #57097 (Pending Backport): ceph status does not report an application is not enabled on the p...
- If pool has 0 objects in it then ceph status (and ceph health detail) does not report application is not enabled for ...
- 12:30 PM Bug #57049 (Fix Under Review): cluster logging does not adhere to mon_cluster_log_file_level
- 10:25 AM Bug #52807 (Resolved): ceph-erasure-code-tool: new tool to encode/decode files
- 10:24 AM Backport #52808 (Rejected): nautilus: ceph-erasure-code-tool: new tool to encode/decode files
- Nautilus is EOL
- 10:24 AM Bug #52448 (Resolved): osd: pg may get stuck in backfill_toofull after backfill is interrupted du...
- 10:24 AM Backport #52832 (Rejected): nautilus: osd: pg may get stuck in backfill_toofull after backfill is...
- 10:24 AM Backport #52832 (Resolved): nautilus: osd: pg may get stuck in backfill_toofull after backfill is...
- Nautilus is EOL
- 10:23 AM Backport #52938 (Rejected): nautilus: Primary OSD crash caused corrupted object and further crash...
- Nautilus is EOL
- 10:21 AM Backport #52771 (Rejected): nautilus: pg scrub stat mismatch with special objects that have hash ...
- Nautilus is EOL
- 10:20 AM Bug #42742 (Resolved): "failing miserably..." in Infiniband.cc
- 10:20 AM Backport #42848 (Rejected): nautilus: "failing miserably..." in Infiniband.cc
- Nautilus is EOL
- 10:20 AM Bug #43656 (Resolved): AssertionError: not all PGs are active or peered 15 seconds after marking ...
- 10:20 AM Backport #43776 (Rejected): nautilus: AssertionError: not all PGs are active or peered 15 seconds...
- 10:19 AM Backport #43776 (Resolved): nautilus: AssertionError: not all PGs are active or peered 15 seconds...
- Nautilus is EOL
08/10/2022
- 02:35 PM Bug #57074: common: Latest version of main experiences build failures
- @Kefu the main issue seems to be that install-deps is broken for Centos 8 Stream; currently, it halts when trying to ...
- 06:13 AM Bug #57074: common: Latest version of main experiences build failures
- probably another thing i can do is to enable cmake to error out if a C++ compiler not compliant with C++20 standard i...
- 06:11 AM Bug #57074: common: Latest version of main experiences build failures
- Laura, i am sorry for the inconvenience.
this is expected if the stock gcc compiler on an aged (even not ancient) ... - 05:17 AM Bug #57074: common: Latest version of main experiences build failures
- I have encountered these compilation errors on Ubuntu 20.4.0. Basically you need gcc version >= 10.1 Using install-de...
- 02:03 PM Fix #56709 (Resolved): test/osd/TestPGLog: Fix confusing description between log and olog.
- 10:44 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Seen in these pacific runs
1. https://pulpito.ceph.com/yuriw-2022-08-04_20:54:08-fs-wip-yuri6-testing-2022-08-04-061... - 07:15 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Seen in https://pulpito.ceph.com/yuriw-2022-08-09_15:36:21-fs-wip-yuri8-testing-2022-08-03-1028-quincy-distro-default...
- 07:11 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Seen in https://pulpito.ceph.com/yuriw-2022-08-04_11:54:20-fs-wip-yuri8-testing-2022-08-03-1028-quincy-distro-default...
- 08:48 AM Bug #43813 (Resolved): objecter doesn't send osd_op
- 08:48 AM Backport #43992 (Rejected): nautilus: objecter doesn't send osd_op
- Nautilus is EOL
- 08:48 AM Bug #52486 (Closed): test tracker: please ignore
- 08:47 AM Backport #52498 (Rejected): nautilus: test tracker: please ignore
- 08:47 AM Backport #52497 (Rejected): octopus: test tracker: please ignore
- 08:47 AM Backport #52495 (Rejected): pacific: test tracker: please ignore
- 07:54 AM Bug #52509 (Can't reproduce): PG merge: PG stuck in premerge+peered state
- We have never experienced this problem again
- 05:56 AM Backport #57076 (In Progress): pacific: Invalid read of size 8 in handle_recovery_delete()
08/09/2022
- 07:49 PM Bug #57074: common: Latest version of main experiences build failures
- Per https://en.cppreference.com/w/cpp/compiler_support/20 (found by Mark Nelson), only some features were enabled in ...
- 05:23 PM Bug #57074: common: Latest version of main experiences build failures
- '-std=c++2a' seems to be the way that gcc versions < 9 add support for C++20, per https://gcc.gnu.org/projects/cxx-st...
- 05:08 PM Bug #57074: common: Latest version of main experiences build failures
- The gcc version running here is 8.5.0:
$ gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-15)
For the out... - 04:52 PM Bug #57074: common: Latest version of main experiences build failures
- one thing that stands out from the command line is '-std=c++2a' instead of '-std=c++20'. what compiler version is run...
- 03:07 PM Bug #57074 (Duplicate): common: Latest version of main experiences build failures
- Built on:...
- 07:22 PM Backport #56663 (Resolved): pacific: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should...
- 04:52 PM Backport #56663: pacific: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should be gap >= ...
- Kamoltat Sirivadhna wrote:
> https://github.com/ceph/ceph/pull/47211
merged - 07:22 PM Backport #56664 (Resolved): quincy: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should ...
- 04:43 PM Backport #56664: quincy: mgr/DaemonServer:: adjust_pgs gap > max_pg_num_change should be gap >= m...
- Kamoltat Sirivadhna wrote:
> https://github.com/ceph/ceph/pull/47210
merged - 05:59 PM Backport #57025 (Resolved): quincy: test_pool_min_size:AssertionError:wait_for_clean:failed befor...
- 05:57 PM Backport #57024 (Resolved): quincy: test_pool_min_size: 'check for active or peered' reached maxi...
- 05:56 PM Backport #57019 (Resolved): quincy: test_pool_min_size: AssertionError: not clean before minsize ...
- 05:43 PM Backport #57076 (Resolved): pacific: Invalid read of size 8 in handle_recovery_delete()
- https://github.com/ceph/ceph/pull/47525
- 04:51 PM Backport #56099: pacific: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
- Laura Flores wrote:
> https://github.com/ceph/ceph/pull/46748
merged - 04:44 PM Bug #55153 (Resolved): Make the mClock config options related to [res, wgt, lim] modifiable durin...
- 04:44 PM Backport #56498 (Resolved): quincy: Make the mClock config options related to [res, wgt, lim] mod...
- 04:40 PM Backport #56498: quincy: Make the mClock config options related to [res, wgt, lim] modifiable dur...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/47020
merged - 04:41 PM Bug #55435: mon/Elector: notify_ranked_removed() does not properly erase dead_ping in the case of...
- https://github.com/ceph/ceph/pull/47086 merged
- 04:06 PM Bug #52124 (Pending Backport): Invalid read of size 8 in handle_recovery_delete()
- 02:36 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-08-04_11:58:29-rados-wip-yuri3-testing-2022-08-03-0828-pacific-distro-default-smithi/6958376
- 03:32 PM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
- /a/yuriw-2022-08-08_22:19:17-rados-wip-yuri-testing-2022-08-08-1230-quincy-distro-default-smithi/6962388/
- 01:53 PM Bug #45318: Health check failed: 2/6 mons down, quorum b,a,c,e (MON_DOWN)" in cluster log running...
- /a/yuriw-2022-08-04_11:58:29-rados-wip-yuri3-testing-2022-08-03-0828-pacific-distro-default-smithi/6958138
- 12:35 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- Hi,
Another longer one.
OSD.25, data on sdh, db on sdb
- 11:13 AM Bug #56530 (Resolved): Quincy: High CPU and slow progress during backfill
- 11:13 AM Backport #57052 (Resolved): quincy: Quincy: High CPU and slow progress during backfill
- 09:11 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Seen in recent quincy run https://pulpito.ceph.com/yuriw-2022-08-02_21:20:37-fs-wip-yuri7-testing-2022-07-27-0808-qui...
- 08:08 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- Seeing this in a Quincy run:
/a/yuriw-2022-08-08_22:19:32-rados-wip-yuri4-testing-2022-08-08-1009-quincy-distro-defa... - 06:58 AM Bug #47589: radosbench times out "reached maximum tries (800) after waiting for 4800 seconds"
- Seeing this on a Quincy run:
/a/yuriw-2022-08-08_22:19:32-rados-wip-yuri4-testing-2022-08-08-1009-quincy-distro-defa... - 06:34 AM Backport #49775 (Rejected): nautilus: Get more parallel scrubs within osd_max_scrubs limits
- Nautilus is EOL
08/08/2022
- 03:53 PM Bug #57061 (Fix Under Review): Use single cluster log level (mon_cluster_log_level) config to con...
- 02:56 PM Bug #57061 (Fix Under Review): Use single cluster log level (mon_cluster_log_level) config to con...
- We donot control the verbosity of the cluster logs which are getting logged to stderr, graylog and journald. Each Log...
- 12:19 PM Bug #49231: MONs unresponsive over extended periods of time
- We are planning to upgrade to Octopus. However, I do not believe we can reproduce the issue here. The above config ha...
- 08:51 AM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
- Dan van der Ster wrote:
> Good point. In fact it is sufficient to just create some files in the cephfs after taking ... - 08:43 AM Backport #51497 (Rejected): nautilus: mgr spamming with repeated set pgp_num_actual while merging
- Nautilus is EOL
- 08:42 AM Bug #48212 (Resolved): poollast_epoch_clean floor is stuck after pg merging
- Nautilus is EOL
- 08:41 AM Backport #52644 (Rejected): nautilus: pool last_epoch_clean floor is stuck after pg merging
- Nautilus EOL
- 04:50 AM Backport #57052 (In Progress): quincy: Quincy: High CPU and slow progress during backfill
- 04:30 AM Backport #57052 (Resolved): quincy: Quincy: High CPU and slow progress during backfill
- https://github.com/ceph/ceph/pull/47490
- 04:27 AM Bug #56530 (Pending Backport): Quincy: High CPU and slow progress during backfill
08/07/2022
- 09:38 AM Bug #45702: PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter...
- http://pulpito.front.sepia.ceph.com/rfriedma-2022-08-06_12:17:03-rados-wip-rf-snprefix-distro-default-smithi/6960416/...
08/06/2022
- 10:06 PM Backport #55631: pacific: ceph-osd takes all memory before oom on boot
- Who is in charge of this one? Is there any advance?
- 10:03 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Wow I`m quite surprised to see this is taking so much time to be resolved. >
Can someone do a small recap on what's ...
08/05/2022
- 04:13 PM Bug #57049 (Duplicate): cluster logging does not adhere to mon_cluster_log_file_level
- Even after setting mon_cluster_log_file_level to info or less verbose level, we are still seeing debug logs are getti...
- 01:26 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- Hello,
We don't have as much stalled moments these last days, only ~ 5 min.
I've taken some logs but really at the ...
08/04/2022
- 03:29 PM Bug #55809: "Leak_IndirectlyLost" valgrind report on mon.c
- /a/yuriw-2022-08-03_20:33:43-rados-wip-yuri8-testing-2022-08-03-1028-quincy-distro-default-smithi/6957591
- 11:36 AM Fix #57040 (Fix Under Review): osd: Update osd's IOPS capacity using async Context completion ins...
- 10:47 AM Fix #57040 (Resolved): osd: Update osd's IOPS capacity using async Context completion instead of ...
- The method, OSD::mon_cmd_set_config(), sets a config option related to
mClock during OSD boot-up. The method waits o... - 05:12 AM Backport #57030 (In Progress): quincy: rados/test.sh: Early exit right after LibRados global test...
- 05:10 AM Backport #57029 (In Progress): pacific: rados/test.sh: Early exit right after LibRados global tes...
08/03/2022
- 07:26 PM Backport #57020: pacific: test_pool_min_size: AssertionError: not clean before minsize thrashing ...
- https://github.com/ceph/ceph/pull/47446
- 03:10 PM Backport #57020 (Resolved): pacific: test_pool_min_size: AssertionError: not clean before minsize...
- 07:25 PM Backport #57022: pacific: test_pool_min_size: 'check for active or peered' reached maximum tries ...
- https://github.com/ceph/ceph/pull/47446
- 03:12 PM Backport #57022 (Resolved): pacific: test_pool_min_size: 'check for active or peered' reached max...
- 07:23 PM Backport #57019: quincy: test_pool_min_size: AssertionError: not clean before minsize thrashing s...
- https://github.com/ceph/ceph/pull/47445
- 03:10 PM Backport #57019 (Resolved): quincy: test_pool_min_size: AssertionError: not clean before minsize ...
- 07:22 PM Backport #57024: quincy: test_pool_min_size: 'check for active or peered' reached maximum tries (...
- https://github.com/ceph/ceph/pull/47445
- 03:12 PM Backport #57024 (Resolved): quincy: test_pool_min_size: 'check for active or peered' reached maxi...
- 07:22 PM Backport #57023 (Rejected): octopus: test_pool_min_size: 'check for active or peered' reached max...
- 03:12 PM Backport #57023 (Rejected): octopus: test_pool_min_size: 'check for active or peered' reached max...
- 07:13 PM Backport #57026: pacific: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout ...
- https://github.com/ceph/ceph/pull/47446/
- 03:16 PM Backport #57026 (Resolved): pacific: test_pool_min_size:AssertionError:wait_for_clean:failed befo...
- 07:04 PM Backport #57025: quincy: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout e...
- https://github.com/ceph/ceph/pull/47445
- 03:16 PM Backport #57025 (Resolved): quincy: test_pool_min_size:AssertionError:wait_for_clean:failed befor...
- 05:30 PM Backport #57030 (Resolved): quincy: rados/test.sh: Early exit right after LibRados global tests c...
- https://github.com/ceph/ceph/pull/47452
- 05:30 PM Backport #57029 (Resolved): pacific: rados/test.sh: Early exit right after LibRados global tests ...
- https://github.com/ceph/ceph/pull/47451
- 05:27 PM Bug #55001 (Pending Backport): rados/test.sh: Early exit right after LibRados global tests complete
- 03:05 PM Bug #55001: rados/test.sh: Early exit right after LibRados global tests complete
- https://github.com/ceph/ceph/pull/47165 merged
- 03:10 PM Bug #51904 (Pending Backport): test_pool_min_size:AssertionError:wait_for_clean:failed before tim...
- 03:09 PM Bug #54511 (Pending Backport): test_pool_min_size: AssertionError: not clean before minsize thras...
- 03:09 PM Bug #49777 (Pending Backport): test_pool_min_size: 'check for active or peered' reached maximum t...
- 02:58 PM Bug #57017 (Pending Backport): mon-stretched_cluster: degraded stretched mode lead to Monitor crash
- There are certain scenarios in degraded
stretched cluster where will try to
go into the
function Monitor::go_recov... - 02:33 PM Feature #23493 (Resolved): config: strip/escape single-quotes in values when setting them via con...
- 11:50 AM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- To get some more insight on the issue I would suggest to do the following once the issue is faced again:
1) For OSD-... - 06:02 AM Bug #55773 (Resolved): Assertion failure (ceph_assert(have_pending)) when creating new OSDs durin...
- 06:01 AM Backport #56060 (Resolved): quincy: Assertion failure (ceph_assert(have_pending)) when creating n...
08/01/2022
- 11:32 PM Bug #37808: osd: osdmap cache weak_refs assert during shutdown
- /a/yuriw-2022-07-27_22:35:53-rados-wip-yuri8-testing-2022-07-27-1303-pacific-distro-default-smithi/6950918
- 08:13 PM Tasks #56952 (In Progress): Set mgr_pool to true for a handful of tests in the rados qa suite
- 17.2.2 had the libcephsqlite failure. I am scheduling some rados/thrash tests here to see the current results. Since ...
- 05:54 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- I was able to try the patch on Pacific this morning. Running one OSD with the patch, getting 500s from RGW when I pre...
- 03:05 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- I've just had a latency plateau. No scrub/deep-scrub on the impacted OSD during that time...
At least, no message in... - 12:42 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- Zero occurrence of "timed out" in all my ceph-osds logs for 2 days. But, as I have increased bluestore_prefer_deferre...
- 12:10 PM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- Gilles Mocellin wrote:
> This morning, I have :
> PG_NOT_DEEP_SCRUBBED: 11 pgs not deep-scrubbed in time
> Never h... - 08:36 AM Bug #56733: Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- This morning, I have :
PG_NOT_DEEP_SCRUBBED: 11 pgs not deep-scrubbed in time
Never had before Pacific.
Could it... - 06:29 AM Backport #55157 (In Progress): quincy: mon: config commands do not accept whitespace style config...
- 06:28 AM Backport #55156 (In Progress): pacific: mon: config commands do not accept whitespace style confi...
- 06:03 AM Bug #52124 (Fix Under Review): Invalid read of size 8 in handle_recovery_delete()
07/29/2022
- 02:31 PM Bug #51945: qa/workunits/mon/caps.sh: Error: Expected return 13, got 0
- ...
- 10:52 AM Bug #51945: qa/workunits/mon/caps.sh: Error: Expected return 13, got 0
- The wrong return code is just an echo of a failure with an auth entity deletion:...
- 05:57 AM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
- as i said, i dont have any more logs, as i had to bring the cluster back in a working state.
As this issue is comi... - 04:24 AM Bug #56661: Quincy: OSD crashing one after another with data loss with ceph_assert_fail
- Hm... at first glance, OSD calls stop_block() on a head object, which is already stopped, in kick_object_context_bloc...
07/28/2022
- 09:54 PM Feature #56956 (Fix Under Review): osdc: Add objecter fastfail
- 09:54 PM Feature #56956 (Fix Under Review): osdc: Add objecter fastfail
- There is no point in indefinitely waiting when pg of an object is inactive. It is appropriate to cancel the op in suc...
- 07:12 PM Tasks #56952 (Closed): Set mgr_pool to true for a handful of tests in the rados qa suite
- In most places in the rados suite we use `sudo ceph config set mgr mgr_pool false --force` (see https://github.com/ce...
- 01:37 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- That is very strange. I've been able to reproduce 100% of the time with this:...
- 12:41 PM Bug #56707 (Fix Under Review): pglog growing unbounded on EC with copy by ref
- 12:41 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- Alex, thanks for the information. Unfortunately, I couldn't recreate the issue, but I did found some issue with refco...
- 02:23 AM Bug #56926 (New): crash: int BlueFS::_flush_range_F(BlueFS::FileWriter*, uint64_t, uint64_t): abort
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=97c9a15c7262222fd841813a...- 02:22 AM Bug #56903 (New): crash: int fork_function(int, std::ostream&, std::function<signed char()>): ass...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8749e9b5d1fac718fbbb96fb...- 02:22 AM Bug #56901 (New): crash: LogMonitor::log_external_backlog()
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=64ca4b6b04c168da450a852a...- 02:22 AM Bug #56896 (New): crash: int OSD::shutdown(): assert(end_time - start_time_func < cct->_conf->osd...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=50bf2266e28cc1764b47775b...- 02:22 AM Bug #56895 (New): crash: void MissingLoc::add_active_missing(const pg_missing_t&): assert(0 == "u...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=f96348a2ae0d2c754de01fc7...- 02:22 AM Bug #56892 (New): crash: StackStringBuf<4096ul>::xsputn(char const*, long)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3a3287f5eaa9fbb99295b2b7...- 02:22 AM Bug #56890 (New): crash: MOSDRepOp::encode_payload(unsigned long)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=9be8aeab4dd246c5baf1f1c7...- 02:22 AM Bug #56889 (New): crash: MOSDRepOp::encode_payload(unsigned long)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=fce79f2ea6c1a34825a23dd9...- 02:22 AM Bug #56888 (New): crash: int fork_function(int, std::ostream&, std::function<signed char()>): ass...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8df8f5fbb1ef85f0956e0f78...- 02:22 AM Bug #56887 (New): crash: void BlueStore::_do_write_small(BlueStore::TransContext*, BlueStore::Col...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=625223857a28a74eae75273a...- 02:22 AM Bug #56883 (New): crash: rocksdb::BlockBasedTableBuilder::Add(rocksdb::Slice const&, rocksdb::Sli...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=ae08527e7a8d310b5740fbf6...- 02:21 AM Bug #56878 (New): crash: MonitorDBStore::get_synchronizer(std::pair<std::basic_string<char, std::...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=5cacc7785f8a352e3cd86982...- 02:21 AM Bug #56873 (New): crash: int OSD::shutdown(): assert(end_time - start_time_func < cct->_conf->osd...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=210d418989a6bc9fdb60989c...- 02:21 AM Bug #56872 (New): crash: __cxa_rethrow()
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=5ce84c33423abe42eac8cc98...- 02:21 AM Bug #56871 (New): crash: __cxa_rethrow()
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3c6c9906c46f7979e39f2a3d...- 02:21 AM Bug #56867 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e151a6a9ae5a0a079dad1ca4...- 02:21 AM Bug #56863 (New): crash: void RDMAConnectedSocketImpl::handle_connection(): assert(!r)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=d1c8198db9a116b38c161a79...- 02:21 AM Bug #56856 (New): crash: ceph::buffer::list::iterator_impl<true>::copy(unsigned int, std::basic_s...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=03d7803d6cda8b31445b5fa2...- 02:21 AM Bug #56855 (New): crash: rocksdb::CompactionJob::Run()
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=b79a082186434ab8becebddb...- 02:21 AM Bug #56850 (Resolved): crash: void PaxosService::propose_pending(): assert(have_pending)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=106ff764dfe8a5f766a511a1...- 02:21 AM Bug #56849 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=5ff0cd923e0b4beb646ae133...- 02:20 AM Bug #56848 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=0dcd9dfbff0c25591d64a41a...- 02:20 AM Bug #56847 (Duplicate): crash: void PaxosService::propose_pending(): assert(have_pending)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=7a53cbc0bcdeffa2f26d71d0...- 02:20 AM Bug #56843 (New): crash: int fork_function(int, std::ostream&, std::function<signed char()>): ass...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=339539062c280c5c4e5e605c...- 02:20 AM Bug #56837 (New): crash: __assert_perror_fail()
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8b423fcbfb14f36724d15462...- 02:20 AM Bug #56835 (New): crash: ceph::logging::detail::JournaldClient::JournaldClient(): assert(fd > 0)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e226e4ce8be4c94d64dd6104...- 02:20 AM Bug #56833 (New): crash: __assert_perror_fail()
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e0d06d29c57064910751db9d...- 02:20 AM Bug #56826 (New): crash: MOSDPGLog::encode_payload(unsigned long)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=ee3ed1408924d926185a65e3...- 02:20 AM Bug #56821 (New): crash: MOSDRepOp::encode_payload(unsigned long)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=6d21b2c78bcc5092dac5bcc9...- 02:19 AM Bug #56816 (New): crash: unsigned long const md_config_t::get_val<unsigned long>(ConfigValues con...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=66ff3f43b85f15283932865d...- 02:19 AM Bug #56814 (New): crash: rocksdb::MemTableIterator::key() const
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=7329bea2aaafb66aa5060938...- 02:19 AM Bug #56813 (New): crash: MOSDPGLog::encode_payload(unsigned long)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=e4eeb1a3b34df8062d7d1788...- 02:19 AM Bug #56809 (New): crash: MOSDPGScan::encode_payload(unsigned long)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=2fe9b06ce88dccd8c9fe8f41...- 02:18 AM Bug #56797 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3c3fa597eda743682305f64b...- 02:18 AM Bug #56796 (New): crash: void ECBackend::handle_recovery_push(const PushOp&, RecoveryMessages*, b...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=dbf2120428a133c3689fa508...- 02:18 AM Bug #56794 (New): crash: void LogMonitor::_create_sub_incremental(MLog*, int, version_t): assert(...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=1f3b5497ed0df042120d8ff7...- 02:17 AM Bug #56793 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=54876dfe5b7062de7d1d3ee5...- 02:17 AM Bug #56789 (New): crash: void RDMAConnectedSocketImpl::handle_connection(): assert(!r)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=a87f94f67786787071927f90...- 02:17 AM Bug #56787 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=f58b099fd24ce33032cf74bd...- 02:17 AM Bug #56785 (New): crash: void OSDShard::register_and_wake_split_child(PG*): assert(!slot->waiting...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=d44ea277d2ae53e186d6b488...- 02:16 AM Bug #56781 (New): crash: virtual void OSDMonitor::update_from_paxos(bool*): assert(version > osdm...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=4aed07fd08164fe65fe7c6e0...- 02:16 AM Bug #56780 (New): crash: virtual void AuthMonitor::update_from_paxos(bool*): assert(version > key...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=11756492895a3349dfb227aa...- 02:16 AM Bug #56779 (New): crash: void MissingLoc::add_active_missing(const pg_missing_t&): assert(0 == "u...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=83d5be7b2d08c79f23a10dba...- 02:16 AM Bug #56778 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=803b4a91fd84c3d26353cb47...- 02:16 AM Bug #56776 (New): crash: std::string MonMap::get_name(unsigned int) const: assert(n < ranks.size())
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=7464294c2c2ac69856297e37...- 02:16 AM Bug #56773 (New): crash: int64_t BlueFS::_read_random(BlueFS::FileReader*, uint64_t, uint64_t, ch...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=ba26d388e9213afb18b683ee...- 02:16 AM Bug #56772 (New): crash: uint64_t SnapSet::get_clone_bytes(snapid_t) const: assert(clone_overlap....
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=62b8a9e7f0bb7fc1fc81b2dc...- 02:16 AM Bug #56770 (New): crash: void OSDShard::register_and_wake_split_child(PG*): assert(p != pg_slots....
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=d9289f1067de7f0cc0e374ff...- 02:15 AM Bug #56764 (New): crash: uint64_t SnapSet::get_clone_bytes(snapid_t) const: assert(clone_size.cou...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3969752632dfdff2c710083a...- 02:14 AM Bug #56756 (New): crash: long const md_config_t::get_val<long>(ConfigValues const&, std::basic_st...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=a4792692d74b82c4590d9b51...- 02:14 AM Bug #56755 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
*New crash events were reported via Telemetry with newer versions (['16.2.6', '16.2.7', '16.2.9']) than encountered...- 02:14 AM Bug #56754 (New): crash: DeviceList::DeviceList(ceph::common::CephContext*): assert(num)
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=17b0ccd87cab46177149698e...- 02:14 AM Bug #56752 (New): crash: void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_t) [wi...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=34f05776defb000d033885b3...- 02:14 AM Bug #56750 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=eb1729ae63d80bd79b6ea92b...- 02:14 AM Bug #56749 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=9bfe9728f3e90e92bcab42f9...- 02:13 AM Bug #56748 (New): crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): a...
*New crash events were reported via Telemetry with newer versions (['16.2.0', '16.2.1', '16.2.2', '16.2.5', '16.2.6...- 02:13 AM Bug #56747 (New): crash: std::__cxx11::string MonMap::get_name(unsigned int) const: assert(n < ra...
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=0846d215ecad4c78633623e5...
07/27/2022
- 11:46 PM Backport #56736 (In Progress): quincy: unessesarily long laggy PG state
- https://github.com/ceph/ceph/pull/47901
- 11:46 PM Backport #56735 (Resolved): octopus: unessesarily long laggy PG state
- 11:46 PM Backport #56734 (In Progress): pacific: unessesarily long laggy PG state
- https://github.com/ceph/ceph/pull/47899
- 11:40 PM Bug #53806 (Pending Backport): unessesarily long laggy PG state
- 06:23 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943721/remote/smithi042/l...
- 05:58 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- Moving to next week's bug scrub.
- 05:59 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
- Tried that a few times for different PGs on different OSDs, but it doesn't help
- 05:47 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
- Pascal Ehlert wrote:
> This indeed happened during an upgrade from Octopus to Pacific.
> I had forgotten to reduce ... - 12:24 PM Bug #56386: Writes to a cephfs after metadata pool snapshot causes inconsistent objects
- This indeed happened during an upgrade from Octopus to Pacific.
I had forgotten to reduce the number of ranks in Cep... - 05:54 PM Bug #56574: rados/valgrind-leaks: cluster [WRN] Health check failed: 2 osds down (OSD_DOWN)" in c...
- Nitzan, could it be a different issue?
- 04:27 PM Bug #56733 (New): Since Pacific upgrade, sporadic latencies plateau on random OSD/disks
- Hello,
Since our upgrade to Pacific, we suffer from sporadic latencies on disks, not always the same.
The cluster... - 02:09 PM Bug #55851 (Fix Under Review): Assert in Ceph messenger
- 01:37 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- >1. "dumping the refcount" - how did you dump the refcount?
I extracted it with rados getxattr refcont and used the... - 10:50 AM Bug #56707: pglog growing unbounded on EC with copy by ref
- Alex
few more question, so i'll be able to recreate the scenario as you got it
1. "dumping the refcount" - how did ... - 07:20 AM Backport #56723 (Resolved): quincy: osd thread deadlock
- https://github.com/ceph/ceph/pull/47930
- 07:20 AM Backport #56722 (Resolved): pacific: osd thread deadlock
- https://github.com/ceph/ceph/pull/48254
- 07:16 AM Bug #55355 (Pending Backport): osd thread deadlock
07/26/2022
- 03:14 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
- All the tests that this has failed on involve thrashing. Specifically, they all use thrashosds-health.yaml (https://g...
- 03:09 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- That was faster than I thought. Attached massif outfile (let me know if that's what you expect not super familiar wit...
- 02:41 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- I don't have one handy everything is in prometheus and sharing a screen of all the mempools isn't very legible. Valgr...
- 02:04 PM Bug #56707: pglog growing unbounded on EC with copy by ref
- Alexandre, can you please send us the dump_mempools and if you can also run valgrind massif ?
- 02:57 PM Backport #51287 (Resolved): pacific: LibRadosService.StatusFormat failed, Expected: (0) != (retry...
- 02:32 PM Bug #55851: Assert in Ceph messenger
- Perhaps we should move into @deactivate_existing@ part of @reuse_connection()@ where we hold both locks the same time.
- 02:28 PM Bug #55851 (In Progress): Assert in Ceph messenger
- 02:27 PM Bug #55851: Assert in Ceph messenger
- It looks @reuse_connection()@ holds the ...
- 02:18 PM Bug #55851: Assert in Ceph messenger
- The number of elements in @FrameAssembler::m_desc@ can be altered only by:
1. ... - 03:17 AM Fix #56709 (Resolved): test/osd/TestPGLog: Fix confusing description between log and olog.
- https://github.com/ceph/ceph/pull/47272
test/osd/TestPGLog.cc has a mistake description between log and olog in ...
Also available in: Atom