Activity
From 02/19/2024 to 03/19/2024
Today
- 01:14 PM Bug #64869 (In Progress): rados/thrash: slow reservation response from 1 (115547ms) in cluster log
- The problem is: how to differentiate between instances where one of the scrub reservation messages is queued as 'wait...
- 12:48 PM Bug #64866 (In Progress): rados/test.sh: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNo...
the client log shows cookie 94576533816384...- 12:20 PM Bug #64978 (New): from rgw suite: HEALTH_WARN Reduced data availability: 1 pg inactive, 1 pg peering
- from https://qa-proxy.ceph.com/teuthology/cbodley-2024-03-19_01:03:50-rgw-wip-cbodley-testing-distro-default-smithi/7...
- 05:28 AM Bug #64854 (Fix Under Review): decoding chunk_refs_by_hash_t return wrong values
- 02:20 AM Bug #64824: mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err == 0)
- Radoslaw Zarzynski wrote:
> Would need logs with @debug_mon=20@ and @debug_rocksdb=20@ from period before the assert... - 02:01 AM Bug #62209: can not promote object at readonly tier mode
- In the scenario of cache tiering, are there any other solutions?
03/18/2024
- 07:29 PM Bug #64972: qa: "ceph tell 4.3a deep-scrub" command not found
- and https://github.com/ceph/ceph/pull/54214
- 07:29 PM Bug #64972 (New): qa: "ceph tell 4.3a deep-scrub" command not found
- ...
- 07:24 PM Bug #63967 (Resolved): qa/tasks/ceph.py: "ceph tell <pgid> deep_scrub" fails
- 06:56 PM Bug #64646: ceph osd pool rmsnap clone object leak
- In QA.
- 06:56 PM Bug #64854: decoding chunk_refs_by_hash_t return wrong values
- Hmm, I guess I saw a PR for that.
- 06:55 PM Bug #64824: mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err == 0)
- Would need logs with @debug_mon=20@ and @debug_rocksdb=20@ from period before the assertion.
- 06:51 PM Bug #64670: LibRadosAioEC.RoundTrip2 hang and pkill
- Nothing new but still observing. Bump up.
- 06:50 PM Bug #64866: rados/test.sh: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1 failed
- Hi Nitzan! Would you mind taking a look?
- 06:49 PM Bug #64863: rados/thrash-old-clients: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c in clu...
- Hmm, I think I saw Laura's PR for @MON_DOWN@.
- 06:44 PM Bug #58436: ceph cluster log reporting log level in numeric format for the clog messages
- Do we need to backport?
- 06:43 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- In QA.
- 06:36 PM Bug #64558: librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove
- Sent to QA.
- 06:28 PM Bug #57782 (Fix Under Review): [mon] high cpu usage by fn_monstore thread
- The fix awaits QA.
- 06:26 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
- Passed QA.
- 06:25 PM Bug #64938: Pool created with single PG splits into many on single OSD causes OSD to hit max_pgs_...
- Reviewed.
- 06:21 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
- https://github.com/ceph/ceph/pull/54492 merged
- 06:12 PM Bug #64968 (Fix Under Review): mon: MON_DOWN warnings when mons are first booting
- 04:11 PM Bug #64968 (Fix Under Review): mon: MON_DOWN warnings when mons are first booting
- ...
- 05:58 PM Bug #56393: failed to complete snap trimming before timeout
- Hi Matan,
would you mind taking a look? Not a high priority. - 01:53 PM Bug #56393: failed to complete snap trimming before timeout
- /a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603381/...
- 05:52 PM Bug #64347: src/osd/PG.cc: FAILED ceph_assert(!bad || !cct->_conf->osd_debug_verify_cached_snaps)
- In QA.
- 04:26 PM Bug #64347: src/osd/PG.cc: FAILED ceph_assert(!bad || !cct->_conf->osd_debug_verify_cached_snaps)
- /a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603610/
- 05:48 PM Bug #64917: SnapMapperTest.CheckObjectKeyFormat object key changed
- I think this is already tackled by https://github.com/ceph/ceph/pull/56142.
Assigning to Matan for confirmation. I... - 04:31 PM Bug #64917: SnapMapperTest.CheckObjectKeyFormat object key changed
- /a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603418/
/a/yuriw-2024-03... - 05:43 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
- Bump up.
- 05:12 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
- /a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603349
- 05:41 PM Bug #53240: full-object read crc is mismatch, because truncate modify oi.size and forget to clear...
- In QA.
- 05:40 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
- I'm going to propose a patch removing the @--force@.
- 05:39 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- Bump up.
03/16/2024
03/15/2024
- 10:49 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
- I recently created a draft PR https://github.com/ceph/ceph/pull/56233/, adding the additional arguments peering_bucke...
- 10:14 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
- WIP PR: https://github.com/ceph/ceph/pull/56233
- 09:02 AM Bug #56393: failed to complete snap trimming before timeout
- /a/yuriw-2024-03-13_19:25:03-rados-wip-yuri6-testing-2024-03-12-0858-distro-default-smithi/7597884
/a/yuriw-2024-03-... - 08:06 AM Bug #64942 (New): rados/verify: valgrind reports "Invalid read of size 8" error.
- /a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587319
/a/yuriw-2024-03-... - 01:01 AM Bug #64938 (Fix Under Review): Pool created with single PG splits into many on single OSD causes ...
- 12:51 AM Bug #64938 (Fix Under Review): Pool created with single PG splits into many on single OSD causes ...
- With autoscale mode ON, if a new pool is created without specifying the pg_num/pgp_num values then the pool gets crea...
03/14/2024
- 02:18 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
- peering_crush_bucket_[count|target|barrier]
- 01:47 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
- Don't forget that there is also pg_pool_t::peering_crush_bucket_count that directly requires a minimum number of high...
- 12:38 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
- My plan current script to setup a vstart to test out the above hypothesis:...
- 01:17 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
- /a/yuriw-2024-03-13_19:26:09-rados-wip-yuri-testing-2024-03-12-1240-reef-distro-default-smithi/7598397
/a/yuriw-2024... - 12:57 PM Backport #63559: reef: Heartbeat crash in osd
- /a/yuriw-2024-03-13_19:26:09-rados-wip-yuri-testing-2024-03-12-1240-reef-distro-default-smithi/7598201
- 11:00 AM Bug #64917 (New): SnapMapperTest.CheckObjectKeyFormat object key changed
- /a/yuriw-2024-03-12_18:29:22-rados-wip-yuri8-testing-2024-03-11-1138-distro-default-smithi/7594695...
03/13/2024
- 04:44 PM Bug #57782: [mon] high cpu usage by fn_monstore thread
- Hi,
Thanks to this article https://blog.palark.com/sre-troubleshooting-ceph-systemd-containerd/, I think root caus... - 01:34 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
- Ilya Dryomov wrote:
> No, snap2 would continue to exist and one should be able to "rollback" to it. Rollback is rea... - 10:16 AM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
- Matan Breizman wrote:
> Ilya Dryomov wrote:
> > Put another way: rollback is a destructive operation. One isn't ex... - 10:00 AM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
- Ilya Dryomov wrote:
> Put another way: rollback is a destructive operation. One isn't expected to be able to go bac... - 01:15 PM Bug #64897 (New): unittest_ceph_crypto - valgrind failed
- running unit-test with valgraind:
ctest -R unittest_ceph_crypto -T memcheck... - 01:14 PM Bug #64895 (New): unittest_perf_counters_cache - valgrind failed
running unit-test with valgraind:
ctest -R unittest_perf_counters_cache -T memcheck...- 01:13 PM Bug #64893 (New): unittest_bufferlist - valgrind failed
- running unit-test with valgraind:
ctest -R unittest_bufferlist -T memcheck... - 01:11 PM Bug #64892 (New): unittest_ipaddr - valgrind failed
- running unit-test with valgraind:
ctest -R unittest_ipaddr -T memcheck... - 01:08 PM Bug #64891 (New): unittest_admin_socket - valgrind failed
- running unit-test with valgraind:
ctest -R unittest_admin_socket -T memcheck... - 08:08 AM Backport #64881 (In Progress): reef: singleton/ec-inconsistent-hinfo.yaml: Include a possible ben...
- 07:34 AM Backport #64881 (In Progress): reef: singleton/ec-inconsistent-hinfo.yaml: Include a possible ben...
- https://github.com/ceph/ceph/pull/56151
- 07:32 AM Bug #64314 (Resolved): cluster log: Cluster log level string representation missing in the cluste...
- 07:30 AM Fix #64573 (Pending Backport): singleton/ec-inconsistent-hinfo.yaml: Include a possible benign cl...
- 05:26 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- Brad Hubbard wrote:
> Nitzan Mordechai wrote:
> > now the segfault happens on check_one function where we also have... - 02:27 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- Nitzan Mordechai wrote:
> now the segfault happens on check_one function where we also have pre-regex to truncate th...
03/12/2024
- 08:29 PM Bug #64725 (Fix Under Review): rados/singleton: application not enabled on pool 'rbd'
- 01:48 PM Bug #64725: rados/singleton: application not enabled on pool 'rbd'
- /a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587549
- 06:21 PM Bug #58436: ceph cluster log reporting log level in numeric format for the clog messages
- https://github.com/ceph/ceph/pull/49730 merged
- 05:03 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
- Ilya Dryomov wrote:
> This is because rollback discards all changes made to image HEAD and makes it identical to the... - 04:30 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
- Matan Breizman wrote:
> the suggested change here suggests that the disk usage should actually be:
> NAME ... - 04:13 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
- Hi Matan,
We are able to roll back back and forth between arbitrary snapshots and the suggested change in https://... - 02:24 PM Bug #64735 (Need More Info): OSD/MON: rollback_to snap the latest overlap is not right
- We should first understand whether this is a bug or intentional behavior, given the following order of operations:
<... - 03:35 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
- /a/yuriw-2024-03-08_16:19:51-rados-wip-yuri2-testing-2024-03-01-1606-distro-default-smithi/7587184
- 01:20 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
- /a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587334
- 03:33 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
- /a/yuriw-2024-03-08_16:19:51-rados-wip-yuri2-testing-2024-03-01-1606-distro-default-smithi/7587174/
- 01:18 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
- /a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587531
/a/yuriw-2024-03-... - 02:15 PM Bug #64869 (In Progress): rados/thrash: slow reservation response from 1 (115547ms) in cluster log
- /a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587833
The cluster log... - 01:27 PM Bug #64866 (In Progress): rados/test.sh: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNo...
- /a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587349
There was a sim... - 01:19 PM Bug #62832 (Resolved): common: config_proxy deadlock during shutdown (and possibly other times)
- 01:19 PM Backport #63457 (Resolved): quincy: common: config_proxy deadlock during shutdown (and possibly o...
- 12:44 PM Bug #64863 (New): rados/thrash-old-clients: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c ...
- The following tests in the rados suite failed with the warning:
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testi... - 12:21 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- /a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587455
- 11:26 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- now the segfault happens on check_one function where we also have pre-regex to truncate the output that causing segfa...
- 07:55 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- according to the console logs:...
- 04:22 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- Radoslaw Zarzynski wrote:
> The fix isn't merged yet which could explain the reoccurrence above
The run mentioned... - 08:28 AM Bug #64514 (Duplicate): LibRadosTwoPoolsPP.PromoteSnapScrub test failed
- Closing as this is a duplicate.
- 08:27 AM Bug #64646: ceph osd pool rmsnap clone object leak
- Radoslaw Zarzynski wrote:
> Need a squid backport as well.
Awaiting main merge (https://github.com/ceph/ceph/pull... - 06:31 AM Bug #64854 (Fix Under Review): decoding chunk_refs_by_hash_t return wrong values
- When running ceph dencoder test on clang-14 compiled JSON dump of chunk_refs_by_hash_t will show:...
- 06:02 AM Bug #56393: failed to complete snap trimming before timeout
- /a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587430
/a/yuriw-2024-03-... - 02:08 AM Bug #64824: mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err == 0)
- Radoslaw Zarzynski wrote:
> Looks like a mon-scrub failure. This can be caused by a HW issue or by a corruption.
> ...
03/11/2024
- 08:55 PM Bug #64438: NeoRadosWatchNotify.WatchNotifyTimeout times out along with FAILED ceph_assert(op->se...
- Fails here in the neorados test:...
- 07:18 PM Feature #64849 (New): rados: Support read_from_replica everywhere
- The Objecter supports read-from-replica if you pass in the LOCALIZE_READS flag. If we want to serve all read IO from ...
- 06:40 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
- There is PR posted: https://github.com/ceph/ceph/pull/55991
- 06:06 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
- Hi Matan! Would you mind taking a look?
- 06:18 PM Bug #64670: LibRadosAioEC.RoundTrip2 hang and pkill
- Bump up.
- 06:16 PM Bug #54182: OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
- Review in progress.
- 06:15 PM Bug #64514: LibRadosTwoPoolsPP.PromoteSnapScrub test failed
- Bump up.
- 06:09 PM Bug #64725: rados/singleton: application not enabled on pool 'rbd'
- Fix is to add this to the ignorelist.
- 06:02 PM Bug #64646: ceph osd pool rmsnap clone object leak
- Need a squid backport as well.
- 06:00 PM Bug #64824 (Need More Info): mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err =...
- Looks like a mon-scrub failure. This can be caused by a HW issue or by a corruption.
Is there a sign of malfunctioni... - 08:24 AM Bug #64824 (Need More Info): mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err =...
- -1> 2024-03-11T02:29:03.716+0000 7f6600eaf700 -1 /root/rpmbuild/BUILD/ceph-16.2.14/src/mon/Monitor.cc: In functio...
- 05:55 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- The fix isn't merged yet which could explain the reoccurrence above
- 02:45 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- /a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587684
/a/yuriw-2024-03-... - 05:51 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- Bump up.
- 05:50 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
- 1. I'm still nor sure we need @--force@. 2. If it turns justified, shouldn't it be @--yes-i-really-really-mean-it@?
- 05:42 PM Bug #64314: cluster log: Cluster log level string representation missing in the cluster logs.
- Still in testing.
03/10/2024
- 07:37 AM Bug #64657 (Rejected): Ceph test cases starting cluster not waiting for OSDs to join fully
- 茁野 鲍 Thanks for letting us know!
i'll reject that bug
03/08/2024
- 11:50 PM Bug #64804 (Duplicate): gcc-13 apparently breaks SafeTimer
- 04:07 AM Bug #64804 (Duplicate): gcc-13 apparently breaks SafeTimer
- https://github.com/ceph/ceph/pull/55886
Probably related to https://bugzilla.redhat.com/show_bug.cgi?id=2241339 . - 10:19 AM Bug #62338: osd: choose_async_recovery_ec may select an acting set < min_size
- Hello again.
Apparently I got a tiny little bit too excited.
I tested the case described above with 16.2.15 and... - 12:26 AM Bug #64802 (New): rados: generalize stretch mode pg temp handling to be usable without stretch mode
- PeeringState::calc_replicated_acting_stretch encodes special behavior for stretch clusters which prohibits the primar...
03/07/2024
- 12:17 PM Bug #64788 (Fix Under Review): EpollDriver::del_event() crashes when the nic is unplugged
- 11:48 AM Bug #64788 (Fix Under Review): EpollDriver::del_event() crashes when the nic is unplugged
- librbd uses msgr to talk to its Ceph cluster. if the client's nic is hot unplugged, there is chance that @EpollDriver...
- 09:04 AM Bug #64657: Ceph test cases starting cluster not waiting for OSDs to join fully
- Thank you for addressing this issue. I appreciate your effort in fixing the issue.
I apologize for the oversight o...
03/06/2024
- 07:15 PM Bug #64726: LibRadosAioEC.MultiWritePP hang and pkill
- ...
- 07:14 PM Bug #64726: LibRadosAioEC.MultiWritePP hang and pkill
- I think the direct reason behind the test's hang is the death of @osd.5@:...
- 08:22 AM Bug #64726: LibRadosAioEC.MultiWritePP hang and pkill
- removed the "Related issues"
- 08:21 AM Bug #64726: LibRadosAioEC.MultiWritePP hang and pkill
- last op that LibRadosAioEC.MultiWritePP trying to do is writing the oid_MultiWritePP_ obj:...
- 03:20 PM Bug #63389: Failed to encode map X with expected CRC
- The problem came because of a commit that introduced the commented-out check for @SERVER_REEF@ in @OSDMap::encode()@....
- 08:21 AM Bug #64735 (Need More Info): OSD/MON: rollback_to snap the latest overlap is not right
- when rollback_to snap, we use the latest clone's current overlap to intersection_of older snapshot's clone overlap.
... - 07:43 AM Bug #62338: osd: choose_async_recovery_ec may select an acting set < min_size
- Hello. Just FYI, this fixes a very nasty issue in my EC setup.
Here are some details.
The EC setup and crush rule...
03/05/2024
- 10:50 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
- /a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581448
- 10:47 PM Bug #64726 (New): LibRadosAioEC.MultiWritePP hang and pkill
- /a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581519...
- 10:42 PM Bug #55141: thrashers/fastread: assertion failure: rollback_info_trimmed_to == head
- /a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581575
- 10:33 PM Bug #64725 (Fix Under Review): rados/singleton: application not enabled on pool 'rbd'
- /a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581526...
- 10:24 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
- /a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581722
/a/yuriw-2024-03-04_20:52:58-rados-ree... - 10:24 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
- Update on this: The PR is ready to be reviewed again.
- 01:04 PM Bug #64514 (In Progress): LibRadosTwoPoolsPP.PromoteSnapScrub test failed
- 01:04 PM Bug #64514: LibRadosTwoPoolsPP.PromoteSnapScrub test failed
- This may be related to bug fixed in https://tracker.ceph.com/issues/64347. However, the outcome here is different whi...
- 08:08 AM Bug #64657: Ceph test cases starting cluster not waiting for OSDs to join fully
- Without the full log it will be hard to tell if the symptoms that I see are exactly as 茁野 鲍 see, but we are missing t...
03/04/2024
- 09:19 PM Backport #63526 (Resolved): quincy: crash: int OSD::shutdown(): assert(end_time - start_time_func...
- 08:45 PM Bug #61140: crash: int OSD::shutdown(): assert(end_time - start_time_func < cct->_conf->osd_fast_...
- https://github.com/ceph/ceph/pull/55134 merged
- 08:07 PM Backport #58337 (Rejected): pacific: mon-stretched_cluster: degraded stretched mode lead to Monit...
- 08:06 PM Backport #58337 (Duplicate): pacific: mon-stretched_cluster: degraded stretched mode lead to Moni...
- pacific is EOL
- 08:07 PM Bug #59271 (Resolved): mon: FAILED ceph_assert(osdmon()->is_writeable())
- 08:07 PM Backport #59700 (Rejected): pacific: mon: FAILED ceph_assert(osdmon()->is_writeable())
- pacific is EOL
- 08:06 PM Bug #57017 (Resolved): mon-stretched_cluster: degraded stretched mode lead to Monitor crash
- 08:00 PM Bug #64657: Ceph test cases starting cluster not waiting for OSDs to join fully
- Hi Nitzan! Would you mind taking a look?
- 07:59 PM Bug #64637: LeakPossiblyLost in BlueStore::_do_write_small() in osd
- Looks like typical symptom of (CPU/memory) starvation.
- 07:59 PM Bug #64646: ceph osd pool rmsnap clone object leak
- note from bug scrub: reviewed, went to QA.
- 07:58 PM Bug #64514: LibRadosTwoPoolsPP.PromoteSnapScrub test failed
- Bump up.
- 07:56 PM Bug #54182: OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
- note from bug scrub: reviewed, changes requested.
- 07:55 PM Bug #64670: LibRadosAioEC.RoundTrip2 hang and pkill
- Might be something new. Bump up and observe.
- 07:53 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- note from scrub: the PR is approved. Needs-qa.
- 07:51 PM Bug #64674 (Resolved): src/scripts/ceph-backport.sh
- I guess we don't need to backport anything.
- 07:49 PM Bug #64258: osd/PrimaryLogPG.cc: FAILED ceph_assert(inserted)
- note from bug scrub: reviewed.
- 01:40 PM Bug #64258 (Fix Under Review): osd/PrimaryLogPG.cc: FAILED ceph_assert(inserted)
- 07:49 PM Bug #64695: Aborted signal starting in AsyncConnection::send_message()
- ...
- 05:39 PM Bug #64695 (New): Aborted signal starting in AsyncConnection::send_message()
- /a/yuriw-2024-03-01_16:47:30-rados-wip-yuri11-testing-2024-02-28-0950-reef-distro-default-smithi/7577623...
- 07:44 PM Bug #64314: cluster log: Cluster log level string representation missing in the cluster logs.
- Still in QA. Bump up.
- 07:36 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
- Thank you very, very much for the scenario! This throws a lot of light on what has happened.
I'm not sure whether th... - 07:32 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- note from bug scrub: Aishwarya is addressing the review's comments.
- 06:27 PM Bug #53240: full-object read crc is mismatch, because truncate modify oi.size and forget to clear...
- The fix goes into QA.
- 12:18 AM Bug #63066: rados/objectstore - application not enabled on pool '.mgr'
- /a/yuriw-2024-02-28_15:47:41-rados-wip-yuri4-testing-2024-02-27-1111-quincy-distro-default-smithi/7575815
/a/yuriw-2...
03/01/2024
- 11:19 PM Bug #64674: src/scripts/ceph-backport.sh
- revert PR: https://github.com/ceph/ceph/pull/55884
will fix this - 11:16 PM Bug #64674 (Resolved): src/scripts/ceph-backport.sh
- src/script/ceph-backport.sh: line 1737: ../../../ceph/.github/pull_request_template.md: No such file or directory
... - 11:01 PM Backport #64673 (In Progress): quincy: test_pool_min_size: AssertionError: wait_for_clean: failed...
- 10:58 PM Backport #64673 (In Progress): quincy: test_pool_min_size: AssertionError: wait_for_clean: failed...
- https://github.com/ceph/ceph/pull/55882
- 10:58 PM Backport #64672 (New): pacific: test_pool_min_size: AssertionError: wait_for_clean: failed before...
- 10:58 PM Backport #64671 (New): reef: test_pool_min_size: AssertionError: wait_for_clean: failed before ti...
- 10:55 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- /a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576306
- 10:54 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
- /a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576311
- 10:53 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
- /a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576314
- 09:30 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
- /a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576298
- 10:53 PM Bug #59172 (Pending Backport): test_pool_min_size: AssertionError: wait_for_clean: failed before ...
- 10:51 PM Bug #64670 (New): LibRadosAioEC.RoundTrip2 hang and pkill
- /a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576303...
- 12:11 PM Backport #64649 (In Progress): quincy: min_last_epoch_clean is not updated, causing osdmap to be ...
- 12:00 PM Backport #64650 (In Progress): reef: min_last_epoch_clean is not updated, causing osdmap to be un...
- 11:44 AM Backport #64651 (In Progress): squid: min_last_epoch_clean is not updated, causing osdmap to be u...
- 09:19 AM Bug #64657: Ceph test cases starting cluster not waiting for OSDs to join fully
- eg. for reproduce the issue:
diff slicer-src/src/test/osd/safe-to-destroy.sh
function run() {
@@ -32,18 +32,3... - 09:12 AM Bug #64657 (Rejected): Ceph test cases starting cluster not waiting for OSDs to join fully
- I've identified an issue in the Ceph testing framework where, after starting a temporary cluster using functions like...
02/29/2024
- 09:25 PM Backport #64406: reef: Failed to encode map X with expected CRC
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/55712
merged - 09:00 PM Bug #64637: LeakPossiblyLost in BlueStore::_do_write_small() in osd
- Laura Flores wrote:
> /a/yuriw-2024-02-22_21:33:08-rados-wip-yuri8-testing-2024-02-22-0734-reef-distro-default-smith... - 09:00 PM Bug #64637 (New): LeakPossiblyLost in BlueStore::_do_write_small() in osd
- 08:57 PM Bug #64637 (Duplicate): LeakPossiblyLost in BlueStore::_do_write_small() in osd
- 08:54 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- /a/yuriw-2024-02-28_22:39:54-rados-wip-yuri8-testing-2024-02-22-0734-reef-distro-default-smithi/7576288
- 08:42 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
- /a/yuriw-2024-02-28_22:39:54-rados-wip-yuri8-testing-2024-02-22-0734-reef-distro-default-smithi/7576292
- 06:26 PM Backport #64651 (In Progress): squid: min_last_epoch_clean is not updated, causing osdmap to be u...
- https://github.com/ceph/ceph/pull/55865
- 06:15 PM Backport #64650 (In Progress): reef: min_last_epoch_clean is not updated, causing osdmap to be un...
- https://github.com/ceph/ceph/pull/55867
- 06:15 PM Backport #64649 (In Progress): quincy: min_last_epoch_clean is not updated, causing osdmap to be ...
- https://github.com/ceph/ceph/pull/55868
- 06:08 PM Bug #63883 (Pending Backport): min_last_epoch_clean is not updated, causing osdmap to be unable t...
- 02:46 PM Bug #64646 (Fix Under Review): ceph osd pool rmsnap clone object leak
- There are 2 ways to remove pool snaps, rados tool or mon command (ceph osd pool rmsnap).
It seems that the monitor c... - 07:02 AM Bug #53342: Exiting scrub checking -- not all pgs scrubbed
- Radoslaw Zarzynski wrote:
> Ronen, do we need any backporting?
No. The fix (55478) made it in time for Squid. - 04:19 AM Bug #64471: osd: upgrades from v18.2.[01] to main fail with "heartbeat_check: no reply from"
- Xiubo Li wrote:
> Patrick,
>
> From console_logs/smithi196.log, the kernel just crashed when copying data from us... - 12:44 AM Bug #64471: osd: upgrades from v18.2.[01] to main fail with "heartbeat_check: no reply from"
- Xiubo Li wrote:
> Patrick,
>
> From console_logs/smithi196.log, the kernel just crashed when copying data from us...
02/28/2024
- 10:32 PM Bug #64637 (New): LeakPossiblyLost in BlueStore::_do_write_small() in osd
- /a/yuriw-2024-02-22_21:33:08-rados-wip-yuri8-testing-2024-02-22-0734-reef-distro-default-smithi/7571350...
- 12:07 AM Bug #64471: osd: upgrades from v18.2.[01] to main fail with "heartbeat_check: no reply from"
- Patrick,
From console_logs/smithi196.log, the kernel just crashed when copying data from userspace:...
02/27/2024
- 04:34 PM Bug #64514: LibRadosTwoPoolsPP.PromoteSnapScrub test failed
- Hi guys,
this bug came up a few weeks ago and I've asked one of the PR authors of the run I was reviewing to take ... - 03:28 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
- Nicolas Dandrimont wrote:
> I believe I understand what might have gone wrong though: One of our benchmarking script... - 02:01 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
- Hi!
Radoslaw Zarzynski wrote:
> Loic, is there an object store of one of those dead OSDs available for investigat... - 12:29 PM Bug #64504: aio ops queued but never executed
- So far what I see is that the client op didn't get scheduled at all on osd.1 due a continuous stream of higher priori...
- 01:02 AM Bug #64194 (Duplicate): make check(arm64): unittest_rgw_dmclock_scheduler Failed
02/26/2024
- 07:32 PM Bug #54182: OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
- Bump up.
- 07:30 PM Bug #64347: src/osd/PG.cc: FAILED ceph_assert(!bad || !cct->_conf->osd_debug_verify_cached_snaps)
- Bump up – needs qa.
- 07:24 PM Bug #64438: NeoRadosWatchNotify.WatchNotifyTimeout times out along with FAILED ceph_assert(op->se...
- Continuing to look at this bug.
- 07:20 PM Bug #64258 (In Progress): osd/PrimaryLogPG.cc: FAILED ceph_assert(inserted)
- Moving back to _in progress_ per https://github.com/ceph/ceph/pull/55410#issuecomment-1945423142.
- 07:14 PM Bug #64460: rados/upgrade/parallel: "[WRN] MON_DOWN: 1/3 mons down, quorum a,b" in cluster log
- This needs a whitelist PR. Discussed in bug scrub.
- 07:09 PM Bug #64471: osd: upgrades from v18.2.[01] to main fail with "heartbeat_check: no reply from"
- I think we were investigating it together with Patrick on @CEPH-RADOS@. The finding was that the entire node was unre...
- 07:07 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
- Bump up.
- 07:06 PM Bug #53342 (Resolved): Exiting scrub checking -- not all pgs scrubbed
- Ronen, do we need any backporting?
- 07:05 PM Bug #61385: TEST_dump_scrub_schedule fails from "key is query_active: negation:0 # expected: true...
- In the previous tracker, the error message was "expected: false, in actual: true". This one is "expected: true, in ac...
- 07:02 PM Bug #64504: aio ops queued but never executed
- Asked Sridhar to judge whether it's dmclock-related.
- 06:58 PM Bug #62777: rados/valgrind-leaks: expected valgrind issues and found none
- Let's watch to see if this is fixed by https://github.com/ceph/ceph/pull/52639.
- 06:55 PM Bug #64514: LibRadosTwoPoolsPP.PromoteSnapScrub test failed
- Hmm, it seems to happen *before* the scrub part:...
- 06:50 PM Bug #64558 (Fix Under Review): librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove
- 02:43 AM Bug #64558 (Fix Under Review): librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove
- librados::OPERATION_FULL_FORCE should be translated to CEPH_OSD_FLAG_FULL_FORCE before calling IoCtxImpl::remove().
... - 06:46 PM Backport #64576 (New): quincy: Incorrect behavior on combined cmpext+write ops in the face of ses...
- 06:46 PM Backport #64575 (New): reef: Incorrect behavior on combined cmpext+write ops in the face of sessi...
- 06:44 PM Bug #64314: cluster log: Cluster log level string representation missing in the cluster logs.
- Bump up. Already in the QA.
- 06:42 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- Bump up.
- 06:40 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
- bump up
- 06:40 PM Bug #64192 (Pending Backport): Incorrect behavior on combined cmpext+write ops in the face of ses...
- 05:04 PM Fix #64573 (Fix Under Review): singleton/ec-inconsistent-hinfo.yaml: Include a possible benign cl...
- 04:37 PM Fix #64573 (Pending Backport): singleton/ec-inconsistent-hinfo.yaml: Include a possible benign cl...
- The changes introduced as part of PR: https://github.com/ceph/ceph/pull/53524
made the randomized values of osd_op_q... - 02:19 PM Bug #64562: Occasional segmentation faults in ScrubQueue::collect_ripe_jobs
- .
- 02:18 PM Bug #64562: Occasional segmentation faults in ScrubQueue::collect_ripe_jobs
- No problem with the rename Igor, thank you!
Igor Fedotov wrote:
> Hi Paolo,
> mind me renaming the ticket to som... - 02:16 PM Bug #64562: Occasional segmentation faults in ScrubQueue::collect_ripe_jobs
- Igor Fedotov wrote:
> Hi Paolo,
> mind me renaming the ticket to something like "Occasional segmentation faults in ... - 01:27 PM Bug #64562: Occasional segmentation faults in ScrubQueue::collect_ripe_jobs
- Hi Paolo,
mind me renaming the ticket to something like "Occasional segmentation faults in ScrubQueue::collect_ripe_... - 10:07 AM Bug #64562 (New): Occasional segmentation faults in ScrubQueue::collect_ripe_jobs
- Hello!
Igor Fedotov suggested me to open a new ticket under RADOS subproject, the original ticket is [[https://tra...
02/22/2024
- 05:21 PM Feature #56956: osdc: Add objecter fastfail
- > There is no point in indefinitely waiting when pg of an object is inactive.
This is not correct for CephFS or RB... - 03:04 PM Backport #64406 (In Progress): reef: Failed to encode map X with expected CRC
- 07:04 AM Bug #59196 (Fix Under Review): ceph_test_lazy_omap_stats segfault while waiting for active+clean
- 12:04 AM Backport #63843 (In Progress): quincy: Add health error if one or more OSDs registered v1/v2 publ...
- 12:01 AM Backport #63842 (In Progress): reef: Add health error if one or more OSDs registered v1/v2 public...
02/21/2024
- 07:42 PM Bug #64519: OSD/MON: No snapshot metadata keys trimming
- This reminded me of the notes in https://pad.ceph.com/p/removing_removed_snaps/timeslider#4651 that talk about why th...
- 10:22 AM Bug #64519 (New): OSD/MON: No snapshot metadata keys trimming
- The Monitor's keys of purged_snap_ / purged_epoch_ and OSD's PSN_ (SnapMapper::PURGED_SNAP_PREFIX) keys are not trimm...
- 03:13 PM Bug #63389: Failed to encode map X with expected CRC
- The actual fix (not just the log message change) is: https://github.com/ceph/ceph/pull/55401.
It got approved 2 hour... - 03:50 AM Bug #64514 (Duplicate): LibRadosTwoPoolsPP.PromoteSnapScrub test failed
- In rados_api_tests: ...
02/20/2024
- 05:33 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
- This tracker looks very interesting: https://tracker.ceph.com/issues/57757.
- 05:00 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
- Links to crash sites:
* https://github.com/ceph/ceph/blob/v17.2.7/src/osd/ECBackend.cc#L676
* https://github.co... - 05:33 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
- /a/yuriw-2024-02-14_14:58:57-rados-wip-yuri4-testing-2024-02-13-1546-distro-default-smithi/7560007/
- 05:17 PM Bug #62777: rados/valgrind-leaks: expected valgrind issues and found none
- /a/yuriw-2024-02-14_14:58:57-rados-wip-yuri4-testing-2024-02-13-1546-distro-default-smithi/7559915
- 11:50 AM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
- Thanks for Aishwarya who also looked on the queued ops that didn't executed, i opened new bug for it: https://tracker...
- 11:43 AM Bug #64504 (New): aio ops queued but never executed
- Few of teuthology tests were failed when trying to execute aio_write and then wait_for_complete never completed.
the...
02/19/2024
- 06:34 PM Bug #61385: TEST_dump_scrub_schedule fails from "key is query_active: negation:0 # expected: true...
- Why copying into a new bug report, instead of marking as duplicate?
- 06:32 PM Bug #53342: Exiting scrub checking -- not all pgs scrubbed
- I think we can close this bug as 'resolved'
- 06:31 PM Bug #62119: timeout on reserving replicsa
- @aishwarya - I think we can lower the severity, or maybe even close this bug.
It seems as though some specific tests... - 06:28 PM Bug #64310 (Rejected): osd/scrub: PGs remain in the scrub queue after an interval change
- My mistake. Not exactly a bug.
(Fuller explanation:
recent changes to the scrub state-machine changed the point in ... - 06:25 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
Will take a look.
Also available in: Atom