Project

General

Profile

Activity

From 02/19/2024 to 03/19/2024

Today

05:28 AM Bug #64854 (Fix Under Review): decoding chunk_refs_by_hash_t return wrong values
Nitzan Mordechai
02:20 AM Bug #64824: mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err == 0)
Radoslaw Zarzynski wrote:
> Would need logs with @debug_mon=20@ and @debug_rocksdb=20@ from period before the assert...
yite gu
02:01 AM Bug #62209: can not promote object at readonly tier mode
In the scenario of cache tiering, are there any other solutions? Arthur ho

03/18/2024

07:29 PM Bug #64972: qa: "ceph tell 4.3a deep-scrub" command not found
and https://github.com/ceph/ceph/pull/54214 Patrick Donnelly
07:29 PM Bug #64972 (New): qa: "ceph tell 4.3a deep-scrub" command not found
... Patrick Donnelly
07:24 PM Bug #63967 (Resolved): qa/tasks/ceph.py: "ceph tell <pgid> deep_scrub" fails
Patrick Donnelly
06:56 PM Bug #64646: ceph osd pool rmsnap clone object leak
In QA. Radoslaw Zarzynski
06:56 PM Bug #64854: decoding chunk_refs_by_hash_t return wrong values
Hmm, I guess I saw a PR for that. Radoslaw Zarzynski
06:55 PM Bug #64824: mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err == 0)
Would need logs with @debug_mon=20@ and @debug_rocksdb=20@ from period before the assertion. Radoslaw Zarzynski
06:51 PM Bug #64670: LibRadosAioEC.RoundTrip2 hang and pkill
Nothing new but still observing. Bump up. Radoslaw Zarzynski
06:50 PM Bug #64866: rados/test.sh: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1 failed
Hi Nitzan! Would you mind taking a look? Radoslaw Zarzynski
06:49 PM Bug #64863: rados/thrash-old-clients: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c in clu...
Hmm, I think I saw Laura's PR for @MON_DOWN@. Radoslaw Zarzynski
06:44 PM Bug #58436: ceph cluster log reporting log level in numeric format for the clog messages
Do we need to backport? Radoslaw Zarzynski
06:43 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
In QA. Radoslaw Zarzynski
06:36 PM Bug #64558: librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove
Sent to QA. Radoslaw Zarzynski
06:28 PM Bug #57782 (Fix Under Review): [mon] high cpu usage by fn_monstore thread
The fix awaits QA. Radoslaw Zarzynski
06:26 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
Passed QA. Radoslaw Zarzynski
06:25 PM Bug #64938: Pool created with single PG splits into many on single OSD causes OSD to hit max_pgs_...
Reviewed. Radoslaw Zarzynski
06:21 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
https://github.com/ceph/ceph/pull/54492 merged Yuri Weinstein
06:12 PM Bug #64968 (Fix Under Review): mon: MON_DOWN warnings when mons are first booting
Patrick Donnelly
04:11 PM Bug #64968 (Fix Under Review): mon: MON_DOWN warnings when mons are first booting
... Patrick Donnelly
05:58 PM Bug #56393: failed to complete snap trimming before timeout
Hi Matan,
would you mind taking a look? Not a high priority.
Radoslaw Zarzynski
01:53 PM Bug #56393: failed to complete snap trimming before timeout
/a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603381/... Aishwarya Mathuria
05:52 PM Bug #64347: src/osd/PG.cc: FAILED ceph_assert(!bad || !cct->_conf->osd_debug_verify_cached_snaps)
In QA. Radoslaw Zarzynski
04:26 PM Bug #64347: src/osd/PG.cc: FAILED ceph_assert(!bad || !cct->_conf->osd_debug_verify_cached_snaps)
/a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603610/ Aishwarya Mathuria
05:48 PM Bug #64917: SnapMapperTest.CheckObjectKeyFormat object key changed
I think this is already tackled by https://github.com/ceph/ceph/pull/56142.
Assigning to Matan for confirmation. I...
Radoslaw Zarzynski
04:31 PM Bug #64917: SnapMapperTest.CheckObjectKeyFormat object key changed
/a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603418/
/a/yuriw-2024-03...
Aishwarya Mathuria
05:43 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
Bump up. Radoslaw Zarzynski
05:12 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
/a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603349 Aishwarya Mathuria
05:41 PM Bug #53240: full-object read crc is mismatch, because truncate modify oi.size and forget to clear...
In QA. Radoslaw Zarzynski
05:40 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
I'm going to propose a patch removing the @--force@. Radoslaw Zarzynski
05:39 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Bump up. Radoslaw Zarzynski

03/16/2024

06:22 PM Backport #64406 (Resolved): reef: Failed to encode map X with expected CRC
Ilya Dryomov

03/15/2024

10:49 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
I recently created a draft PR https://github.com/ceph/ceph/pull/56233/, adding the additional arguments peering_bucke... Kamoltat (Junior) Sirivadhna
10:14 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
WIP PR: https://github.com/ceph/ceph/pull/56233 Kamoltat (Junior) Sirivadhna
09:02 AM Bug #56393: failed to complete snap trimming before timeout
/a/yuriw-2024-03-13_19:25:03-rados-wip-yuri6-testing-2024-03-12-0858-distro-default-smithi/7597884
/a/yuriw-2024-03-...
Aishwarya Mathuria
08:06 AM Bug #64942 (New): rados/verify: valgrind reports "Invalid read of size 8" error.
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587319
/a/yuriw-2024-03-...
Sridhar Seshasayee
01:01 AM Bug #64938 (Fix Under Review): Pool created with single PG splits into many on single OSD causes ...
Prashant D
12:51 AM Bug #64938 (Fix Under Review): Pool created with single PG splits into many on single OSD causes ...
With autoscale mode ON, if a new pool is created without specifying the pg_num/pgp_num values then the pool gets crea... Prashant D

03/14/2024

02:18 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
peering_crush_bucket_[count|target|barrier] Kamoltat (Junior) Sirivadhna
01:47 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
Don't forget that there is also pg_pool_t::peering_crush_bucket_count that directly requires a minimum number of high... Greg Farnum
12:38 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
My plan current script to setup a vstart to test out the above hypothesis:... Kamoltat (Junior) Sirivadhna
01:17 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
/a/yuriw-2024-03-13_19:26:09-rados-wip-yuri-testing-2024-03-12-1240-reef-distro-default-smithi/7598397
/a/yuriw-2024...
Aishwarya Mathuria
12:57 PM Backport #63559: reef: Heartbeat crash in osd
/a/yuriw-2024-03-13_19:26:09-rados-wip-yuri-testing-2024-03-12-1240-reef-distro-default-smithi/7598201 Aishwarya Mathuria
11:00 AM Bug #64917 (New): SnapMapperTest.CheckObjectKeyFormat object key changed
/a/yuriw-2024-03-12_18:29:22-rados-wip-yuri8-testing-2024-03-11-1138-distro-default-smithi/7594695... Nitzan Mordechai

03/13/2024

04:44 PM Bug #57782: [mon] high cpu usage by fn_monstore thread
Hi,
Thanks to this article https://blog.palark.com/sre-troubleshooting-ceph-systemd-containerd/, I think root caus...
Peter Goron
01:34 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
Ilya Dryomov wrote:
> No, snap2 would continue to exist and one should be able to "rollback" to it. Rollback is rea...
Matan Breizman
10:16 AM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
Matan Breizman wrote:
> Ilya Dryomov wrote:
> > Put another way: rollback is a destructive operation. One isn't ex...
Ilya Dryomov
10:00 AM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
Ilya Dryomov wrote:
> Put another way: rollback is a destructive operation. One isn't expected to be able to go bac...
Matan Breizman
01:15 PM Bug #64897 (New): unittest_ceph_crypto - valgrind failed
running unit-test with valgraind:
ctest -R unittest_ceph_crypto -T memcheck...
Nitzan Mordechai
01:14 PM Bug #64895 (New): unittest_perf_counters_cache - valgrind failed

running unit-test with valgraind:
ctest -R unittest_perf_counters_cache -T memcheck...
Nitzan Mordechai
01:13 PM Bug #64893 (New): unittest_bufferlist - valgrind failed
running unit-test with valgraind:
ctest -R unittest_bufferlist -T memcheck...
Nitzan Mordechai
01:11 PM Bug #64892 (New): unittest_ipaddr - valgrind failed
running unit-test with valgraind:
ctest -R unittest_ipaddr -T memcheck...
Nitzan Mordechai
01:08 PM Bug #64891 (New): unittest_admin_socket - valgrind failed
running unit-test with valgraind:
ctest -R unittest_admin_socket -T memcheck...
Nitzan Mordechai
08:08 AM Backport #64881 (In Progress): reef: singleton/ec-inconsistent-hinfo.yaml: Include a possible ben...
Sridhar Seshasayee
07:34 AM Backport #64881 (In Progress): reef: singleton/ec-inconsistent-hinfo.yaml: Include a possible ben...
https://github.com/ceph/ceph/pull/56151 Backport Bot
07:32 AM Bug #64314 (Resolved): cluster log: Cluster log level string representation missing in the cluste...
Sridhar Seshasayee
07:30 AM Fix #64573 (Pending Backport): singleton/ec-inconsistent-hinfo.yaml: Include a possible benign cl...
Sridhar Seshasayee
05:26 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
Brad Hubbard wrote:
> Nitzan Mordechai wrote:
> > now the segfault happens on check_one function where we also have...
Nitzan Mordechai
02:27 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
Nitzan Mordechai wrote:
> now the segfault happens on check_one function where we also have pre-regex to truncate th...
Brad Hubbard

03/12/2024

08:29 PM Bug #64725 (Fix Under Review): rados/singleton: application not enabled on pool 'rbd'
Laura Flores
01:48 PM Bug #64725: rados/singleton: application not enabled on pool 'rbd'
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587549 Sridhar Seshasayee
06:21 PM Bug #58436: ceph cluster log reporting log level in numeric format for the clog messages
https://github.com/ceph/ceph/pull/49730 merged Yuri Weinstein
05:03 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
Ilya Dryomov wrote:
> This is because rollback discards all changes made to image HEAD and makes it identical to the...
Ilya Dryomov
04:30 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
Matan Breizman wrote:
> the suggested change here suggests that the disk usage should actually be:
> NAME ...
Ilya Dryomov
04:13 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
Hi Matan,
We are able to roll back back and forth between arbitrary snapshots and the suggested change in https://...
Ilya Dryomov
02:24 PM Bug #64735 (Need More Info): OSD/MON: rollback_to snap the latest overlap is not right
We should first understand whether this is a bug or intentional behavior, given the following order of operations:
<...
Matan Breizman
03:35 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
/a/yuriw-2024-03-08_16:19:51-rados-wip-yuri2-testing-2024-03-01-1606-distro-default-smithi/7587184 Matan Breizman
01:20 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587334 Sridhar Seshasayee
03:33 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
/a/yuriw-2024-03-08_16:19:51-rados-wip-yuri2-testing-2024-03-01-1606-distro-default-smithi/7587174/ Matan Breizman
01:18 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587531
/a/yuriw-2024-03-...
Sridhar Seshasayee
02:15 PM Bug #64869 (New): rados/thrash: slow reservation response from 1 (115547ms) in cluster log
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587833
The cluster log...
Sridhar Seshasayee
01:27 PM Bug #64866 (New): rados/test.sh: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1 ...
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587349
There was a sim...
Sridhar Seshasayee
01:19 PM Bug #62832 (Resolved): common: config_proxy deadlock during shutdown (and possibly other times)
Patrick Donnelly
01:19 PM Backport #63457 (Resolved): quincy: common: config_proxy deadlock during shutdown (and possibly o...
Patrick Donnelly
12:44 PM Bug #64863 (New): rados/thrash-old-clients: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c ...
The following tests in the rados suite failed with the warning:
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testi...
Sridhar Seshasayee
12:21 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587455 Sridhar Seshasayee
11:26 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
now the segfault happens on check_one function where we also have pre-regex to truncate the output that causing segfa... Nitzan Mordechai
07:55 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
according to the console logs:... Nitzan Mordechai
04:22 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
Radoslaw Zarzynski wrote:
> The fix isn't merged yet which could explain the reoccurrence above
The run mentioned...
Sridhar Seshasayee
08:28 AM Bug #64514 (Duplicate): LibRadosTwoPoolsPP.PromoteSnapScrub test failed
Closing as this is a duplicate. Matan Breizman
08:27 AM Bug #64646: ceph osd pool rmsnap clone object leak
Radoslaw Zarzynski wrote:
> Need a squid backport as well.
Awaiting main merge (https://github.com/ceph/ceph/pull...
Matan Breizman
06:31 AM Bug #64854 (Fix Under Review): decoding chunk_refs_by_hash_t return wrong values
When running ceph dencoder test on clang-14 compiled JSON dump of chunk_refs_by_hash_t will show:... Nitzan Mordechai
06:02 AM Bug #56393: failed to complete snap trimming before timeout
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587430
/a/yuriw-2024-03-...
Sridhar Seshasayee
02:08 AM Bug #64824: mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err == 0)
Radoslaw Zarzynski wrote:
> Looks like a mon-scrub failure. This can be caused by a HW issue or by a corruption.
> ...
yite gu

03/11/2024

08:55 PM Bug #64438: NeoRadosWatchNotify.WatchNotifyTimeout times out along with FAILED ceph_assert(op->se...
Fails here in the neorados test:... Laura Flores
07:18 PM Feature #64849 (New): rados: Support read_from_replica everywhere
The Objecter supports read-from-replica if you pass in the LOCALIZE_READS flag. If we want to serve all read IO from ... Greg Farnum
06:40 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
There is PR posted: https://github.com/ceph/ceph/pull/55991 Ilya Dryomov
06:06 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
Hi Matan! Would you mind taking a look? Radoslaw Zarzynski
06:18 PM Bug #64670: LibRadosAioEC.RoundTrip2 hang and pkill
Bump up. Radoslaw Zarzynski
06:16 PM Bug #54182: OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
Review in progress. Radoslaw Zarzynski
06:15 PM Bug #64514: LibRadosTwoPoolsPP.PromoteSnapScrub test failed
Bump up. Radoslaw Zarzynski
06:09 PM Bug #64725: rados/singleton: application not enabled on pool 'rbd'
Fix is to add this to the ignorelist. Laura Flores
06:02 PM Bug #64646: ceph osd pool rmsnap clone object leak
Need a squid backport as well. Radoslaw Zarzynski
06:00 PM Bug #64824 (Need More Info): mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err =...
Looks like a mon-scrub failure. This can be caused by a HW issue or by a corruption.
Is there a sign of malfunctioni...
Radoslaw Zarzynski
08:24 AM Bug #64824 (Need More Info): mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err =...
-1> 2024-03-11T02:29:03.716+0000 7f6600eaf700 -1 /root/rpmbuild/BUILD/ceph-16.2.14/src/mon/Monitor.cc: In functio... yite gu
05:55 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
The fix isn't merged yet which could explain the reoccurrence above Radoslaw Zarzynski
02:45 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587684
/a/yuriw-2024-03-...
Sridhar Seshasayee
05:51 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Bump up. Radoslaw Zarzynski
05:50 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
1. I'm still nor sure we need @--force@. 2. If it turns justified, shouldn't it be @--yes-i-really-really-mean-it@? Radoslaw Zarzynski
05:42 PM Bug #64314: cluster log: Cluster log level string representation missing in the cluster logs.
Still in testing. Radoslaw Zarzynski

03/10/2024

07:37 AM Bug #64657 (Rejected): Ceph test cases starting cluster not waiting for OSDs to join fully
茁野 鲍 Thanks for letting us know!
i'll reject that bug
Nitzan Mordechai

03/08/2024

11:50 PM Bug #64804 (Duplicate): gcc-13 apparently breaks SafeTimer
Samuel Just
04:07 AM Bug #64804 (Duplicate): gcc-13 apparently breaks SafeTimer
https://github.com/ceph/ceph/pull/55886
Probably related to https://bugzilla.redhat.com/show_bug.cgi?id=2241339 .
Samuel Just
10:19 AM Bug #62338: osd: choose_async_recovery_ec may select an acting set < min_size
Hello again.
Apparently I got a tiny little bit too excited.
I tested the case described above with 16.2.15 and...
Bartosz Rabiega
12:26 AM Bug #64802 (New): rados: generalize stretch mode pg temp handling to be usable without stretch mode
PeeringState::calc_replicated_acting_stretch encodes special behavior for stretch clusters which prohibits the primar... Samuel Just

03/07/2024

12:17 PM Bug #64788 (Fix Under Review): EpollDriver::del_event() crashes when the nic is unplugged
Kefu Chai
11:48 AM Bug #64788 (Fix Under Review): EpollDriver::del_event() crashes when the nic is unplugged
librbd uses msgr to talk to its Ceph cluster. if the client's nic is hot unplugged, there is chance that @EpollDriver... Kefu Chai
09:04 AM Bug #64657: Ceph test cases starting cluster not waiting for OSDs to join fully
Thank you for addressing this issue. I appreciate your effort in fixing the issue.
I apologize for the oversight o...
茁野 鲍

03/06/2024

07:15 PM Bug #64726: LibRadosAioEC.MultiWritePP hang and pkill
... Radoslaw Zarzynski
07:14 PM Bug #64726: LibRadosAioEC.MultiWritePP hang and pkill
I think the direct reason behind the test's hang is the death of @osd.5@:... Radoslaw Zarzynski
08:22 AM Bug #64726: LibRadosAioEC.MultiWritePP hang and pkill
removed the "Related issues" Nitzan Mordechai
08:21 AM Bug #64726: LibRadosAioEC.MultiWritePP hang and pkill
last op that LibRadosAioEC.MultiWritePP trying to do is writing the oid_MultiWritePP_ obj:... Nitzan Mordechai
03:20 PM Bug #63389: Failed to encode map X with expected CRC
The problem came because of a commit that introduced the commented-out check for @SERVER_REEF@ in @OSDMap::encode()@.... Radoslaw Zarzynski
08:21 AM Bug #64735 (Need More Info): OSD/MON: rollback_to snap the latest overlap is not right
when rollback_to snap, we use the latest clone's current overlap to intersection_of older snapshot's clone overlap.
...
dian xing
07:43 AM Bug #62338: osd: choose_async_recovery_ec may select an acting set < min_size
Hello. Just FYI, this fixes a very nasty issue in my EC setup.
Here are some details.
The EC setup and crush rule...
Bartosz Rabiega

03/05/2024

10:50 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
/a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581448 Laura Flores
10:47 PM Bug #64726 (New): LibRadosAioEC.MultiWritePP hang and pkill
/a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581519... Laura Flores
10:42 PM Bug #55141: thrashers/fastread: assertion failure: rollback_info_trimmed_to == head
/a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581575 Laura Flores
10:33 PM Bug #64725 (Fix Under Review): rados/singleton: application not enabled on pool 'rbd'
/a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581526... Laura Flores
10:24 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
/a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581722
/a/yuriw-2024-03-04_20:52:58-rados-ree...
Laura Flores
10:24 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
Update on this: The PR is ready to be reviewed again. Laura Flores
01:04 PM Bug #64514 (In Progress): LibRadosTwoPoolsPP.PromoteSnapScrub test failed
Matan Breizman
01:04 PM Bug #64514: LibRadosTwoPoolsPP.PromoteSnapScrub test failed
This may be related to bug fixed in https://tracker.ceph.com/issues/64347. However, the outcome here is different whi... Matan Breizman
08:08 AM Bug #64657: Ceph test cases starting cluster not waiting for OSDs to join fully
Without the full log it will be hard to tell if the symptoms that I see are exactly as 茁野 鲍 see, but we are missing t... Nitzan Mordechai

03/04/2024

09:19 PM Backport #63526 (Resolved): quincy: crash: int OSD::shutdown(): assert(end_time - start_time_func...
Igor Fedotov
08:45 PM Bug #61140: crash: int OSD::shutdown(): assert(end_time - start_time_func < cct->_conf->osd_fast_...
https://github.com/ceph/ceph/pull/55134 merged Yuri Weinstein
08:07 PM Backport #58337 (Rejected): pacific: mon-stretched_cluster: degraded stretched mode lead to Monit...
Konstantin Shalygin
08:06 PM Backport #58337 (Duplicate): pacific: mon-stretched_cluster: degraded stretched mode lead to Moni...
pacific is EOL Konstantin Shalygin
08:07 PM Bug #59271 (Resolved): mon: FAILED ceph_assert(osdmon()->is_writeable())
Konstantin Shalygin
08:07 PM Backport #59700 (Rejected): pacific: mon: FAILED ceph_assert(osdmon()->is_writeable())
pacific is EOL Konstantin Shalygin
08:06 PM Bug #57017 (Resolved): mon-stretched_cluster: degraded stretched mode lead to Monitor crash
Konstantin Shalygin
08:00 PM Bug #64657: Ceph test cases starting cluster not waiting for OSDs to join fully
Hi Nitzan! Would you mind taking a look? Radoslaw Zarzynski
07:59 PM Bug #64637: LeakPossiblyLost in BlueStore::_do_write_small() in osd
Looks like typical symptom of (CPU/memory) starvation. Radoslaw Zarzynski
07:59 PM Bug #64646: ceph osd pool rmsnap clone object leak
note from bug scrub: reviewed, went to QA. Radoslaw Zarzynski
07:58 PM Bug #64514: LibRadosTwoPoolsPP.PromoteSnapScrub test failed
Bump up. Radoslaw Zarzynski
07:56 PM Bug #54182: OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
note from bug scrub: reviewed, changes requested. Radoslaw Zarzynski
07:55 PM Bug #64670: LibRadosAioEC.RoundTrip2 hang and pkill
Might be something new. Bump up and observe. Radoslaw Zarzynski
07:53 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
note from scrub: the PR is approved. Needs-qa. Radoslaw Zarzynski
07:51 PM Bug #64674 (Resolved): src/scripts/ceph-backport.sh
I guess we don't need to backport anything. Radoslaw Zarzynski
07:49 PM Bug #64258: osd/PrimaryLogPG.cc: FAILED ceph_assert(inserted)
note from bug scrub: reviewed. Radoslaw Zarzynski
01:40 PM Bug #64258 (Fix Under Review): osd/PrimaryLogPG.cc: FAILED ceph_assert(inserted)
Nitzan Mordechai
07:49 PM Bug #64695: Aborted signal starting in AsyncConnection::send_message()
... Radoslaw Zarzynski
05:39 PM Bug #64695 (New): Aborted signal starting in AsyncConnection::send_message()
/a/yuriw-2024-03-01_16:47:30-rados-wip-yuri11-testing-2024-02-28-0950-reef-distro-default-smithi/7577623... Laura Flores
07:44 PM Bug #64314: cluster log: Cluster log level string representation missing in the cluster logs.
Still in QA. Bump up. Radoslaw Zarzynski
07:36 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
Thank you very, very much for the scenario! This throws a lot of light on what has happened.
I'm not sure whether th...
Radoslaw Zarzynski
07:32 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
note from bug scrub: Aishwarya is addressing the review's comments. Radoslaw Zarzynski
06:27 PM Bug #53240: full-object read crc is mismatch, because truncate modify oi.size and forget to clear...
The fix goes into QA. Radoslaw Zarzynski
12:18 AM Bug #63066: rados/objectstore - application not enabled on pool '.mgr'
/a/yuriw-2024-02-28_15:47:41-rados-wip-yuri4-testing-2024-02-27-1111-quincy-distro-default-smithi/7575815
/a/yuriw-2...
Laura Flores

03/01/2024

11:19 PM Bug #64674: src/scripts/ceph-backport.sh
revert PR: https://github.com/ceph/ceph/pull/55884
will fix this
Kamoltat (Junior) Sirivadhna
11:16 PM Bug #64674 (Resolved): src/scripts/ceph-backport.sh
src/script/ceph-backport.sh: line 1737: ../../../ceph/.github/pull_request_template.md: No such file or directory
...
Kamoltat (Junior) Sirivadhna
11:01 PM Backport #64673 (In Progress): quincy: test_pool_min_size: AssertionError: wait_for_clean: failed...
Kamoltat (Junior) Sirivadhna
10:58 PM Backport #64673 (In Progress): quincy: test_pool_min_size: AssertionError: wait_for_clean: failed...
https://github.com/ceph/ceph/pull/55882 Backport Bot
10:58 PM Backport #64672 (New): pacific: test_pool_min_size: AssertionError: wait_for_clean: failed before...
Backport Bot
10:58 PM Backport #64671 (New): reef: test_pool_min_size: AssertionError: wait_for_clean: failed before ti...
Backport Bot
10:55 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
/a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576306 Laura Flores
10:54 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
/a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576311 Laura Flores
10:53 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
/a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576314 Laura Flores
09:30 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
/a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576298 Laura Flores
10:53 PM Bug #59172 (Pending Backport): test_pool_min_size: AssertionError: wait_for_clean: failed before ...
Kamoltat (Junior) Sirivadhna
10:51 PM Bug #64670 (New): LibRadosAioEC.RoundTrip2 hang and pkill
/a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576303... Laura Flores
12:11 PM Backport #64649 (In Progress): quincy: min_last_epoch_clean is not updated, causing osdmap to be ...
Mykola Golub
12:00 PM Backport #64650 (In Progress): reef: min_last_epoch_clean is not updated, causing osdmap to be un...
Mykola Golub
11:44 AM Backport #64651 (In Progress): squid: min_last_epoch_clean is not updated, causing osdmap to be u...
Mykola Golub
09:19 AM Bug #64657: Ceph test cases starting cluster not waiting for OSDs to join fully
eg. for reproduce the issue:
diff slicer-src/src/test/osd/safe-to-destroy.sh
function run() {
@@ -32,18 +32,3...
茁野 鲍
09:12 AM Bug #64657 (Rejected): Ceph test cases starting cluster not waiting for OSDs to join fully
I've identified an issue in the Ceph testing framework where, after starting a temporary cluster using functions like... 茁野 鲍

02/29/2024

09:25 PM Backport #64406: reef: Failed to encode map X with expected CRC
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/55712
merged
Yuri Weinstein
09:00 PM Bug #64637: LeakPossiblyLost in BlueStore::_do_write_small() in osd
Laura Flores wrote:
> /a/yuriw-2024-02-22_21:33:08-rados-wip-yuri8-testing-2024-02-22-0734-reef-distro-default-smith...
Laura Flores
09:00 PM Bug #64637 (New): LeakPossiblyLost in BlueStore::_do_write_small() in osd
Laura Flores
08:57 PM Bug #64637 (Duplicate): LeakPossiblyLost in BlueStore::_do_write_small() in osd
Laura Flores
08:54 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
/a/yuriw-2024-02-28_22:39:54-rados-wip-yuri8-testing-2024-02-22-0734-reef-distro-default-smithi/7576288 Laura Flores
08:42 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
/a/yuriw-2024-02-28_22:39:54-rados-wip-yuri8-testing-2024-02-22-0734-reef-distro-default-smithi/7576292
Laura Flores
06:26 PM Backport #64651 (In Progress): squid: min_last_epoch_clean is not updated, causing osdmap to be u...
https://github.com/ceph/ceph/pull/55865 Backport Bot
06:15 PM Backport #64650 (In Progress): reef: min_last_epoch_clean is not updated, causing osdmap to be un...
https://github.com/ceph/ceph/pull/55867 Backport Bot
06:15 PM Backport #64649 (In Progress): quincy: min_last_epoch_clean is not updated, causing osdmap to be ...
https://github.com/ceph/ceph/pull/55868 Backport Bot
06:08 PM Bug #63883 (Pending Backport): min_last_epoch_clean is not updated, causing osdmap to be unable t...
Mykola Golub
02:46 PM Bug #64646 (Fix Under Review): ceph osd pool rmsnap clone object leak
There are 2 ways to remove pool snaps, rados tool or mon command (ceph osd pool rmsnap).
It seems that the monitor c...
Matan Breizman
07:02 AM Bug #53342: Exiting scrub checking -- not all pgs scrubbed
Radoslaw Zarzynski wrote:
> Ronen, do we need any backporting?
No. The fix (55478) made it in time for Squid.
Ronen Friedman
04:19 AM Bug #64471: osd: upgrades from v18.2.[01] to main fail with "heartbeat_check: no reply from"
Xiubo Li wrote:
> Patrick,
>
> From console_logs/smithi196.log, the kernel just crashed when copying data from us...
Venky Shankar
12:44 AM Bug #64471: osd: upgrades from v18.2.[01] to main fail with "heartbeat_check: no reply from"
Xiubo Li wrote:
> Patrick,
>
> From console_logs/smithi196.log, the kernel just crashed when copying data from us...
Xiubo Li

02/28/2024

10:32 PM Bug #64637 (New): LeakPossiblyLost in BlueStore::_do_write_small() in osd
/a/yuriw-2024-02-22_21:33:08-rados-wip-yuri8-testing-2024-02-22-0734-reef-distro-default-smithi/7571350... Laura Flores
12:07 AM Bug #64471: osd: upgrades from v18.2.[01] to main fail with "heartbeat_check: no reply from"
Patrick,
From console_logs/smithi196.log, the kernel just crashed when copying data from userspace:...
Xiubo Li

02/27/2024

04:34 PM Bug #64514: LibRadosTwoPoolsPP.PromoteSnapScrub test failed
Hi guys,
this bug came up a few weeks ago and I've asked one of the PR authors of the run I was reviewing to take ...
Kamoltat (Junior) Sirivadhna
03:28 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
Nicolas Dandrimont wrote:
> I believe I understand what might have gone wrong though: One of our benchmarking script...
Nicolas Dandrimont
02:01 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
Hi!
Radoslaw Zarzynski wrote:
> Loic, is there an object store of one of those dead OSDs available for investigat...
Nicolas Dandrimont
12:29 PM Bug #64504: aio ops queued but never executed
So far what I see is that the client op didn't get scheduled at all on osd.1 due a continuous stream of higher priori... Sridhar Seshasayee
01:02 AM Bug #64194 (Duplicate): make check(arm64): unittest_rgw_dmclock_scheduler Failed
Rixin Luo

02/26/2024

07:32 PM Bug #54182: OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
Bump up. Radoslaw Zarzynski
07:30 PM Bug #64347: src/osd/PG.cc: FAILED ceph_assert(!bad || !cct->_conf->osd_debug_verify_cached_snaps)
Bump up – needs qa. Radoslaw Zarzynski
07:24 PM Bug #64438: NeoRadosWatchNotify.WatchNotifyTimeout times out along with FAILED ceph_assert(op->se...
Continuing to look at this bug. Laura Flores
07:20 PM Bug #64258 (In Progress): osd/PrimaryLogPG.cc: FAILED ceph_assert(inserted)
Moving back to _in progress_ per https://github.com/ceph/ceph/pull/55410#issuecomment-1945423142. Radoslaw Zarzynski
07:14 PM Bug #64460: rados/upgrade/parallel: "[WRN] MON_DOWN: 1/3 mons down, quorum a,b" in cluster log
This needs a whitelist PR. Discussed in bug scrub. Laura Flores
07:09 PM Bug #64471: osd: upgrades from v18.2.[01] to main fail with "heartbeat_check: no reply from"
I think we were investigating it together with Patrick on @CEPH-RADOS@. The finding was that the entire node was unre... Radoslaw Zarzynski
07:07 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
Bump up. Radoslaw Zarzynski
07:06 PM Bug #53342 (Resolved): Exiting scrub checking -- not all pgs scrubbed
Ronen, do we need any backporting? Radoslaw Zarzynski
07:05 PM Bug #61385: TEST_dump_scrub_schedule fails from "key is query_active: negation:0 # expected: true...
In the previous tracker, the error message was "expected: false, in actual: true". This one is "expected: true, in ac... Laura Flores
07:02 PM Bug #64504: aio ops queued but never executed
Asked Sridhar to judge whether it's dmclock-related. Radoslaw Zarzynski
06:58 PM Bug #62777: rados/valgrind-leaks: expected valgrind issues and found none
Let's watch to see if this is fixed by https://github.com/ceph/ceph/pull/52639. Laura Flores
06:55 PM Bug #64514: LibRadosTwoPoolsPP.PromoteSnapScrub test failed
Hmm, it seems to happen *before* the scrub part:... Radoslaw Zarzynski
06:50 PM Bug #64558 (Fix Under Review): librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove
Radoslaw Zarzynski
02:43 AM Bug #64558 (Fix Under Review): librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove
librados::OPERATION_FULL_FORCE should be translated to CEPH_OSD_FLAG_FULL_FORCE before calling IoCtxImpl::remove().
...
Yuanrun Chen
06:46 PM Backport #64576 (New): quincy: Incorrect behavior on combined cmpext+write ops in the face of ses...
Backport Bot
06:46 PM Backport #64575 (New): reef: Incorrect behavior on combined cmpext+write ops in the face of sessi...
Backport Bot
06:44 PM Bug #64314: cluster log: Cluster log level string representation missing in the cluster logs.
Bump up. Already in the QA. Radoslaw Zarzynski
06:42 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Bump up. Radoslaw Zarzynski
06:40 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
bump up Radoslaw Zarzynski
06:40 PM Bug #64192 (Pending Backport): Incorrect behavior on combined cmpext+write ops in the face of ses...
Radoslaw Zarzynski
05:04 PM Fix #64573 (Fix Under Review): singleton/ec-inconsistent-hinfo.yaml: Include a possible benign cl...
Sridhar Seshasayee
04:37 PM Fix #64573 (Pending Backport): singleton/ec-inconsistent-hinfo.yaml: Include a possible benign cl...
The changes introduced as part of PR: https://github.com/ceph/ceph/pull/53524
made the randomized values of osd_op_q...
Sridhar Seshasayee
02:19 PM Bug #64562: Occasional segmentation faults in ScrubQueue::collect_ripe_jobs
. Pablo Higueras
02:18 PM Bug #64562: Occasional segmentation faults in ScrubQueue::collect_ripe_jobs
No problem with the rename Igor, thank you!
Igor Fedotov wrote:
> Hi Paolo,
> mind me renaming the ticket to som...
Pablo Higueras
02:16 PM Bug #64562: Occasional segmentation faults in ScrubQueue::collect_ripe_jobs
Igor Fedotov wrote:
> Hi Paolo,
> mind me renaming the ticket to something like "Occasional segmentation faults in ...
Pablo Higueras
01:27 PM Bug #64562: Occasional segmentation faults in ScrubQueue::collect_ripe_jobs
Hi Paolo,
mind me renaming the ticket to something like "Occasional segmentation faults in ScrubQueue::collect_ripe_...
Igor Fedotov
10:07 AM Bug #64562 (New): Occasional segmentation faults in ScrubQueue::collect_ripe_jobs
Hello!
Igor Fedotov suggested me to open a new ticket under RADOS subproject, the original ticket is [[https://tra...
Pablo Higueras

02/22/2024

05:21 PM Feature #56956: osdc: Add objecter fastfail
> There is no point in indefinitely waiting when pg of an object is inactive.
This is not correct for CephFS or RB...
Dan van der Ster
03:04 PM Backport #64406 (In Progress): reef: Failed to encode map X with expected CRC
Radoslaw Zarzynski
07:04 AM Bug #59196 (Fix Under Review): ceph_test_lazy_omap_stats segfault while waiting for active+clean
Nitzan Mordechai
12:04 AM Backport #63843 (In Progress): quincy: Add health error if one or more OSDs registered v1/v2 publ...
Prashant D
12:01 AM Backport #63842 (In Progress): reef: Add health error if one or more OSDs registered v1/v2 public...
Prashant D

02/21/2024

07:42 PM Bug #64519: OSD/MON: No snapshot metadata keys trimming
This reminded me of the notes in https://pad.ceph.com/p/removing_removed_snaps/timeslider#4651 that talk about why th... Joshua Baergen
10:22 AM Bug #64519 (New): OSD/MON: No snapshot metadata keys trimming
The Monitor's keys of purged_snap_ / purged_epoch_ and OSD's PSN_ (SnapMapper::PURGED_SNAP_PREFIX) keys are not trimm... Matan Breizman
03:13 PM Bug #63389: Failed to encode map X with expected CRC
The actual fix (not just the log message change) is: https://github.com/ceph/ceph/pull/55401.
It got approved 2 hour...
Radoslaw Zarzynski
03:50 AM Bug #64514 (Duplicate): LibRadosTwoPoolsPP.PromoteSnapScrub test failed
In rados_api_tests: ... Aishwarya Mathuria

02/20/2024

05:33 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
This tracker looks very interesting: https://tracker.ceph.com/issues/57757. Radoslaw Zarzynski
05:00 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
Links to crash sites:
* https://github.com/ceph/ceph/blob/v17.2.7/src/osd/ECBackend.cc#L676
* https://github.co...
Radoslaw Zarzynski
05:33 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
/a/yuriw-2024-02-14_14:58:57-rados-wip-yuri4-testing-2024-02-13-1546-distro-default-smithi/7560007/ Aishwarya Mathuria
05:17 PM Bug #62777: rados/valgrind-leaks: expected valgrind issues and found none
/a/yuriw-2024-02-14_14:58:57-rados-wip-yuri4-testing-2024-02-13-1546-distro-default-smithi/7559915 Aishwarya Mathuria
11:50 AM Bug #58130: LibRadosAio.SimpleWrite hang and pkill
Thanks for Aishwarya who also looked on the queued ops that didn't executed, i opened new bug for it: https://tracker... Nitzan Mordechai
11:43 AM Bug #64504 (New): aio ops queued but never executed
Few of teuthology tests were failed when trying to execute aio_write and then wait_for_complete never completed.
the...
Nitzan Mordechai

02/19/2024

06:34 PM Bug #61385: TEST_dump_scrub_schedule fails from "key is query_active: negation:0 # expected: true...
Why copying into a new bug report, instead of marking as duplicate?
Ronen Friedman
06:32 PM Bug #53342: Exiting scrub checking -- not all pgs scrubbed
I think we can close this bug as 'resolved' Ronen Friedman
06:31 PM Bug #62119: timeout on reserving replicsa
@aishwarya - I think we can lower the severity, or maybe even close this bug.
It seems as though some specific tests...
Ronen Friedman
06:28 PM Bug #64310 (Rejected): osd/scrub: PGs remain in the scrub queue after an interval change
My mistake. Not exactly a bug.
(Fuller explanation:
recent changes to the scrub state-machine changed the point in ...
Ronen Friedman
06:25 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13

Will take a look.
Ronen Friedman
 

Also available in: Atom