Project

General

Profile

Activity

From 02/29/2024 to 03/29/2024

Today

03:05 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
Closing https://github.com/ceph/ceph/pull/55596 in favour of https://github.com/ceph/ceph/pull/56574 Brad Hubbard

03/28/2024

11:14 AM Bug #65185: OSD_SCRUB_ERROR, inconsistent pg in upgrade tests
The OSDMap CRC issue is clearly there but I'm not sure / I doubt it can explain the scrub error.
Let's ask Ronen for...
Radoslaw Zarzynski
11:03 AM Bug #65186: OSDs unreachable in upgrade test
... Radoslaw Zarzynski
11:02 AM Backport #65198: squid: Failed to encode map X with expected CRC
https://github.com/ceph/ceph/pull/56553 Radoslaw Zarzynski
10:31 AM Backport #65198 (In Progress): squid: Failed to encode map X with expected CRC
Radoslaw Zarzynski
10:29 AM Backport #65198 (In Progress): squid: Failed to encode map X with expected CRC
Backport Bot
07:03 AM Bug #64824: mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err == 0)
... yite gu

03/27/2024

10:35 PM Bug #64972: qa: "ceph tell 4.3a deep-scrub" command not found
Laura Flores wrote:
> Strange, the syntax in the text snippet works in a vstart cluster:
> [...]
The issue, I be...
Patrick Donnelly
10:09 PM Bug #64972: qa: "ceph tell 4.3a deep-scrub" command not found
Strange, the syntax in the text snippet works in a vstart cluster:... Laura Flores
08:33 PM Bug #65186: OSDs unreachable in upgrade test
/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7615991 Laura Flores
08:29 PM Bug #65186: OSDs unreachable in upgrade test
Possibly a dupe of the related tracker (crc encoding issues) Laura Flores
08:28 PM Bug #65186 (New): OSDs unreachable in upgrade test
/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7616011/remote/smithi087/log/a8e8c570-e819-11ee... Laura Flores
08:31 PM Bug #65185: OSD_SCRUB_ERROR, inconsistent pg in upgrade tests
Laura Flores wrote:
> /a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7616025/remote/smithi098...
Laura Flores
08:21 PM Bug #65185 (New): OSD_SCRUB_ERROR, inconsistent pg in upgrade tests
/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7616025/remote/smithi098/log/b1f19696-e81a-11ee... Laura Flores
04:43 PM Bug #65183 (Fix Under Review): Overriding an EC pool needs the "--yes-i-really-mean-it" flag in a...
Radoslaw Zarzynski
04:23 PM Bug #65183: Overriding an EC pool needs the "--yes-i-really-mean-it" flag in addition to "force"
Likely coming from this change:
https://github.com/ceph/ceph/pull/56287
Laura Flores
04:23 PM Bug #65183 (Fix Under Review): Overriding an EC pool needs the "--yes-i-really-mean-it" flag in a...
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623454... Laura Flores
12:07 PM Bug #51725 (Resolved): make bufferlist::c_str() skip rebuild when it isn't necessary
Konstantin Shalygin
12:06 PM Backport #52595 (Rejected): pacific: make bufferlist::c_str() skip rebuild when it isn't necessary
Pacific is EOL Konstantin Shalygin
12:06 PM Bug #51843 (Resolved): osd/scrub: OSD crashes at PG removal
Konstantin Shalygin
12:06 PM Backport #53340 (Rejected): pacific: osd/scrub: OSD crashes at PG removal
Pacific is EOL Konstantin Shalygin
12:05 PM Bug #53294 (Resolved): rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
Konstantin Shalygin
12:05 PM Bug #49525 (Resolved): found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi10121515-14:e4 ...
Konstantin Shalygin
12:04 PM Backport #55973 (Rejected): pacific: found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi1...
Pacific is EOL Konstantin Shalygin
12:04 PM Backport #56656 (Rejected): pacific: rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlu...
Pacific is EOL Konstantin Shalygin
12:02 PM Backport #64672 (Rejected): pacific: test_pool_min_size: AssertionError: wait_for_clean: failed b...
Pacific is EOL Konstantin Shalygin
12:01 PM Backport #64410 (In Progress): quincy: map eXX had wrong heartbeat addr
Konstantin Shalygin
12:00 PM Backport #64412 (In Progress): reef: map eXX had wrong heartbeat addr
Konstantin Shalygin
11:58 AM Backport #64411 (Rejected): pacific: map eXX had wrong heartbeat addr
Pacific is EOL Konstantin Shalygin
11:56 AM Backport #64407 (Rejected): pacific: Expected warnings that need to be whitelisted cause rados/ce...
Pacific is EOL Konstantin Shalygin
11:56 AM Backport #64157 (Rejected): pacific: CommandFailedError (rados/test_python.sh): "RADOS object not...
Pacific is EOL Konstantin Shalygin
11:55 AM Backport #59675 (Rejected): pacific: osd:tick checking mon for new map
Pacific is EOL Konstantin Shalygin
11:54 AM Backport #58870 (Rejected): pacific: ClsLock.TestExclusiveEphemeralStealEphemeral failed
Pacific is EOL Konstantin Shalygin
11:16 AM Bug #57061: Use single cluster log level (mon_cluster_log_level) config to control verbosity of c...
In QA. Radoslaw Zarzynski
11:15 AM Bug #64258: osd/PrimaryLogPG.cc: FAILED ceph_assert(inserted)
Sent to QA. Radoslaw Zarzynski
09:19 AM Bug #54744: crash: void MonMap::add(const mon_info_t&): assert(addr_mons.count(a) == 0)
The priority level is set to "minor" ... when the time comes that messenger v1 is deprecated ... operators will disab... Stefan Kooman
09:17 AM Bug #54744: crash: void MonMap::add(const mon_info_t&): assert(addr_mons.count(a) == 0)
This should be fixed indeed. I wanted to disable msgv1 on this cluster. I already had set the flag "ceph config set m... Stefan Kooman
02:17 AM Feature #65163 (New): Rados:Provide options for data compression levels, specified with -l, to en...
wei zhu

03/26/2024

04:37 PM Bug #54439: LibRadosWatchNotify.WatchNotify2Multi fails
/a/yuriw-2024-03-20_18:33:32-rados-wip-yuri6-testing-2024-03-18-1406-squid-distro-default-smithi/7613112... Laura Flores
04:26 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
/a/yuriw-2024-03-20_18:33:32-rados-wip-yuri6-testing-2024-03-18-1406-squid-distro-default-smithi/7613235 Laura Flores
04:20 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
/a/yuriw-2024-03-20_18:33:32-rados-wip-yuri6-testing-2024-03-18-1406-squid-distro-default-smithi/7613108 Laura Flores
02:45 PM Bug #64519: OSD/MON: No snapshot metadata keys trimming
Adding 53545 as a candidate for fixing this issue, this will require additional documentation on how to use the tool ... Matan Breizman
09:06 AM Bug #64519 (In Progress): OSD/MON: No snapshot metadata keys trimming
https://tracker.ceph.com/issues/62983 should help with avoiding the gaps in the purged snaps ids intervals. As a resu... Matan Breizman
12:57 PM Backport #65150 (In Progress): squid: cluster log: Cluster log level string representation missin...
Sridhar Seshasayee
11:59 AM Backport #65150 (In Progress): squid: cluster log: Cluster log level string representation missin...
https://github.com/ceph/ceph/pull/56478 Backport Bot
12:55 PM Backport #65151 (In Progress): squid: singleton/ec-inconsistent-hinfo.yaml: Include a possible be...
Sridhar Seshasayee
12:50 PM Backport #65151 (In Progress): squid: singleton/ec-inconsistent-hinfo.yaml: Include a possible be...
https://github.com/ceph/ceph/pull/56477 Sridhar Seshasayee
11:57 AM Fix #64573 (Pending Backport): singleton/ec-inconsistent-hinfo.yaml: Include a possible benign cl...
Sridhar Seshasayee
04:32 AM Fix #64573 (Resolved): singleton/ec-inconsistent-hinfo.yaml: Include a possible benign cluster lo...
Sridhar Seshasayee
11:56 AM Bug #58436: ceph cluster log reporting log level in numeric format for the clog messages
Radoslaw Zarzynski wrote:
> Do we need to backport?
Yes, this along with https://tracker.ceph.com/issues/64314 ne...
Sridhar Seshasayee
11:56 AM Bug #64314 (Pending Backport): cluster log: Cluster log level string representation missing in th...
Sridhar Seshasayee
11:30 AM Backport #65141 (In Progress): reef: osd: modify PG deletion cost for mClock scheduler
Aishwarya Mathuria
07:27 AM Backport #65141 (In Progress): reef: osd: modify PG deletion cost for mClock scheduler
https://github.com/ceph/ceph/pull/56475 Backport Bot
11:26 AM Backport #65140 (In Progress): squid: osd: modify PG deletion cost for mClock scheduler
Aishwarya Mathuria
07:27 AM Backport #65140 (In Progress): squid: osd: modify PG deletion cost for mClock scheduler
https://github.com/ceph/ceph/pull/56474 Backport Bot
09:01 AM Bug #62983 (Resolved): OSD/MON: purged snap keys are not merged
An alternative solution is to avoid incrementing the snapid on removal to avoid the gaps. Matan Breizman
07:20 AM Bug #65139 (Pending Backport): osd: modify PG deletion cost for mClock scheduler
Aishwarya Mathuria
07:20 AM Bug #65139 (Pending Backport): osd: modify PG deletion cost for mClock scheduler
With the osd_delete_sleep_ssd and osd_delete_sleep_hdd options disabled with mClock, it was noticed that PG deletion ... Aishwarya Mathuria
04:35 AM Bug #62171 (Resolved): All OSD shards should use the same scheduler type when osd_op_queue=debug_...
Sridhar Seshasayee
04:34 AM Backport #63874 (Resolved): reef: All OSD shards should use the same scheduler type when osd_op_q...
Sridhar Seshasayee
04:32 AM Backport #64881 (Resolved): reef: singleton/ec-inconsistent-hinfo.yaml: Include a possible benign...
Sridhar Seshasayee

03/25/2024

09:37 PM Bug #63066: rados/objectstore - application not enabled on pool '.mgr'
/a/yuriw-2024-03-22_13:10:42-rados-wip-yuri7-testing-2024-03-20-1625-quincy-distro-default-smithi/7616976 Laura Flores
09:24 PM Backport #64881: reef: singleton/ec-inconsistent-hinfo.yaml: Include a possible benign cluster lo...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/56151
merged
Yuri Weinstein
09:24 PM Bug #64725: rados/singleton: application not enabled on pool 'rbd'
/a/yuriw-2024-03-22_13:10:42-rados-wip-yuri7-testing-2024-03-20-1625-quincy-distro-default-smithi/7616657 Laura Flores
09:23 PM Backport #63874: reef: All OSD shards should use the same scheduler type when osd_op_queue=debug_...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/54981
merged
Yuri Weinstein
09:05 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
/a/yuriw-2024-03-22_13:09:48-rados-wip-yuri11-testing-2024-03-21-0851-reef-distro-default-smithi/7616706 Laura Flores
08:57 PM Bug #59057 (Resolved): rados/test_envlibrados_for_rocksdb.sh: No rule to make target 'rocksdb_env...
Laura Flores
06:39 PM Bug #64854: decoding chunk_refs_by_hash_t return wrong values
In QA. Radoslaw Zarzynski
06:38 PM Bug #63891 (Fix Under Review): mon/AuthMonitor: fix potential repeated global_id
Bump up. Radoslaw Zarzynski
06:37 PM Bug #64997 (Need More Info): There is always an osd process that takes up high cpu
Note from bugscrub: need a summary here. Radoslaw Zarzynski
06:36 PM Bug #64670: LibRadosAioEC.RoundTrip2 hang and pkill
Looks like a starvation? Radoslaw Zarzynski
06:28 PM Bug #64972: qa: "ceph tell 4.3a deep-scrub" command not found
Radoslaw Zarzynski wrote:
> Patrick, are you posting the PR as a culprit?
yes, is it not?
Patrick Donnelly
05:50 PM Bug #64972: qa: "ceph tell 4.3a deep-scrub" command not found
Patrick, are you posting the PR as a culprit? Radoslaw Zarzynski
06:18 PM Bug #63198: rados/thrash: AssertionError: wait_for_recovery: failed before timeout expired
Bump up but not terribly high prio. Radoslaw Zarzynski
06:14 PM Bug #65013: replica_read not available on most recently updated objects in each PG
Bump up. Radoslaw Zarzynski
06:12 PM Bug #65044: osd/scrub: must disable reservation timeout for reserver-based requests
Bump up. Radoslaw Zarzynski
06:03 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
A commend for local replication from Ronen (thanks!):... Radoslaw Zarzynski
08:38 AM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620817 Nitzan Mordechai
06:02 PM Bug #62209: can not promote object at readonly tier mode
If the fix helps, we can reopen and merge. Radoslaw Zarzynski
03:49 AM Bug #62209: can not promote object at readonly tier mode
Okay, thanks, I will try it. By the way, does ceph have other caching solutions now? Arthur ho
05:57 PM Backport #65081 (Resolved): squid: mon: MON_DOWN warnings when mons are first booting
Patrick Donnelly
05:57 PM Bug #56393: failed to complete snap trimming before timeout
I will merge the PR mentioned by Matan above. Radoslaw Zarzynski
10:08 AM Bug #56393: failed to complete snap trimming before timeout
Radoslaw Zarzynski wrote:
> Hi Matan,
> would you mind taking a look? Not a high priority.
I suspect that the ne...
Matan Breizman
05:52 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Review in progress. Radoslaw Zarzynski
05:51 PM Bug #53240: full-object read crc is mismatch, because truncate modify oi.size and forget to clear...
Bump up. Radoslaw Zarzynski
05:48 PM Backport #65121 (New): reef: PG autoscaler tuning => catastrophic ceph cluster crash
Backport Bot
05:48 PM Backport #65120 (New): squid: PG autoscaler tuning => catastrophic ceph cluster crash
Backport Bot
05:48 PM Backport #65119 (New): quincy: PG autoscaler tuning => catastrophic ceph cluster crash
Backport Bot
05:45 PM Bug #64333 (Pending Backport): PG autoscaler tuning => catastrophic ceph cluster crash
Zac has already created some backports. Radoslaw Zarzynski
03:12 PM Bug #64802 (Fix Under Review): rados: generalize stretch mode pg temp handling to be usable witho...
Kamoltat (Junior) Sirivadhna
02:44 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
Okay so the latest change that I added will have two commands:... Kamoltat (Junior) Sirivadhna
01:45 PM Bug #63881 (Fix Under Review): Inaccurate pg splits/merges and pool deletion/creation on OSD mapgap
Matan Breizman

03/24/2024

12:05 PM Backport #65097 (In Progress): squid: ceph osd pool rmsnap clone object leak
Matan Breizman
11:58 AM Backport #65097 (In Progress): squid: ceph osd pool rmsnap clone object leak
https://github.com/ceph/ceph/pull/56432 Backport Bot
12:05 PM Backport #65096 (In Progress): reef: ceph osd pool rmsnap clone object leak
Matan Breizman
11:58 AM Backport #65096 (In Progress): reef: ceph osd pool rmsnap clone object leak
https://github.com/ceph/ceph/pull/56431 Backport Bot
12:04 PM Backport #65095 (In Progress): quincy: ceph osd pool rmsnap clone object leak
Matan Breizman
11:57 AM Backport #65095 (In Progress): quincy: ceph osd pool rmsnap clone object leak
https://github.com/ceph/ceph/pull/56430 Backport Bot
11:55 AM Bug #64646 (Pending Backport): ceph osd pool rmsnap clone object leak
Matan Breizman
09:46 AM Bug #64917 (Fix Under Review): SnapMapperTest.CheckObjectKeyFormat object key changed
Matan Breizman

03/23/2024

03:50 PM Bug #65044 (Fix Under Review): osd/scrub: must disable reservation timeout for reserver-based req...
Ronen Friedman

03/22/2024

06:00 PM Bug #65090 (New): rados: the object most recently written on a pg won't be available for replica ...
In practice, this means that at any time at least one object on each PG won't be available for replica read. On a po... Samuel Just
05:59 PM Bug #65086 (New): rados: replicas do not initialize their mlcod value upon activation, replica re...
... Samuel Just
05:55 PM Bug #65085 (New): rados: replica mlcod tends to lag by two cycles rather than one limiting replic...
The replica and the primary populate RepModify::last_complete and RepGather::pg_local_last_complete prior to doing Pr... Samuel Just
04:08 PM Backport #65082 (In Progress): reef: mon: MON_DOWN warnings when mons are first booting
Patrick Donnelly
03:59 PM Backport #65082 (In Progress): reef: mon: MON_DOWN warnings when mons are first booting
https://github.com/ceph/ceph/pull/56408 Backport Bot
04:06 PM Backport #65081 (In Progress): squid: mon: MON_DOWN warnings when mons are first booting
Patrick Donnelly
03:59 PM Backport #65081 (Resolved): squid: mon: MON_DOWN warnings when mons are first booting
https://github.com/ceph/ceph/pull/56407 Backport Bot
03:56 PM Bug #64968 (Pending Backport): mon: MON_DOWN warnings when mons are first booting
Patrick Donnelly
02:41 PM Backport #62921 (In Progress): quincy: mon/MonmapMonitor: do not propose on error in prepare_update
Patrick Donnelly
02:39 PM Backport #62923 (In Progress): reef: mon/MonmapMonitor: do not propose on error in prepare_update
Patrick Donnelly
02:34 PM Backport #62922 (Rejected): pacific: mon/MonmapMonitor: do not propose on error in prepare_update
EOL Patrick Donnelly
01:31 PM Bug #65044 (In Progress): osd/scrub: must disable reservation timeout for reserver-based requests
Ronen Friedman
11:41 AM Backport #65072 (New): squid: rados/thrash: slow reservation response from 1 (115547ms) in cluste...
Backport Bot
11:34 AM Bug #64869 (Pending Backport): rados/thrash: slow reservation response from 1 (115547ms) in clust...
Ronen Friedman
01:06 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
Looking at the above crash which is referred to in https://github.com/ceph/ceph/pull/55596#issuecomment-2011798771
...
Brad Hubbard

03/21/2024

11:51 PM Bug #65013: replica_read not available on most recently updated objects in each PG
'last_complete_ondisk is updated to' only happens on the primary from PeeringState::calc_min_last_complete_ondisk(), ... Samuel Just
02:56 PM Bug #65013: replica_read not available on most recently updated objects in each PG
Also note, pgs from the data pool that couldn't serve from replica:... Yehuda Sadeh
02:52 PM Bug #65013: replica_read not available on most recently updated objects in each PG
Reproduced the issue again on a new dev environment.
Created env:...
Yehuda Sadeh
12:20 AM Bug #65013: replica_read not available on most recently updated objects in each PG
Some definitions:
pg_info_t::last_update: most recent update seen by an OSD (primary or replica), should be the sa...
Samuel Just
03:11 PM Bug #65044 (Fix Under Review): osd/scrub: must disable reservation timeout for reserver-based req...
Ronen Friedman
03:00 PM Backport #65042 (New): squid: librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove
Backport Bot
03:00 PM Backport #65041 (New): quincy: librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove
Backport Bot
03:00 PM Backport #65040 (New): reef: librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove
Backport Bot
03:00 PM Bug #64558 (Pending Backport): librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove
Ilya Dryomov
09:23 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
/a/yuriw-2024-03-19_00:09:45-rados-wip-yuri5-testing-2024-03-18-1144-distro-default-smithi/7609959 Aishwarya Mathuria
09:22 AM Bug #64917: SnapMapperTest.CheckObjectKeyFormat object key changed
/a/yuriw-2024-03-19_00:09:45-rados-wip-yuri5-testing-2024-03-18-1144-distro-default-smithi/7609912 Aishwarya Mathuria
07:47 AM Bug #63198: rados/thrash: AssertionError: wait_for_recovery: failed before timeout expired
/a/yuriw-2024-03-19_00:09:45-rados-wip-yuri5-testing-2024-03-18-1144-distro-default-smithi/7609848 Aishwarya Mathuria
07:35 AM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
/a/yuriw-2024-03-19_00:09:45-rados-wip-yuri5-testing-2024-03-18-1144-distro-default-smithi/7609843 Aishwarya Mathuria

03/20/2024

11:20 PM Backport #63879: quincy: tools/ceph_objectstore_tool: Support get/set/superblock
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/55014
merged
Yuri Weinstein
11:15 PM Bug #65013: replica_read not available on most recently updated objects in each PG
In addition to the observation that mlcod is lagging by more than it should, there's a second detail:... Samuel Just
11:04 PM Bug #65013: replica_read not available on most recently updated objects in each PG
... Samuel Just
10:47 PM Bug #65013: replica_read not available on most recently updated objects in each PG
A 4G image will be comprised of 1024 4M objects. The above script creates 64 pgs. The best possible case with the c... Samuel Just
10:26 PM Bug #65013: replica_read not available on most recently updated objects in each PG
The following tests are done on ad0cb1eb1609caa646abbbdf6ebccd4dfda0b417 from https://github.com/ceph/ceph/pull/56180... Samuel Just
08:57 PM Bug #65013: replica_read not available on most recently updated objects in each PG
Is that branch https://github.com/ceph/ceph/pull/56180/files ad0cb1eb1609caa646abbbdf6ebccd4dfda0b417 ? Samuel Just
08:40 PM Bug #65013: replica_read not available on most recently updated objects in each PG
From that log line, looks like last_update is 162'9763 and mlcod is 162'9761 -- that looks like 2 updates behind? Samuel Just
08:25 PM Bug #65013 (New): replica_read not available on most recently updated objects in each PG
In my dev environment, when trying to read from replica (leveraging crush_location), the osd rejects the requests and... Yehuda Sadeh
09:57 PM Backport #65014 (New): reef: rados/singleton: application not enabled on pool 'rbd'
Backport Bot
09:51 PM Bug #64725 (Pending Backport): rados/singleton: application not enabled on pool 'rbd'
Laura Flores
03:04 PM Bug #62209: can not promote object at readonly tier mode
you can refer to this:https://github.com/ceph/ceph/pull/52672 Jack Lv
02:59 PM Bug #64869: rados/thrash: slow reservation response from 1 (115547ms) in cluster log
Updating the backport field per today's core-sync. Radoslaw Zarzynski
11:41 AM Bug #64869 (Fix Under Review): rados/thrash: slow reservation response from 1 (115547ms) in clust...
Ronen Friedman
02:34 PM Bug #65008 (New): EC pool - PGs down even if min size is satisfied
Hello I've been evaluating erasure coding ceph setup with following requirements:
- k+m 7+5
- 3 racks
- 5 hosts ...
Bartosz Rabiega
12:35 PM Bug #64670: LibRadosAioEC.RoundTrip2 hang and pkill
... Nitzan Mordechai
10:02 AM Bug #64866 (Fix Under Review): rados/test.sh: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.Wa...
Nitzan Mordechai
09:24 AM Bug #64866: rados/test.sh: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1 failed
after checking deeply, only watch_check will give us the timeout return code.
the client log also shows that we wa...
Nitzan Mordechai
03:25 AM Bug #64997 (Need More Info): There is always an osd process that takes up high cpu
refer to: https://github.com/rook/rook/issues/13901 cao yong
03:24 AM Bug #63891: mon/AuthMonitor: fix potential repeated global_id
does anyone see this issue? Min Shi

03/19/2024

02:53 PM Backport #64396 (Resolved): quincy: mon: health store size growing infinitely
Konstantin Shalygin
02:47 PM Backport #64396: quincy: mon: health store size growing infinitely
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/55549
merged
Yuri Weinstein
02:45 PM Backport #63843: quincy: Add health error if one or more OSDs registered v1/v2 public ip addresse...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/55698
merged
Yuri Weinstein
01:51 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
During code reading found that @--force@ allows to overrule the stripe alignment rules. @--yes-i-really-mean-it@ is p... Radoslaw Zarzynski
01:14 PM Bug #64869 (In Progress): rados/thrash: slow reservation response from 1 (115547ms) in cluster log
The problem is: how to differentiate between instances where one of the scrub reservation messages is queued as 'wait... Ronen Friedman
12:48 PM Bug #64866 (In Progress): rados/test.sh: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNo...

the client log shows cookie 94576533816384...
Nitzan Mordechai
12:20 PM Bug #64978 (New): from rgw suite: HEALTH_WARN Reduced data availability: 1 pg inactive, 1 pg peering
from https://qa-proxy.ceph.com/teuthology/cbodley-2024-03-19_01:03:50-rgw-wip-cbodley-testing-distro-default-smithi/7... Casey Bodley
05:28 AM Bug #64854 (Fix Under Review): decoding chunk_refs_by_hash_t return wrong values
Nitzan Mordechai
02:20 AM Bug #64824: mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err == 0)
Radoslaw Zarzynski wrote:
> Would need logs with @debug_mon=20@ and @debug_rocksdb=20@ from period before the assert...
yite gu
02:01 AM Bug #62209: can not promote object at readonly tier mode
In the scenario of cache tiering, are there any other solutions? Arthur ho

03/18/2024

07:29 PM Bug #64972: qa: "ceph tell 4.3a deep-scrub" command not found
and https://github.com/ceph/ceph/pull/54214 Patrick Donnelly
07:29 PM Bug #64972 (New): qa: "ceph tell 4.3a deep-scrub" command not found
... Patrick Donnelly
07:24 PM Bug #63967 (Resolved): qa/tasks/ceph.py: "ceph tell <pgid> deep_scrub" fails
Patrick Donnelly
06:56 PM Bug #64646: ceph osd pool rmsnap clone object leak
In QA. Radoslaw Zarzynski
06:56 PM Bug #64854: decoding chunk_refs_by_hash_t return wrong values
Hmm, I guess I saw a PR for that. Radoslaw Zarzynski
06:55 PM Bug #64824: mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err == 0)
Would need logs with @debug_mon=20@ and @debug_rocksdb=20@ from period before the assertion. Radoslaw Zarzynski
06:51 PM Bug #64670: LibRadosAioEC.RoundTrip2 hang and pkill
Nothing new but still observing. Bump up. Radoslaw Zarzynski
06:50 PM Bug #64866: rados/test.sh: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify3/1 failed
Hi Nitzan! Would you mind taking a look? Radoslaw Zarzynski
06:49 PM Bug #64863: rados/thrash-old-clients: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c in clu...
Hmm, I think I saw Laura's PR for @MON_DOWN@. Radoslaw Zarzynski
06:44 PM Bug #58436: ceph cluster log reporting log level in numeric format for the clog messages
Do we need to backport? Radoslaw Zarzynski
06:43 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
In QA. Radoslaw Zarzynski
06:36 PM Bug #64558: librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove
Sent to QA. Radoslaw Zarzynski
06:28 PM Bug #57782 (Fix Under Review): [mon] high cpu usage by fn_monstore thread
The fix awaits QA. Radoslaw Zarzynski
06:26 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
Passed QA. Radoslaw Zarzynski
06:25 PM Bug #64938: Pool created with single PG splits into many on single OSD causes OSD to hit max_pgs_...
Reviewed. Radoslaw Zarzynski
06:21 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
https://github.com/ceph/ceph/pull/54492 merged Yuri Weinstein
06:12 PM Bug #64968 (Fix Under Review): mon: MON_DOWN warnings when mons are first booting
Patrick Donnelly
04:11 PM Bug #64968 (Pending Backport): mon: MON_DOWN warnings when mons are first booting
... Patrick Donnelly
05:58 PM Bug #56393: failed to complete snap trimming before timeout
Hi Matan,
would you mind taking a look? Not a high priority.
Radoslaw Zarzynski
01:53 PM Bug #56393: failed to complete snap trimming before timeout
/a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603381/... Aishwarya Mathuria
05:52 PM Bug #64347: src/osd/PG.cc: FAILED ceph_assert(!bad || !cct->_conf->osd_debug_verify_cached_snaps)
In QA. Radoslaw Zarzynski
04:26 PM Bug #64347: src/osd/PG.cc: FAILED ceph_assert(!bad || !cct->_conf->osd_debug_verify_cached_snaps)
/a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603610/ Aishwarya Mathuria
05:48 PM Bug #64917: SnapMapperTest.CheckObjectKeyFormat object key changed
I think this is already tackled by https://github.com/ceph/ceph/pull/56142.
Assigning to Matan for confirmation. I...
Radoslaw Zarzynski
04:31 PM Bug #64917: SnapMapperTest.CheckObjectKeyFormat object key changed
/a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603418/
/a/yuriw-2024-03...
Aishwarya Mathuria
05:43 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
Bump up. Radoslaw Zarzynski
05:12 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
/a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603349 Aishwarya Mathuria
05:41 PM Bug #53240: full-object read crc is mismatch, because truncate modify oi.size and forget to clear...
In QA. Radoslaw Zarzynski
05:40 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
I'm going to propose a patch removing the @--force@. Radoslaw Zarzynski
05:39 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Bump up. Radoslaw Zarzynski

03/16/2024

06:22 PM Backport #64406 (Resolved): reef: Failed to encode map X with expected CRC
Ilya Dryomov

03/15/2024

10:49 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
I recently created a draft PR https://github.com/ceph/ceph/pull/56233/, adding the additional arguments peering_bucke... Kamoltat (Junior) Sirivadhna
10:14 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
WIP PR: https://github.com/ceph/ceph/pull/56233 Kamoltat (Junior) Sirivadhna
09:02 AM Bug #56393: failed to complete snap trimming before timeout
/a/yuriw-2024-03-13_19:25:03-rados-wip-yuri6-testing-2024-03-12-0858-distro-default-smithi/7597884
/a/yuriw-2024-03-...
Aishwarya Mathuria
08:06 AM Bug #64942 (New): rados/verify: valgrind reports "Invalid read of size 8" error.
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587319
/a/yuriw-2024-03-...
Sridhar Seshasayee
01:01 AM Bug #64938 (Fix Under Review): Pool created with single PG splits into many on single OSD causes ...
Prashant D
12:51 AM Bug #64938 (Fix Under Review): Pool created with single PG splits into many on single OSD causes ...
With autoscale mode ON, if a new pool is created without specifying the pg_num/pgp_num values then the pool gets crea... Prashant D

03/14/2024

02:18 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
peering_crush_bucket_[count|target|barrier] Kamoltat (Junior) Sirivadhna
01:47 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
Don't forget that there is also pg_pool_t::peering_crush_bucket_count that directly requires a minimum number of high... Greg Farnum
12:38 PM Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode
My plan current script to setup a vstart to test out the above hypothesis:... Kamoltat (Junior) Sirivadhna
01:17 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
/a/yuriw-2024-03-13_19:26:09-rados-wip-yuri-testing-2024-03-12-1240-reef-distro-default-smithi/7598397
/a/yuriw-2024...
Aishwarya Mathuria
12:57 PM Backport #63559: reef: Heartbeat crash in osd
/a/yuriw-2024-03-13_19:26:09-rados-wip-yuri-testing-2024-03-12-1240-reef-distro-default-smithi/7598201 Aishwarya Mathuria
11:00 AM Bug #64917 (Fix Under Review): SnapMapperTest.CheckObjectKeyFormat object key changed
/a/yuriw-2024-03-12_18:29:22-rados-wip-yuri8-testing-2024-03-11-1138-distro-default-smithi/7594695... Nitzan Mordechai

03/13/2024

04:44 PM Bug #57782: [mon] high cpu usage by fn_monstore thread
Hi,
Thanks to this article https://blog.palark.com/sre-troubleshooting-ceph-systemd-containerd/, I think root caus...
Peter Goron
01:34 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
Ilya Dryomov wrote:
> No, snap2 would continue to exist and one should be able to "rollback" to it. Rollback is rea...
Matan Breizman
10:16 AM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
Matan Breizman wrote:
> Ilya Dryomov wrote:
> > Put another way: rollback is a destructive operation. One isn't ex...
Ilya Dryomov
10:00 AM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
Ilya Dryomov wrote:
> Put another way: rollback is a destructive operation. One isn't expected to be able to go bac...
Matan Breizman
01:15 PM Bug #64897 (New): unittest_ceph_crypto - valgrind failed
running unit-test with valgraind:
ctest -R unittest_ceph_crypto -T memcheck...
Nitzan Mordechai
01:14 PM Bug #64895 (New): unittest_perf_counters_cache - valgrind failed

running unit-test with valgraind:
ctest -R unittest_perf_counters_cache -T memcheck...
Nitzan Mordechai
01:13 PM Bug #64893 (New): unittest_bufferlist - valgrind failed
running unit-test with valgraind:
ctest -R unittest_bufferlist -T memcheck...
Nitzan Mordechai
01:11 PM Bug #64892 (New): unittest_ipaddr - valgrind failed
running unit-test with valgraind:
ctest -R unittest_ipaddr -T memcheck...
Nitzan Mordechai
01:08 PM Bug #64891 (New): unittest_admin_socket - valgrind failed
running unit-test with valgraind:
ctest -R unittest_admin_socket -T memcheck...
Nitzan Mordechai
08:08 AM Backport #64881 (In Progress): reef: singleton/ec-inconsistent-hinfo.yaml: Include a possible ben...
Sridhar Seshasayee
07:34 AM Backport #64881 (Resolved): reef: singleton/ec-inconsistent-hinfo.yaml: Include a possible benign...
https://github.com/ceph/ceph/pull/56151 Backport Bot
07:32 AM Bug #64314 (Resolved): cluster log: Cluster log level string representation missing in the cluste...
Sridhar Seshasayee
07:30 AM Fix #64573 (Pending Backport): singleton/ec-inconsistent-hinfo.yaml: Include a possible benign cl...
Sridhar Seshasayee
05:26 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
Brad Hubbard wrote:
> Nitzan Mordechai wrote:
> > now the segfault happens on check_one function where we also have...
Nitzan Mordechai
02:27 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
Nitzan Mordechai wrote:
> now the segfault happens on check_one function where we also have pre-regex to truncate th...
Brad Hubbard

03/12/2024

08:29 PM Bug #64725 (Fix Under Review): rados/singleton: application not enabled on pool 'rbd'
Laura Flores
01:48 PM Bug #64725: rados/singleton: application not enabled on pool 'rbd'
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587549 Sridhar Seshasayee
06:21 PM Bug #58436: ceph cluster log reporting log level in numeric format for the clog messages
https://github.com/ceph/ceph/pull/49730 merged Yuri Weinstein
05:03 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
Ilya Dryomov wrote:
> This is because rollback discards all changes made to image HEAD and makes it identical to the...
Ilya Dryomov
04:30 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
Matan Breizman wrote:
> the suggested change here suggests that the disk usage should actually be:
> NAME ...
Ilya Dryomov
04:13 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
Hi Matan,
We are able to roll back back and forth between arbitrary snapshots and the suggested change in https://...
Ilya Dryomov
02:24 PM Bug #64735 (Need More Info): OSD/MON: rollback_to snap the latest overlap is not right
We should first understand whether this is a bug or intentional behavior, given the following order of operations:
<...
Matan Breizman
03:35 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
/a/yuriw-2024-03-08_16:19:51-rados-wip-yuri2-testing-2024-03-01-1606-distro-default-smithi/7587184 Matan Breizman
01:20 PM Bug #64437: qa/standalone/scrub/osd-scrub-repair.sh: TEST_repair_stats_ec: test 26 = 13
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587334 Sridhar Seshasayee
03:33 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
/a/yuriw-2024-03-08_16:19:51-rados-wip-yuri2-testing-2024-03-01-1606-distro-default-smithi/7587174/ Matan Breizman
01:18 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587531
/a/yuriw-2024-03-...
Sridhar Seshasayee
02:15 PM Bug #64869 (Pending Backport): rados/thrash: slow reservation response from 1 (115547ms) in clust...
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587833
The cluster log...
Sridhar Seshasayee
01:27 PM Bug #64866 (Fix Under Review): rados/test.sh: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.Wa...
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587349
There was a sim...
Sridhar Seshasayee
01:19 PM Bug #62832 (Resolved): common: config_proxy deadlock during shutdown (and possibly other times)
Patrick Donnelly
01:19 PM Backport #63457 (Resolved): quincy: common: config_proxy deadlock during shutdown (and possibly o...
Patrick Donnelly
12:44 PM Bug #64863 (New): rados/thrash-old-clients: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c ...
The following tests in the rados suite failed with the warning:
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testi...
Sridhar Seshasayee
12:21 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587455 Sridhar Seshasayee
11:26 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
now the segfault happens on check_one function where we also have pre-regex to truncate the output that causing segfa... Nitzan Mordechai
07:55 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
according to the console logs:... Nitzan Mordechai
04:22 AM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
Radoslaw Zarzynski wrote:
> The fix isn't merged yet which could explain the reoccurrence above
The run mentioned...
Sridhar Seshasayee
08:28 AM Bug #64514 (Duplicate): LibRadosTwoPoolsPP.PromoteSnapScrub test failed
Closing as this is a duplicate. Matan Breizman
08:27 AM Bug #64646: ceph osd pool rmsnap clone object leak
Radoslaw Zarzynski wrote:
> Need a squid backport as well.
Awaiting main merge (https://github.com/ceph/ceph/pull...
Matan Breizman
06:31 AM Bug #64854 (Fix Under Review): decoding chunk_refs_by_hash_t return wrong values
When running ceph dencoder test on clang-14 compiled JSON dump of chunk_refs_by_hash_t will show:... Nitzan Mordechai
06:02 AM Bug #56393: failed to complete snap trimming before timeout
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587430
/a/yuriw-2024-03-...
Sridhar Seshasayee
02:08 AM Bug #64824: mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err == 0)
Radoslaw Zarzynski wrote:
> Looks like a mon-scrub failure. This can be caused by a HW issue or by a corruption.
> ...
yite gu

03/11/2024

08:55 PM Bug #64438: NeoRadosWatchNotify.WatchNotifyTimeout times out along with FAILED ceph_assert(op->se...
Fails here in the neorados test:... Laura Flores
07:18 PM Feature #64849 (New): rados: Support read_from_replica everywhere
The Objecter supports read-from-replica if you pass in the LOCALIZE_READS flag. If we want to serve all read IO from ... Greg Farnum
06:40 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
There is PR posted: https://github.com/ceph/ceph/pull/55991 Ilya Dryomov
06:06 PM Bug #64735: OSD/MON: rollback_to snap the latest overlap is not right
Hi Matan! Would you mind taking a look? Radoslaw Zarzynski
06:18 PM Bug #64670: LibRadosAioEC.RoundTrip2 hang and pkill
Bump up. Radoslaw Zarzynski
06:16 PM Bug #54182: OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
Review in progress. Radoslaw Zarzynski
06:15 PM Bug #64514: LibRadosTwoPoolsPP.PromoteSnapScrub test failed
Bump up. Radoslaw Zarzynski
06:09 PM Bug #64725: rados/singleton: application not enabled on pool 'rbd'
Fix is to add this to the ignorelist. Laura Flores
06:02 PM Bug #64646: ceph osd pool rmsnap clone object leak
Need a squid backport as well. Radoslaw Zarzynski
06:00 PM Bug #64824 (Need More Info): mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err =...
Looks like a mon-scrub failure. This can be caused by a HW issue or by a corruption.
Is there a sign of malfunctioni...
Radoslaw Zarzynski
08:24 AM Bug #64824 (Need More Info): mon: ceph-16.2.14/src/mon/Monitor.cc: 5661: FAILED ceph_assert(err =...
-1> 2024-03-11T02:29:03.716+0000 7f6600eaf700 -1 /root/rpmbuild/BUILD/ceph-16.2.14/src/mon/Monitor.cc: In functio... yite gu
05:55 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
The fix isn't merged yet which could explain the reoccurrence above Radoslaw Zarzynski
02:45 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587684
/a/yuriw-2024-03-...
Sridhar Seshasayee
05:51 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
Bump up. Radoslaw Zarzynski
05:50 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
1. I'm still nor sure we need @--force@. 2. If it turns justified, shouldn't it be @--yes-i-really-really-mean-it@? Radoslaw Zarzynski
05:42 PM Bug #64314: cluster log: Cluster log level string representation missing in the cluster logs.
Still in testing. Radoslaw Zarzynski

03/10/2024

07:37 AM Bug #64657 (Rejected): Ceph test cases starting cluster not waiting for OSDs to join fully
茁野 鲍 Thanks for letting us know!
i'll reject that bug
Nitzan Mordechai

03/08/2024

11:50 PM Bug #64804 (Duplicate): gcc-13 apparently breaks SafeTimer
Samuel Just
04:07 AM Bug #64804 (Duplicate): gcc-13 apparently breaks SafeTimer
https://github.com/ceph/ceph/pull/55886
Probably related to https://bugzilla.redhat.com/show_bug.cgi?id=2241339 .
Samuel Just
10:19 AM Bug #62338: osd: choose_async_recovery_ec may select an acting set < min_size
Hello again.
Apparently I got a tiny little bit too excited.
I tested the case described above with 16.2.15 and...
Bartosz Rabiega
12:26 AM Bug #64802 (Fix Under Review): rados: generalize stretch mode pg temp handling to be usable witho...
PeeringState::calc_replicated_acting_stretch encodes special behavior for stretch clusters which prohibits the primar... Samuel Just

03/07/2024

12:17 PM Bug #64788 (Fix Under Review): EpollDriver::del_event() crashes when the nic is unplugged
Kefu Chai
11:48 AM Bug #64788 (Fix Under Review): EpollDriver::del_event() crashes when the nic is unplugged
librbd uses msgr to talk to its Ceph cluster. if the client's nic is hot unplugged, there is chance that @EpollDriver... Kefu Chai
09:04 AM Bug #64657: Ceph test cases starting cluster not waiting for OSDs to join fully
Thank you for addressing this issue. I appreciate your effort in fixing the issue.
I apologize for the oversight o...
茁野 鲍

03/06/2024

07:15 PM Bug #64726: LibRadosAioEC.MultiWritePP hang and pkill
... Radoslaw Zarzynski
07:14 PM Bug #64726: LibRadosAioEC.MultiWritePP hang and pkill
I think the direct reason behind the test's hang is the death of @osd.5@:... Radoslaw Zarzynski
08:22 AM Bug #64726: LibRadosAioEC.MultiWritePP hang and pkill
removed the "Related issues" Nitzan Mordechai
08:21 AM Bug #64726: LibRadosAioEC.MultiWritePP hang and pkill
last op that LibRadosAioEC.MultiWritePP trying to do is writing the oid_MultiWritePP_ obj:... Nitzan Mordechai
03:20 PM Bug #63389: Failed to encode map X with expected CRC
The problem came because of a commit that introduced the commented-out check for @SERVER_REEF@ in @OSDMap::encode()@.... Radoslaw Zarzynski
08:21 AM Bug #64735 (Need More Info): OSD/MON: rollback_to snap the latest overlap is not right
when rollback_to snap, we use the latest clone's current overlap to intersection_of older snapshot's clone overlap.
...
dian xing
07:43 AM Bug #62338: osd: choose_async_recovery_ec may select an acting set < min_size
Hello. Just FYI, this fixes a very nasty issue in my EC setup.
Here are some details.
The EC setup and crush rule...
Bartosz Rabiega

03/05/2024

10:50 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
/a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581448 Laura Flores
10:47 PM Bug #64726 (New): LibRadosAioEC.MultiWritePP hang and pkill
/a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581519... Laura Flores
10:42 PM Bug #55141: thrashers/fastread: assertion failure: rollback_info_trimmed_to == head
/a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581575 Laura Flores
10:33 PM Bug #64725 (Pending Backport): rados/singleton: application not enabled on pool 'rbd'
/a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581526... Laura Flores
10:24 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
/a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581722
/a/yuriw-2024-03-04_20:52:58-rados-ree...
Laura Flores
10:24 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
Update on this: The PR is ready to be reviewed again. Laura Flores
01:04 PM Bug #64514 (In Progress): LibRadosTwoPoolsPP.PromoteSnapScrub test failed
Matan Breizman
01:04 PM Bug #64514: LibRadosTwoPoolsPP.PromoteSnapScrub test failed
This may be related to bug fixed in https://tracker.ceph.com/issues/64347. However, the outcome here is different whi... Matan Breizman
08:08 AM Bug #64657: Ceph test cases starting cluster not waiting for OSDs to join fully
Without the full log it will be hard to tell if the symptoms that I see are exactly as 茁野 鲍 see, but we are missing t... Nitzan Mordechai

03/04/2024

09:19 PM Backport #63526 (Resolved): quincy: crash: int OSD::shutdown(): assert(end_time - start_time_func...
Igor Fedotov
08:45 PM Bug #61140: crash: int OSD::shutdown(): assert(end_time - start_time_func < cct->_conf->osd_fast_...
https://github.com/ceph/ceph/pull/55134 merged Yuri Weinstein
08:07 PM Backport #58337 (Rejected): pacific: mon-stretched_cluster: degraded stretched mode lead to Monit...
Konstantin Shalygin
08:06 PM Backport #58337 (Duplicate): pacific: mon-stretched_cluster: degraded stretched mode lead to Moni...
pacific is EOL Konstantin Shalygin
08:07 PM Bug #59271 (Resolved): mon: FAILED ceph_assert(osdmon()->is_writeable())
Konstantin Shalygin
08:07 PM Backport #59700 (Rejected): pacific: mon: FAILED ceph_assert(osdmon()->is_writeable())
pacific is EOL Konstantin Shalygin
08:06 PM Bug #57017 (Resolved): mon-stretched_cluster: degraded stretched mode lead to Monitor crash
Konstantin Shalygin
08:00 PM Bug #64657: Ceph test cases starting cluster not waiting for OSDs to join fully
Hi Nitzan! Would you mind taking a look? Radoslaw Zarzynski
07:59 PM Bug #64637: LeakPossiblyLost in BlueStore::_do_write_small() in osd
Looks like typical symptom of (CPU/memory) starvation. Radoslaw Zarzynski
07:59 PM Bug #64646: ceph osd pool rmsnap clone object leak
note from bug scrub: reviewed, went to QA. Radoslaw Zarzynski
07:58 PM Bug #64514: LibRadosTwoPoolsPP.PromoteSnapScrub test failed
Bump up. Radoslaw Zarzynski
07:56 PM Bug #54182: OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
note from bug scrub: reviewed, changes requested. Radoslaw Zarzynski
07:55 PM Bug #64670: LibRadosAioEC.RoundTrip2 hang and pkill
Might be something new. Bump up and observe. Radoslaw Zarzynski
07:53 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
note from scrub: the PR is approved. Needs-qa. Radoslaw Zarzynski
07:51 PM Bug #64674 (Resolved): src/scripts/ceph-backport.sh
I guess we don't need to backport anything. Radoslaw Zarzynski
07:49 PM Bug #64258: osd/PrimaryLogPG.cc: FAILED ceph_assert(inserted)
note from bug scrub: reviewed. Radoslaw Zarzynski
01:40 PM Bug #64258 (Fix Under Review): osd/PrimaryLogPG.cc: FAILED ceph_assert(inserted)
Nitzan Mordechai
07:49 PM Bug #64695: Aborted signal starting in AsyncConnection::send_message()
... Radoslaw Zarzynski
05:39 PM Bug #64695 (New): Aborted signal starting in AsyncConnection::send_message()
/a/yuriw-2024-03-01_16:47:30-rados-wip-yuri11-testing-2024-02-28-0950-reef-distro-default-smithi/7577623... Laura Flores
07:44 PM Bug #64314: cluster log: Cluster log level string representation missing in the cluster logs.
Still in QA. Bump up. Radoslaw Zarzynski
07:36 PM Bug #64333: PG autoscaler tuning => catastrophic ceph cluster crash
Thank you very, very much for the scenario! This throws a lot of light on what has happened.
I'm not sure whether th...
Radoslaw Zarzynski
07:32 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
note from bug scrub: Aishwarya is addressing the review's comments. Radoslaw Zarzynski
06:27 PM Bug #53240: full-object read crc is mismatch, because truncate modify oi.size and forget to clear...
The fix goes into QA. Radoslaw Zarzynski
12:18 AM Bug #63066: rados/objectstore - application not enabled on pool '.mgr'
/a/yuriw-2024-02-28_15:47:41-rados-wip-yuri4-testing-2024-02-27-1111-quincy-distro-default-smithi/7575815
/a/yuriw-2...
Laura Flores

03/01/2024

11:19 PM Bug #64674: src/scripts/ceph-backport.sh
revert PR: https://github.com/ceph/ceph/pull/55884
will fix this
Kamoltat (Junior) Sirivadhna
11:16 PM Bug #64674 (Resolved): src/scripts/ceph-backport.sh
src/script/ceph-backport.sh: line 1737: ../../../ceph/.github/pull_request_template.md: No such file or directory
...
Kamoltat (Junior) Sirivadhna
11:01 PM Backport #64673 (In Progress): quincy: test_pool_min_size: AssertionError: wait_for_clean: failed...
Kamoltat (Junior) Sirivadhna
10:58 PM Backport #64673 (In Progress): quincy: test_pool_min_size: AssertionError: wait_for_clean: failed...
https://github.com/ceph/ceph/pull/55882 Backport Bot
10:58 PM Backport #64672 (Rejected): pacific: test_pool_min_size: AssertionError: wait_for_clean: failed b...
Backport Bot
10:58 PM Backport #64671 (New): reef: test_pool_min_size: AssertionError: wait_for_clean: failed before ti...
Backport Bot
10:55 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
/a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576306 Laura Flores
10:54 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
/a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576311 Laura Flores
10:53 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
/a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576314 Laura Flores
09:30 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
/a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576298 Laura Flores
10:53 PM Bug #59172 (Pending Backport): test_pool_min_size: AssertionError: wait_for_clean: failed before ...
Kamoltat (Junior) Sirivadhna
10:51 PM Bug #64670 (New): LibRadosAioEC.RoundTrip2 hang and pkill
/a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576303... Laura Flores
12:11 PM Backport #64649 (In Progress): quincy: min_last_epoch_clean is not updated, causing osdmap to be ...
Mykola Golub
12:00 PM Backport #64650 (In Progress): reef: min_last_epoch_clean is not updated, causing osdmap to be un...
Mykola Golub
11:44 AM Backport #64651 (In Progress): squid: min_last_epoch_clean is not updated, causing osdmap to be u...
Mykola Golub
09:19 AM Bug #64657: Ceph test cases starting cluster not waiting for OSDs to join fully
eg. for reproduce the issue:
diff slicer-src/src/test/osd/safe-to-destroy.sh
function run() {
@@ -32,18 +32,3...
茁野 鲍
09:12 AM Bug #64657 (Rejected): Ceph test cases starting cluster not waiting for OSDs to join fully
I've identified an issue in the Ceph testing framework where, after starting a temporary cluster using functions like... 茁野 鲍

02/29/2024

09:25 PM Backport #64406: reef: Failed to encode map X with expected CRC
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/55712
merged
Yuri Weinstein
09:00 PM Bug #64637: LeakPossiblyLost in BlueStore::_do_write_small() in osd
Laura Flores wrote:
> /a/yuriw-2024-02-22_21:33:08-rados-wip-yuri8-testing-2024-02-22-0734-reef-distro-default-smith...
Laura Flores
09:00 PM Bug #64637 (New): LeakPossiblyLost in BlueStore::_do_write_small() in osd
Laura Flores
08:57 PM Bug #64637 (Duplicate): LeakPossiblyLost in BlueStore::_do_write_small() in osd
Laura Flores
08:54 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
/a/yuriw-2024-02-28_22:39:54-rados-wip-yuri8-testing-2024-02-22-0734-reef-distro-default-smithi/7576288 Laura Flores
08:42 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
/a/yuriw-2024-02-28_22:39:54-rados-wip-yuri8-testing-2024-02-22-0734-reef-distro-default-smithi/7576292
Laura Flores
06:26 PM Backport #64651 (In Progress): squid: min_last_epoch_clean is not updated, causing osdmap to be u...
https://github.com/ceph/ceph/pull/55865 Backport Bot
06:15 PM Backport #64650 (In Progress): reef: min_last_epoch_clean is not updated, causing osdmap to be un...
https://github.com/ceph/ceph/pull/55867 Backport Bot
06:15 PM Backport #64649 (In Progress): quincy: min_last_epoch_clean is not updated, causing osdmap to be ...
https://github.com/ceph/ceph/pull/55868 Backport Bot
06:08 PM Bug #63883 (Pending Backport): min_last_epoch_clean is not updated, causing osdmap to be unable t...
Mykola Golub
02:46 PM Bug #64646 (Pending Backport): ceph osd pool rmsnap clone object leak
There are 2 ways to remove pool snaps, rados tool or mon command (ceph osd pool rmsnap).
It seems that the monitor c...
Matan Breizman
07:02 AM Bug #53342: Exiting scrub checking -- not all pgs scrubbed
Radoslaw Zarzynski wrote:
> Ronen, do we need any backporting?
No. The fix (55478) made it in time for Squid.
Ronen Friedman
 

Also available in: Atom