Project

General

Profile

Activity

From 08/31/2023 to 09/29/2023

09/29/2023

10:46 PM Cleanup #62851 (Closed): OSD::ShardedOpWQ::stop_for_fast_shutdown(): warning: comparison of integ...
Closing, as the issue that addresses this is attached to https://tracker.ceph.com/issues/61140. Laura Flores
08:57 PM Bug #63014 (Fix Under Review): osd perf : Effects of osd_op_num_shard_[hdd/ssd] on op latency an...
Neha Ojha
03:46 PM Bug #48896 (Fix Under Review): osd/OSDMap.cc: FAILED ceph_assert(osd_weight.count(i.first))
Mykola Golub

09/28/2023

05:56 PM Bug #63029 (Fix Under Review): Upmap balancer: output of "osdmaptool <file> --upmap" says no opti...
Laura Flores
05:47 PM Bug #63029: Upmap balancer: output of "osdmaptool <file> --upmap" says no optimizations, even tho...
I have looked into the issue. The problem exists on Reef, but does not reproduce in Quincy (17.2.6).
When I ran th...
Laura Flores
05:46 PM Bug #63029 (Pending Backport): Upmap balancer: output of "osdmaptool <file> --upmap" says no opti...
Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2241104 Laura Flores
09:07 AM Fix #63015 (Resolved): mClockScheduler: Add ability to handle high priority operations
Add high priority queue to the mClockScheduler that allows scheduling items
with higher priority before items in the...
Sridhar Seshasayee
07:01 AM Bug #63014: osd perf : Effects of osd_op_num_shard_[hdd/ssd] on op latency and bandwidth when us...
test-case-9 : osd_op_num_shards_hdd = 1 / osd_op_num_threads_per_shard_hdd = 5 (new value), rados bench -t 100... jianwei zhang
06:59 AM Bug #63014: osd perf : Effects of osd_op_num_shard_[hdd/ssd] on op latency and bandwidth when us...
test-case-8 : osd_op_num_shards_hdd = 1 / osd_op_num_threads_per_shard_hdd = 5 (new value), rados bench -t 10... jianwei zhang
06:50 AM Bug #63014: osd perf : Effects of osd_op_num_shard_[hdd/ssd] on op latency and bandwidth when us...
test-case-7 : osd_op_num_shards_hdd = 1 / osd_op_num_threads_per_shard_hdd = 5 (new value), rados bench -t 5... jianwei zhang
06:47 AM Bug #63014: osd perf : Effects of osd_op_num_shard_[hdd/ssd] on op latency and bandwidth when us...
test-case-5 : osd_op_num_shards_hdd = 1 / osd_op_num_threads_per_shard_hdd = 5 (new value), rados bench -t 1... jianwei zhang
06:44 AM Bug #63014: osd perf : Effects of osd_op_num_shard_[hdd/ssd] on op latency and bandwidth when us...
test-case-5 : osd_op_num_shards_hdd = 5 / osd_op_num_threads_per_shard_hdd = 1 (default value), rados bench -t 1
<pr...
jianwei zhang
06:33 AM Bug #63014: osd perf : Effects of osd_op_num_shard_[hdd/ssd] on op latency and bandwidth when us...
test-case-3 : osd_op_num_shards_hdd = 5 / osd_op_num_threads_per_shard_hdd = 1 (default value), rados bench -t 10
<p...
jianwei zhang
06:13 AM Bug #63014: osd perf : Effects of osd_op_num_shard_[hdd/ssd] on op latency and bandwidth when us...
* solution:... jianwei zhang
05:43 AM Bug #63014 (Fix Under Review): osd perf : Effects of osd_op_num_shard_[hdd/ssd] on op latency an...
background:
* add patch1: https://github.com/ceph/ceph/pull/53417 (ref: https://tracker.ceph.com/issues/62812)
* ad...
jianwei zhang
02:28 AM Bug #62704: Cephfs different monitor has a different LAST_DEEP_SCRUB state
Radoslaw Zarzynski wrote:
> Hi! Could you please elaborate on the symptom? Was the integer values absolutely the sam...
fuchen ma

09/27/2023

06:34 PM Bug #63008 (Need More Info): rados: race condition in rados_ping_monitor can cause segmentation f...
In the CI of go-ceph we noticed a sporadic segmentation fault in the test for rados_ping_monitor. (See https://github... Sven Anderson

09/26/2023

11:19 PM Backport #62995 (In Progress): quincy: Add detail description for delayed op in osd log file
https://github.com/ceph/ceph/pull/53690 Prashant D
11:04 PM Backport #62995 (In Progress): quincy: Add detail description for delayed op in osd log file
We donot have the tracker for PR#50531 so code has not been backported to reef and other old releases. The details fo... Prashant D
11:05 PM Backport #62996 (Resolved): pacific: Add detail description for delayed op in osd log file
We donot have the tracker for PR#50531 so code has not been backported to reef and other old releases. The details fo... Prashant D
09:26 PM Backport #62993 (In Progress): reef: Add detail description for delayed op in osd log file
Prashant D
09:26 PM Backport #62993: reef: Add detail description for delayed op in osd log file
https://github.com/ceph/ceph/pull/53688 Prashant D
09:15 PM Backport #62993 (Resolved): reef: Add detail description for delayed op in osd log file
We donot have the tracker for PR#50531 so code has not been backported to reef and other old releases. The details fo... Prashant D
06:53 PM Bug #59172 (Fix Under Review): test_pool_min_size: AssertionError: wait_for_clean: failed before ...
Kamoltat (Junior) Sirivadhna
06:48 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
/a/lflores-2023-09-08_18:08:19-rados-wip-lflores-testing-2023-09-08-1504-reef-distro-default-smithi/7391363 Laura Flores
06:43 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
/a/lflores-2023-09-08_18:08:19-rados-wip-lflores-testing-2023-09-08-1504-reef-distro-default-smithi/7391156 Laura Flores
06:39 PM Bug #62992: Heartbeat crash in reset_timeout and clear_timeout
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?var-sig_v2=e037b11ac9f7a2e20c20fcf65d4439bf3f... Laura Flores
06:38 PM Bug #62992 (Pending Backport): Heartbeat crash in reset_timeout and clear_timeout
/a/lflores-2023-09-08_18:08:19-rados-wip-lflores-testing-2023-09-08-1504-reef-distro-default-smithi/7391228... Laura Flores
02:55 PM Bug #62836: CEPH zero iops after upgrade to Reef and manual read balancer
Radoslaw Zarzynski wrote:
> Hello Mosharaf! Thanks for the update. It looks to the @rm-pg-upmap-primary@ has workaro...
Laura Flores
12:03 PM Bug #62983 (Need More Info): OSD/MON: purged snap keys are not merged
purged_snap_ keys stored in the monitor for each snapshots removed.
The keys are merged on contiguous snap id interv...
Matan Breizman
10:35 AM Feature #62981 (Pending Backport): Share mon's purged snapshots with OSD
purged snapshots are marked in the monitor as `purged_snap_` keys. See OSDMonitor::make_purged_snap_key().
While in ...
Matan Breizman
08:49 AM Bug #62965: osd perf error
Radoslaw Zarzynski wrote:
> Is the junky duration
>
> [...]
>
> the problem here? If so, there is was a fix: h...
zhipeng li
01:31 AM Bug #62965: osd perf error
zhipeng li wrote:
> Radoslaw Zarzynski wrote:
> > Is the junky duration
> >
> > [...]
> >
> > the problem her...
zhipeng li
01:30 AM Bug #62965: osd perf error
zhipeng li wrote:
> in performance testing ,many small file read latency may bigger then 1000s for system clock is...
zhipeng li
01:26 AM Bug #62965: osd perf error
maybe we can skip it like this https://github.com/ceph/ceph/pull/53643 zhipeng li
01:14 AM Bug #62965: osd perf error
Radoslaw Zarzynski wrote:
> Is the junky duration
>
> [...]
>
> the problem here? If so, there is was a fix: h...
zhipeng li

09/25/2023

06:09 PM Bug #62934: unittest_osdmap (Subprocess aborted) during OSDMapTest.BUG_42485
The evidence from Neha it's not a recent thing -- is was seen in 2021: https://github.com/ceph/ceph/pull/41848#issuec... Radoslaw Zarzynski
06:04 PM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
Bump this up for next bug scrub. Radoslaw Zarzynski
06:02 PM Bug #62965: osd perf error
Is the junky duration... Radoslaw Zarzynski
09:20 AM Bug #62965 (New): osd perf error
test RGW with cosbench with 4k files,
get osd slow ops info as bellow:
ceph daemon osd.14 dump_historic_slow_ops ...
zhipeng li
05:58 PM Bug #62836: CEPH zero iops after upgrade to Reef and manual read balancer
Hello Mosharaf! Thanks for the update. It looks to the @rm-pg-upmap-primary@ has workarounded the problem.
```
ce...
Radoslaw Zarzynski
11:14 AM Bug #62836: CEPH zero iops after upgrade to Reef and manual read balancer
Laura Flores wrote:
> Hey Mosharaf, any updates on the state of your cluster?
>
> We would still need a copy of y...
Mosharaf Hossain

09/22/2023

07:48 AM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
test-8 : client_lim = 0 / client_res = 0.5 / client_wgt = 60 / iops=240 / bw=240M... jianwei zhang

09/21/2023

08:38 PM Bug #61815 (Resolved): PgScrubber cluster warning is misspelled
Backport of PR#52590 to reef and quincy is not required as custom reaction *ReservingReplicas::react(const Reservatio... Prashant D
08:07 PM Bug #62934 (New): unittest_osdmap (Subprocess aborted) during OSDMapTest.BUG_42485
full log: https://jenkins.ceph.com/job/ceph-pull-requests/122260/consoleText... Casey Bodley
07:52 PM Bug #61453 (Resolved): the mgr, osd version information missing in "ceph versions" command during...
Backport PRs had been merged in upstream branches. Prashant D
05:54 PM Bug #62512: osd msgr-worker high cpu 300% due to throttle-osd_client_messages get_or_fail_fail (o...
Perfect thing to discuss during Perf Weekly meetings. Radoslaw Zarzynski
05:49 PM Backport #62927 (New): reef: "stuck peering for" warning is misleading
Backport Bot
05:49 PM Backport #62926 (New): quincy: "stuck peering for" warning is misleading
Backport Bot
05:48 PM Bug #62704 (Need More Info): Cephfs different monitor has a different LAST_DEEP_SCRUB state
Hi! Could you please elaborate on the symptom? Was the integer values absolutely the same in the third round? Radoslaw Zarzynski
05:45 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
In response to #33: this seems something different as the fix for https://tracker.ceph.com/issues/51688 got merged in... Radoslaw Zarzynski
05:44 PM Bug #51688 (Pending Backport): "stuck peering for" warning is misleading
Radoslaw Zarzynski
05:37 PM Bug #62119: timeout on reserving replicsa
Per the core sync on 20 Sep Aishwarya is working on this. Radoslaw Zarzynski
05:36 PM Bug #62832: common: config_proxy deadlock during shutdown (and possibly other times)
Oops, the PR links somehow leads to 404. A Github issue? Radoslaw Zarzynski
02:05 AM Bug #62832 (Fix Under Review): common: config_proxy deadlock during shutdown (and possibly other ...
Patrick Donnelly
03:55 PM Backport #62923 (New): reef: mon/MonmapMonitor: do not propose on error in prepare_update
Backport Bot
03:55 PM Backport #62922 (New): pacific: mon/MonmapMonitor: do not propose on error in prepare_update
Backport Bot
03:54 PM Backport #62921 (New): quincy: mon/MonmapMonitor: do not propose on error in prepare_update
Backport Bot
03:52 PM Bug #58974 (Pending Backport): mon/MonmapMonitor: do not propose on error in prepare_update
Patrick Donnelly
03:43 PM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
test-7: client_lim = 1 / client_res = 0.5 / client_wgt = 60 / iops=240 / bw=240M ... jianwei zhang
02:54 PM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...

cost = max (1M, 200K) = 1M
bw = 240M
mclock_queue_delay = 1 / 240 = 0.0041 s
200K_on_disk_lat = 200 / 1024 /...
jianwei zhang
02:45 PM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
test-6 : client_lim = 1 / client_res = 0.5 / client_wgt = 60 / iops=240 / bw=240M
* bsize=200K
* BW = 39M / 240M ...
jianwei zhang
02:37 PM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
Sridhar Seshasayee wrote:
> jianwei zhang wrote:
> > For HDD,
> >
> > The problem scenario is as follows:
> > P...
jianwei zhang
09:15 AM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
jianwei zhang wrote:
> For HDD,
>
> The problem scenario is as follows:
> Preconditions:
> bandwidth=100MiB/s
...
Sridhar Seshasayee
02:55 AM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
hi Sridhar Seshasayee,
I am still quite confused about cost calculation and latency tag.
For HDD,
The proble...
jianwei zhang
01:33 PM Bug #62918 (Fix Under Review): all write io block but no flush is triggered

If memory page is large, __maybe_wait_for_writeback_ may wait for dirty bufferheads exceed limit, but dirty data si...
Jack Lv
10:21 AM Bug #62171: All OSD shards should use the same scheduler type when osd_op_queue=debug_random.
Tests with fix shows same scheduler type applied on all OSD shards.
OSD logs from vstart cluster with osd_op_queue...
Sridhar Seshasayee

09/20/2023

10:14 PM Cleanup #62911: Label used for Ceph health alert POOL_NEARFULL is in discordance with documentati...
Prashant D wrote:
> The prometheus alert should be changed to POOL_NEARFULL instead of changing ceph warning to POOL...
Laura Flores
10:13 PM Cleanup #62911: Label used for Ceph health alert POOL_NEARFULL is in discordance with documentati...
Claiming this issue for the time being for Grace Hopper Open Source Day! Laura Flores
10:11 PM Cleanup #62911: Label used for Ceph health alert POOL_NEARFULL is in discordance with documentati...
The prometheus alert should be changed to POOL_NEARFULL instead of changing ceph warning to POOL_NEAR_FULL... Prashant D
10:04 PM Cleanup #62911 (Fix Under Review): Label used for Ceph health alert POOL_NEARFULL is in discordan...
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=2238396
Description of problem:
The label provided by the "ceph ...
Laura Flores
08:47 PM Bug #62836 (Need More Info): CEPH zero iops after upgrade to Reef and manual read balancer
Laura Flores
06:39 AM Bug #62872: ceph osd_max_backfills default value is 1000
I found that there is already a solution here: https://tracker.ceph.com/issues/58529. Under the mclock_scheduler flow... tan changzhi

09/19/2023

09:39 PM Feature #54525 (New): osd/mon: log memory usage during tick
Laura Flores
03:13 PM Bug #62836: CEPH zero iops after upgrade to Reef and manual read balancer
Hey Mosharaf, any updates on the state of your cluster?
We would still need a copy of your actual osdmap file (ach...
Laura Flores
02:34 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
http://pulpito.front.sepia.ceph.com/mchangir-2023-09-12_05:40:22-fs-wip-mchangir-testing-20230908.140927-testing-defa... Milind Changire
11:29 AM Bug #61140 (Fix Under Review): crash: int OSD::shutdown(): assert(end_time - start_time_func < cc...
Igor Fedotov
02:57 AM Bug #62872 (New): ceph osd_max_backfills default value is 1000
In the ceph 17 version, the ceph recovery parameters do not take effect. After checking the configuration parameters,... tan changzhi

09/18/2023

04:47 PM Bug #62836: CEPH zero iops after upgrade to Reef and manual read balancer
Hi Mosharaf,
Regarding the traceback you got when trying to apply the rm-pg-upmap-primary command:...
Laura Flores
09:15 AM Bug #62777: rados/valgrind-leaks: expected valgrind issues and found none
Also, the monitors didn't stop, we are checking valgring logs of running process (the memory leak error will show onl... Nitzan Mordechai

09/17/2023

06:24 AM Bug #62704: Cephfs different monitor has a different LAST_DEEP_SCRUB state
Radoslaw Zarzynski wrote:
> Is this consistent? Was there a third attempt, on mon node 1, showing @1586:7591@ again?...
fuchen ma

09/15/2023

07:46 PM Bug #45253: Inconsistent characters allowed set for device classes
Claiming this issue for the time being for Grace Hopper Open Source Day. Laura Flores
07:45 PM Bug #45253: Inconsistent characters allowed set for device classes
Steps to reproduce in a vstart cluster:
1. Check the current crush tree:...
Laura Flores
07:03 PM Cleanup #62855: src/test/TestTimers.cc: warning: control reaches end of non-void function
Claiming this one for the time being for Grace Hopper Open Source Day. Laura Flores
07:02 PM Cleanup #62855 (Resolved): src/test/TestTimers.cc: warning: control reaches end of non-void function
Seen on the latest main SHA (01ef9e5e91e73422cf11f9b49d06815e4ed75c0d):... Laura Flores
06:24 PM Cleanup #62851: OSD::ShardedOpWQ::stop_for_fast_shutdown(): warning: comparison of integer expres...
Claiming this one for the time being for Grace Hopper Open Source Day. Laura Flores
06:23 PM Cleanup #62851 (Resolved): OSD::ShardedOpWQ::stop_for_fast_shutdown(): warning: comparison of int...
Seen on the latest main SHA (01ef9e5e91e73422cf11f9b49d06815e4ed75c0d):... Laura Flores
06:13 PM Bug #53251 (Closed): compiler warning about deprecated fmt::format_to()
Doesn't seem relevant anymore based on the latest main SHA (01ef9e5e91e73422cf11f9b49d06815e4ed75c0d).
(I ran `nin...
Laura Flores
05:18 PM Tasks #61816 (Closed): Add logging to the read balancer
Laura Flores
08:17 AM Bug #62512: osd msgr-worker high cpu 300% due to throttle-osd_client_messages get_or_fail_fail (o...
Have an idea,
Bind time_event to throttle,
When waking up the EventCenter::process_events msgr-worker thread,
Dete...
jianwei zhang
08:12 AM Bug #62512: osd msgr-worker high cpu 300% due to throttle-osd_client_messages get_or_fail_fail (o...
... jianwei zhang
07:13 AM Bug #62512: osd msgr-worker high cpu 300% due to throttle-osd_client_messages get_or_fail_fail (o...
# ms_time_events_min_wait_interval=1000us
* osd cpu
* osd cpu : 176
* osd msgr worker : 3 * 35% = 105%
* 6...
jianwei zhang
06:15 AM Bug #62512: osd msgr-worker high cpu 300% due to throttle-osd_client_messages get_or_fail_fail (o...
Avoid meaningless idling of msgr worker thread jianwei zhang
06:14 AM Bug #62512: osd msgr-worker high cpu 300% due to throttle-osd_client_messages get_or_fail_fail (o...
solution:
The core idea is that after the first time_event timeout expires in nanoseconds,
Force alignment to epo...
jianwei zhang
05:36 AM Bug #62512: osd msgr-worker high cpu 300% due to throttle-osd_client_messages get_or_fail_fail (o...
code logic analysis:
* clock accuracy issue
* end_time > now : Nanosecond comparison, but converted to microseconds...
jianwei zhang
05:26 AM Bug #62512: osd msgr-worker high cpu 300% due to throttle-osd_client_messages get_or_fail_fail (o...

Log statistical analysis:
* After msgr-worker-thread was awakened, 93.34% (1756073 / 1881227) processed 0 events.
...
jianwei zhang
05:20 AM Bug #62512: osd msgr-worker high cpu 300% due to throttle-osd_client_messages get_or_fail_fail (o...
2023-09-15 problem analysis:... jianwei zhang
12:46 AM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
we prepare
Use osd bench or rados bench to test the bandwidth of osd,
use fio to t est random iops of hdd
jianwei zhang
12:45 AM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
Sridhar Seshasayee wrote:
> Responses to your questions.
>
> Q: how to test and get osd_mclock_max_sequential_ba...
jianwei zhang

09/14/2023

09:02 PM Backport #59702 (Resolved): reef: mon: FAILED ceph_assert(osdmon()->is_writeable())
https://github.com/ceph/ceph/pull/51409
Merged
Kamoltat (Junior) Sirivadhna
08:59 PM Bug #58894 (Resolved): [pg-autoscaler][mgr] does not throw warn to increase PG count on pools wit...
All backports has resolved. Kamoltat (Junior) Sirivadhna
08:58 PM Backport #62820 (Resolved): reef: [pg-autoscaler][mgr] does not throw warn to increase PG count o...
Merged Kamoltat (Junior) Sirivadhna
07:28 PM Bug #46437 (Closed): Admin Socket leaves behind .asok files after daemons (ex: RGW) shut down gra...
Ali Maredia
06:54 PM Backport #51195 (In Progress): pacific: [rfe] increase osd_max_write_op_reply_len default value t...
Konstantin Shalygin
05:24 PM Bug #62836: CEPH zero iops after upgrade to Reef and manual read balancer
Mosharaf Hossain wrote:
> Stefan Kooman wrote:
> > @Mosharaf Hossain:
> >
> > Do you also have a performance ove...
Laura Flores
05:17 PM Bug #62836: CEPH zero iops after upgrade to Reef and manual read balancer

Stefan Kooman wrote:
> @Mosharaf Hossain:
>
> Do you also have a performance overview when you were runni...
Mosharaf Hossain
03:36 PM Bug #62836: CEPH zero iops after upgrade to Reef and manual read balancer
After running `ceph osd dump`, you should see entries like this at the end of the output, which indicate each pg you ... Laura Flores
03:13 PM Bug #62836: CEPH zero iops after upgrade to Reef and manual read balancer
Hi Mosharaf, as Stefan wrote above, you can get your osdmap file by running the following command, where "osdmap" is ... Laura Flores
10:10 AM Bug #62836: CEPH zero iops after upgrade to Reef and manual read balancer
@Mosharaf Hossain:
What kind of client do you use to access the VM storage (i.e. kernel client rbd, krbd, or librb...
Stefan Kooman
09:52 AM Bug #62836: CEPH zero iops after upgrade to Reef and manual read balancer
The dashboard values show all "0", but the graph indicates it's still doing IO, as does "ceph -s". It might (also) be... Stefan Kooman
09:47 AM Bug #62836: CEPH zero iops after upgrade to Reef and manual read balancer
@Mosharaf Hossain:
Do you also have a performance overview when you were running Quincy? Quincy would then be the ...
Stefan Kooman
04:31 AM Bug #62836 (Need More Info): CEPH zero iops after upgrade to Reef and manual read balancer
We've recently performed an upgrade on our Cephadm cluster, transitioning from Ceph Quiency to Reef. However, followi... Mosharaf Hossain
01:54 PM Backport #50697 (In Progress): pacific: common: the dump of thread IDs is in dec instead of hex
Konstantin Shalygin
01:52 PM Backport #56649 (In Progress): pacific: [Progress] Do not show NEW PG_NUM value for pool if autos...
Konstantin Shalygin
01:50 PM Backport #56648 (Resolved): quincy: [Progress] Do not show NEW PG_NUM value for pool if autoscale...
Konstantin Shalygin
01:50 PM Backport #50831 (In Progress): pacific: pacific ceph-mon: mon initial failed on aarch64
Konstantin Shalygin
10:29 AM Bug #62826 (Fix Under Review): crushmap holds the previous rule for an EC pool created with name ...
Nitzan Mordechai
07:44 AM Bug #62839 (New): Teuthology failure in LibRadosTwoPoolsPP.HitSetWrite

Branch tested: wip-rf-fshut ('main' b690343128 as of 12.9.23 + changes to one shutdown function (commit 210dbd4ff19...
Ronen Friedman

09/13/2023

10:58 PM Bug #62833: [Reads Balancer] osdmaptool with with --read option creates suggestions for primary O...
The symptom is an error like this after applying a pg-upmap-primary command:... Laura Flores
10:56 PM Bug #62833 (Fix Under Review): [Reads Balancer] osdmaptool with with --read option creates sugges...
Laura Flores
06:07 PM Bug #62833 (Resolved): [Reads Balancer] osdmaptool with with --read option creates suggestions fo...
See the BZ for more details: https://bugzilla.redhat.com/show_bug.cgi?id=2237574 Laura Flores
09:02 PM Backport #61569 (Resolved): quincy: the mgr, osd version information missing in "ceph versions" c...
Prashant D
06:57 PM Bug #62568: Coredump in rados_aio_write_op_operate
... Radoslaw Zarzynski
06:52 PM Bug #62704: Cephfs different monitor has a different LAST_DEEP_SCRUB state
Is this consistent? Was there a third attempt, on mon node 1, showing @1586:7591@ again?
And BTW, this is mgr-handle...
Radoslaw Zarzynski
06:43 PM Bug #62213: crush: choose leaf with type = 0 may incorrectly map out osds
Bump up. Radoslaw Zarzynski
06:42 PM Bug #62769 (Duplicate): ninja fails during build osd.cc
Neha Ojha
06:34 PM Bug #62776: rados: cluster [WRN] overall HEALTH_WARN - do not have an application enabled
Not a priority. Radoslaw Zarzynski
06:33 PM Bug #62777: rados/valgrind-leaks: expected valgrind issues and found none
Yeah, we have a test intentionally causing a leak just to ensure valgrind truly works.
I wonder what might if this t...
Radoslaw Zarzynski
06:28 PM Bug #62788 (Rejected): mon: mon store db loss file
This likely is a corruption of filesystem / hardware error.... Radoslaw Zarzynski
06:19 PM Bug #62119: timeout on reserving replicsa
Bumping this up. Radoslaw Zarzynski
06:18 PM Bug #50245: TEST_recovery_scrub_2: Not enough recovery started simultaneously
note from scrub: let's observe. Radoslaw Zarzynski
06:17 PM Bug #53789: CommandFailedError (rados/test_python.sh): "RADOS object not found" causes test_rados...
The fix approved but waits for QA. Bumping this up. Radoslaw Zarzynski
03:28 PM Bug #62832 (Pending Backport): common: config_proxy deadlock during shutdown (and possibly other ...
Saw this deadlock in teuthology where I was doing parallel `ceph config set` commands:... Patrick Donnelly
01:50 PM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
Responses to your questions.
Q: how to test and get osd_mclock_max_sequential_bandwidth_hdd and osd_mclock_max_ca...
Sridhar Seshasayee
12:21 PM Bug #62826 (Fix Under Review): crushmap holds the previous rule for an EC pool created with name ...
BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2224324
Description of problem:
If an EC pool is created with...
Nitzan Mordechai

09/12/2023

07:03 PM Bug #62669 (Fix Under Review): Pacific: multiple scrub and deep-scrub start message repeating for...
Prashant D
05:38 PM Backport #62820 (Resolved): reef: [pg-autoscaler][mgr] does not throw warn to increase PG count o...
Kamoltat (Junior) Sirivadhna
05:08 PM Bug #58894 (Pending Backport): [pg-autoscaler][mgr] does not throw warn to increase PG count on p...
Oops, looks this tracker missed a reef backport (the patches are absent in v18.2.0). Radoslaw Zarzynski
04:53 PM Backport #62819 (In Progress): reef: osd: choose_async_recovery_ec may select an acting set < min...
https://github.com/ceph/ceph/pull/54550 Backport Bot
04:53 PM Backport #62818 (Resolved): pacific: osd: choose_async_recovery_ec may select an acting set < min...
https://github.com/ceph/ceph/pull/54548 Backport Bot
04:53 PM Backport #62817 (In Progress): quincy: osd: choose_async_recovery_ec may select an acting set < m...
https://github.com/ceph/ceph/pull/54549 Backport Bot
04:51 PM Bug #62338 (Pending Backport): osd: choose_async_recovery_ec may select an acting set < min_size
Radoslaw Zarzynski
03:09 PM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
A version has been modified, please review
For cost calculation, the core idea is to take the larger value of user i...
jianwei zhang
07:35 AM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
!https://tracker.ceph.com/attachments/download/6655/rados_bench_pr_52809.png!
osd/scheduler/mClockScheduler: Use s...
jianwei zhang
07:18 AM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
!https://tracker.ceph.com/attachments/download/6653/tell_bench.png!
!https://tracker.ceph.com/attachments/download...
jianwei zhang
07:16 AM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
another question :
how to test and get osd_mclock_max_sequential_bandwidth_hdd and osd_mclock_max_capacity_iops_hdd...
jianwei zhang
05:54 AM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
jianwei zhang wrote:
> One is not to add osd_bandwidth_cost_per_io cost:
> !https://tracker.ceph.com/attachments/do...
jianwei zhang
05:37 AM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
Please help me check if there are any errors in the process of calculating cost.
If nothing goes wrong,
Please disc...
jianwei zhang
05:35 AM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
Incremental step calculation method:... jianwei zhang
05:30 AM Bug #62812: osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io in mClockS...
One is not to add osd_bandwidth_cost_per_io cost:
!https://tracker.ceph.com/attachments/download/6650/add_osd_bandwi...
jianwei zhang
05:29 AM Bug #62812 (Resolved): osd: Is it necessary to unconditionally increase osd_bandwidth_cost_per_io...
In this PR, the IOPS-based QoS cost calculation method is removed and the Bandwidth-based QoS cost calculation method... jianwei zhang
09:32 AM Bug #57628 (In Progress): osd:PeeringState.cc: FAILED ceph_assert(info.history.same_interval_sinc...
Matan Breizman
09:31 AM Bug #57628: osd:PeeringState.cc: FAILED ceph_assert(info.history.same_interval_since != 0)
WIP: https://gist.github.com/Matan-B/40b5a7ee30e9e73d20c052594365aae8
This seems to be highly related to map gap e...
Matan Breizman
05:25 AM Bug #62811: PGs stuck in backfilling state after their primary OSD is removed by setting its crus...
Analysis of the issue is performed by taking a single PG (6.15a) on osd.34 and on which backfill didn't start.
I hav...
Sridhar Seshasayee
03:43 AM Bug #62811 (New): PGs stuck in backfilling state after their primary OSD is removed by setting it...
I am pasting the problem description from the original BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2233777
<pr...
Sridhar Seshasayee

09/11/2023

07:22 PM Bug #57628: osd:PeeringState.cc: FAILED ceph_assert(info.history.same_interval_since != 0)
selected by holding the ALT key :-) Yaarit Hatuka
06:49 PM Bug #57628: osd:PeeringState.cc: FAILED ceph_assert(info.history.same_interval_since != 0)
If anyone knows how to properly select multiple affected versions, please go ahead.
v18.0.0, v14.0.0, v15.0.0, and...
Laura Flores
06:43 PM Bug #57628: osd:PeeringState.cc: FAILED ceph_assert(info.history.same_interval_since != 0)
/a/lflores-2023-09-08_20:36:06-rados-wip-lflores-testing-2-2023-09-08-1755-distro-default-smithi/7391621 Laura Flores
07:39 AM Bug #62788 (Rejected): mon: mon store db loss file
... yite gu

09/08/2023

11:54 PM Bug #62669: Pacific: multiple scrub and deep-scrub start message repeating for a same PG
Thanks to Cory and David for providing the osd.426 debug logs to find out the reason for scrub getting initiated for ... Prashant D
05:38 AM Bug #62669: Pacific: multiple scrub and deep-scrub start message repeating for a same PG
The multiple scrub starts messages for the same PG indicates that there is a problem with scrubbing in the pacific re... Prashant D
10:05 PM Bug #62777 (New): rados/valgrind-leaks: expected valgrind issues and found none
rados/valgrind-leaks/{1-start 2-inject-leak/mon centos_latest}
/a/yuriw-2023-08-11_02:49:40-rados-wip-yuri4-testin...
Laura Flores
07:43 PM Bug #62776 (New): rados: cluster [WRN] overall HEALTH_WARN - do not have an application enabled
Description: rados/basic/{ceph clusters/{fixed-2 openstack} mon_election/connectivity msgr-failures/few msgr/async-v1... Laura Flores
01:08 PM Bug #62769 (Duplicate): ninja fails during build osd.cc
[124/210] Building CXX object src/osd/CMakeFiles/osd.dir/OSD.cc.o
FAILED: src/osd/CMakeFiles/osd.dir/OSD.cc.o
/usr...
MOHIT AGRAWAL

09/07/2023

04:41 PM Feature #61788 (Resolved): Adding missing types to ceph-dencoder
J. Eric Ivancich

09/06/2023

09:24 AM Bug #62596 (Closed): osd: Remove leaked clone objects (SnapMapper malformed key)
Matan Breizman
08:25 AM Bug #62704: Cephfs different monitor has a different LAST_DEEP_SCRUB state
Unrelated to cephfs - moving to RADOS component. Venky Shankar

09/05/2023

10:00 PM Bug #53789: CommandFailedError (rados/test_python.sh): "RADOS object not found" causes test_rados...
/a/yuriw-2023-09-01_19:14:47-rados-wip-batrick-testing-20230831.124848-pacific-distro-default-smithi/7386290 Laura Flores
08:20 PM Bug #50245 (New): TEST_recovery_scrub_2: Not enough recovery started simultaneously
/a/yuriw-2023-08-15_18:58:56-rados-wip-yuri3-testing-2023-08-15-0955-distro-default-smithi/7369212
Worth looking i...
Laura Flores
08:10 PM Bug #61774: centos 9 testing reveals rocksdb "Leak_StillReachable" memory leak in mons
/a/yuriw-2023-08-16_18:39:08-rados-wip-yuri3-testing-2023-08-15-0955-distro-default-smithi/7370286/ Kamoltat (Junior) Sirivadhna
08:10 PM Bug #62119: timeout on reserving replicsa
/a/yuriw-2023-08-15_18:58:56-rados-wip-yuri3-testing-2023-08-15-0955-distro-default-smithi/7369280
/a/yuriw-2023-08-...
Laura Flores
07:38 PM Bug #59196: ceph_test_lazy_omap_stats segfault while waiting for active+clean
/a/yuriw-2023-08-15_18:58:56-rados-wip-yuri3-testing-2023-08-15-0955-distro-default-smithi/7369175 Laura Flores
07:17 AM Bug #62704 (Closed): Cephfs different monitor has a different LAST_DEEP_SCRUB state
I ran 'ceph pg dump pgs' on each monitor node and find that the values of REPORTED are inconsistent. For example:
Th...
fuchen ma

09/04/2023

11:44 AM Bug #59531: quincy: "OSD bench result of 228617.361065 IOPS exceeded the threshold limit of 500.0...
https://pulpito.ceph.com/rishabh-2023-08-25_06:38:25-fs-wip-rishabh-2023aug3-b5-testing-default-smithi/7379315 Rishabh Dave
08:08 AM Backport #59676: reef: osd:tick checking mon for new map
https://github.com/ceph/ceph/pull/53269 yite gu
08:03 AM Bug #62568: Coredump in rados_aio_write_op_operate
Hi, Any feedback ? Nokia ceph-users

09/03/2023

05:30 AM Documentation #62680 (In Progress): Docs for setting up multisite RGW don't work
This procedure is expected to be tested during the first week of September 2023. Zac Dover
05:29 AM Documentation #62680 (In Progress): Docs for setting up multisite RGW don't work
An email from Petr Bena:
Hello,
My goal is to setup multisite RGW with 2 separate CEPH clusters in separate dat...
Zac Dover

09/01/2023

06:55 PM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
/teuthology/pdonnell-2023-08-31_15:31:51-fs-wip-batrick-testing-20230831.124848-pacific-distro-default-smithi/7385689... Patrick Donnelly

08/31/2023

11:14 PM Bug #62669: Pacific: multiple scrub and deep-scrub start message repeating for a same PG
The scrub "starts" message should be logged when remotes are reserved and scrubber is initiated for the PG. Checking ... Prashant D
07:56 PM Bug #62669: Pacific: multiple scrub and deep-scrub start message repeating for a same PG
The "starts" message was reintroduced in pacific with PR https://github.com/ceph/ceph/pull/48070. The multiple scrub ... Prashant D
07:51 PM Bug #62669 (Resolved): Pacific: multiple scrub and deep-scrub start message repeating for a same PG
The ceph cluster log reporting multiple "scrub starts" and "deep-scrub starts" for a same PG multiple times within sh... Prashant D
09:53 AM Bug #61140: crash: int OSD::shutdown(): assert(end_time - start_time_func < cct->_conf->osd_fast_...
We just observed this "noise" for quite a few OSDs on rolling reboots.... would be nice to have this "not" treated an... Christian Rohmann
05:54 AM Bug #62568: Coredump in rados_aio_write_op_operate
We are considering for using quincy/reef to test BTW, Even if the memory is exhausted, the expected response from mal... Nokia ceph-users
 

Also available in: Atom