Project

General

Profile

Activity

From 01/26/2021 to 02/24/2021

02/24/2021

10:07 PM Bug #49463 (Can't reproduce): qa/standalone/misc/rados-striper.sh: Caught signal in thread_name:r...
... Neha Ojha
09:34 PM Bug #49461 (Duplicate): rados/upgrade/pacific-x/parallel: upgrade incomplete
... Neha Ojha
09:26 PM Bug #49460 (Fix Under Review): qa/workunits/cephtool/test.sh: test_mon_osd_create_destroy fails
Neha Ojha
08:58 PM Bug #49460 (Resolved): qa/workunits/cephtool/test.sh: test_mon_osd_create_destroy fails
... Neha Ojha
09:00 PM Bug #49212 (Fix Under Review): mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to cl...
earlier,... Sage Weil
08:48 PM Bug #49212 (In Progress): mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class '...
... Sage Weil
10:45 AM Bug #49353: Random OSDs being marked as down even when there is very less activity on the cluster...
Nokia ceph-users wrote:
> Do you suspect that this is something relevant to 14.2.2 and could be solved with a higher...
Igor Fedotov
04:55 AM Bug #49353: Random OSDs being marked as down even when there is very less activity on the cluster...
Do you suspect that this is something relevant to 14.2.2 and could be solved with a higher version? Nokia ceph-users
09:16 AM Bug #49448 (New): If OSD types are changed, pools rules can become unresolvable without providing...
When some OSDs in a cluster are of a specific type, such as hdd_aes, and the type is used in a rule, if the type of s... linzhou zhou
05:07 AM Bug #49428 (Triaged): ceph_test_rados_api_snapshots fails with "rados_mon_command osd pool create...
Brad Hubbard
04:50 AM Bug #49428: ceph_test_rados_api_snapshots fails with "rados_mon_command osd pool create failed wi...
TLDR skip to ********* MON.A ************** below.
So this looks like a race. The calls seem to be serialized in t...
Brad Hubbard
12:48 AM Bug #49428: ceph_test_rados_api_snapshots fails with "rados_mon_command osd pool create failed wi...
Here's the error from the mon log.... Brad Hubbard
05:06 AM Bug #47719 (In Progress): api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
Brad Hubbard
12:50 AM Bug #49427: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
Most likely, the problem is that the object being dirtied is present, but the prior clone is missing pending recovery. Samuel Just

02/23/2021

11:52 PM Bug #49427: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
dec_refcount_by_dirty is related to tiering/dedeup which got added fairly recently in https://github.com/ceph/ceph/pu... Neha Ojha
10:36 PM Bug #49427: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
/a/bhubbard-2021-02-23_02:25:14-rados-master-distro-basic-smithi/5905669 Brad Hubbard
01:44 AM Bug #49427 (Resolved): FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_...
/a/bhubbard-2021-02-22_23:51:15-rados-master-distro-basic-smithi/5904732
rados/verify/{centos_latest ceph clusters...
Brad Hubbard
11:15 PM Bug #49403: Caught signal (aborted) on mgrmap epoch 1 during librados init (rados-striper)
/a/sage-2021-02-23_06:29:23-rados-wip-sage-testing-2021-02-22-2228-distro-basic-smithi/5906245
Sage Weil
09:06 PM Backport #49055 (In Progress): nautilus: pick_a_shard() always select shard 0
Nathan Cutler
03:40 PM Bug #48613: Reproduce https://tracker.ceph.com/issues/48417
... Deepika Upadhyay
12:23 PM Bug #49353: Random OSDs being marked as down even when there is very less activity on the cluster...
Nokia ceph-users wrote:
> Hi , Another occurrence
>
> _2021-02-22 09:19:43.010071 mon.cn1 (mon.0) 267937 : cluste...
Nokia ceph-users
12:23 PM Bug #49353: Random OSDs being marked as down even when there is very less activity on the cluster...
Hi , Another occurrence
_2021-02-22 09:19:43.010071 mon.cn1 (mon.0) 267937 : cluster [INF] osd.146 marked down aft...
Nokia ceph-users
04:34 AM Bug #48065: "ceph osd crush set|reweight-subtree" commands do not set weight on device class subtree
Sage Weil wrote:
> BTW Mykola I would suggest using 'ceph osd crush reweight osd.N' (which works fine already) inste...
Mykola Golub
02:02 AM Bug #49428 (Duplicate): ceph_test_rados_api_snapshots fails with "rados_mon_command osd pool crea...
/a/bhubbard-2021-02-22_23:51:15-rados-master-distro-basic-smithi/5904720... Brad Hubbard
12:42 AM Bug #49069 (Resolved): mds crashes on v15.2.8 -> master upgrade decoding MMgrConfigure
Sage Weil

02/22/2021

08:22 PM Bug #48065: "ceph osd crush set|reweight-subtree" commands do not set weight on device class subtree
BTW Mykola I would suggest using 'ceph osd crush reweight osd.N' (which works fine already) instead of the 'ceph osd ... Sage Weil
08:22 PM Bug #48065 (Fix Under Review): "ceph osd crush set|reweight-subtree" commands do not set weight o...
Sage Weil
07:37 PM Bug #46318 (In Progress): mon_recovery: quorum_status times out
Neha Ojha wrote:
> We are still seeing these.
>
> /a/teuthology-2021-01-18_07:01:01-rados-master-distro-basic-smi...
Sage Weil
09:50 AM Bug #49409 (New): osd run into dead loop and tell slow request when rollback snap with using cach...
xin mycho
08:59 AM Bug #47719: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
If we are happy with https://github.com/ceph/ceph/pull/39601 in theory perhaps we need to extend it to cover the othe... Brad Hubbard
06:31 AM Bug #47719: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
Hey Sage, I think you meant /a/sage-2021-02-20_16:46:42-rados-wip-sage2-testing-2021-02-20-0942-distro-basic-smithi/5... Brad Hubbard
02:17 AM Bug #47719: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
First, the relevant test code from src/test/librados/watch_notify.cc.... Brad Hubbard

02/21/2021

04:50 PM Backport #49404 (Resolved): pacific: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
https://github.com/ceph/ceph/pull/39597
Backport Bot
04:48 PM Bug #48984 (Pending Backport): lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
Sage Weil
04:46 PM Bug #49403 (Duplicate): Caught signal (aborted) on mgrmap epoch 1 during librados init (rados-str...
... Sage Weil
04:42 PM Bug #47719: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
/a/sage-2021-02-20_16:46:42-rados-wip-sage2-testing-2021-02-20-0942-distro-basic-smithi/5899129
Sage Weil
11:48 AM Bug #48998: Scrubbing terminated -- not all pgs were active and clean
rados/singleton/{all/lost-unfound-delete mon_election/classic msgr-failures/none msgr/async-v1only objectstore/bluest... Kefu Chai
03:35 AM Backport #49402 (Resolved): octopus: rados: Health check failed: 1/3 mons down, quorum a,c (MON_D...
https://github.com/ceph/ceph/pull/40138 Backport Bot
03:35 AM Backport #49401 (Resolved): pacific: rados: Health check failed: 1/3 mons down, quorum a,c (MON_D...
https://github.com/ceph/ceph/pull/40137 Backport Bot
03:32 AM Bug #45441 (Pending Backport): rados: Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" ...
Kefu Chai

02/20/2021

07:41 PM Bug #48386 (Resolved): Paxos::restart() and Paxos::shutdown() can race leading to use-after-free ...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
12:33 PM Bug #49395 (Resolved): ceph-test rpm missing gtest dependencies
Kefu Chai
03:47 AM Bug #49395 (Fix Under Review): ceph-test rpm missing gtest dependencies
Patrick Donnelly

02/19/2021

11:59 PM Bug #45647: "ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failu...
/a/teuthology-2021-02-17_03:31:03-rados-pacific-distro-basic-smithi/5889472 Neha Ojha
11:30 PM Backport #49398 (Resolved): pacific: rados/dashboard: Health check failed: Telemetry requires re-...
https://github.com/ceph/ceph/pull/39484 Backport Bot
11:30 PM Backport #49397 (Resolved): octopus: rados/dashboard: Health check failed: Telemetry requires re-...
https://github.com/ceph/ceph/pull/39704 Backport Bot
11:29 PM Bug #49212 (Duplicate): mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ss...
Neha Ojha
11:25 PM Bug #48990 (Pending Backport): rados/dashboard: Health check failed: Telemetry requires re-opt-in...
Neha Ojha
11:24 PM Bug #48990: rados/dashboard: Health check failed: Telemetry requires re-opt-in (TELEMETRY_CHANGED...
pacific backport merged: https://github.com/ceph/ceph/pull/39484 Josh Durgin
11:24 PM Bug #40809: qa: "Failed to send signal 1: None" in rados
Deepika Upadhyay wrote:
> this happens due to dispatch delay.
> Testing with increased values for a test case can ...
Neha Ojha
07:45 AM Bug #40809: qa: "Failed to send signal 1: None" in rados
this happens due to dispatch delay.
Testing with increased values for a test case can lead to this failure:
/ceph/...
Deepika Upadhyay
11:05 PM Bug #44945: Mon High CPU usage when another mon syncing from it
Wout van Heeswijk wrote:
> I think this might be related to #42830. If so it may be resolved with Ceph Nautilus 14.2...
Neha Ojha
11:04 PM Bug #47719: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
Will do. Brad Hubbard
10:54 PM Bug #47719: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
Brad, can you please take look at this one? Neha Ojha
09:25 PM Bug #47719: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
/a/teuthology-2021-02-17_03:31:03-rados-pacific-distro-basic-smithi/5889235 Neha Ojha
10:49 PM Bug #49359: osd: warning: unused variable
f9f9270d75d3bc6383604addefc2386318ecfc8b was done to fix another warning, definitely not high priority :) Neha Ojha
10:47 PM Bug #45441 (Fix Under Review): rados: Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" ...
Sage Weil
10:44 PM Bug #39039 (Duplicate): mon connection reset, command not resent
let's track this at #45647 Sage Weil
10:33 PM Bug #47003 (Duplicate): ceph_test_rados test error. Reponses out of order due to the connection d...
Neha Ojha
10:29 PM Feature #39339: prioritize backfill of metadata pools, automatically
I think this tracker can be marked resolved since pull request 29181 merged. David Zafman
10:26 PM Bug #48468 (Need More Info): ceph-osd crash before being up again
Hi Clement,
Can you reproduce this with logs?...
Sage Weil
10:19 PM Bug #49393 (Need More Info): Segmentation fault in ceph::logging::Log::entry()
Sage Weil
09:11 PM Bug #49393 (Can't reproduce): Segmentation fault in ceph::logging::Log::entry()
... Neha Ojha
10:16 PM Bug #49395 (Resolved): ceph-test rpm missing gtest dependencies
... Sage Weil
09:28 PM Bug #48841 (Fix Under Review): test_turn_off_module: wait_until_equal timed out
Neha Ojha
08:09 PM Bug #49392 (Resolved): osd ok-to-stop too conservative
Currently 'osd ok-to-stop' is too conservative: if the pg is degraded, and is touched by an osd we might stop, it alw... Sage Weil
04:43 PM Backport #49320 (In Progress): octopus: thrash_cache_writeback_proxy_none: FAILED ceph_assert(ver...
https://github.com/ceph/ceph/pull/39578 Neha Ojha
02:51 PM Bug #48613: Reproduce https://tracker.ceph.com/issues/48417
investigating the 2 unfound objects, `when all_unfound_are_queried_or_lost all of
might_have_unfound` all participat...
Deepika Upadhyay
01:53 PM Bug #49104: crush weirdness: degraded PGs not marked as such, and choose_total_tries = 50 is too ...
Neha Ojha wrote:
> Regarding Problem A, will it be possible for you to share osd logs with debug_osd=20 to demonstra...
Dan van der Ster
01:26 PM Backport #49377 (Resolved): pacific: building libcrc32
https://github.com/ceph/ceph/pull/39902 Backport Bot
10:31 AM Bug #49231: MONs unresponsive over extended periods of time
I think I found the reason for this behaviour. I managed to pull extended logs during an incident and saw that the MO... Frank Schilder
01:40 AM Bug #48984 (Fix Under Review): lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
Neha Ojha

02/18/2021

05:20 PM Bug #49359: osd: warning: unused variable
https://stackoverflow.com/a/50176479 Patrick Donnelly
05:17 PM Bug #49359 (New): osd: warning: unused variable
... Patrick Donnelly
03:30 PM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
... Sebastian Wagner
03:21 PM Bug #49259 (Resolved): test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
Sebastian Wagner
03:21 PM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
turned out to be caused by https://github.com/ceph/ceph/pull/39530 Sebastian Wagner
10:16 AM Bug #49353 (Need More Info): Random OSDs being marked as down even when there is very less activi...
osd.149 went down at 03:25:26
2021-01-14 03:25:25.974634 mon.cn1 (mon.0) 384654 : cluster [INF] osd.149 marked down ...
Igor Fedotov
09:51 AM Bug #49353 (Need More Info): Random OSDs being marked as down even when there is very less activi...
Hi,
We recently see some random OSDs being marked as down status with the below message on one of our Nautilus cl...
Nokia ceph-users
08:06 AM Backport #48495 (Resolved): nautilus: Paxos::restart() and Paxos::shutdown() can race leading to ...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39160
m...
Nathan Cutler

02/17/2021

08:22 PM Bug #48984 (In Progress): lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
David Zafman
08:18 PM Bug #48984: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs

Proposed fix in https://github.com/ceph/ceph/pull/39535
Needs extensive testing
David Zafman
05:41 PM Bug #48984: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs

If a requested scrub runs into a rejected remote reservation, the m_planned_scrub is already reset. This means tha...
David Zafman
07:51 PM Bug #48990: rados/dashboard: Health check failed: Telemetry requires re-opt-in (TELEMETRY_CHANGED...
https://github.com/ceph/ceph/pull/39484 merged Yuri Weinstein
04:29 PM Backport #49073: nautilus: crash in Objecter and CRUSH map lookup
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/39197
merged
Yuri Weinstein
04:29 PM Backport #48495: nautilus: Paxos::restart() and Paxos::shutdown() can race leading to use-after-f...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/39160
merged
Yuri Weinstein
10:20 AM Backport #49320 (Resolved): octopus: thrash_cache_writeback_proxy_none: FAILED ceph_assert(versio...
https://github.com/ceph/ceph/pull/39578 Backport Bot
10:15 AM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
http://qa-proxy.ceph.com/teuthology/yuriw-2021-02-16_16:01:09-rados-wip-yuri-testing-2021-02-08-1109-octopus-distro-b... Deepika Upadhyay

02/16/2021

10:51 PM Bug #49259 (Need More Info): test_rados_api tests timeout with cephadm (plus extremely large OSD ...
Brad Hubbard
09:30 PM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
From IRC:... Neha Ojha
06:00 PM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
Sebastian Wagner wrote:
> sage: this is related to thrashing and only happens within cephadm. non-cephadm is not aff...
Neha Ojha
08:47 PM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
Argh! So it does, my bad. Please ignore comment 22 for now. Brad Hubbard
08:44 PM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
Deepika Upadhyay wrote:
> /ceph/teuthology-archive/yuriw-2021-02-15_20:25:26-rados-wip-yuri3-testing-2021-02-15-1020...
Neha Ojha
07:51 PM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
Deepika Upadhyay wrote:
> /ceph/teuthology-archive/yuriw-2021-02-15_20:25:26-rados-wip-yuri3-testing-2021-02-15-1020...
Brad Hubbard
07:14 PM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
-/ceph/teuthology-archive/yuriw-2021-02-15_20:25:26-rados-wip-yuri3-testing-2021-02-15-1020-nautilus-distro-basic-gib... Deepika Upadhyay
03:28 PM Bug #49303: FTBFS due to cmake's inability to find std::filesystem on a CentOS8 on aarch64
Deepika, -I don't understand why or how the "workaround" addresses the issue here. probably you could file a PR based... Kefu Chai
10:58 AM Bug #49303: FTBFS due to cmake's inability to find std::filesystem on a CentOS8 on aarch64
hey Kefu! Should we use this workaround meanwhile the real bug is being fixed?... Deepika Upadhyay
08:39 AM Bug #49303: FTBFS due to cmake's inability to find std::filesystem on a CentOS8 on aarch64
created https://github.com/ceph/ceph/pull/39491 in hope to work around this. Kefu Chai
04:49 AM Bug #49303: FTBFS due to cmake's inability to find std::filesystem on a CentOS8 on aarch64
filed https://bugzilla.redhat.com/show_bug.cgi?id=1929043 Kefu Chai
04:48 AM Bug #49303 (In Progress): FTBFS due to cmake's inability to find std::filesystem on a CentOS8 on ...
... Kefu Chai
01:23 PM Bug #49190: LibRadosMiscConnectFailure_ConnectFailure_Test: FAILED ceph_assert(p != obs_call_gate...
Bug #40868 is not related Jos Collin

02/15/2021

10:35 PM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
... Brad Hubbard
03:40 PM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
sage: this is related to thrashing and only happens within cephadm. non-cephadm is not affected Sebastian Wagner
04:22 AM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
Managed to reproduce this with some manageable large osd logs.
On the first osd, just before the slow ops begin we...
Brad Hubbard

02/13/2021

12:28 AM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
https://pulpito.ceph.com/swagner-2021-02-11_11:00:52-rados:cephadm-wip-swagner3-testing-2021-02-10-1322-distro-basic-... Sebastian Wagner

02/12/2021

11:00 PM Backport #48986 (Resolved): pacific: ceph osd df tree reporting incorrect SIZE value for rack hav...
Brad Hubbard
10:44 PM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
This ran for nearly 24 hours.... Brad Hubbard
10:58 AM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
Brad Hubbard wrote:
> https://tracker.ceph.com/issues/39039 ?
>
> Maybe it would be worth a try to see if disabli...
Sebastian Wagner
07:11 AM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
https://tracker.ceph.com/issues/39039 ?
Maybe it would be worth a try to see if disabling cephx improves the situa...
Brad Hubbard
05:39 AM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
... Brad Hubbard
03:42 AM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
The stuck op is a copy of 16:8744f7fc:test-rados-api-smithi091-35842-17::big:head to 16:2b70cbe7:test-rados-api-smith... Brad Hubbard
02:34 AM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
From swagner-2021-02-11_10:31:33-rados:cephadm-wip-swagner-testing-2021-02-09-1126-distro-basic-smithi/5874513 there'... Brad Hubbard
10:09 PM Bug #49190 (Resolved): LibRadosMiscConnectFailure_ConnectFailure_Test: FAILED ceph_assert(p != ob...
Neha Ojha
10:00 PM Bug #49087 (Resolved): pacific: rados/upgrade/nautilus-x-singleton fails on 20.04
Neha Ojha
04:16 PM Bug #49087: pacific: rados/upgrade/nautilus-x-singleton fails on 20.04
https://github.com/ceph/ceph/pull/39214 merged Yuri Weinstein
06:15 PM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
https://github.com/ceph/ceph/pull/39179 merged Yuri Weinstein
06:11 PM Bug #48613: Reproduce https://tracker.ceph.com/issues/48417
These two unfound objects are of interest to us. Let's figure out why these are unfound.... Neha Ojha
01:50 PM Feature #49275 (Fix Under Review): [RFE] Add health warning in ceph status for filestore OSDs
Prashant D
01:43 PM Feature #49275 (Resolved): [RFE] Add health warning in ceph status for filestore OSDs
Along with health warn for filestore osds, the health detail should give OSD numbers which are still on filestore to ... Prashant D
09:06 AM Support #49268 (Closed): Blocked IOs up to 30 seconds when host powered down
Hello all,
I am facing an "issue" with my ceph cluster.

I have a small 6 nodes cluster.
Each node has 2 OSDs ...
Julien Demais

02/11/2021

11:26 PM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
/a/swagner-2021-02-11_10:31:33-rados:cephadm-wip-swagner-testing-2021-02-09-1126-distro-basic-smithi/5874516 shows th... Neha Ojha
09:21 PM Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
/a/swagner-2021-02-11_10:31:33-rados:cephadm-wip-swagner-testing-2021-02-09-1126-distro-basic-smithi/5874516/
See...
Neha Ojha
08:01 PM Bug #49259 (Resolved): test_rados_api tests timeout with cephadm (plus extremely large OSD logs)
swagner-2021-02-11_10:31:33-rados:cephadm-wip-swagner-testing-2021-02-09-1126-distro-basic-smithi/5874513... Sebastian Wagner
09:16 AM Bug #49231: MONs unresponsive over extended periods of time
Update: I start seeing this issue now 2 to 3 times a day, its getting really irritating. Possibly due to the large re... Frank Schilder
04:54 AM Bug #48984: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
Trying a potential patch to see if I understand the actual root cause here. Brad Hubbard

02/10/2021

08:21 PM Bug #48613: Reproduce https://tracker.ceph.com/issues/48417
... Deepika Upadhyay
04:49 PM Bug #48786 (Fix Under Review): api_tier_pp: LibRadosTwoPoolsPP.ManifestSnapRefcount/ManifestSnapR...
Neha Ojha
04:38 PM Bug #48786: api_tier_pp: LibRadosTwoPoolsPP.ManifestSnapRefcount/ManifestSnapRefcount2 failed
https://pulpito.ceph.com/swagner-2021-02-10_11:41:39-rados:cephadm-wip-swagner-testing-2021-02-09-1126-distro-basic-s... Sebastian Wagner
03:12 PM Bug #46847: Loss of placement information on OSD reboot
Given the "severity" I'd be really glad if some of the Ceph core devs could have a look at this :) I'm really not tha... Jonas Jelten
11:00 AM Bug #46847: Loss of placement information on OSD reboot
Thanks for getting back on this. Your observations are exactly what I see as well. A note about severity of this bug.... Frank Schilder
10:40 AM Bug #49231 (New): MONs unresponsive over extended periods of time
Version: 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
I'm repeatedly observing that the MONs ...
Frank Schilder

02/09/2021

10:21 PM Bug #48984: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
http://qa-proxy.ceph.com/teuthology/bhubbard-2021-02-09_20:24:03-rados:singleton-nomsgr:all:lazy_omap_stats_output.ya... Brad Hubbard
06:32 AM Bug #48984: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
If we look at the output from http://qa-proxy.ceph.com/teuthology/ideepika-2021-01-22_07:01:14-rados-wip-deepika-test... Brad Hubbard
02:43 AM Bug #48984: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
From http://qa-proxy.ceph.com/teuthology/bhubbard-2021-02-08_22:46:10-rados:singleton-nomsgr:all:lazy_omap_stats_outp... Brad Hubbard
08:57 PM Bug #47380: mon: slow ops due to osd_failure
Copying my note from https://tracker.ceph.com/issues/43893#note-4
> Looking at this ticket again it's not a no_rep...
Greg Farnum
03:19 PM Bug #48613: Reproduce https://tracker.ceph.com/issues/48417
see teuthology: /home/ideepika/crt.log... Deepika Upadhyay

02/08/2021

11:49 PM Bug #48984: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
I have what looks like three reproducers here, http://pulpito.front.sepia.ceph.com/bhubbard-2021-02-08_22:46:10-rados... Brad Hubbard
11:42 PM Bug #48984: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
In nojha-2021-02-01_21:31:14-rados-wip-39145-distro-basic-smithi/5847125, where I ran the command manually, following... Neha Ojha
04:46 AM Bug #48984: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
Looking into this further, in the successful case (from a fresh run on master) we see the following output.... Brad Hubbard
10:21 PM Backport #48496 (Resolved): octopus: Paxos::restart() and Paxos::shutdown() can race leading to u...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39161
m...
Nathan Cutler
07:28 PM Backport #48496: octopus: Paxos::restart() and Paxos::shutdown() can race leading to use-after-fr...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/39161
merged
Yuri Weinstein
07:18 PM Bug #48998: Scrubbing terminated -- not all pgs were active and clean
... Deepika Upadhyay
06:41 PM Bug #49064 (Resolved): test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoadKeysInR...
Neha Ojha
06:31 PM Bug #49064: test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder ...
https://github.com/ceph/ceph/pull/39264 merged Yuri Weinstein
06:40 PM Backport #49134 (Resolved): pacific: test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBB...
Neha Ojha
06:29 PM Bug #49069: mds crashes on v15.2.8 -> master upgrade decoding MMgrConfigure
https://github.com/ceph/ceph/pull/39237 merged Yuri Weinstein
10:46 AM Bug #44595: cache tiering: Error: oid 48 copy_from 493 returned error code -2
/ceph/teuthology-archive/yuriw-2021-02-07_16:27:00-rados-wip-yuri8-testing-2021-01-2
7-1208-octopus-distro-basic-smi...
Deepika Upadhyay
10:40 AM Bug #49212 (Resolved): mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd...
... Deepika Upadhyay
09:00 AM Bug #45647: "ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failu...
... Deepika Upadhyay

02/07/2021

10:42 PM Bug #48984 (Need More Info): lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
Brad Hubbard
10:42 PM Bug #48984: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
I haven't been able to reproduce this but the following is a review based on the code.
The last output from src/te...
Brad Hubbard

02/06/2021

06:48 AM Bug #49190: LibRadosMiscConnectFailure_ConnectFailure_Test: FAILED ceph_assert(p != obs_call_gate...
created https://github.com/ceph/ceph/pull/39331 before this issue is addressed. Kefu Chai
06:42 AM Bug #49190: LibRadosMiscConnectFailure_ConnectFailure_Test: FAILED ceph_assert(p != obs_call_gate...
it is a regression introduced by https://github.com/ceph/ceph/pull/37954 Kefu Chai
01:17 AM Bug #49196 (New): ceph-monstore-tool: sort by IP addresses if not already sorted
Details in https://tracker.ceph.com/issues/49158 Neha Ojha

02/05/2021

11:24 PM Bug #49104: crush weirdness: degraded PGs not marked as such, and choose_total_tries = 50 is too ...
I don't see an issue with increasing choose_total_tries as long as it not a very high value.
Regarding Problem A, wi...
Neha Ojha
11:05 PM Bug #49158 (Fix Under Review): doc: ceph-monstore-tools might create wrong monitor store
Neha Ojha
10:46 PM Bug #49136 (Duplicate): osd/PGLog.h: FAILED ceph_assert(miter == missing.get_items().end() || (mi...
Neha Ojha
10:45 PM Bug #49136: osd/PGLog.h: FAILED ceph_assert(miter == missing.get_items().end() || (miter->second....
I don't think this is new issue, just reoccurrence of https://tracker.ceph.com/issues/20874. Also, we only hit this i... Neha Ojha
09:51 PM Bug #48613: Reproduce https://tracker.ceph.com/issues/48417
https://tracker.ceph.com/issues/48417#note-14
rados/thrash-erasure-code-big/{ceph cluster/{12-osds openstack} mon_...
Neha Ojha
09:45 PM Bug #48417: unfound EC objects in sepia's LRC after upgrade
... Neha Ojha
10:34 AM Bug #49190 (Resolved): LibRadosMiscConnectFailure_ConnectFailure_Test: FAILED ceph_assert(p != ob...
I created the branch two days ago and haven't seen this error before:... Sebastian Wagner
01:45 AM Backport #49156 (In Progress): pacific: Segmentation fault in PrimaryLogPG::cancel_manifest_ops
https://github.com/ceph/ceph/pull/39313 Neha Ojha

02/04/2021

09:39 PM Bug #48613: Reproduce https://tracker.ceph.com/issues/48417
Deepika Upadhyay wrote:
> so an interesting thing that I can see, all the inactive pg's have weird osd_id number in ...
Neha Ojha
09:03 PM Bug #48613: Reproduce https://tracker.ceph.com/issues/48417
http://pulpito.front.sepia.ceph.com/ideepika-2021-02-01_14:53:09-rados:thrash-erasure-code-wip-cbodley-testing-distro... Deepika Upadhyay
07:37 PM Bug #45702: PGLog::read_log_and_missing: ceph_assert(miter == missing.get_items().end() || (miter...
https://tracker.ceph.com/issues/49136 may have a different root cause, so marking is it as related and not duplicate ... Neha Ojha
07:25 PM Bug #47813 (Fix Under Review): osd op age is 4294967296
Neha Ojha
06:42 PM Bug #47452 (Resolved): invalid values of crush-failure-domain should not be allowed while creatin...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
06:08 AM Bug #49158 (Fix Under Review): doc: ceph-monstore-tools might create wrong monitor store
I tried to recover mon quorum from OSDs with the following document.
https://github.com/ceph/ceph/blob/master/doc/...
Satoru Takeuchi
04:20 AM Backport #49156 (Resolved): pacific: Segmentation fault in PrimaryLogPG::cancel_manifest_ops
https://github.com/ceph/ceph/pull/39313 Backport Bot
04:17 AM Bug #48745 (Pending Backport): Segmentation fault in PrimaryLogPG::cancel_manifest_ops
Kefu Chai
01:53 AM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
Myoungwon Oh wrote:
> This is the same issue as https://tracker.ceph.com/issues/48786, https://tracker.ceph.com/issu...
Neha Ojha
01:52 AM Bug #47024 (In Progress): rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
Myoungwon Oh
01:51 AM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
This is the same issue as https://tracker.ceph.com/issues/48786, https://tracker.ceph.com/issues/48915.
I already po...
Myoungwon Oh
12:25 AM Backport #49145 (In Progress): pacific: out of order op
https://github.com/ceph/ceph/pull/39284 Neha Ojha

02/03/2021

11:50 PM Backport #49145 (Resolved): pacific: out of order op
https://github.com/ceph/ceph/pull/39284 Backport Bot
11:49 PM Bug #48793 (Pending Backport): out of order op
Neha Ojha
11:36 PM Bug #48990 (Fix Under Review): rados/dashboard: Health check failed: Telemetry requires re-opt-in...
Sage Weil
11:35 PM Bug #48990: rados/dashboard: Health check failed: Telemetry requires re-opt-in (TELEMETRY_CHANGED...
I think it's this test:... Sage Weil
11:01 PM Bug #49139 (Resolved): rados/perf: cosbench workloads hang forever
Neha Ojha
09:01 PM Bug #49139 (Pending Backport): rados/perf: cosbench workloads hang forever
Neha Ojha
06:51 PM Bug #49139 (Fix Under Review): rados/perf: cosbench workloads hang forever
Neha Ojha
05:25 PM Bug #49139 (Resolved): rados/perf: cosbench workloads hang forever
... Neha Ojha
11:01 PM Backport #49144 (Resolved): pacific: rados/perf: cosbench workloads hang forever
Neha Ojha
09:06 PM Backport #49144 (In Progress): pacific: rados/perf: cosbench workloads hang forever
https://github.com/ceph/ceph/pull/39280 Neha Ojha
09:05 PM Backport #49144 (Resolved): pacific: rados/perf: cosbench workloads hang forever
Backport Bot
07:54 PM Bug #48841 (In Progress): test_turn_off_module: wait_until_equal timed out
Kamoltat (Junior) Sirivadhna
07:15 PM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
Myoungwon Oh, do you know what could have broken these tests? Neha Ojha
07:12 PM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
/a/nojha-2021-02-01_21:31:14-rados-wip-39145-distro-basic-smithi/5846988 Neha Ojha
07:11 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
... Neha Ojha
04:48 PM Bug #49136 (Duplicate): osd/PGLog.h: FAILED ceph_assert(miter == missing.get_items().end() || (mi...
... Neha Ojha
03:09 PM Backport #49134 (In Progress): pacific: test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest....
https://github.com/ceph/ceph/pull/39264 Neha Ojha
03:00 PM Backport #49134 (Resolved): pacific: test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBB...
https://github.com/ceph/ceph/pull/39264 Backport Bot
02:56 PM Bug #49064 (Pending Backport): test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoa...
Neha Ojha
11:14 AM Bug #47813 (In Progress): osd op age is 4294967296
Deepika Upadhyay

02/02/2021

10:51 PM Bug #49069 (Pending Backport): mds crashes on v15.2.8 -> master upgrade decoding MMgrConfigure
Sage Weil
10:01 PM Bug #48984: lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
/a/nojha-2021-02-01_21:31:14-rados-wip-39145-distro-basic-smithi/5847125 was going to hang
pool 1 'rbd' replicated...
Neha Ojha
07:13 PM Feature #49089: msg: add new func support_reencode
So we just want to use this to expand the pool of messages which can be reencoded inline with the fast prepare mechan... Greg Farnum
08:14 AM Feature #49089: msg: add new func support_reencode
PR: https://github.com/ceph/ceph/pull/34659 try fix this. jianpeng ma
08:13 AM Feature #49089 (Fix Under Review): msg: add new func support_reencode
Currently, we use Messenger::ms_can_fast_dispatch to verfiy Message whether support reencode. Now we add a new api of... jianpeng ma
06:43 PM Bug #49064 (Fix Under Review): test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoa...
Neha Ojha
12:28 AM Bug #49064: test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder ...
Trying to reproduce this in https://pulpito.ceph.com/nojha-2021-02-01_22:48:44-rados:singleton-master-distro-basic-sm... Neha Ojha
04:34 PM Bug #48745 (Fix Under Review): Segmentation fault in PrimaryLogPG::cancel_manifest_ops
Kefu Chai
06:57 AM Bug #48745: Segmentation fault in PrimaryLogPG::cancel_manifest_ops
Sorry, It seems that my bad.
https://github.com/ceph/ceph/pull/39217
Myoungwon Oh
03:01 PM Bug #49104 (Triaged): crush weirdness: degraded PGs not marked as such, and choose_total_tries = ...
With a 14.2.11 cluster, I found a strange case where using the nautilus optimal tunables, the default choose_total_tr... Dan van der Ster
09:59 AM Bug #48613: Reproduce https://tracker.ceph.com/issues/48417
... Deepika Upadhyay
06:45 AM Bug #45441: rados: Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
... Deepika Upadhyay
01:40 AM Bug #49087 (Fix Under Review): pacific: rados/upgrade/nautilus-x-singleton fails on 20.04
Neha Ojha
01:21 AM Bug #49087 (Resolved): pacific: rados/upgrade/nautilus-x-singleton fails on 20.04
bd4081a8981367a68820e4acc4bc2ea98f1b77c6 added 20.04 targets for pacific after which rados/upgrade/nautilus-x-singlet... Neha Ojha

02/01/2021

07:41 PM Backport #48595 (Resolved): nautilus: nautilus: qa/standalone/scrub/osd-scrub-test.sh: _scrub_abo...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39125
m...
Nathan Cutler
05:31 PM Backport #48595: nautilus: nautilus: qa/standalone/scrub/osd-scrub-test.sh: _scrub_abort: return 1
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/39125
merged
Yuri Weinstein
07:39 PM Backport #48379 (Resolved): nautilus: invalid values of crush-failure-domain should not be allowe...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39124
m...
Nathan Cutler
05:29 PM Backport #48379: nautilus: invalid values of crush-failure-domain should not be allowed while cre...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/39124
merged
Yuri Weinstein
06:59 PM Bug #49064 (In Progress): test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoadKeys...
Neha Ojha
06:15 PM Bug #48745: Segmentation fault in PrimaryLogPG::cancel_manifest_ops
rados/thrash/{0-size-min-size-overrides/2-size-2-min-size 1-pg-log-overrides/normal_pg_log 2-recovery-overrides/{more... Neha Ojha
06:05 PM Bug #44595: cache tiering: Error: oid 48 copy_from 493 returned error code -2
... Deepika Upadhyay
02:49 PM Bug #49069 (Fix Under Review): mds crashes on v15.2.8 -> master upgrade decoding MMgrConfigure
Sage Weil
11:57 AM Bug #48909: clog slow request overwhelm monitors
PR: https://github.com/ceph/ceph/pull/39199 gerald yang
04:20 AM Backport #49073 (In Progress): nautilus: crash in Objecter and CRUSH map lookup
Kefu Chai
04:19 AM Backport #49073 (Resolved): nautilus: crash in Objecter and CRUSH map lookup
https://github.com/ceph/ceph/pull/39197 Kefu Chai

01/31/2021

10:39 PM Bug #49072: Segmentation fault in thread_name:tp_osd_tp apparently in libpthread
Note that Kefu did the heavy lifting in comment 3. Brad Hubbard
10:37 PM Bug #49072 (Resolved): Segmentation fault in thread_name:tp_osd_tp apparently in libpthread
/a/kchai-2021-01-11_11:52:22-rados-wip-kefu2-testing-2021-01-10-1949-distro-basic-smithi/5777646
PG::recovery_stat...
Brad Hubbard
10:35 PM Bug #49072: Segmentation fault in thread_name:tp_osd_tp apparently in libpthread
/a/jafaj-2021-01-05_16:20:30-rados-wip-jan-testing-2021-01-05-1401-distro-basic-smithi/5756811 with logs, coredump is... Brad Hubbard
10:34 PM Bug #49072: Segmentation fault in thread_name:tp_osd_tp apparently in libpthread
Looks like this might be it.... Brad Hubbard
10:29 PM Bug #49072 (Resolved): Segmentation fault in thread_name:tp_osd_tp apparently in libpthread
I suspect there is memory corruption involved and that this is a badly corrupted stack.... Brad Hubbard
02:48 PM Bug #49069 (Resolved): mds crashes on v15.2.8 -> master upgrade decoding MMgrConfigure
... Sage Weil

01/30/2021

12:45 PM Bug #48793 (Fix Under Review): out of order op
Ronen Friedman
12:29 AM Bug #48990: rados/dashboard: Health check failed: Telemetry requires re-opt-in (TELEMETRY_CHANGED...
Trying to reproduce the issue:
https://pulpito.ceph.com/yaarit-2021-01-29_19:21:30-rados:dashboard-pacific-distro-ba...
Yaarit Hatuka

01/29/2021

07:16 PM Backport #48986 (In Progress): pacific: ceph osd df tree reporting incorrect SIZE value for rack ...
https://github.com/ceph/ceph/pull/39180 Neha Ojha
07:12 PM Backport #49058 (In Progress): pacific: thrash_cache_writeback_proxy_none: FAILED ceph_assert(ver...
https://github.com/ceph/ceph/pull/39179 Neha Ojha
03:40 PM Backport #49058 (Resolved): pacific: thrash_cache_writeback_proxy_none: FAILED ceph_assert(versio...
https://github.com/ceph/ceph/pull/39179 Backport Bot
06:36 PM Bug #49064: test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder ...
... Neha Ojha
06:22 PM Bug #49064 (Resolved): test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoadKeysInR...
... Neha Ojha
03:38 PM Bug #46323 (Pending Backport): thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == o...
Neha Ojha
01:18 AM Bug #46323 (Fix Under Review): thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == o...
Neha Ojha
08:17 AM Backport #48496 (In Progress): octopus: Paxos::restart() and Paxos::shutdown() can race leading t...
Nathan Cutler
08:15 AM Backport #48495 (In Progress): nautilus: Paxos::restart() and Paxos::shutdown() can race leading ...
Nathan Cutler
08:14 AM Backport #48495: nautilus: Paxos::restart() and Paxos::shutdown() can race leading to use-after-f...
> @Nathan, Definitely. It was originally "Seen in Nautilus" (downstream BZ) and the race condition still exists.
@...
Nathan Cutler
07:30 AM Backport #49055 (Resolved): nautilus: pick_a_shard() always select shard 0
https://github.com/ceph/ceph/pull/39651 Backport Bot
07:30 AM Backport #49054 (Resolved): pacific: pick_a_shard() always select shard 0
https://github.com/ceph/ceph/pull/39977 Backport Bot
07:30 AM Backport #49053 (Resolved): octopus: pick_a_shard() always select shard 0
https://github.com/ceph/ceph/pull/39978 Backport Bot
07:25 AM Bug #49052 (Resolved): pick_a_shard() always select shard 0
Kefu Chai
07:18 AM Bug #47003: ceph_test_rados test error. Reponses out of order due to the connection drops data.
/a//kchai-2021-01-28_03:28:19-rados-wip-kefu-testing-2021-01-27-1353-distro-basic-smithi/5834177 Kefu Chai
07:00 AM Bug #48613: Reproduce https://tracker.ceph.com/issues/48417
I did a run with https://github.com/ceph/ceph/pull/38906/commits ( passes )
http://pulpito.front.sepia.ceph.com/idee...
Deepika Upadhyay
01:17 AM Bug #49050 (New): Make thrash_cache_writeback_proxy_none work with writeback overlay
In https://github.com/ceph/ceph/pull/39152, we have disabled some tests since
(1) they cause noise in daily rados r...
Neha Ojha

01/28/2021

11:04 PM Backport #48496: octopus: Paxos::restart() and Paxos::shutdown() can race leading to use-after-fr...
Nathan Cutler wrote:
> @Brad, are you sure this backport is applicable to octopus? If it's not applicable, please ch...
Brad Hubbard
11:11 AM Backport #48496 (Need More Info): octopus: Paxos::restart() and Paxos::shutdown() can race leadin...
@Brad, are you sure this backport is applicable to octopus? If it's not applicable, please change Status to "Rejected... Nathan Cutler
11:02 PM Backport #48495: nautilus: Paxos::restart() and Paxos::shutdown() can race leading to use-after-f...
Nathan Cutler wrote:
> @Brad, are you sure this backport is applicable to nautilus? If it's not applicable, please c...
Brad Hubbard
11:10 AM Backport #48495 (Need More Info): nautilus: Paxos::restart() and Paxos::shutdown() can race leadi...
@Brad, are you sure this backport is applicable to nautilus? If it's not applicable, please change Status to "Rejecte... Nathan Cutler
06:22 PM Bug #48997: rados/singleton/all/recovery-preemption: defer backfill|defer recovery not found in logs
/a/teuthology-2021-01-26_19:05:09-rados-pacific-distro-basic-smithi/5831527 Neha Ojha
05:08 PM Bug #48946 (Fix Under Review): Disable and re-enable clog_to_monitors could trigger assertion
Dan Hill
01:52 AM Bug #48946 (In Progress): Disable and re-enable clog_to_monitors could trigger assertion
Dan Hill
01:22 PM Bug #48793 (In Progress): out of order op
In the revised scrub code there is a period in which:
- the scrub is marked as 'preempted', and
- preemption is alr...
Ronen Friedman
11:16 AM Backport #48987 (In Progress): nautilus: ceph osd df tree reporting incorrect SIZE value for rack...
Nathan Cutler
11:13 AM Backport #48595 (In Progress): nautilus: nautilus: qa/standalone/scrub/osd-scrub-test.sh: _scrub_...
Nathan Cutler
11:07 AM Backport #48379 (In Progress): nautilus: invalid values of crush-failure-domain should not be all...
Nathan Cutler

01/27/2021

08:10 PM Bug #36473 (Resolved): hung osd_repop, bluestore committed but failed to trigger repop_commit
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:07 PM Bug #39525 (Resolved): lz4 compressor corrupts data when buffers are unaligned
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:06 PM Bug #40792 (Resolved): monc: send_command to specific down mon breaks other mon msgs
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:05 PM Bug #41190 (Resolved): osd: pg stuck in waitactingchange when new acting set doesn't change
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:04 PM Bug #42452 (Resolved): msg/async: the event center is blocked by rdma construct conection for tra...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:04 PM Bug #42477 (Resolved): Rados should use the '-o outfile' convention
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:03 PM Bug #42977 (Resolved): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:03 PM Bug #43311 (Resolved): asynchronous recovery + backfill might spin pg undersized for a long time
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:03 PM Bug #43582 (Resolved): rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:02 PM Bug #44407 (Resolved): mon: Get session_map_lock before remove_session
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:01 PM Bug #45076 (Resolved): rados: Sharded OpWQ drops suicide_grace after waiting for work
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:00 PM Bug #47044 (Resolved): PG::_delete_some isn't optimal iterating objects
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:00 PM Bug #47328 (Resolved): nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
07:56 PM Documentation #23354 (Resolved): doc: osd_op_queue & osd_op_queue_cut_off
Konstantin Shalygin
07:54 PM Backport #48481 (Rejected): mimic: PG::_delete_some isn't optimal iterating objects
Nathan Cutler
07:54 PM Backport #47992 (Rejected): mimic: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Nathan Cutler
07:54 PM Backport #44467 (Rejected): mimic: mon: Get session_map_lock before remove_session
Nathan Cutler
07:54 PM Backport #45025 (Rejected): mimic: hung osd_repop, bluestore committed but failed to trigger repo...
Nathan Cutler
07:54 PM Backport #44369 (Rejected): mimic: msg/async: the event center is blocked by rdma construct conec...
Nathan Cutler
07:54 PM Backport #44088 (Rejected): mimic: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Nathan Cutler
07:54 PM Backport #44368 (Rejected): mimic: Rados should use the '-o outfile' convention
Nathan Cutler
07:54 PM Backport #44086 (Rejected): mimic: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
Nathan Cutler
07:54 PM Backport #43622 (Rejected): mimic: pg: fastinfo incorrect when last_update moves backward in time
Nathan Cutler
07:54 PM Backport #43991 (Rejected): mimic: objecter doesn't send osd_op
Nathan Cutler
07:54 PM Backport #41732 (Rejected): mimic: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missin...
Nathan Cutler
07:54 PM Backport #42847 (Rejected): mimic: "failing miserably..." in Infiniband.cc
Nathan Cutler
07:54 PM Backport #41546 (Rejected): mimic: monc: send_command to specific down mon breaks other mon msgs
Nathan Cutler
07:53 PM Backport #45892 (Rejected): mimic: osd: pg stuck in waitactingchange when new acting set doesn't ...
mimic EOL Nathan Cutler
07:53 PM Backport #45358 (Rejected): mimic: rados: Sharded OpWQ drops suicide_grace after waiting for work
mimic EOL Nathan Cutler
07:52 PM Backport #45038 (Rejected): mimic: mon: reset min_size when changing pool size
Nathan Cutler
07:51 PM Backport #44489 (Rejected): mimic: lz4 compressor corrupts data when buffers are unaligned
mimic EOL Nathan Cutler
07:51 PM Backport #43470 (Rejected): mimic: asynchronous recovery + backfill might spin pg undersized for ...
mimic EOL Nathan Cutler
07:33 PM Backport #45891 (Rejected): luminous: osd: pg stuck in waitactingchange when new acting set doesn...
luminous EOL Nathan Cutler
07:20 PM Bug #43929 (Resolved): osd: Allow 64-char hostname to be added as the "host" in CRUSH
Nathan Cutler
07:19 PM Backport #43988 (Rejected): luminous: osd: Allow 64-char hostname to be added as the "host" in CRUSH
luminous EOL Nathan Cutler
07:18 PM Bug #42114 (Resolved): mon: /var/lib/ceph/mon/* data (esp rocksdb) is not 0600
Nathan Cutler
07:18 PM Backport #42201 (Rejected): mimic: mon: /var/lib/ceph/mon/* data (esp rocksdb) is not 0600
mimic EOL Nathan Cutler
07:18 PM Bug #42577 (Rejected): acting_recovery_backfill won't catch all up peers
Nathan Cutler
07:17 PM Backport #42202 (Rejected): luminous: mon: /var/lib/ceph/mon/* data (esp rocksdb) is not 0600
Nathan Cutler
07:16 PM Backport #42996 (Rejected): luminous: acting_recovery_backfill won't catch all up peers
eol Nathan Cutler
07:14 PM Bug #23816 (Resolved): disable bluestore cache caused a rocksdb error
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
07:14 PM Bug #24664 (Resolved): osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
07:12 PM Bug #38724 (Resolved): _txc_add_transaction error (39) Directory not empty not handled on operati...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
07:12 PM Bug #39174 (Resolved): crushtool crash on Fedora 28 and newer
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
07:12 PM Bug #39390 (Resolved): filestore pre-split may not split enough directories
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
07:12 PM Bug #39439 (Resolved): osd: segv in _preboot -> heartbeat
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
07:10 PM Bug #40483 (Resolved): Pool settings aren't populated to OSD after restart.
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
07:09 PM Bug #40634 (Resolved): mon: auth mon isn't loading full KeyServerData after restart
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
07:09 PM Bug #40804 (Resolved): ceph mgr module ls -f plain crashes mon
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
07:08 PM Bug #41601 (Resolved): oi(object_info_t).size does not match on disk size
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
07:07 PM Bug #43306 (Resolved): segv in collect_sys_info
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
07:02 PM Backport #44084 (Rejected): luminous: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
Nathan Cutler
07:02 PM Backport #44087 (Rejected): luminous: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Nathan Cutler
07:02 PM Backport #43621 (Rejected): luminous: pg: fastinfo incorrect when last_update moves backward in time
Nathan Cutler
07:02 PM Backport #43632 (Rejected): luminous: segv in collect_sys_info
Nathan Cutler
07:02 PM Backport #41702 (Rejected): luminous: oi(object_info_t).size does not match on disk size
Nathan Cutler
07:02 PM Backport #40889 (Rejected): luminous: Pool settings aren't populated to OSD after restart.
Nathan Cutler
07:02 PM Backport #41547 (Rejected): luminous: monc: send_command to specific down mon breaks other mon msgs
Nathan Cutler
07:02 PM Backport #40883 (Rejected): luminous: ceph mgr module ls -f plain crashes mon
Nathan Cutler
07:02 PM Backport #39694 (Rejected): luminous: _txc_add_transaction error (39) Directory not empty not han...
Nathan Cutler
07:02 PM Backport #40731 (Rejected): luminous: mon: auth mon isn't loading full KeyServerData after restart
Nathan Cutler
07:02 PM Backport #39681 (Rejected): luminous: filestore pre-split may not split enough directories
Nathan Cutler
07:02 PM Backport #39309 (Rejected): luminous: crushtool crash on Fedora 28 and newer
Nathan Cutler
07:02 PM Backport #39515 (Rejected): luminous: osd: segv in _preboot -> heartbeat
Nathan Cutler
07:02 PM Backport #24888 (Rejected): luminous: osd: crash in OpTracker::unregister_inflight_op via OSD::ge...
Nathan Cutler
07:02 PM Backport #23926 (Rejected): luminous: disable bluestore cache caused a rocksdb error
Nathan Cutler
04:32 PM Bug #48793 (Triaged): out of order op
Neha Ojha
12:44 AM Bug #48793: out of order op
Ronen, can you please check if this is related to your scrub refactor? Neha Ojha
12:37 AM Bug #48793: out of order op
/a/jafaj-2021-01-05_16:20:30-rados-wip-jan-testing-2021-01-05-1401-distro-basic-smithi/5756733
Following is the pr...
Neha Ojha
11:17 AM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
rados/singleton/{all/thrash_cache_writeback_proxy_none mon_election/classic msgr-failures/few msgr/async-v2only objec... Kefu Chai

01/26/2021

10:53 PM Bug #49020 (New): rados subcommand rmomapkey does not report error when key provided not found
In the following command, the object exists in the pool, but there is no omap key "waldo". When the following is run:... J. Eric Ivancich
10:05 PM Bug #48793 (New): out of order op
Neha Ojha
07:42 PM Bug #48793 (Need More Info): out of order op
rados/thrash-erasure-code-big/{ceph cluster/{12-osds openstack} mon_election/classic msgr-failures/osd-delay objectst... Neha Ojha
07:39 PM Bug #48793: out of order op
This is not related to https://github.com/ceph/ceph/pull/38111.
rados/thrash/{0-size-min-size-overrides/2-size-2-m...
Neha Ojha
09:30 AM Backport #49009 (Resolved): octopus: osd crash in OSD::heartbeat when dereferencing null session
https://github.com/ceph/ceph/pull/40277 Backport Bot
09:30 AM Backport #49008 (Resolved): pacific: osd crash in OSD::heartbeat when dereferencing null session
https://github.com/ceph/ceph/pull/40246 Backport Bot
09:28 AM Bug #48821 (Pending Backport): osd crash in OSD::heartbeat when dereferencing null session
Kefu Chai
01:20 AM Bug #48998: Scrubbing terminated -- not all pgs were active and clean
PG 2.4 is in active+recovering+undersized+degraded+remapped... Neha Ojha
01:16 AM Bug #48998 (New): Scrubbing terminated -- not all pgs were active and clean
... Neha Ojha
12:38 AM Bug #48997 (Can't reproduce): rados/singleton/all/recovery-preemption: defer backfill|defer recov...
... Neha Ojha
 

Also available in: Atom