Project

General

Profile

Activity

From 10/13/2020 to 11/11/2020

11/11/2020

08:46 PM Bug #45947: ceph_test_rados_watch_notify hang seen in nautilus
Agreed, it looks related. Let's leave it here for now. Brad Hubbard
08:24 PM Bug #45947: ceph_test_rados_watch_notify hang seen in nautilus
... Deepika Upadhyay
04:06 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...

https://pulpito.ceph.com/yuriw-2020-11-10_19:24:45-rados-wip-yuri4-testing-2020-11-10-0959-distro-basic-smithi/
ht...
Deepika Upadhyay
01:27 AM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
Fails deterministically: https://pulpito.ceph.com/nojha-2020-11-09_22:09:31-rados:monthrash-master-distro-basic-smith... Neha Ojha
03:59 PM Documentation #47523 (Resolved): ceph df documentation is outdated
Zac Dover
02:20 PM Bug #47697 (Resolved): mon: set session_timeout when adding to session_map
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:19 PM Bug #47951 (Resolved): MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:18 PM Backport #47748 (Resolved): nautilus: mon: set session_timeout when adding to session_map
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37554
m...
Nathan Cutler
01:05 AM Backport #47748: nautilus: mon: set session_timeout when adding to session_map
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37554
merged
Yuri Weinstein
02:04 PM Bug #48183 (New): monmap::build_initial returns different error val on FreeBSD
3/3 Test #132: unittest_mon_monmap ..............***Failed 0.14 sec
Running main() from gmock_main.cc
[=========...
Willem Jan Withagen
01:56 PM Backport #47747 (Resolved): octopus: mon: set session_timeout when adding to session_map
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37553
m...
Nathan Cutler
11:22 AM Bug #47044: PG::_delete_some isn't optimal iterating objects
Can confirm this makes replacing hw for an S3 cluster quite intrusive, due to the block-db devs getting overloaded (i... Dan van der Ster
10:14 AM Feature #48182 (Resolved): osd: allow remote read by calling cls method from within cls context
Currently, a cls method can only access an object's data and metadata.
However, in some cases, it would be useful if...
Ken Iizawa
06:37 AM Bug #48163: osd: osd crash due to FAILED ceph_assert(current_best)
Please close this issue. wencong wan
06:36 AM Bug #48163: osd: osd crash due to FAILED ceph_assert(current_best)
Perf dump shows the crashed osd has too many active connections. Finally, we found some ceph-fuse clients on other ho... wencong wan

11/10/2020

10:34 PM Bug #48077 (Resolved): Allowing scrub configs begin_day/end_day to include 7 and begin_hour/end_h...
David Zafman
10:04 PM Bug #48173 (New): Additional test cases for osd-recovery-scrub.sh

3. Remote reservation non-overlapping PGs start recovery on PG that has a replica
4. failed local (need sleep in O...
David Zafman
08:13 PM Bug #48172 (New): Nautilus 14.2.13 osdmap not trimming on clean cluster
We have cluster running on 14.2.13 (some osd's are on 14.2.9, they are on Debian). Cluster is in active+clean state, ... Marcin Śliwiński
07:32 PM Backport #47747: octopus: mon: set session_timeout when adding to session_map
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37553
merged
Yuri Weinstein
06:57 PM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
http://pulpito.front.sepia.ceph.com/gregf-2020-11-06_17:45:44-rados-wip-stretch-fixes-116-2-distro-basic-smithi/ has ... Greg Farnum
06:57 PM Bug #47719: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
http://pulpito.front.sepia.ceph.com/gregf-2020-11-06_17:45:44-rados-wip-stretch-fixes-116-2-distro-basic-smithi/5597065 Greg Farnum
11:25 AM Bug #48163 (Closed): osd: osd crash due to FAILED ceph_assert(current_best)
More than 90 osds crash in my cluster, crash info is as follows:
"os_version_id": "7",
"assert_condition...
wencong wan
09:52 AM Bug #48065: "ceph osd crush set|reweight-subtree" commands do not set weight on device class subtree
Actually, the problem with the weight not updated on the class subtree is easily reproducible on a vstart cluster (se... Mykola Golub
12:23 AM Bug #18445 (Won't Fix): ceph: ping <mon.id> doesn't connect to cluster
Long since fixed differently Dan Mick

11/09/2020

11:57 PM Bug #48065 (Need More Info): "ceph osd crush set|reweight-subtree" commands do not set weight on ...
This does sound like a bug. Can you please share the osdmap? Neha Ojha
10:11 PM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
Fails on every run
https://pulpito.ceph.com/teuthology-2020-11-08_07:01:02-rados-master-distro-basic-smithi/
http...
Neha Ojha
09:52 PM Bug #45647: "ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failu...
rados/singleton/{all/max-pg-per-osd.from-replica mon_election/connectivity msgr-failures/many msgr/async-v2only objec... Neha Ojha
08:35 PM Bug #48153 (Fix Under Review): collection_list_legacy: pg inconsistent
Mykola Golub
06:35 PM Bug #48153: collection_list_legacy: pg inconsistent
And just as a note. The problem is observed only when the scrub is run for a pg that have both old and new versions o... Mykola Golub
06:23 PM Bug #48153 (In Progress): collection_list_legacy: pg inconsistent
> And if you see "collection_list_legacy" in the log, then I am interested to see more details about object names you... Mykola Golub
05:08 PM Bug #48153: collection_list_legacy: pg inconsistent
> ceph tell osd.430 injectargs '--debug-osd=10'
sorry, it should be '--debug-bluestore=10'.
And if you see "c...
Mykola Golub
04:17 PM Bug #48153: collection_list_legacy: pg inconsistent
Alexander, I am building a cluster to reproduce the issue.
Meantime, could you please increase debug_osd level on ...
Mykola Golub
03:38 PM Bug #48153: collection_list_legacy: pg inconsistent
Mykola Golub wrote:
> Alexander, did you have osds of different versions in your acting set? If you did, what exactl...
Alexander Kazansky
03:31 PM Bug #48153: collection_list_legacy: pg inconsistent
> missing replica happens only on osd with 14.2.12 or 14.2.13, on 14.2.11 all ok.
Can I assume that in [206,430,41...
Mykola Golub
03:26 PM Bug #48153: collection_list_legacy: pg inconsistent
Alexander, did you have osds of different versions in your acting set? If you did, what exactly they were?
Also is i...
Mykola Golub
03:07 PM Bug #48153 (Resolved): collection_list_legacy: pg inconsistent
hello ppl.
i have problem with osd on nau 14.2.12 and 14.2.13
after update some osd on 14.2.12 or 14.2.13 sta...
Alexander Kazansky
02:09 PM Backport #47362: nautilus: pgs inconsistent, union_shard_errors=missing
@Alexander -
This issue is closed and it is only by chance that I saw your comment on it and decided to respond.
...
Nathan Cutler
11:20 AM Backport #47362: nautilus: pgs inconsistent, union_shard_errors=missing
14.2.13 bug happens
ceph pg 21.4c1 query | jq .state...
Alexander Kazansky
12:50 PM Feature #48151: osd: allow remote read by calling cls method from within cls context
I should have created this ticket under RADOS project, not Ceph project. Ken Iizawa
12:43 PM Feature #48151 (Closed): osd: allow remote read by calling cls method from within cls context
Currently, a cls method can only access an object's data and metadata.
However, in some cases, it would be useful if...
Ken Iizawa

11/08/2020

04:25 PM Bug #46318: mon_recovery: quorum_status times out
/a/kchai-2020-11-08_14:53:34-rados-wip-kefu-testing-2020-11-07-2116-distro-basic-smithi/5602229/ Kefu Chai

11/07/2020

02:34 AM Bug #45706: Memory usage in buffer_anon showing unbounded growth in osds on EC pool. (14.2.9)
There's this buffer::list::rebuild buffer_anon leak fix in the master branch that may solve the issue:
https://git...
Zac Medico

11/06/2020

11:03 PM Bug #48033 (Need More Info): mon: after unrelated crash: handle_auth_request failed to assign glo...
https://tracker.ceph.com/issues/47654#note-7 and https://tracker.ceph.com/issues/47654#note-8 may help to understand ... Neha Ojha

11/05/2020

09:09 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
Greg, I am assigning this bug to you, let me know if you need anything from me. Neha Ojha
05:59 PM Backport #47993 (Resolved): nautilus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37818
m...
Nathan Cutler
05:17 PM Backport #47993: nautilus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37818
merged
Yuri Weinstein
05:59 PM Backport #47825: nautilus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37815
m...
Nathan Cutler
05:28 PM Backport #47825 (Resolved): nautilus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
David Zafman
05:16 PM Backport #47825: nautilus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37815
merged
Yuri Weinstein
05:57 PM Backport #47826: octopus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37853
m...
Nathan Cutler
05:28 PM Backport #47826 (Resolved): octopus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
David Zafman
04:27 PM Backport #47826: octopus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37853
merged
Yuri Weinstein
05:56 PM Backport #47994 (Resolved): octopus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37819
m...
Nathan Cutler
04:22 PM Backport #47994: octopus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37819
merged
Yuri Weinstein
05:56 PM Backport #47987 (Resolved): octopus: MonClient: mon_host with DNS Round Robin results in 'unable ...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37817
m...
Nathan Cutler
04:22 PM Backport #47987: octopus: MonClient: mon_host with DNS Round Robin results in 'unable to parse ad...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37817
merged
Yuri Weinstein
05:29 PM Bug #46405 (Resolved): osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
David Zafman
04:15 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
... Deepika Upadhyay
12:46 AM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
/a/teuthology-2020-11-04_07:01:02-rados-master-distro-basic-smithi/5590078 Neha Ojha
12:44 AM Bug #48030: mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeout not getti...
/a/teuthology-2020-11-04_07:01:02-rados-master-distro-basic-smithi/5590040 looks similar Neha Ojha
12:41 AM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
/a/teuthology-2020-11-04_07:01:02-rados-master-distro-basic-smithi/5590019 Neha Ojha
12:39 AM Bug #48029: Exiting scrub checking -- not all pgs scrubbed.
rados/singleton-nomsgr/{all/osd_stale_reads mon_election/connectivity rados supported-random-distro$/{ubuntu_latest}}... Neha Ojha

11/04/2020

11:39 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
Neha Ojha wrote:
> Greg Farnum wrote:
> > Oh the bug does occur while executing commands to change the strategy. Bu...
Greg Farnum
11:10 PM Bug #47654 (Triaged): test_mon_pg: mon fails to join quorum to due election strategy mismatch
Greg Farnum wrote:
> Oh the bug does occur while executing commands to change the strategy. But this all still looks...
Neha Ojha
07:46 PM Bug #46264: mon: check for mismatched daemon versions

Delay 7 days by default with a config value before health warning/error when detected.
David Zafman
07:46 PM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
Enhancement:
Possible test cases to replace existing test
1. Simple test for "not scheduling scrubs due to active...
David Zafman

11/03/2020

11:02 PM Bug #48077 (Fix Under Review): Allowing scrub configs begin_day/end_day to include 7 and begin_ho...
David Zafman
11:51 AM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
Oh the bug does occur while executing commands to change the strategy. But this all still looks fine to me and certai... Greg Farnum
11:39 AM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
Hmm, I'm confused about the "^C" output as the incoming message strategy. Going over things, as best I can tell those... Greg Farnum
12:46 AM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
When mon.c calls for election... Neha Ojha
07:45 AM Bug #48060: data loss in EC pool
osd.22 crashed every minute with '/build/ceph-15.2.5/src/osd/osd_types.cc: 5698: FAILED ceph_assert(clone_size.count(... Hannes Tamme

11/02/2020

11:34 PM Bug #48077 (Resolved): Allowing scrub configs begin_day/end_day to include 7 and begin_hour/end_h...

- Make the range of "osd scrub begin/end week day" 0-6 in options.cc
- Handle code in [1] to deal with this and te...
David Zafman
07:26 PM Bug #48060: data loss in EC pool
During the day unfound object list grows. Current state is:
[ERR] PG_DAMAGED: Possible data damage: 5 pgs recovery_u...
Hannes Tamme
06:04 AM Bug #48060: data loss in EC pool
When osd.22, osd.34 and osd.43 was down ceph -w give us endless lines of:
2020-11-02T07:18:00.464219+0200 osd.45 ...
Hannes Tamme
05:18 AM Bug #48060: data loss in EC pool
We accepted data loss and executing command:
root@ik01:~# ceph pg 30.17 mark_unfound_lost delete
pg has 1 object...
Hannes Tamme
11:02 AM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
observed failing test case in interactive on error mode, with this config for yaml: ... Deepika Upadhyay
07:12 AM Bug #48065 (Resolved): "ceph osd crush set|reweight-subtree" commands do not set weight on device...
We noticed that if one set an osd crush weight using the command... Mykola Golub
01:12 AM Bug #45441: rados: Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
/a/teuthology-2020-10-28_07:01:02-rados-master-distro-basic-smithi/5567279 shows the following at the time the failur... Brad Hubbard

11/01/2020

04:09 PM Bug #48060 (New): data loss in EC pool
We have data LOSS in our EC pool k4m2.
Pool is used for RBD volumes. 15 RBD volumes have broken objects.
Broken ob...
Hannes Tamme
02:14 PM Bug #48059 (New): core dump running osdmaptool
I have an Octopus (15.2.4) cluster with degraded and unfound objects, and PGs that have been stuck in the degraded an... Michael Thomas

10/31/2020

05:13 AM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
i can reproduce this issue using a vstart cluster locally using master HEAD at 038750c78afd56c7becf744cf7dc4f8d115793... Kefu Chai
03:12 AM Bug #47719: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
... Kefu Chai

10/30/2020

09:47 PM Bug #45243: nautilus: qa/standalone/scrub/osd-scrub-repair.sh fails with osd-scrub-repair.sh:698:...
Haven't seen it recently but there might still be a race somewhere which causes this. Neha Ojha
09:45 PM Bug #44945 (Need More Info): Mon High CPU usage when another mon syncing from it
Which ceph version is this? I'd be curious to know if this is still an issue with Octopus since we improved removed s... Neha Ojha
09:40 PM Bug #44694 (Duplicate): MON_DOWN during cluster setup
https://tracker.ceph.com/issues/45441 seems to be same issue. Neha Ojha
09:38 PM Bug #44643 (Can't reproduce): leaked buffer (alloc from MonClient::handle_auth_request)
Neha Ojha
09:37 PM Bug #44243 (Can't reproduce): memstore make check test fails
Neha Ojha
09:36 PM Bug #44217 (Can't reproduce): Leaked connection (alloc from AsyncMessenger::add_accept)
Neha Ojha
09:27 PM Bug #43915 (Can't reproduce): leaked Session (alloc from OSD::ms_handle_authentication)
Neha Ojha
09:24 PM Bug #43591: /sbin/fstrim can interfere with umount
This might still be a problem, just haven't seen it recently. Neha Ojha
09:16 PM Bug #43185 (Resolved): ceph -s not showing client activity
Neha Ojha
09:15 PM Bug #42921 (Can't reproduce): osd: segmentation fault in PGLog::check
Neha Ojha
09:14 PM Bug #42706 (Can't reproduce): LibRadosList.EnumerateObjectsSplit fails
Neha Ojha
09:13 PM Bug #42186 (Can't reproduce): "2019-10-04T19:31:51.053283+0000 osd.7 (osd.7) 108 : cluster [ERR] ...
Neha Ojha
09:12 PM Bug #42175 (Can't reproduce): _txc_add_transaction error (2) No such file or directory not handl...
Neha Ojha
09:11 PM Bug #41943 (Closed): ceph-mgr fails to report OSD status correctly
Closing for lack of information and also luminous is EOL now. Please feel free to reopen if this reproduces on a rece... Neha Ojha
09:10 PM Bug #41748 (Can't reproduce): log [ERR] : 7.19 caller_ops.size 62 > log size 61
Neha Ojha
09:09 PM Bug #40820 (Closed): standalone/scrub/osd-scrub-test.sh +3 day failed assert
Haven't seen this in a while. Neha Ojha
09:08 PM Bug #40721 (Can't reproduce): backfill caught in loop from block
Neha Ojha
09:07 PM Bug #40522 (Can't reproduce): on_local_recover doesn't touch?
Neha Ojha
09:05 PM Bug #40454 (Can't reproduce): snap_mapper error, scrub gets r -2..repaired
Neha Ojha
09:04 PM Bug #41183 (Resolved): pg autoscale on EC pools
Josh Durgin
06:28 PM Bug #47930 (In Progress): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: ret...
David Zafman
04:16 PM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
to reproduce, we just need to change, `s/mon client directed command retry: 5/mon client directed command retry: 2 ru... Deepika Upadhyay
10:43 AM Bug #48042 (Fix Under Review): Log "ceph health detail" periodically in cluster log
Prashant D
10:24 AM Bug #47673: cephfs 4k randwrite + EC pool(2+1) + single node all OSDs OOM
鑫 王 wrote:
> *A slow IO will occur during execution.*
> I have another question why is the field buffer_anon also g...
Igor Fedotov

10/29/2020

11:39 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
Based on f7099f72faccb09aea5054c0b428bf89be67141c, "failed to assign global_id" is expected when we are not quorum. T... Neha Ojha
07:55 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
Looking at logs from /a/nojha-2020-10-28_21:12:45-rados:singleton-bluestore-master-distro-basic-smithi/5569512/
We...
Neha Ojha
05:34 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
I am able to reproduce this withtout msgr failure injection.
rados:singleton-bluestore/{all/cephtool mon_election/...
Neha Ojha
08:08 PM Bug #43193 (Fix Under Review): "ceph ping mon.<id>" cannot work
I can confirm this. There is a detailed explanation at https://github.com/ceph/ceph/pull/37716 but the briefest summa... Nathan Cutler
04:45 PM Bug #48042: Log "ceph health detail" periodically in cluster log
Neha Ojha wrote:
> This will help us spot things like obvious network issues which can lead to racks/hosts down in a...
Neha Ojha
04:39 PM Bug #48042 (Resolved): Log "ceph health detail" periodically in cluster log
This will help us spot things like obvious network issues which can lead to racks/hosts down in a cluster. Also gives... Neha Ojha
03:57 PM Documentation #18986: Need to document monitor health configuration values
got outdated, for will discuss and update what are relevant metrics that needs to be documented and update soon Deepika Upadhyay
08:45 AM Backport #47986: nautilus: MonClient: mon_host with DNS Round Robin results in 'unable to parse a...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37816
m...
Nathan Cutler
03:31 AM Backport #47986 (Resolved): nautilus: MonClient: mon_host with DNS Round Robin results in 'unable...
Brad Hubbard
04:42 AM Bug #47673: cephfs 4k randwrite + EC pool(2+1) + single node all OSDs OOM
hi Igor,
I can't explain why OSD handles 850K(writing 7426528761/8704), but when load is very low (a client iodep...
Stellar Wang
02:43 AM Bug #48028: ceph-mon always suffer lots of slow ops from v14.2.9
Yao Ning wrote:
> root@worker-2:~# docker exec ceph-mon-worker-2 ceph -s
> cluster:
> id: 299a04ba-dd3e-...
Yao Ning

10/28/2020

06:42 PM Bug #48033 (Closed): mon: after unrelated crash: handle_auth_request failed to assign global_id; ...
ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94) nautilus (stable)
I have tried to extensively se...
Peter Gervai
06:17 PM Backport #47986: nautilus: MonClient: mon_host with DNS Round Robin results in 'unable to parse a...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37816
merged
Yuri Weinstein
05:12 PM Bug #45190: osd dump times out
... Neha Ojha
05:00 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
/a/teuthology-2020-10-28_07:01:02-rados-master-distro-basic-smithi/5567283 Neha Ojha
04:59 PM Bug #45441: rados: Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
rados/singleton/{all/mon-config mon_election/connectivity msgr-failures/many msgr/async objectstore/bluestore-comp-lz... Neha Ojha
04:57 PM Bug #48030 (Resolved): mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeou...
... Neha Ojha
04:52 PM Bug #48029 (New): Exiting scrub checking -- not all pgs scrubbed.
... Neha Ojha
03:28 PM Bug #48028: ceph-mon always suffer lots of slow ops from v14.2.9
Yao Ning wrote:
> root@worker-2:~# docker exec ceph-mon-worker-2 ceph -s
> cluster:
> id: 299a04ba-dd3e-...
Yao Ning
03:21 PM Bug #48028 (Won't Fix - EOL): ceph-mon always suffer lots of slow ops from v14.2.9
root@worker-2:~# docker exec ceph-mon-worker-2 ceph -s
cluster:
id: 299a04ba-dd3e-43a7-af17-628190cf742f
...
Yao Ning
01:16 PM Bug #48026 (New): Mon crashes when adding 4th OSD
*Context*: I'm running Ceph Octopus 15.2.5 (the latest as of this bug) using Rook on a toy Kubernetes cluster of two ... Lalit Maganti

10/27/2020

10:49 PM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1

We only need 1 pool with 1 pg, if we orchestrate carefully. The existing test is more like a shotgun, sending lots...
David Zafman
08:14 PM Bug #47930 (Triaged): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
/a/teuthology-2020-10-21_07:01:02-rados-master-distro-basic-smithi/5544900 - here the failure occurred because the la... Neha Ojha
09:06 PM Bug #47952: Replicated pool creation fails Nautilus 14.2.12 build when cluster runs with filestor...
Neha Ojha wrote:
> 14.2.12 introduced the following change in https://github.com/ceph/ceph/pull/37474, which is prob...
Prashant Tambe
09:34 AM Backport #47826 (In Progress): octopus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: ret...
Nathan Cutler
09:34 AM Backport #47741 (Duplicate): octopus: mon: set session_timeout when adding to session_map
Nathan Cutler

10/26/2020

09:11 PM Backport #47994 (In Progress): octopus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Nathan Cutler
10:34 AM Backport #47994 (Resolved): octopus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
https://github.com/ceph/ceph/pull/37819 Nathan Cutler
09:05 PM Backport #47993 (In Progress): nautilus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Nathan Cutler
10:34 AM Backport #47993 (Resolved): nautilus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
https://github.com/ceph/ceph/pull/37818 Nathan Cutler
09:04 PM Backport #47987 (In Progress): octopus: MonClient: mon_host with DNS Round Robin results in 'unab...
Nathan Cutler
10:32 AM Backport #47987 (Resolved): octopus: MonClient: mon_host with DNS Round Robin results in 'unable ...
https://github.com/ceph/ceph/pull/37817 Nathan Cutler
09:02 PM Backport #47986 (In Progress): nautilus: MonClient: mon_host with DNS Round Robin results in 'una...
Nathan Cutler
10:32 AM Backport #47986 (Resolved): nautilus: MonClient: mon_host with DNS Round Robin results in 'unable...
https://github.com/ceph/ceph/pull/37816 Nathan Cutler
08:39 PM Backport #47825 (In Progress): nautilus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: re...
Nathan Cutler
05:28 PM Bug #44981 (Resolved): rados/test_envlibrados_for_rocksdb.sh build failure (seen in nautilus)
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
05:27 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
sentry event: https://sentry.ceph.com/organizations/ceph/issues/10/events/2bdb1a2346cf4325b1bfaa7adf609f15/?project=2... Neha Ojha
11:21 AM Bug #47974: Slow requests due to unhealthy hearbeat - 'OSD::osd_op_tp thread 0x7f7f85903700' had ...
Did you perform any large pool/PG removals recently? Or may be some data rebalancing that could result in PG migratio... Igor Fedotov
11:08 AM Backport #45781 (Rejected): mimic: rados/test_envlibrados_for_rocksdb.sh build failure (seen in n...
mimic EOL Nathan Cutler
10:46 AM Backport #47898 (Resolved): octopus: mon stat prints plain text with -f json
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37705
m...
Nathan Cutler
10:33 AM Backport #47992 (Rejected): mimic: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Nathan Cutler
07:00 AM Bug #47951 (Pending Backport): MonClient: mon_host with DNS Round Robin results in 'unable to par...
Kefu Chai

10/25/2020

01:56 PM Bug #47328 (Pending Backport): nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Igor Fedotov
04:26 AM Bug #47929: Huge RAM Usage on OSD recovery
Neha Ojha wrote:
> Can you export and upload a copy the problematic PG via ceph-post-file?
ceph-post-file: 7639cc...
Luis Felipe Domínguez Vega

10/24/2020

04:05 PM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
... Kefu Chai
07:40 AM Bug #47974 (New): Slow requests due to unhealthy hearbeat - 'OSD::osd_op_tp thread 0x7f7f85903700...
Slow requests observed due to unhealthy hearbeats on osd.2.
/a/sseshasa-2020-10-23_12:25:57-rados-wip-sseshasa-tes...
Sridhar Seshasayee
02:23 AM Bug #47951: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
Alex Litvak wrote:
> Will the fix to it be posted soon? I am building ceph in containers from existing releases, is...
Alex Litvak
02:23 AM Bug #47951: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
Will the fix it to it posted soon? I am building ceph in containers from existing releases, is there a tag I can use... Alex Litvak

10/23/2020

10:08 PM Bug #47929: Huge RAM Usage on OSD recovery
Neha Ojha wrote:
> Can you export and upload a copy the problematic PG via ceph-post-file?
there are differents P...
Luis Felipe Domínguez Vega
08:11 PM Bug #47929: Huge RAM Usage on OSD recovery
Can you export and upload a copy the problematic PG via ceph-post-file? Neha Ojha
05:42 PM Bug #47951: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
This appears to break any sort of resolution of IPv6 addresses from hostnames. This affects qemu's usage of rbd, in ... Troy Ablan
11:30 AM Bug #47951: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
The fix is probably:... Jonas Jelten
07:20 AM Bug #47951: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
Seems like this commit broke this functionality: https://github.com/ceph/ceph/commit/2f075704073ff80f94c70cf79516028d... Wido den Hollander
03:48 PM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
/a/teuthology-2020-10-23_07:01:02-rados-master-distro-basic-smithi/5550707 Neha Ojha
03:40 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
/a/teuthology-2020-10-23_07:01:02-rados-master-distro-basic-smithi/5550826 Neha Ojha
03:17 PM Bug #38783: Changing mon_pg_warn_max_object_skew has no effect.
Andrew Mitroshin wrote:
> Injecting into mgr has solved the issue, thanks!
What command did you use to inject int...
Scott Hubbard
02:03 PM Backport #47898: octopus: mon stat prints plain text with -f json
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37705
merged
Yuri Weinstein
01:51 PM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
http://qa-proxy.ceph.com/teuthology/yuriw-2020-10-20_19:54:27-rados-wip-yuri-testing-2020-10-20-0934-octopus-distro-b... Deepika Upadhyay
10:17 AM Bug #43893: lingering osd_failure ops (due to failure_info holding references?)
Still exist in 14.2.11. When you have some issues with network, then after all you're ending with SLOW_OPS with osd_f... Rafal Wadolowski

10/22/2020

10:11 PM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
/a/kchai-2020-10-21_07:01:44-rados-wip-kefu-testing-2020-10-21-1144-distro-basic-smithi/5545065 Neha Ojha
07:52 PM Bug #47952: Replicated pool creation fails Nautilus 14.2.12 build when cluster runs with filestor...
14.2.12 introduced the following change in https://github.com/ceph/ceph/pull/37474, which is probably the case you ar... Neha Ojha
07:08 PM Bug #47952 (New): Replicated pool creation fails Nautilus 14.2.12 build when cluster runs with fi...
Tried pool creation using ceph-ansibles-4.0 and replication pool failed with following error :
Build : Nautilus 1...
Prashant Tambe
06:10 PM Bug #47929: Huge RAM Usage on OSD recovery
Nop, not work the export-import behavior, because on recover, when need to recover that PG then OOM killed Luis Felipe Domínguez Vega
01:52 PM Bug #47929: Huge RAM Usage on OSD recovery
there are some extrange behavior because now in another failing OSD not work at all and i execute the export-remove a... Luis Felipe Domínguez Vega
03:47 AM Bug #47929: Huge RAM Usage on OSD recovery

Changed and used the ...
Luis Felipe Domínguez Vega
05:10 PM Bug #47951 (Fix Under Review): MonClient: mon_host with DNS Round Robin results in 'unable to par...
Patrick Donnelly
05:06 PM Bug #47951 (In Progress): MonClient: mon_host with DNS Round Robin results in 'unable to parse ad...
Patrick Donnelly
04:34 PM Bug #47951 (Resolved): MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
I performed a test upgrade to 14.2.12 today on a cluster using IPv6 with Round Robin DNS for mon_host... Wido den Hollander
04:33 PM Bug #47949: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_scrub: return 1
... Neha Ojha
02:54 PM Bug #47949: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_scrub: return 1
Deepika Upadhyay wrote:
http://qa-proxy.ceph.com/teuthology/yuriw-2020-10-20_15:30:01-rados-wip-yuri5-testing-2020...
Deepika Upadhyay
01:06 PM Bug #47949 (New): scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_scrub: return 1
... Deepika Upadhyay
03:53 PM Bug #40777 (New): hit assert in AuthMonitor::update_from_paxos
Neha Ojha
03:22 PM Bug #47767: octopus: setting noscrub crashed osd process
It happened again moments after setting nodeep-scrub:... Dan van der Ster
11:00 AM Bug #46732: teuthology.exceptions.MaxWhileTries: 'check for active or peered' reached maximum tri...
saw this recently, with same configuration description:
/a/yuriw-2020-10-20_15:30:01-rados-wip-yuri5-testing-2020-...
Deepika Upadhyay
07:41 AM Bug #47945 (Duplicate): scrubbing failure
description: rados/thrash/{0-size-min-size-overrides/3-size-2-min-size 1-pg-log-overrides/short_
2-recovery-overri...
Deepika Upadhyay
04:59 AM Bug #46845 (Resolved): Newly orchestrated OSD fails with 'unable to find any IPv4 address in netw...
https://github.com/ceph/ceph/pull/37709 Kefu Chai

10/21/2020

10:41 PM Bug #47929: Huge RAM Usage on OSD recovery
Well try with:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-<osd id> --pgid "<stuck_pg get from log>" ...
Luis Felipe Domínguez Vega
06:15 PM Bug #47929: Huge RAM Usage on OSD recovery
ceph -s: https://pastebin.ubuntu.com/p/3rjd435Sdh/
ceph pg dump: https://pastebin.ubuntu.com/p/THsSd2J33s/
Luis Felipe Domínguez Vega
05:57 PM Bug #47929: Huge RAM Usage on OSD recovery
Can you please provide the output of "ceph -s" and "ceph pg dump"? Neha Ojha
04:29 PM Bug #47929 (New): Huge RAM Usage on OSD recovery
Hi, today mi Infra provider has a blackout, then the Ceph was try to
recover but are in an inconsistent state becaus...
Luis Felipe Domínguez Vega
10:32 PM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1

Before scrubs were started in the background, not all PGs were in recovery. But somehow in this case the scrubs, p...
David Zafman
10:10 PM Bug #47930 (Resolved): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
... Neha Ojha
10:13 PM Bug #45441: rados: Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
rados/thrash-erasure-code-overwrites/{bluestore-bitmap ceph clusters/{fixed-2 openstack} fast/fast mon_election/conne... Neha Ojha
10:07 PM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
/a/teuthology-2020-10-21_07:01:02-rados-master-distro-basic-smithi/5544858 Neha Ojha
04:05 PM Bug #46318: mon_recovery: quorum_status times out
rados/monthrash/{ceph clusters/3-mons mon_election/connectivity msgr-failures/few msgr/async-v1only objectstore/blues... Neha Ojha
01:43 PM Bug #47328 (Fix Under Review): nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Igor Fedotov
01:19 PM Bug #47328 (In Progress): nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Igor Fedotov

10/20/2020

10:15 PM Bug #40777: hit assert in AuthMonitor::update_from_paxos
https://github.com/facebook/rocksdb/issues/5558 shows the same issue. Brad Hubbard
12:28 PM Bug #47907 (Can't reproduce): test_mon_mon: ceph mon stat -f json parse error
Kefu Chai
06:19 AM Bug #47907: test_mon_mon: ceph mon stat -f json parse error
passed at https://pulpito.ceph.com/kchai-2020-10-20_04:38:01-rados-master-distro-basic-smithi/... Kefu Chai
12:08 AM Bug #47907 (Can't reproduce): test_mon_mon: ceph mon stat -f json parse error
... Neha Ojha
08:12 AM Bug #44420 (Fix Under Review): cephadm cluster: "ceph ping mon.*" works fine, but "ceph ping mon....
Mykola Golub

10/19/2020

07:23 PM Feature #47732: Issue health warning if a performance issue is occurring especially for ceph-osd ...
Look at swap to make sure memory isn't over provisioned to containers, for example.
Do containers swap or crash if...
David Zafman
07:17 PM Feature #47732: Issue health warning if a performance issue is occurring especially for ceph-osd ...
Include in Orchestator checks? David Zafman
09:01 AM Backport #47899 (In Progress): nautilus: mon stat prints plain text with -f json
Nathan Cutler
08:34 AM Backport #47899 (Resolved): nautilus: mon stat prints plain text with -f json
https://github.com/ceph/ceph/pull/37706 Nathan Cutler
08:58 AM Backport #47898 (In Progress): octopus: mon stat prints plain text with -f json
Nathan Cutler
08:34 AM Backport #47898 (Resolved): octopus: mon stat prints plain text with -f json
https://github.com/ceph/ceph/pull/37705 Nathan Cutler
06:08 AM Bug #46816 (Pending Backport): mon stat prints plain text with -f json
Kefu Chai
06:01 AM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
... Kefu Chai

10/16/2020

02:45 PM Bug #43795 (Resolved): Ceph tools utilizing "global_[pre_]init" no longer process "early" environ...
Nathan Cutler
02:45 PM Backport #43996 (Rejected): mimic: Ceph tools utilizing "global_[pre_]init" no longer process "ea...
mimic EOL Nathan Cutler

10/15/2020

02:10 PM Bug #47673: cephfs 4k randwrite + EC pool(2+1) + single node all OSDs OOM
鑫 王 wrote:
> hi Igor,
> Is there any new progress?
Hi!
I haven't managed to reproduce this locally for mast...
Igor Fedotov
10:20 AM Bug #44420: cephadm cluster: "ceph ping mon.*" works fine, but "ceph ping mon.<id>" is broken
Sebastian Wagner wrote:
> might be a cephadm issue.
Indeed, it does seem to happen *only* when the daemon is runn...
Nathan Cutler
08:30 AM Backport #47741: octopus: mon: set session_timeout when adding to session_map
Nathan Cutler wrote:
> @Konstantin: Which master PR are you intending to backport to octopus here?
@Nathan:
look...
Wei-Chung Cheng

10/14/2020

01:57 PM Bug #43887: ceph_test_rados_delete_pools_parallel failure
rados/monthrash/{ceph clusters/3-mons msgr-failures/few msgr/async objectstore/filestore-xfs
rados supported-random-...
Deepika Upadhyay
01:52 PM Bug #24057 (New): cbt fails to copy results to the archive dir
Deepika Upadhyay

10/13/2020

12:02 AM Bug #47838: mon/test_mon_osdmap_prune.sh: first_pinned != trim_to
ok, fails 1 out 10 times but seems new, need to look more.... Neha Ojha
 

Also available in: Atom