Project

General

Profile

Activity

From 10/01/2020 to 10/30/2020

10/30/2020

09:47 PM Bug #45243: nautilus: qa/standalone/scrub/osd-scrub-repair.sh fails with osd-scrub-repair.sh:698:...
Haven't seen it recently but there might still be a race somewhere which causes this. Neha Ojha
09:45 PM Bug #44945 (Need More Info): Mon High CPU usage when another mon syncing from it
Which ceph version is this? I'd be curious to know if this is still an issue with Octopus since we improved removed s... Neha Ojha
09:40 PM Bug #44694 (Duplicate): MON_DOWN during cluster setup
https://tracker.ceph.com/issues/45441 seems to be same issue. Neha Ojha
09:38 PM Bug #44643 (Can't reproduce): leaked buffer (alloc from MonClient::handle_auth_request)
Neha Ojha
09:37 PM Bug #44243 (Can't reproduce): memstore make check test fails
Neha Ojha
09:36 PM Bug #44217 (Can't reproduce): Leaked connection (alloc from AsyncMessenger::add_accept)
Neha Ojha
09:27 PM Bug #43915 (Can't reproduce): leaked Session (alloc from OSD::ms_handle_authentication)
Neha Ojha
09:24 PM Bug #43591: /sbin/fstrim can interfere with umount
This might still be a problem, just haven't seen it recently. Neha Ojha
09:16 PM Bug #43185 (Resolved): ceph -s not showing client activity
Neha Ojha
09:15 PM Bug #42921 (Can't reproduce): osd: segmentation fault in PGLog::check
Neha Ojha
09:14 PM Bug #42706 (Can't reproduce): LibRadosList.EnumerateObjectsSplit fails
Neha Ojha
09:13 PM Bug #42186 (Can't reproduce): "2019-10-04T19:31:51.053283+0000 osd.7 (osd.7) 108 : cluster [ERR] ...
Neha Ojha
09:12 PM Bug #42175 (Can't reproduce): _txc_add_transaction error (2) No such file or directory not handl...
Neha Ojha
09:11 PM Bug #41943 (Closed): ceph-mgr fails to report OSD status correctly
Closing for lack of information and also luminous is EOL now. Please feel free to reopen if this reproduces on a rece... Neha Ojha
09:10 PM Bug #41748 (Can't reproduce): log [ERR] : 7.19 caller_ops.size 62 > log size 61
Neha Ojha
09:09 PM Bug #40820 (Closed): standalone/scrub/osd-scrub-test.sh +3 day failed assert
Haven't seen this in a while. Neha Ojha
09:08 PM Bug #40721 (Can't reproduce): backfill caught in loop from block
Neha Ojha
09:07 PM Bug #40522 (Can't reproduce): on_local_recover doesn't touch?
Neha Ojha
09:05 PM Bug #40454 (Can't reproduce): snap_mapper error, scrub gets r -2..repaired
Neha Ojha
09:04 PM Bug #41183 (Resolved): pg autoscale on EC pools
Josh Durgin
06:28 PM Bug #47930 (In Progress): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: ret...
David Zafman
04:16 PM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
to reproduce, we just need to change, `s/mon client directed command retry: 5/mon client directed command retry: 2 ru... Deepika Upadhyay
10:43 AM Bug #48042 (Fix Under Review): Log "ceph health detail" periodically in cluster log
Prashant D
10:24 AM Bug #47673: cephfs 4k randwrite + EC pool(2+1) + single node all OSDs OOM
鑫 王 wrote:
> *A slow IO will occur during execution.*
> I have another question why is the field buffer_anon also g...
Igor Fedotov

10/29/2020

11:39 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
Based on f7099f72faccb09aea5054c0b428bf89be67141c, "failed to assign global_id" is expected when we are not quorum. T... Neha Ojha
07:55 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
Looking at logs from /a/nojha-2020-10-28_21:12:45-rados:singleton-bluestore-master-distro-basic-smithi/5569512/
We...
Neha Ojha
05:34 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
I am able to reproduce this withtout msgr failure injection.
rados:singleton-bluestore/{all/cephtool mon_election/...
Neha Ojha
08:08 PM Bug #43193 (Fix Under Review): "ceph ping mon.<id>" cannot work
I can confirm this. There is a detailed explanation at https://github.com/ceph/ceph/pull/37716 but the briefest summa... Nathan Cutler
04:45 PM Bug #48042: Log "ceph health detail" periodically in cluster log
Neha Ojha wrote:
> This will help us spot things like obvious network issues which can lead to racks/hosts down in a...
Neha Ojha
04:39 PM Bug #48042 (Resolved): Log "ceph health detail" periodically in cluster log
This will help us spot things like obvious network issues which can lead to racks/hosts down in a cluster. Also gives... Neha Ojha
03:57 PM Documentation #18986: Need to document monitor health configuration values
got outdated, for will discuss and update what are relevant metrics that needs to be documented and update soon Deepika Upadhyay
08:45 AM Backport #47986: nautilus: MonClient: mon_host with DNS Round Robin results in 'unable to parse a...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37816
m...
Nathan Cutler
03:31 AM Backport #47986 (Resolved): nautilus: MonClient: mon_host with DNS Round Robin results in 'unable...
Brad Hubbard
04:42 AM Bug #47673: cephfs 4k randwrite + EC pool(2+1) + single node all OSDs OOM
hi Igor,
I can't explain why OSD handles 850K(writing 7426528761/8704), but when load is very low (a client iodep...
Stellar Wang
02:43 AM Bug #48028: ceph-mon always suffer lots of slow ops from v14.2.9
Yao Ning wrote:
> root@worker-2:~# docker exec ceph-mon-worker-2 ceph -s
> cluster:
> id: 299a04ba-dd3e-...
Yao Ning

10/28/2020

06:42 PM Bug #48033 (Closed): mon: after unrelated crash: handle_auth_request failed to assign global_id; ...
ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94) nautilus (stable)
I have tried to extensively se...
Peter Gervai
06:17 PM Backport #47986: nautilus: MonClient: mon_host with DNS Round Robin results in 'unable to parse a...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37816
merged
Yuri Weinstein
05:12 PM Bug #45190: osd dump times out
... Neha Ojha
05:00 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
/a/teuthology-2020-10-28_07:01:02-rados-master-distro-basic-smithi/5567283 Neha Ojha
04:59 PM Bug #45441: rados: Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
rados/singleton/{all/mon-config mon_election/connectivity msgr-failures/many msgr/async objectstore/bluestore-comp-lz... Neha Ojha
04:57 PM Bug #48030 (Resolved): mon/caps.sh: mgr command(pg dump) waits forever due to rados_mon_op_timeou...
... Neha Ojha
04:52 PM Bug #48029 (New): Exiting scrub checking -- not all pgs scrubbed.
... Neha Ojha
03:28 PM Bug #48028: ceph-mon always suffer lots of slow ops from v14.2.9
Yao Ning wrote:
> root@worker-2:~# docker exec ceph-mon-worker-2 ceph -s
> cluster:
> id: 299a04ba-dd3e-...
Yao Ning
03:21 PM Bug #48028 (Won't Fix - EOL): ceph-mon always suffer lots of slow ops from v14.2.9
root@worker-2:~# docker exec ceph-mon-worker-2 ceph -s
cluster:
id: 299a04ba-dd3e-43a7-af17-628190cf742f
...
Yao Ning
01:16 PM Bug #48026 (New): Mon crashes when adding 4th OSD
*Context*: I'm running Ceph Octopus 15.2.5 (the latest as of this bug) using Rook on a toy Kubernetes cluster of two ... Lalit Maganti

10/27/2020

10:49 PM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1

We only need 1 pool with 1 pg, if we orchestrate carefully. The existing test is more like a shotgun, sending lots...
David Zafman
08:14 PM Bug #47930 (Triaged): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
/a/teuthology-2020-10-21_07:01:02-rados-master-distro-basic-smithi/5544900 - here the failure occurred because the la... Neha Ojha
09:06 PM Bug #47952: Replicated pool creation fails Nautilus 14.2.12 build when cluster runs with filestor...
Neha Ojha wrote:
> 14.2.12 introduced the following change in https://github.com/ceph/ceph/pull/37474, which is prob...
Prashant Tambe
09:34 AM Backport #47826 (In Progress): octopus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: ret...
Nathan Cutler
09:34 AM Backport #47741 (Duplicate): octopus: mon: set session_timeout when adding to session_map
Nathan Cutler

10/26/2020

09:11 PM Backport #47994 (In Progress): octopus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Nathan Cutler
10:34 AM Backport #47994 (Resolved): octopus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
https://github.com/ceph/ceph/pull/37819 Nathan Cutler
09:05 PM Backport #47993 (In Progress): nautilus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Nathan Cutler
10:34 AM Backport #47993 (Resolved): nautilus: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
https://github.com/ceph/ceph/pull/37818 Nathan Cutler
09:04 PM Backport #47987 (In Progress): octopus: MonClient: mon_host with DNS Round Robin results in 'unab...
Nathan Cutler
10:32 AM Backport #47987 (Resolved): octopus: MonClient: mon_host with DNS Round Robin results in 'unable ...
https://github.com/ceph/ceph/pull/37817 Nathan Cutler
09:02 PM Backport #47986 (In Progress): nautilus: MonClient: mon_host with DNS Round Robin results in 'una...
Nathan Cutler
10:32 AM Backport #47986 (Resolved): nautilus: MonClient: mon_host with DNS Round Robin results in 'unable...
https://github.com/ceph/ceph/pull/37816 Nathan Cutler
08:39 PM Backport #47825 (In Progress): nautilus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: re...
Nathan Cutler
05:28 PM Bug #44981 (Resolved): rados/test_envlibrados_for_rocksdb.sh build failure (seen in nautilus)
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
05:27 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
sentry event: https://sentry.ceph.com/organizations/ceph/issues/10/events/2bdb1a2346cf4325b1bfaa7adf609f15/?project=2... Neha Ojha
11:21 AM Bug #47974: Slow requests due to unhealthy hearbeat - 'OSD::osd_op_tp thread 0x7f7f85903700' had ...
Did you perform any large pool/PG removals recently? Or may be some data rebalancing that could result in PG migratio... Igor Fedotov
11:08 AM Backport #45781 (Rejected): mimic: rados/test_envlibrados_for_rocksdb.sh build failure (seen in n...
mimic EOL Nathan Cutler
10:46 AM Backport #47898 (Resolved): octopus: mon stat prints plain text with -f json
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37705
m...
Nathan Cutler
10:33 AM Backport #47992 (Rejected): mimic: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Nathan Cutler
07:00 AM Bug #47951 (Pending Backport): MonClient: mon_host with DNS Round Robin results in 'unable to par...
Kefu Chai

10/25/2020

01:56 PM Bug #47328 (Pending Backport): nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Igor Fedotov
04:26 AM Bug #47929: Huge RAM Usage on OSD recovery
Neha Ojha wrote:
> Can you export and upload a copy the problematic PG via ceph-post-file?
ceph-post-file: 7639cc...
Luis Felipe Domínguez Vega

10/24/2020

04:05 PM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
... Kefu Chai
07:40 AM Bug #47974 (New): Slow requests due to unhealthy hearbeat - 'OSD::osd_op_tp thread 0x7f7f85903700...
Slow requests observed due to unhealthy hearbeats on osd.2.
/a/sseshasa-2020-10-23_12:25:57-rados-wip-sseshasa-tes...
Sridhar Seshasayee
02:23 AM Bug #47951: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
Alex Litvak wrote:
> Will the fix to it be posted soon? I am building ceph in containers from existing releases, is...
Alex Litvak
02:23 AM Bug #47951: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
Will the fix it to it posted soon? I am building ceph in containers from existing releases, is there a tag I can use... Alex Litvak

10/23/2020

10:08 PM Bug #47929: Huge RAM Usage on OSD recovery
Neha Ojha wrote:
> Can you export and upload a copy the problematic PG via ceph-post-file?
there are differents P...
Luis Felipe Domínguez Vega
08:11 PM Bug #47929: Huge RAM Usage on OSD recovery
Can you export and upload a copy the problematic PG via ceph-post-file? Neha Ojha
05:42 PM Bug #47951: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
This appears to break any sort of resolution of IPv6 addresses from hostnames. This affects qemu's usage of rbd, in ... Troy Ablan
11:30 AM Bug #47951: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
The fix is probably:... Jonas Jelten
07:20 AM Bug #47951: MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
Seems like this commit broke this functionality: https://github.com/ceph/ceph/commit/2f075704073ff80f94c70cf79516028d... Wido den Hollander
03:48 PM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
/a/teuthology-2020-10-23_07:01:02-rados-master-distro-basic-smithi/5550707 Neha Ojha
03:40 PM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
/a/teuthology-2020-10-23_07:01:02-rados-master-distro-basic-smithi/5550826 Neha Ojha
03:17 PM Bug #38783: Changing mon_pg_warn_max_object_skew has no effect.
Andrew Mitroshin wrote:
> Injecting into mgr has solved the issue, thanks!
What command did you use to inject int...
Scott Hubbard
02:03 PM Backport #47898: octopus: mon stat prints plain text with -f json
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37705
merged
Yuri Weinstein
01:51 PM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
http://qa-proxy.ceph.com/teuthology/yuriw-2020-10-20_19:54:27-rados-wip-yuri-testing-2020-10-20-0934-octopus-distro-b... Deepika Upadhyay
10:17 AM Bug #43893: lingering osd_failure ops (due to failure_info holding references?)
Still exist in 14.2.11. When you have some issues with network, then after all you're ending with SLOW_OPS with osd_f... Rafal Wadolowski

10/22/2020

10:11 PM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
/a/kchai-2020-10-21_07:01:44-rados-wip-kefu-testing-2020-10-21-1144-distro-basic-smithi/5545065 Neha Ojha
07:52 PM Bug #47952: Replicated pool creation fails Nautilus 14.2.12 build when cluster runs with filestor...
14.2.12 introduced the following change in https://github.com/ceph/ceph/pull/37474, which is probably the case you ar... Neha Ojha
07:08 PM Bug #47952 (New): Replicated pool creation fails Nautilus 14.2.12 build when cluster runs with fi...
Tried pool creation using ceph-ansibles-4.0 and replication pool failed with following error :
Build : Nautilus 1...
Prashant Tambe
06:10 PM Bug #47929: Huge RAM Usage on OSD recovery
Nop, not work the export-import behavior, because on recover, when need to recover that PG then OOM killed Luis Felipe Domínguez Vega
01:52 PM Bug #47929: Huge RAM Usage on OSD recovery
there are some extrange behavior because now in another failing OSD not work at all and i execute the export-remove a... Luis Felipe Domínguez Vega
03:47 AM Bug #47929: Huge RAM Usage on OSD recovery

Changed and used the ...
Luis Felipe Domínguez Vega
05:10 PM Bug #47951 (Fix Under Review): MonClient: mon_host with DNS Round Robin results in 'unable to par...
Patrick Donnelly
05:06 PM Bug #47951 (In Progress): MonClient: mon_host with DNS Round Robin results in 'unable to parse ad...
Patrick Donnelly
04:34 PM Bug #47951 (Resolved): MonClient: mon_host with DNS Round Robin results in 'unable to parse addrs'
I performed a test upgrade to 14.2.12 today on a cluster using IPv6 with Round Robin DNS for mon_host... Wido den Hollander
04:33 PM Bug #47949: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_scrub: return 1
... Neha Ojha
02:54 PM Bug #47949: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_scrub: return 1
Deepika Upadhyay wrote:
http://qa-proxy.ceph.com/teuthology/yuriw-2020-10-20_15:30:01-rados-wip-yuri5-testing-2020...
Deepika Upadhyay
01:06 PM Bug #47949 (New): scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_scrub: return 1
... Deepika Upadhyay
03:53 PM Bug #40777 (New): hit assert in AuthMonitor::update_from_paxos
Neha Ojha
03:22 PM Bug #47767: octopus: setting noscrub crashed osd process
It happened again moments after setting nodeep-scrub:... Dan van der Ster
11:00 AM Bug #46732: teuthology.exceptions.MaxWhileTries: 'check for active or peered' reached maximum tri...
saw this recently, with same configuration description:
/a/yuriw-2020-10-20_15:30:01-rados-wip-yuri5-testing-2020-...
Deepika Upadhyay
07:41 AM Bug #47945 (Duplicate): scrubbing failure
description: rados/thrash/{0-size-min-size-overrides/3-size-2-min-size 1-pg-log-overrides/short_
2-recovery-overri...
Deepika Upadhyay
04:59 AM Bug #46845 (Resolved): Newly orchestrated OSD fails with 'unable to find any IPv4 address in netw...
https://github.com/ceph/ceph/pull/37709 Kefu Chai

10/21/2020

10:41 PM Bug #47929: Huge RAM Usage on OSD recovery
Well try with:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-<osd id> --pgid "<stuck_pg get from log>" ...
Luis Felipe Domínguez Vega
06:15 PM Bug #47929: Huge RAM Usage on OSD recovery
ceph -s: https://pastebin.ubuntu.com/p/3rjd435Sdh/
ceph pg dump: https://pastebin.ubuntu.com/p/THsSd2J33s/
Luis Felipe Domínguez Vega
05:57 PM Bug #47929: Huge RAM Usage on OSD recovery
Can you please provide the output of "ceph -s" and "ceph pg dump"? Neha Ojha
04:29 PM Bug #47929 (New): Huge RAM Usage on OSD recovery
Hi, today mi Infra provider has a blackout, then the Ceph was try to
recover but are in an inconsistent state becaus...
Luis Felipe Domínguez Vega
10:32 PM Bug #47930: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1

Before scrubs were started in the background, not all PGs were in recovery. But somehow in this case the scrubs, p...
David Zafman
10:10 PM Bug #47930 (Resolved): scrub/osd-recovery-scrub.sh: TEST_recovery_scrub: wait_background: return 1
... Neha Ojha
10:13 PM Bug #45441: rados: Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
rados/thrash-erasure-code-overwrites/{bluestore-bitmap ceph clusters/{fixed-2 openstack} fast/fast mon_election/conne... Neha Ojha
10:07 PM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
/a/teuthology-2020-10-21_07:01:02-rados-master-distro-basic-smithi/5544858 Neha Ojha
04:05 PM Bug #46318: mon_recovery: quorum_status times out
rados/monthrash/{ceph clusters/3-mons mon_election/connectivity msgr-failures/few msgr/async-v1only objectstore/blues... Neha Ojha
01:43 PM Bug #47328 (Fix Under Review): nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Igor Fedotov
01:19 PM Bug #47328 (In Progress): nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
Igor Fedotov

10/20/2020

10:15 PM Bug #40777: hit assert in AuthMonitor::update_from_paxos
https://github.com/facebook/rocksdb/issues/5558 shows the same issue. Brad Hubbard
12:28 PM Bug #47907 (Can't reproduce): test_mon_mon: ceph mon stat -f json parse error
Kefu Chai
06:19 AM Bug #47907: test_mon_mon: ceph mon stat -f json parse error
passed at https://pulpito.ceph.com/kchai-2020-10-20_04:38:01-rados-master-distro-basic-smithi/... Kefu Chai
12:08 AM Bug #47907 (Can't reproduce): test_mon_mon: ceph mon stat -f json parse error
... Neha Ojha
08:12 AM Bug #44420 (Fix Under Review): cephadm cluster: "ceph ping mon.*" works fine, but "ceph ping mon....
Mykola Golub

10/19/2020

07:23 PM Feature #47732: Issue health warning if a performance issue is occurring especially for ceph-osd ...
Look at swap to make sure memory isn't over provisioned to containers, for example.
Do containers swap or crash if...
David Zafman
07:17 PM Feature #47732: Issue health warning if a performance issue is occurring especially for ceph-osd ...
Include in Orchestator checks? David Zafman
09:01 AM Backport #47899 (In Progress): nautilus: mon stat prints plain text with -f json
Nathan Cutler
08:34 AM Backport #47899 (Resolved): nautilus: mon stat prints plain text with -f json
https://github.com/ceph/ceph/pull/37706 Nathan Cutler
08:58 AM Backport #47898 (In Progress): octopus: mon stat prints plain text with -f json
Nathan Cutler
08:34 AM Backport #47898 (Resolved): octopus: mon stat prints plain text with -f json
https://github.com/ceph/ceph/pull/37705 Nathan Cutler
06:08 AM Bug #46816 (Pending Backport): mon stat prints plain text with -f json
Kefu Chai
06:01 AM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
... Kefu Chai

10/16/2020

02:45 PM Bug #43795 (Resolved): Ceph tools utilizing "global_[pre_]init" no longer process "early" environ...
Nathan Cutler
02:45 PM Backport #43996 (Rejected): mimic: Ceph tools utilizing "global_[pre_]init" no longer process "ea...
mimic EOL Nathan Cutler

10/15/2020

02:10 PM Bug #47673: cephfs 4k randwrite + EC pool(2+1) + single node all OSDs OOM
鑫 王 wrote:
> hi Igor,
> Is there any new progress?
Hi!
I haven't managed to reproduce this locally for mast...
Igor Fedotov
10:20 AM Bug #44420: cephadm cluster: "ceph ping mon.*" works fine, but "ceph ping mon.<id>" is broken
Sebastian Wagner wrote:
> might be a cephadm issue.
Indeed, it does seem to happen *only* when the daemon is runn...
Nathan Cutler
08:30 AM Backport #47741: octopus: mon: set session_timeout when adding to session_map
Nathan Cutler wrote:
> @Konstantin: Which master PR are you intending to backport to octopus here?
@Nathan:
look...
Wei-Chung Cheng

10/14/2020

01:57 PM Bug #43887: ceph_test_rados_delete_pools_parallel failure
rados/monthrash/{ceph clusters/3-mons msgr-failures/few msgr/async objectstore/filestore-xfs
rados supported-random-...
Deepika Upadhyay
01:52 PM Bug #24057 (New): cbt fails to copy results to the archive dir
Deepika Upadhyay

10/13/2020

12:02 AM Bug #47838: mon/test_mon_osdmap_prune.sh: first_pinned != trim_to
ok, fails 1 out 10 times but seems new, need to look more.... Neha Ojha

10/12/2020

11:45 PM Bug #47838: mon/test_mon_osdmap_prune.sh: first_pinned != trim_to
Did not reproduce here: https://pulpito.ceph.com/nojha-2020-10-12_19:42:27-rados:monthrash-master-distro-basic-smithi... Neha Ojha
05:08 PM Bug #47838 (In Progress): mon/test_mon_osdmap_prune.sh: first_pinned != trim_to
... Neha Ojha
11:33 PM Bug #24057: cbt fails to copy results to the archive dir
http://qa-proxy.ceph.com/teuthology/yuriw-2020-10-09_19:36:20-rados-wip-yuri-testing-2020-10-09-1112-octopus-distro-b... Deepika Upadhyay
12:40 PM Bug #46816 (Fix Under Review): mon stat prints plain text with -f json
Nathan Cutler
11:35 AM Bug #46816: mon stat prints plain text with -f json
https://github.com/ceph/ceph/pull/37635 Joao Eduardo Luis
09:47 AM Bug #46816 (Triaged): mon stat prints plain text with -f json
Joao Eduardo Luis

10/10/2020

11:30 AM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
/a/kchai-2020-10-10_09:47:31-rados-wip-kefu-testing-2020-10-09-1210-distro-basic-smithi/5512894 Kefu Chai
08:59 AM Backport #47826 (Resolved): octopus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
https://github.com/ceph/ceph/pull/37853 Nathan Cutler
08:59 AM Backport #47825 (Resolved): nautilus: osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
https://github.com/ceph/ceph/pull/37815 Nathan Cutler
08:54 AM Bug #47813 (Fix Under Review): osd op age is 4294967296
Every few seconds on a 14.2.11 OSD I see an op with age close to 2^32:... Dan van der Ster
03:52 AM Bug #47673: cephfs 4k randwrite + EC pool(2+1) + single node all OSDs OOM
hi Igor,
Is there any new progress?
Stellar Wang

10/09/2020

03:04 PM Backport #47363 (Resolved): octopus: pgs inconsistent, union_shard_errors=missing
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37048
m...
Nathan Cutler
03:39 AM Bug #47673: cephfs 4k randwrite + EC pool(2+1) + single node all OSDs OOM
Igor Fedotov wrote:
> Would you please collect perf counter dumps for both running benchmark (e.g. ) and on its comp...
Stellar Wang

10/08/2020

09:57 PM Bug #47804 (New): EC backend implementation isn't optimal when handling 4k overwrites
EC backend performs redundant read/write ops when handling partial 4k-aligned overwrite.
E.g. there is 64K object i...
Igor Fedotov
06:54 PM Backport #47363: octopus: pgs inconsistent, union_shard_errors=missing
Mykola Golub wrote:
> https://github.com/ceph/ceph/pull/37048
merged
Yuri Weinstein
06:44 PM Bug #46405 (Pending Backport): osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
Neha Ojha
05:54 PM Bug #45190: osd dump times out
/a/yuriw-2020-10-05_22:17:06-rados-wip-yuri7-testing-2020-10-05-1338-octopus-distro-basic-smithi/5500126/teuthology.l... Deepika Upadhyay
05:22 PM Bug #45948: ceph_test_rados_delete_pools_parallel failed with error -2 on nautilus
rados/monthrash/{ceph clusters/3-mons msgr-failures/few msgr/async objectstore/filestore-xfs
rados supported-ran...
Deepika Upadhyay
05:13 PM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
description: rados/singleton/{all/thrash_cache_writeback_proxy_none msgr-failures/few
msgr/async-v2only objectst...
Deepika Upadhyay

10/07/2020

09:34 PM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
rados/singleton/{all/thrash_cache_writeback_proxy_none mon_election/classic msgr-failures/few msgr/async objectstore/... Neha Ojha
11:44 AM Feature #47775: limit osd_pglog size by memory
I've seen that octopus has a new option osd_target_pg_log_entries_per_osd. This looks good. I would therefore amend m... Dan van der Ster
11:17 AM Feature #47775 (New): limit osd_pglog size by memory
We have an S3 cluster with OSDs running out of memory due to the large amount of ram needed to hold 3000 pglog entrie... Dan van der Ster

10/06/2020

08:37 PM Bug #44352: pool listings are slow after deleting objects
We compacted OSDs from time to time and it helped at some time. We then moved .rgw.root pool just to SSD drives (it ... Serg Protsun
03:00 PM Bug #47767: octopus: setting noscrub crashed osd process
Mostly likely a regression caused by https://github.com/ceph/ceph/pull/36292. Neha Ojha
02:41 PM Bug #47767: octopus: setting noscrub crashed osd process
Sorry about the formatting -- I think it's still legible.
I found other osds with missing object errors also trigg...
Dan van der Ster
02:20 PM Bug #47767 (Resolved): octopus: setting noscrub crashed osd process
We just had a crash of one osd (out of ~1200) moments after we set noscrub and nodeep-scrub on the cluster.
Here...
Dan van der Ster
12:58 PM Feature #47766 (New): crushtool compile support from json crush map
When programmatically manipulating a CRUSH map, json is often used, unfortunately, "crushtool" does not support compi... Sébastien Han

10/05/2020

10:00 PM Feature #42659 (Duplicate): add a health_warn when mon_osd_report_timeout <= mon_osd_report_timeout
Neha Ojha
09:56 PM Feature #42659 (Resolved): add a health_warn when mon_osd_report_timeout <= mon_osd_report_timeout
Neha Ojha
10:00 PM Bug #40668 (Resolved): mon_osd_report_timeout should not be allowed to be less than 2x the value ...
Neha Ojha
09:57 PM Bug #40668 (Duplicate): mon_osd_report_timeout should not be allowed to be less than 2x the value...
Neha Ojha
09:31 PM Backport #47741: octopus: mon: set session_timeout when adding to session_map
@Konstantin: Which master PR are you intending to backport to octopus here? Nathan Cutler
02:27 AM Backport #47741 (Duplicate): octopus: mon: set session_timeout when adding to session_map
Konstantin Shalygin
03:46 PM Backport #47748 (In Progress): nautilus: mon: set session_timeout when adding to session_map
Wei-Chung Cheng
01:13 PM Backport #47748 (Resolved): nautilus: mon: set session_timeout when adding to session_map
https://github.com/ceph/ceph/pull/37554 Nathan Cutler
03:46 PM Backport #47747 (In Progress): octopus: mon: set session_timeout when adding to session_map
Wei-Chung Cheng
01:13 PM Backport #47747 (Resolved): octopus: mon: set session_timeout when adding to session_map
https://github.com/ceph/ceph/pull/37553 Nathan Cutler
03:16 PM Documentation #47754 (New): Orchestrator implementation status table is old
https://docs.ceph.com/en/latest/mgr/orchestrator/#current-implementation-status
This table, showing the current im...
Zac Dover
03:11 PM Documentation #47522 (Closed): Document "ceph df detail"
Zac Dover
03:09 PM Documentation #47523 (In Progress): ceph df documentation is outdated
Zac Dover
12:44 PM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
seems like leader has been down since long: ... Deepika Upadhyay
06:17 AM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
http://qa-proxy.ceph.com/teuthology/yuriw-2020-09-16_23:57:37-rados-wip-yuri8-testing-2020-09-16-2220-octopus-distro-... Deepika Upadhyay

10/04/2020

05:48 AM Bug #47697 (Pending Backport): mon: set session_timeout when adding to session_map
Kefu Chai

10/03/2020

02:09 AM Bug #37532 (Resolved): mon: expected_num_objects warning triggers on bluestore-only setups
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:05 AM Bug #44815 (Resolved): Pool stats increase after PG merged (PGMap::apply_incremental doesn't subt...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:04 AM Bug #46024 (Resolved): larger osd_scrub_max_preemptions values cause Floating point exception
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:04 AM Bug #46216 (Resolved): mon: log entry with garbage generated by bad memory access
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:03 AM Bug #46705 (Resolved): Negative peer_num_objects crashes osd
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:03 AM Bug #46824 (Resolved): "No such file or directory" when exporting or importing a pool if locator ...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:02 AM Bug #47159 (Resolved): add ability to clean_temps in osdmaptool
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:02 AM Bug #47309 (Resolved): mon/mon-last-epoch-clean.sh failure
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
01:36 AM Backport #47346 (Resolved): octopus: mon/mon-last-epoch-clean.sh failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37349
m...
Nathan Cutler
01:36 AM Backport #47251 (Resolved): octopus: add ability to clean_temps in osdmaptool
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37348
m...
Nathan Cutler
01:25 AM Backport #47345 (Resolved): nautilus: mon/mon-last-epoch-clean.sh failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37478
m...
Nathan Cutler
01:25 AM Backport #47250 (Resolved): nautilus: add ability to clean_temps in osdmaptool
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37477
m...
Nathan Cutler
01:24 AM Backport #46965 (Resolved): nautilus: Pool stats increase after PG merged (PGMap::apply_increment...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37476
m...
Nathan Cutler
01:24 AM Backport #46935 (Resolved): nautilus: "No such file or directory" when exporting or importing a p...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37475
m...
Nathan Cutler
01:24 AM Backport #46738 (Resolved): nautilus: mon: expected_num_objects warning triggers on bluestore-onl...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37474
m...
Nathan Cutler
01:23 AM Backport #46710 (Resolved): nautilus: Negative peer_num_objects crashes osd
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37473
m...
Nathan Cutler
01:23 AM Backport #46262 (Resolved): nautilus: larger osd_scrub_max_preemptions values cause Floating poin...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37470
m...
Nathan Cutler
01:23 AM Backport #46461 (Resolved): nautilus: pybind/mgr/balancer: should use "==" and "!=" for comparing...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37471
m...
Nathan Cutler

10/02/2020

10:09 PM Bug #40367 (Can't reproduce): "*** Caught signal (Segmentation fault) **" in upgrade:luminous-x-n...
Neha Ojha
10:07 PM Bug #40081 (Closed): mon: luminous crash attempting to decode maps after nautilus quorum has been...
Doesn't apply to nautilus and future releases. Luminous and mimic are EOL. Neha Ojha
10:02 PM Bug #40029 (Resolved): ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(Cep...
Neha Ojha
10:02 PM Bug #40029 (Rejected): ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(Cep...
This only seems to be a problem with luminous and mimic, which are EOL now. Neha Ojha
09:56 PM Bug #39366 (Can't reproduce): ClsLock.TestRenew failure
Neha Ojha
09:55 PM Bug #38513 (Rejected): luminous: "AsyncReserver.h: 190: FAILED assert(!queue_pointers.count(item)...
Luminous is EOL. Neha Ojha
09:54 PM Bug #38402 (Can't reproduce): ceph-objectstore-tool on down osd w/ not enough in osds
Neha Ojha
09:53 PM Bug #38375 (Need More Info): OSD segmentation fault on rbd create
Seems like we have lost the ceph-post-file due one of the lab incidents. Neha Ojha
09:46 PM Bug #38064 (Duplicate): librados::OPERATION_FULL_TRY not completely implemented, test LibRadosAio...
Josh Durgin
09:42 PM Bug #35974: Apparent export-diff/import-diff corruption
Looking at this again, it seems like a potential bug when reading from replicas and encountering an EIO - this should... Josh Durgin
09:18 PM Bug #46318: mon_recovery: quorum_status times out
Joao, are you working on a fix for this? Neha Ojha
09:15 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
Haven't seen this in a while. Neha Ojha
09:11 PM Bug #44362 (Can't reproduce): osd: uninitialized memory in sendmsg
Neha Ojha
09:03 PM Bug #46405 (Fix Under Review): osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning: return 1
Neha Ojha
06:30 PM Feature #47732 (New): Issue health warning if a performance issue is occurring especially for cep...

This feature would identify a false network ping warning which might occur with a very busy ceph-osd(s).
The mon...
David Zafman
06:26 PM Backport #47346: octopus: mon/mon-last-epoch-clean.sh failure
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37349
merged
Yuri Weinstein
06:25 PM Backport #47251: octopus: add ability to clean_temps in osdmaptool
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37348
merged
Yuri Weinstein
04:30 PM Bug #45191: erasure-code/test-erasure-eio.sh: TEST_ec_single_recovery_error fails
http://qa-proxy.ceph.com/teuthology/yuriw-2020-10-01_17:46:11-rados-wip-yuri5-testing-2020-10-01-0834-octopus-distro-... Deepika Upadhyay

10/01/2020

11:04 PM Backport #46287 (Rejected): nautilus: mon: log entry with garbage generated by bad memory access
Not required for Nautilus. Patrick Donnelly
09:18 PM Bug #47508 (In Progress): Multiple read errors cause repeated entry/exit recovery for each error
David Zafman
06:10 PM Bug #47692: qa/standalone/osd/osd-backfill-stats.sh TEST_backfill_sizeup wait_for_clean timeout

qa/standalone/osd/osd-backfill-stats.sh:213: TEST_backfill_sizeup_out
https://pulpito.ceph.com/dzafman-2020-09...
David Zafman
05:18 PM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
/a/teuthology-2020-10-01_07:01:02-rados-master-distro-basic-smithi/5486214 Neha Ojha
05:14 PM Bug #47719 (Resolved): api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
... Neha Ojha
04:54 PM Backport #47345: nautilus: mon/mon-last-epoch-clean.sh failure
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37478
merged
Yuri Weinstein
04:54 PM Backport #47250: nautilus: add ability to clean_temps in osdmaptool
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37477
merged
Yuri Weinstein
04:53 PM Backport #46965: nautilus: Pool stats increase after PG merged (PGMap::apply_incremental doesn't ...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37476
merged
Yuri Weinstein
04:53 PM Backport #46935: nautilus: "No such file or directory" when exporting or importing a pool if loca...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37475
merged
Yuri Weinstein
04:52 PM Backport #46738: nautilus: mon: expected_num_objects warning triggers on bluestore-only setups
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37474
merged
Yuri Weinstein
04:51 PM Backport #46710: nautilus: Negative peer_num_objects crashes osd
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37473
merged
Yuri Weinstein
04:51 PM Backport #46262: nautilus: larger osd_scrub_max_preemptions values cause Floating point exception
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37470
merged
Yuri Weinstein
04:38 PM Backport #46461: nautilus: pybind/mgr/balancer: should use "==" and "!=" for comparing strings
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37471
merged
Yuri Weinstein
08:58 AM Bug #47697: mon: set session_timeout when adding to session_map
Not to my knowledge. When it happens, it is mostly transparent to the user -- the peer reopens the socket and attemp... Ilya Dryomov
02:22 AM Bug #47697: mon: set session_timeout when adding to session_map
Were there upstream QA tests that failed because of this? How did you learn of this problem? Patrick Donnelly
07:56 AM Bug #47712 (New): hdd pg's migrating when converting one ssd class osd to dmcrypt
I have pg's of hdd pools remapping, when I take out an ssd osd.
change crush reweight of ssd osd 33 to 0.0
[@ce...
none none
 

Also available in: Atom