Project

General

Profile

Activity

From 12/13/2020 to 01/11/2021

01/11/2021

11:58 PM Bug #48789 (In Progress): qa/standalone/scrub/osd-scrub-snaps.sh: create_scenario: return 1
David Zafman
07:41 PM Bug #48789 (Triaged): qa/standalone/scrub/osd-scrub-snaps.sh: create_scenario: return 1
related to https://github.com/ceph/ceph/pull/38651 Neha Ojha
10:54 PM Bug #48842: qa/standalone/osd/osd-recovery-prio.sh: TEST_recovery_pool_priority failed

This can easily be explained by a slow test machine. The 10 second sleep wasn't enough time to get recovery initia...
David Zafman
09:35 PM Bug #48842: qa/standalone/osd/osd-recovery-prio.sh: TEST_recovery_pool_priority failed
... Neha Ojha
07:39 PM Bug #48842 (Resolved): qa/standalone/osd/osd-recovery-prio.sh: TEST_recovery_pool_priority failed
... Neha Ojha
09:57 PM Bug #48843 (Resolved): Get more parallel scrubs within osd_max_scrubs limits

When a reservation failure prevents a PG from scrubbing, other possible scrubbable PGs aren't tried.
David Zafman
07:43 PM Bug #45441: rados: Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
rados/singleton-nomsgr/{all/recovery-unfound-found mon_election/connectivity rados supported-random-distro$/{centos_8... Neha Ojha
07:32 PM Bug #48841 (Resolved): test_turn_off_module: wait_until_equal timed out
... Neha Ojha
07:10 PM Bug #48840 (Closed): Octopus: Assert failure: test_ceph_osd_pool_create_utf8
FAIL: test_rados.TestCommand.test_ceph_osd_pool_create_utf8
test_ceph_osd_pool_create_utf8 should also work if a c...
Deepika Upadhyay
06:56 PM Bug #48745: Segmentation fault in PrimaryLogPG::cancel_manifest_ops
Myoungwon Oh wrote:
> Hm... I can't find any clues in /a/nojha-2021-01-07_00\:06\:49-rados-master-distro-basic-smith...
Neha Ojha
06:09 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
/ceph/teuthology-archive/yuriw-2021-01-08_16:38:07-rados-wip-yuri4-testing-2021-01-07-1041-octopus-distro-basic-smith... Deepika Upadhyay
06:08 PM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
-/ceph/teuthology-archive/yuriw-2021-01-08_16:38:07-rados-wip-yuri4-testing-2021-01-07-1041-octopus-distro-basic-smit... Deepika Upadhyay
04:06 AM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
http://pulpito.front.sepia.ceph.com:80/gregf-2021-01-09_02:02:11-rados-wip-stretch-updates-108-2-distro-basic-smithi/... Greg Farnum
06:05 PM Bug #48793 (Triaged): out of order op
details in https://tracker.ceph.com/issues/48777#note-7 Neha Ojha
10:16 AM Bug #48793: out of order op
http://qa-proxy.ceph.com/teuthology/ideepika-2020-12-18_14:27:53-rados:thrash-erasure-code-master-distro-basic-smith... Deepika Upadhyay
06:05 PM Bug #48485: osd thrasher timeout
seems related,adding here will verify later
/ceph/teuthology-archive/yuriw-2021-01-08_16:38:07-rados-wip-yuri4-tes...
Deepika Upadhyay
04:50 PM Backport #48482 (Resolved): nautilus: PG::_delete_some isn't optimal iterating objects
Igor Fedotov
04:43 PM Backport #48482: nautilus: PG::_delete_some isn't optimal iterating objects
Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/38478
merged
Yuri Weinstein
11:51 AM Bug #44595: cache tiering: Error: oid 48 copy_from 493 returned error code -2
... Kefu Chai
10:52 AM Bug #48821: osd crash in OSD::heartbeat when dereferencing null session
The fix seems just to check that the session pointer is not null before trying to use it. If the problem is not deepe... Mykola Golub
10:48 AM Bug #48821 (Resolved): osd crash in OSD::heartbeat when dereferencing null session
For an unhealthy (unstable) cluster with flip-flopping osds we observed crashes like this:... Mykola Golub
08:00 AM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
We got a little relief by reducing mon_osdmap_full_prune_min from the default 10,000 to 1,000 but osdmaps still grew ... David Herselman
03:57 AM Bug #48503: scrub stat mismatch on bytes
http://pulpito.front.sepia.ceph.com:80/gregf-2021-01-09_02:02:11-rados-wip-stretch-updates-108-2-distro-basic-smithi/... Greg Farnum
03:40 AM Bug #47719: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
http://pulpito.front.sepia.ceph.com:80/gregf-2021-01-09_02:02:11-rados-wip-stretch-updates-108-2-distro-basic-smithi/... Greg Farnum

01/09/2021

10:55 PM Bug #46978: OSD: shutdown of a OSD Host causes slow requests
Hi Manuel,
Would you be able to test a patch for this issue?
If so, what OS and ceph packages/version you run?
...
Mauricio Oliveira
04:47 PM Bug #48721: tcmalloc doesn't release memory
Josh Durgin wrote:
> Those stats show the memory is mostly used by the mon or released by tcmalloc but the kernel ha...
Seena Fallah
04:16 PM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
Reproduced on another merge cycle. Restarting only the leading mon, waiting 5 minutes and then creating a new epoch r... David Herselman
01:13 PM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
Apologies, the leading monitor bit is miss leading. The osdmap data is immediately trimmed the moment the last monito... David Herselman
12:56 PM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
We're running Ceph Octopus 15.2.8 with the same problem. Our monitors ran out of space after enabling autoscale as os... David Herselman
06:53 AM Bug #48745: Segmentation fault in PrimaryLogPG::cancel_manifest_ops
Hm... I can't find any clues in /a/nojha-2021-01-07_00\:06\:49-rados-master-distro-basic-smithi/5761073.
Can we repr...
Myoungwon Oh

01/08/2021

10:31 PM Bug #48721: tcmalloc doesn't release memory
Those stats show the memory is mostly used by the mon or released by tcmalloc but the kernel hasn't reclaimed it.
...
Josh Durgin
10:30 PM Bug #48775 (Duplicate): FAILED ceph_assert(is_primary()) in PG::scrub()
Neha Ojha
10:27 PM Bug #48732 (Need More Info): Marking OSDs out causes mon daemons to crash following tcmalloc: lar...
It will be great if you can share a reproducer for this or reproduce this capture monitor logs with debugging enabled. Neha Ojha
10:21 PM Bug #48790: rados/multimon: MON_DOWN in mon_election/connectivity with mon_clock_no_skews
Greg, could you please take a look and see if my theory makes sense. Neha Ojha
10:17 PM Bug #48745: Segmentation fault in PrimaryLogPG::cancel_manifest_ops
Myoungwon Oh: I am assigning this to you for more inputs. Neha Ojha
12:26 AM Bug #48745: Segmentation fault in PrimaryLogPG::cancel_manifest_ops
Xie Xingguo/Myoungwon Oh: this seems to be new regression in master, do you know what could have caused it? I don't s... Neha Ojha
12:11 AM Bug #48745: Segmentation fault in PrimaryLogPG::cancel_manifest_ops
/a/nojha-2021-01-07_00:06:49-rados-master-distro-basic-smithi/5761073 Neha Ojha
07:07 PM Bug #48536 (Rejected): ceph tool: osd crush create-or-move cannot accept multiple crush buckets
I'm dumb; this works just fine with the expected syntax:... Greg Farnum
12:59 PM Bug #48789: qa/standalone/scrub/osd-scrub-snaps.sh: create_scenario: return 1
I suspect this is related to the changes in column family in RocksDB.
(Maybe https://github.com/ceph/ceph/pull/38310...
Ronen Friedman
03:50 AM Bug #48789: qa/standalone/scrub/osd-scrub-snaps.sh: create_scenario: return 1
/a//kchai-2021-01-07_03:01:15-rados-wip-kefu-testing-2021-01-05-2058-distro-basic-smithi/5761661
my branch include...
Kefu Chai
12:02 AM Bug #48793 (Resolved): out of order op
... Neha Ojha

01/07/2021

09:06 PM Bug #48789: qa/standalone/scrub/osd-scrub-snaps.sh: create_scenario: return 1
/a/nojha-2021-01-07_00:06:49-rados-master-distro-basic-smithi/5761090 Neha Ojha
06:17 PM Bug #48789 (Resolved): qa/standalone/scrub/osd-scrub-snaps.sh: create_scenario: return 1
... Neha Ojha
08:58 PM Bug #48775: FAILED ceph_assert(is_primary()) in PG::scrub()
@Ronen: /a/nojha-2021-01-07_00:06:49-rados-master-distro-basic-smithi/5760959 has logs Neha Ojha
08:56 AM Bug #48775: FAILED ceph_assert(is_primary()) in PG::scrub()
Seems to be the same problem supposedly solved by https://github.com/ceph/ceph/pull/38730.
Verifying.
Ronen Friedman
12:25 AM Bug #48775: FAILED ceph_assert(is_primary()) in PG::scrub()
/a/teuthology-2021-01-05_07:01:02-rados-master-distro-basic-smithi/5755459 Neha Ojha
12:12 AM Bug #48775 (Duplicate): FAILED ceph_assert(is_primary()) in PG::scrub()
... Neha Ojha
08:55 PM Bug #48790: rados/multimon: MON_DOWN in mon_election/connectivity with mon_clock_no_skews
My first impression is that this is related to election_strategy connectivity.
All mons are in quorum here:
<pr...
Neha Ojha
06:56 PM Bug #48790 (New): rados/multimon: MON_DOWN in mon_election/connectivity with mon_clock_no_skews
rados/multimon/{clusters/9 mon_election/connectivity msgr-failures/many msgr/async-v1only no_pools objectstore/bluest... Neha Ojha
06:05 PM Bug #45647: "ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failu...
rados/thrash-erasure-code-shec/{ceph clusters/{fixed-4 openstack} mon_election/classic msgr-failures/few objectstore/... Neha Ojha
04:03 PM Bug #48786 (Resolved): api_tier_pp: LibRadosTwoPoolsPP.ManifestSnapRefcount/ManifestSnapRefcount2...
Run: https://pulpito.ceph.com/teuthology-2021-01-07_05:00:03-smoke-master-distro-basic-smithi/
Job: 5761861
Logs: h...
Yuri Weinstein
12:20 PM Backport #48480 (Resolved): octopus: PG::_delete_some isn't optimal iterating objects
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38477
m...
Nathan Cutler
12:19 PM Backport #48243 (Resolved): octopus: collection_list_legacy: pg inconsistent
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38098
m...
Nathan Cutler

01/06/2021

04:27 PM Backport #48480: octopus: PG::_delete_some isn't optimal iterating objects
Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/38477
merged
Yuri Weinstein
04:24 PM Backport #48243: octopus: collection_list_legacy: pg inconsistent
Mykola Golub wrote:
> https://github.com/ceph/ceph/pull/38098
merged
Yuri Weinstein
10:20 AM Bug #48764 (New): Octopus: Radosmodel tests fails when trying DeleteOp::_begin()
... Deepika Upadhyay
09:17 AM Bug #45615: api_watch_notify_pp: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify/1 f...
/ceph/teuthology-archive/yuriw-2021-01-04_18:28:05-rados-wip-yuri2-testing-2021-01-04-0837-octopus-distro-basic-smith... Deepika Upadhyay
06:40 AM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
/a//kchai-2021-01-06_02:57:51-rados-wip-kefu-testing-2021-01-05-2058-distro-basic-smithi/5758216 Kefu Chai

01/05/2021

03:27 PM Bug #48750 (Resolved): ceph config set using osd/host mask not working
this does not work, tested with 14.2.9 and 14.2.16:... Kenneth Waegeman

01/04/2021

07:43 PM Bug #48732: Marking OSDs out causes mon daemons to crash following tcmalloc: large alloc
This seems related to https://bugzilla.redhat.com/show_bug.cgi?id=1826450 our circumstances are highly similar. Wes Dillingham
05:42 PM Bug #48745 (Resolved): Segmentation fault in PrimaryLogPG::cancel_manifest_ops
... Neha Ojha

12/31/2020

06:08 PM Bug #48732 (Need More Info): Marking OSDs out causes mon daemons to crash following tcmalloc: lar...
On a 14.2.11 zero-load cluster I am taking some osd servers out of service.
I began marking OSDs out in preparation ...
Wes Dillingham
11:31 AM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
/a//kchai-2020-12-31_09:50:24-rados-wip-kefu-testing-2020-12-31-1427-distro-basic-smithi/5749187 Kefu Chai

12/30/2020

03:22 AM Bug #48323 (Resolved): "size 0 != clone_size 10" in clog - clone size mismatch when deduped objec...
Kefu Chai

12/29/2020

10:11 AM Bug #48712 (Fix Under Review): ceph_assert(is_primary()) in PG::scrub()
Ronen Friedman

12/28/2020

11:54 AM Bug #48719 (Fix Under Review): ceph: qa/standalone/scrub/osd-recovery-scrub.sh: erroneous 'Recove...
suggested PR: 38723 Ronen Friedman

12/27/2020

08:52 PM Bug #48721 (New): tcmalloc doesn't release memory
Ceph Monitor isn't releasing memory for about many days! Using nautilus 14.2.14.... Seena Fallah
11:03 AM Bug #48712 (In Progress): ceph_assert(is_primary()) in PG::scrub()
Caused when a PGScrub message is queued by a primary, but only de-queued when after an interval change.
(Specifica...
Ronen Friedman
08:00 AM Bug #48719 (Triaged): ceph: qa/standalone/scrub/osd-recovery-scrub.sh: erroneous 'Recovery never ...
Working on a fix that will search the log-files for the 'recovering' text,
instead of polling the 'pg stat'.
Ronen Friedman
07:25 AM Bug #48719 (Fix Under Review): ceph: qa/standalone/scrub/osd-recovery-scrub.sh: erroneous 'Recove...
The log shows:
@
osd-recovery-scrub.sh:182: TEST_recovery_scrub_2:  echo 'Recovery never started'
@
Even though...
Ronen Friedman

12/26/2020

01:25 PM Bug #48468: ceph-osd crash before being up again
Tried with a systemd service instead of the docker container, same behaviour as well.
I've submitted the new crash-...
Clément Hampaï

12/25/2020

06:12 AM Bug #21287: 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->is_error())"
Chang Liu wrote:
> see https://github.com/ceph/ceph/pull/16675
seems that patch was already applied since the "FA...
huang jun

12/24/2020

02:40 PM Bug #48669: libec_isa.so with TEXTREL for ceph-v15.2.8 on arrch64
duplicate #48681 Alexey Shabalin
09:16 AM Bug #48468: ceph-osd crash before being up again
little update, I've tried with @v15.2.8.@ Sadly same behaviour. Clément Hampaï
06:10 AM Bug #48599 (Resolved): Segmentation fault in ~C_SetManifestRefCountDone()
Kefu Chai
05:17 AM Bug #47945 (Duplicate): scrubbing failure
Kefu Chai
04:09 AM Bug #47945: scrubbing failure
/a//kchai-2020-12-23_05:37:18-rados-wip-kefu-testing-2020-12-23-1139-distro-basic-smithi/5732435/ Kefu Chai
04:18 AM Bug #48712: ceph_assert(is_primary()) in PG::scrub()
Hi Ronen, do you mind taking a look? Kefu Chai
04:18 AM Bug #48712 (Resolved): ceph_assert(is_primary()) in PG::scrub()
... Kefu Chai

12/22/2020

08:34 PM Bug #48581: MON: global_init: error reading config file
Oscar Segarra wrote:
> 2020-12-12 00:24:28 /opt/ceph-container/bin/entrypoint.sh: STAYALIVE: container will not die...
Rocky Cardwell
08:00 PM Documentation #23777 (Resolved): doc: description of OSD_OUT_OF_ORDER_FULL problem
I concur with Anthony, and have accordingly changed the status of this issue to RESOLVED. Zac Dover

12/21/2020

10:15 AM Bug #48540: _txc_add_transaction error (17) File exists not handled on operation
I have not seen this behavior anymore, so I would suggest to archive this issue for reference until it happens again ... Arthur S
03:40 AM Bug #48600 (Resolved): osd: valgrind: Invalid read of size 8
Kefu Chai

12/18/2020

02:29 PM Bug #48613: Reproduce https://tracker.ceph.com/issues/48417
investigating using: https://github.com/ideepika/ceph/pull/new/wip-tracker-48613
tuethology command: ...
Deepika Upadhyay
09:58 AM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
... Kefu Chai
03:44 AM Bug #48669 (New): libec_isa.so with TEXTREL for ceph-v15.2.8 on arrch64
Wrong library erasure-code/libec_isa.so after enable isa-l EC for aarch64 platform (commit 9091b7cc32fc0d031ab44dd264... Alexey Shabalin

12/17/2020

11:31 PM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
Deepika Upadhyay wrote:
> master with 5 default: https://pulpito.ceph.com/ideepika-2020-12-16_07:01:58-rados:monthra...
Neha Ojha
07:13 AM Bug #48645 (New): Ceph-OSD octopus memory leak
Hi everyone,
Our team operates a Ceph Cluster(ceph version 15.2.5 (2c93eff00150f0cc5f106a559557a58d3d7b6f1f) octop...
David Marthy
06:45 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
... Kefu Chai

12/16/2020

02:09 PM Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "fail...
master with 5 default: https://pulpito.ceph.com/ideepika-2020-12-16_07:01:58-rados:monthrash-master-distro-basic-smit... Deepika Upadhyay

12/15/2020

08:43 PM Bug #48599 (Fix Under Review): Segmentation fault in ~C_SetManifestRefCountDone()
Neha Ojha
01:37 AM Bug #48599: Segmentation fault in ~C_SetManifestRefCountDone()
https://github.com/ceph/ceph/pull/38576 Myoungwon Oh
12:13 AM Bug #48599: Segmentation fault in ~C_SetManifestRefCountDone()
Ok. I’ll take a look. Myoungwon Oh
07:03 PM Bug #48613 (Resolved): Reproduce https://tracker.ceph.com/issues/48417
Use the osd thrashing tests to reproduce https://tracker.ceph.com/issues/48417.
1. only applies to EC
2. aim is t...
Neha Ojha
06:58 PM Bug #48611 (Resolved): osd: Delay sending info to new backfill peer resetting last_backfill until...
This should be relatively harmless as any osd with lb=MIN wouldn’t have been sent any real IOs anyway. Might be analo... Neha Ojha
06:54 PM Bug #48609 (Closed): osd/PGLog: don’t fast-forward can_rollback_to during merge_log if the log is...
(a) Doing so is wrong specifically for intervals where we go peered but not active.
See PGLog.cc:456, we uncondition...
Neha Ojha

12/14/2020

10:19 PM Bug #48042 (Resolved): Log "ceph health detail" periodically in cluster log
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
10:11 PM Bug #48599: Segmentation fault in ~C_SetManifestRefCountDone()
Probably related to https://github.com/ceph/ceph/pull/37546/commits/29d442b4c7e0be1fd9f765049d00c65a978fa373.
@Myoun...
Neha Ojha
09:03 PM Bug #48599 (Resolved): Segmentation fault in ~C_SetManifestRefCountDone()
... Neha Ojha
09:14 PM Bug #48600 (Fix Under Review): osd: valgrind: Invalid read of size 8
Patrick Donnelly
09:10 PM Bug #48600 (Resolved): osd: valgrind: Invalid read of size 8
... Patrick Donnelly
08:47 PM Bug #47767 (Resolved): octopus: setting noscrub crashed osd process
David Zafman
06:15 PM Backport #48596 (Resolved): octopus: nautilus: qa/standalone/scrub/osd-scrub-test.sh: _scrub_abor...
https://github.com/ceph/ceph/pull/40278 Backport Bot
06:15 PM Backport #48595 (Resolved): nautilus: nautilus: qa/standalone/scrub/osd-scrub-test.sh: _scrub_abo...
https://github.com/ceph/ceph/pull/39125 Backport Bot
06:14 PM Bug #48566 (Pending Backport): nautilus: qa/standalone/scrub/osd-scrub-test.sh: _scrub_abort: re...

This is probably fixed by https://github.com/ceph/ceph/pull/38472 which doesn't have a tracker so this will be that...
David Zafman
03:25 PM Feature #48590 (Rejected): Add ability to blocklist a cephx entity name, a set of entities by a l...
Background:
The need for fencing in a kubernetes multicluster scenario is presented here: https://lists.ceph.io/hy...
Shyamsundar Ranganathan
12:16 PM Backport #48228 (Resolved): octopus: Log "ceph health detail" periodically in cluster log
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38345
m...
Nathan Cutler
03:12 AM Bug #48583 (In Progress): nautilus: Log files are created with rights root:root
Brad Hubbard
01:28 AM Bug #48583 (Resolved): nautilus: Log files are created with rights root:root
On 7148b7c3ae254fbc7796ae63d86ea681c68e0d88... Brad Hubbard

12/13/2020

01:12 PM Bug #48581 (New): MON: global_init: error reading config file
2020-12-12 00:24:28 /opt/ceph-container/bin/entrypoint.sh: STAYALIVE: container will not die if a command fails.,
2...
Oscar Segarra
 

Also available in: Atom