Project

General

Profile

Activity

From 02/29/2020 to 03/29/2020

03/29/2020

01:37 PM Bug #44184: Slow / Hanging Ops after pool creation
When searching through the code I found this in src/mon/OSDMonitor.cc... Wido den Hollander

03/28/2020

09:23 PM Bug #44798 (Fix Under Review): librados mon_command (mgr) command hang
Sage Weil
09:13 PM Bug #44798 (Resolved): librados mon_command (mgr) command hang
- mon starts
- mgr starts
- mgr fetchs mon metadata
- more mons are added to the cluster (post-bootstrap)
- libra...
Sage Weil
12:27 PM Backport #43469 (Resolved): nautilus: asynchronous recovery + backfill might spin pg undersized f...
Kefu Chai
10:46 AM Bug #44797: mon/cephx : trace of a deleted customer in the "auth" index
It was a hidden character.
I do not have the rights to close the ticket
David Casier
10:40 AM Bug #44797 (Closed): mon/cephx : trace of a deleted customer in the "auth" index
... David Casier
12:59 AM Backport #44770 (Resolved): octopus: fast luminous -> nautilus -> octopus upgrade asserts out
Sage Weil
12:57 AM Backport #44717 (Resolved): octopus: osd/PeeringState.cc: 5582: FAILED ceph_assert(ps->is_acting(...
Sage Weil
12:57 AM Bug #44631: ceph pg dump error code 124
/a/sage-2020-03-27_13:32:58-rados-wip-sage3-testing-2020-03-26-1757-distro-basic-smithi/4895381 Sage Weil

03/27/2020

09:45 PM Backport #43257: mimic: monitor config store: Deleting logging config settings does not decrease ...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33327
merged
Yuri Weinstein
03:37 AM Bug #44184: Slow / Hanging Ops after pool creation
Andrew Mitroshin wrote:
> Jan Fajerski wrote:
> > Andrew Mitroshin wrote:
> > > Could you please submit output for...
hoan nv

03/26/2020

10:37 AM Backport #44770 (In Progress): octopus: fast luminous -> nautilus -> octopus upgrade asserts out
Kefu Chai
10:36 AM Backport #44770 (Resolved): octopus: fast luminous -> nautilus -> octopus upgrade asserts out
https://github.com/ceph/ceph/pull/34204 Kefu Chai
10:34 AM Bug #44759 (Pending Backport): fast luminous -> nautilus -> octopus upgrade asserts out
Kefu Chai
07:36 AM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
unfortunately, we still hit this after the patch (https://github.com/ceph/ceph/pull/20571) applied.
we didn't do exp...
huang jun
12:11 AM Bug #44532 (Pending Backport): nautilus: FAILED ceph_assert(head.version == 0 || e.version.versio...
xie xingguo

03/25/2020

11:40 PM Backport #44206 (Resolved): nautilus: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap:...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33530
m...
Nathan Cutler
11:21 PM Backport #44081 (Resolved): nautilus: ceph -s does not show >32bit pg states
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33275
m...
Nathan Cutler
11:21 PM Backport #43997 (Resolved): nautilus: Ceph tools utilizing "global_[pre_]init" no longer process ...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33261
m...
Nathan Cutler
10:15 PM Bug #44759 (Fix Under Review): fast luminous -> nautilus -> octopus upgrade asserts out
Sage Weil
09:58 PM Bug #44759 (Resolved): fast luminous -> nautilus -> octopus upgrade asserts out
... Sage Weil
03:05 PM Bug #44755 (Resolved): Create stronger affinity between drivegroup specs and osd daemons
We currently only show the name of the drivegroup spec in `orch ls`... Joshua Schmid

03/24/2020

10:03 PM Bug #44724: compressor: Set default Zstd compression level to 1
I've closed *PR:* https://github.com/ceph/ceph/pull/34133 in favor of making it a separate commit as part of:
*PR:...
Bryan Stillwell

03/23/2020

11:35 PM Bug #44314: osd-backfill-stats.sh failing intermittently in TEST_backfill_sizeup_out() (degraded ...
Re-purposing this ticket to track a new failure, note that this is different from https://tracker.ceph.com/issues/445... Neha Ojha
10:41 PM Bug #44724: compressor: Set default Zstd compression level to 1
*PR*: https://github.com/ceph/ceph/pull/34133 Bryan Stillwell
10:33 PM Bug #44724 (New): compressor: Set default Zstd compression level to 1
The default compression level of 5 for Zstandard is too high for the majority
of use cases since it requires too man...
Bryan Stillwell
06:55 PM Bug #43807 (Pending Backport): osd-backfill-recovery-log.sh fails
Sage Weil
04:22 PM Bug #43807 (Fix Under Review): osd-backfill-recovery-log.sh fails
Fix 2: https://github.com/ceph/ceph/pull/34126 Neha Ojha
04:17 PM Bug #43807: osd-backfill-recovery-log.sh fails
Neha Ojha wrote:
> /a/sage-2020-03-17_13:59:54-rados-wip-sage-testing-2020-03-17-0740-distro-basic-smithi/4863239
>...
Neha Ojha
03:40 PM Backport #44717 (In Progress): octopus: osd/PeeringState.cc: 5582: FAILED ceph_assert(ps->is_acti...
Nathan Cutler
03:28 PM Backport #44717 (Resolved): octopus: osd/PeeringState.cc: 5582: FAILED ceph_assert(ps->is_acting(...
https://github.com/ceph/ceph/pull/34123 Nathan Cutler
03:39 PM Bug #44184: Slow / Hanging Ops after pool creation
Hi, I'm dealing with similar (not sure whether the same) problem on 13.2.8. I've narrowed it down to osdmaps not bein... Nikola Ciprich
12:57 PM Bug #44715 (Resolved): common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_list.back(...
... Sage Weil
12:16 PM Backport #44711 (Resolved): nautilus: pgs entering premerge state that still need backfill
https://github.com/ceph/ceph/pull/34354 Nathan Cutler

03/22/2020

08:20 PM Backport #44206: nautilus: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33530
merged
Yuri Weinstein
03:56 PM Bug #44684 (Pending Backport): pgs entering premerge state that still need backfill
Sage Weil
09:15 AM Bug #44658: seastar is busting unit tests
> I don't know how dependencies keep getting added without them being added to these install lists, but it is a repea... Kefu Chai
09:04 AM Bug #44658 (Resolved): seastar is busting unit tests
Kefu Chai

03/21/2020

05:46 PM Support #44703 (New): "ceph osd df" reports an OSD using 147GB of disk for 47GB data, 3MB omap, 1...
We are seeing the file miss match in rook 1.2 when using replication instead of earsure coding as ceph object store…s... Shikhar Goel
08:47 AM Bug #44702 (New): Double destroy_qp causes segmentation fault
-4> 2020-03-21T16:17:32.996+0800 ffff93c649f0 -1 osd.26 9142 set_numa_affinity unable to identify public interfac... chunsong feng
06:28 AM Bug #44662 (Resolved): qa/standalone/osd/osd-markdown.sh: markdown_N_impl fails in TEST_markdown_...
Kefu Chai
05:37 AM Bug #44507 (Pending Backport): osd/PeeringState.cc: 5582: FAILED ceph_assert(ps->is_acting(osd_wi...
xie xingguo

03/20/2020

09:20 PM Bug #37875: osdmaps aren't being cleaned up automatically on healthy cluster
This also affects Nautilus v14.2.6. On one of our clusters it caused multiple performance problems when we were rebo... Bryan Stillwell
09:08 PM Bug #43807 (In Progress): osd-backfill-recovery-log.sh fails
Neha Ojha
07:42 PM Backport #44081: nautilus: ceph -s does not show >32bit pg states
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33275
merged
Yuri Weinstein
07:42 PM Backport #43997: nautilus: Ceph tools utilizing "global_[pre_]init" no longer process "early" env...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33261
merged
Yuri Weinstein
01:23 PM Bug #44694 (Duplicate): MON_DOWN during cluster setup
Periodically tests see a MON_DOWN while the cluster is being built. This is presumably because the mons just aren't ... Sage Weil
08:54 AM Bug #44691 (Fix Under Review): mon/caps.sh fails with "Expected return 13, got 0"
Kefu Chai
07:32 AM Bug #44691 (New): mon/caps.sh fails with "Expected return 13, got 0"
... Kefu Chai
08:50 AM Bug #44184: Slow / Hanging Ops after pool creation
Jan Fajerski wrote:
> Andrew Mitroshin wrote:
> > Could you please submit output for the command
> >
> > [...]
...
Andrew Mitroshin

03/19/2020

10:43 PM Backport #44689 (Resolved): nautilus: osd/osd-scrub-repair.sh fails: scrub/osd-scrub-repair.sh:69...
https://github.com/ceph/ceph/pull/35048 Nathan Cutler
10:43 PM Backport #44686 (Resolved): nautilus: osd/osd-backfill-stats.sh TEST_backfill_out2: wait_for_clea...
https://github.com/ceph/ceph/pull/35047 Nathan Cutler
10:43 PM Backport #44685 (Resolved): octopus: osd/osd-backfill-stats.sh TEST_backfill_out2: wait_for_clean...
https://github.com/ceph/ceph/pull/34806 Nathan Cutler
09:27 PM Bug #44684: pgs entering premerge state that still need backfill
it's the mgr's fault..... Sage Weil
09:02 PM Bug #44684 (Resolved): pgs entering premerge state that still need backfill
... Sage Weil
08:59 PM Bug #43807: osd-backfill-recovery-log.sh fails
/a/sage-2020-03-17_13:59:54-rados-wip-sage-testing-2020-03-17-0740-distro-basic-smithi/4863239
Comparing a failed ...
Neha Ojha
07:19 PM Bug #43861 (Resolved): ceph_test_rados_watch_notify hang
Sage Weil
04:44 PM Bug #43914 (Resolved): nautilus: ceph tell command times out
Nathan Cutler
03:14 PM Bug #44662 (Fix Under Review): qa/standalone/osd/osd-markdown.sh: markdown_N_impl fails in TEST_m...
Neha Ojha
08:34 AM Bug #44662: qa/standalone/osd/osd-markdown.sh: markdown_N_impl fails in TEST_markdown_boot
aah, this looks like having the same root cause as https://tracker.ceph.com/issues/44518.
On the monitor side:
...
xie xingguo
01:02 PM Bug #44518 (Pending Backport): osd/osd-backfill-stats.sh TEST_backfill_out2: wait_for_clean timeout
Sage Weil
03:51 AM Bug #44518 (Fix Under Review): osd/osd-backfill-stats.sh TEST_backfill_out2: wait_for_clean timeout
Neha Ojha

03/18/2020

10:33 PM Bug #44518 (In Progress): osd/osd-backfill-stats.sh TEST_backfill_out2: wait_for_clean timeout
Neha Ojha
09:30 PM Bug #43861 (Fix Under Review): ceph_test_rados_watch_notify hang
Sage Weil
09:16 PM Bug #43861: ceph_test_rados_watch_notify hang
i think we should consider jsut dropping this test. it's old test code written by colin almost 10 years ago and i'm ... Sage Weil
09:14 PM Bug #43861: ceph_test_rados_watch_notify hang
another one line comment 5 above... Sage Weil
07:31 PM Bug #44439 (Pending Backport): osd/osd-scrub-repair.sh fails: scrub/osd-scrub-repair.sh:698: TEST...
David Zafman
05:13 PM Bug #44062 (Resolved): LibRadosWatchNotify.WatchNotify failure
Sage Weil
02:16 AM Bug #44062 (In Progress): LibRadosWatchNotify.WatchNotify failure
Brad Hubbard
02:15 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
https://github.com/ceph/ceph/pull/34011 Brad Hubbard
05:13 PM Bug #44582 (Resolved): LibRadosMisc.ShutdownRace
Sage Weil
02:32 PM Feature #44025 (Resolved): Make it harder to set pool replica size to 1
Based on https://github.com/rook/rook/pull/5023#issuecomment-600344198 Neha Ojha
02:23 PM Feature #44025 (Pending Backport): Make it harder to set pool replica size to 1
Deepika Upadhyay
08:15 AM Bug #44184: Slow / Hanging Ops after pool creation
Andrew Mitroshin wrote:
> Could you please submit output for the command
>
> [...]
% ceph osd dump | grep req...
Jan Fajerski
08:09 AM Bug #44184: Slow / Hanging Ops after pool creation
Dan van der Ster wrote:
> See https://tracker.ceph.com/issues/37875
>
> With `ceph pg dump -f json | jq .osd_epoc...
Jan Fajerski
07:38 AM Bug #43975 (Resolved): Slow Requests/OP's types not getting logged
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
07:36 AM Backport #44413 (Resolved): nautilus: FTBFS on s390x in openSUSE Build Service due to presence of...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33716
m...
Nathan Cutler
07:35 AM Backport #44259 (Resolved): nautilus: Slow Requests/OP's types not getting logged
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33503
m...
Nathan Cutler
02:37 AM Bug #44658 (Fix Under Review): seastar is busting unit tests
Kefu Chai

03/17/2020

07:51 PM Bug #44352: pool listings are slow after deleting objects
For other pools, it takes just a fraction of time to list objects with ... Serg Protsun
07:44 PM Bug #44352: pool listings are slow after deleting objects
Abhishek Lekshmanan wrote:
> around 20s were taken to just list contents of the pool which is what happened in the d...
Serg Protsun
07:21 PM Bug #44662 (Resolved): qa/standalone/osd/osd-markdown.sh: markdown_N_impl fails in TEST_markdown_...
... Neha Ojha
06:21 PM Backport #44413: nautilus: FTBFS on s390x in openSUSE Build Service due to presence of -O2 in RPM...
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/33716
merged
Yuri Weinstein
06:16 PM Backport #44259: nautilus: Slow Requests/OP's types not getting logged
Sridhar Seshasayee wrote:
> https://github.com/ceph/ceph/pull/33503
merged
Yuri Weinstein
06:08 PM Bug #43888: osd/osd-bench.sh 'tell osd.N bench' hang
/a/nojha-2020-03-16_17:35:35-rados:standalone-master-distro-basic-smithi/4860664/ Neha Ojha
05:57 PM Bug #43807 (New): osd-backfill-recovery-log.sh fails
Note that this is a resurrection of the same failure with different symptoms
/a/sage-2020-03-17_13:59:54-rados-wip...
Neha Ojha
05:21 PM Bug #43807: osd-backfill-recovery-log.sh fails
/a/sage-2020-03-17_13:59:54-rados-wip-sage-testing-2020-03-17-0740-distro-basic-smithi/4863239 Sage Weil
05:04 PM Bug #44311: crash in Objecter and CRUSH map lookup
Conversation from IRC:
<neha> mahatic: I would like to know Aemerson's thoughts on https://tracker.ceph.com/issues...
Neha Ojha
04:11 PM Bug #44582: LibRadosMisc.ShutdownRace
110 is ETIMEDOUT... Sage Weil
03:54 PM Bug #44582: LibRadosMisc.ShutdownRace
/a/sage-2020-03-17_14:06:37-rados:verify-wip-sage-testing-2020-03-16-2107-distro-basic-smithi/4863370 Sage Weil
04:07 PM Bug #44658 (Resolved): seastar is busting unit tests
run-make-check.sh does not pass any more in seastar tests. I have tried on my old rex box and one of the new vossi ma... Greg Farnum
04:04 PM Bug #44453 (Resolved): mon: fix/improve mon sync over small keys
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:15 PM Bug #44643 (Can't reproduce): leaked buffer (alloc from MonClient::handle_auth_request)
... Sage Weil
08:27 AM Bug #37875: osdmaps aren't being cleaned up automatically on healthy cluster
Hi Greg, are you sure this ticket is a duplicate? In my case, the cluster is healthy with no DOWN OSDs, so it shouldn... Nikola Ciprich
01:46 AM Bug #44507 (Fix Under Review): osd/PeeringState.cc: 5582: FAILED ceph_assert(ps->is_acting(osd_wi...
Neha Ojha

03/16/2020

10:40 PM Bug #44022 (Resolved): mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd ...
Nathan Cutler
10:33 PM Backport #44464 (Resolved): nautilus: mon: fix/improve mon sync over small keys
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33765
m...
Nathan Cutler
10:25 PM Bug #44631: ceph pg dump error code 124
124 -> process was killed by SIGTERM (according to https://www.howtogeek.com/423286/how-to-use-the-timeout-command-on... Sage Weil
10:24 PM Bug #44631 (New): ceph pg dump error code 124
... Sage Weil
10:17 PM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
... Sage Weil
02:52 PM Bug #44184: Slow / Hanging Ops after pool creation
Another customer of ours has reported this behaviour. This cluster was, most likely, installed with Hammer and is now... Wout van Heeswijk
03:58 AM Bug #44062 (Resolved): LibRadosWatchNotify.WatchNotify failure
Brad Hubbard

03/14/2020

03:35 PM Bug #44022: mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd crash
https://github.com/ceph/ceph/pull/33594 merged Yuri Weinstein

03/13/2020

11:23 PM Bug #44518: osd/osd-backfill-stats.sh TEST_backfill_out2: wait_for_clean timeout
/a/sage-2020-03-09_01:44:37-rados:standalone-wip-sage2-testing-2020-03-08-1456-distro-basic-smithi/4838500
queue_r...
Neha Ojha
07:45 PM Bug #44518: osd/osd-backfill-stats.sh TEST_backfill_out2: wait_for_clean timeout
Comparing a passed test with a failed one:
PASSED - note the PG mapping [2,4,3]/[1,0] backfill=[2,3,4]...
Neha Ojha
08:51 PM Bug #44566 (Resolved): ceph tell segv: librados fini vs protocolv2
Sage Weil
12:26 PM Bug #44566: ceph tell segv: librados fini vs protocolv2
for the record, all threads in the second instance:... Sage Weil
12:25 PM Bug #44566 (Fix Under Review): ceph tell segv: librados fini vs protocolv2
Sage Weil
12:22 PM Bug #44566: ceph tell segv: librados fini vs protocolv2
I think this is related to the fix for #44526, https://github.com/ceph/ceph/pull/33825, which skips rados shutdown.
...
Sage Weil
12:20 PM Bug #44566: ceph tell segv: librados fini vs protocolv2
again... Sage Weil
04:11 PM Bug #44184: Slow / Hanging Ops after pool creation
Could you please submit output for the command... Andrew Mitroshin
12:57 PM Bug #44184: Slow / Hanging Ops after pool creation
Jan Fajerski wrote:
> Can confirm seeing issues with osd map pruning ("oldest_map": 41985 vs "newest_map": 83376) an...
Dan van der Ster
12:53 PM Bug #44184: Slow / Hanging Ops after pool creation
Can confirm seeing issues with osd map pruning ("oldest_map": 41985 vs "newest_map": 83376) and large osd_map_cache_m... Jan Fajerski
12:12 PM Bug #44595 (New): cache tiering: Error: oid 48 copy_from 493 returned error code -2
... Sage Weil

03/12/2020

08:59 PM Bug #44586 (New): Deleting a pool w/ in-flight ops might crash client osdc
The rbd-mirror test cases conclude the test by deleting the pools just to ensure the daemons survive. It appears that... Jason Dillaman
05:32 PM Bug #44243: memstore make check test fails
here is a similar failure from ceph_test_objectstore:... Sage Weil
03:14 PM Bug #44582 (Resolved): LibRadosMisc.ShutdownRace
... Sage Weil
02:32 PM Bug #44352: pool listings are slow after deleting objects
around 20s were taken to just list contents of the pool which is what happened in the debug logs, what time is taken ... Abhishek Lekshmanan
02:16 PM Bug #44184: Slow / Hanging Ops after pool creation
So this message came along on the users mailinglist:... Wido den Hollander

03/11/2020

05:27 PM Bug #44566 (Resolved): ceph tell segv: librados fini vs protocolv2
... Sage Weil
02:35 PM Bug #44184: Slow / Hanging Ops after pool creation
Forgot to mention: the cluster has been running since Jewel 10.2.4. I think upgrading from Jewel or older seems to be... Paul Emmerich
02:31 PM Bug #44184: Slow / Hanging Ops after pool creation
Oh, running mostly 14.2.4 and some OSDs on 14.2.5 on Ubuntu 16.04. Small cluster with only 127 OSDs. Paul Emmerich
02:29 PM Bug #44184: Slow / Hanging Ops after pool creation
I've encountered another cluster in the wild that hit this bug. It seems to be triggered somewhat reliable a few hour... Paul Emmerich
02:38 AM Bug #44062 (In Progress): LibRadosWatchNotify.WatchNotify failure
Brad Hubbard
02:02 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
i think https://github.com/ceph/ceph/pull/33871 may help... Sage Weil
12:43 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
/a/sage-2020-03-10_16:51:17-rados-wip-sage3-testing-2020-03-10-1037-distro-basic-smithi/4844006
I think this is fa...
Sage Weil

03/10/2020

10:20 PM Bug #44373 (Resolved): objecter: invalid read
Sage Weil
10:03 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
... Sage Weil
10:01 PM Bug #44518: osd/osd-backfill-stats.sh TEST_backfill_out2: wait_for_clean timeout
slightly different versoin of this:... Sage Weil
10:00 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
/a/sage-2020-03-10_16:51:17-rados-wip-sage3-testing-2020-03-10-1037-distro-basic-smithi/4844127 Sage Weil
06:36 PM Bug #43861: ceph_test_rados_watch_notify hang
... Sage Weil
04:48 PM Backport #44464 (In Progress): nautilus: mon: fix/improve mon sync over small keys
Nathan Cutler
04:47 PM Backport #44464 (Resolved): nautilus: mon: fix/improve mon sync over small keys
Nathan Cutler
03:15 PM Bug #44184: Slow / Hanging Ops after pool creation
Hi everyone, we were able to collect some debug logs that seems to exhibit this case. the log is fairly large, you ca... Jan Fajerski
01:23 PM Bug #44420: cephadm cluster: "ceph ping mon.*" works fine, but "ceph ping mon.<id>" is broken
might be a cephadm issue. Sebastian Wagner
06:30 AM Bug #44536 (New): Segmentation fault when it sets ENABLE_COVERAGE:BOOL=ON
1、It sets option(ENABLE_COVERAGE "Coverage is enabled" ON) in CMakeLists.txt
2、
root@dev:/data/liugangbiao/zy/cep...
yin zheng
12:12 AM Bug #44362: osd: uninitialized memory in sendmsg
@Yehuda: what's the status of this ticket?
Are you able to replicate the issue locally or it happens solely at sepia?
Radoslaw Zarzynski

03/09/2020

11:57 PM Bug #44532 (Resolved): nautilus: FAILED ceph_assert(head.version == 0 || e.version.version > head...
Run: http://pulpito.ceph.com/yuriw-2020-03-07_18:26:25-rados-wip-yuri8-testing-2020-03-06-2005-nautilus-distro-basic-... Yuri Weinstein
08:28 PM Bug #43865 (Resolved): osd-scrub-test.sh fails date check
Sage Weil
07:18 PM Bug #44427: osd: stuck during shutdown
/a/sage-2020-03-09_14:07:51-rados-wip-sage4-testing-2020-03-09-0634-distro-basic-smithi/4841228
Sage Weil
07:06 PM Bug #39039 (Need More Info): mon connection reset, command not resent
can't reproduce :( Sage Weil
07:06 PM Bug #44517: osd/osd-backfill-space.sh TEST_backfill_multi_partial: pgs didn't go active+clean
see http://pulpito.ceph.com/sage-2020-03-09_01:44:37-rados:standalone-wip-sage2-testing-2020-03-08-1456-distro-basic-... Sage Weil
12:12 PM Bug #44517 (New): osd/osd-backfill-space.sh TEST_backfill_multi_partial: pgs didn't go active+clean
... Sage Weil
07:06 PM Bug #44518: osd/osd-backfill-stats.sh TEST_backfill_out2: wait_for_clean timeout
see http://pulpito.ceph.com/sage-2020-03-09_01:44:37-rados:standalone-wip-sage2-testing-2020-03-08-1456-distro-basic-... Sage Weil
12:14 PM Bug #44518 (Resolved): osd/osd-backfill-stats.sh TEST_backfill_out2: wait_for_clean timeout
... Sage Weil
06:42 AM Bug #43382: medium io/system load causes quorum failure
couldnt reproduce, close Anonymous
06:41 AM Bug #43185: ceph -s not showing client activity
fixed with 14.2.8 Anonymous
03:06 AM Bug #44439 (Resolved): osd/osd-scrub-repair.sh fails: scrub/osd-scrub-repair.sh:698: TEST_repair_...
Sage Weil

03/08/2020

11:49 PM Bug #44510 (New): osd/osd-recovery-space.sh TEST_recovery_test_simple failure
... Sage Weil
07:53 PM Bug #43865 (Fix Under Review): osd-scrub-test.sh fails date check
Sage Weil
04:30 PM Bug #39039: mon connection reset, command not resent
trying to reproduce here:
http://pulpito.ceph.com/sage-39039/
http://pulpito.ceph.com/sage-39039-lessdebug/
Sage Weil
04:28 PM Bug #44229 (Can't reproduce): monclient: _check_auth_rotating possible clock skew, rotating keys ...
Sage Weil
02:13 AM Bug #44507 (Resolved): osd/PeeringState.cc: 5582: FAILED ceph_assert(ps->is_acting(osd_with_shard...
... Sage Weil

03/07/2020

07:54 PM Bug #39039 (In Progress): mon connection reset, command not resent
/a/sage-2020-03-07_14:00:00-rados-master-distro-basic-smithi/4834734 Sage Weil
02:21 PM Bug #44454 (Resolved): expected valgrind issues and found none
Sage Weil
01:57 PM Bug #44454 (In Progress): expected valgrind issues and found none
Sage Weil
02:13 PM Bug #44362: osd: uninitialized memory in sendmsg
hmm, seeing this now on master, after the existing whitelist was updated to the new symbols in 31a7a461382a3a979c12e1... Sage Weil
04:44 AM Bug #44362: osd: uninitialized memory in sendmsg
@sage I think we can close it. It seems that my research tracks @rzarzynski's, so I'll take his original conclusions. Yehuda Sadeh
12:58 PM Bug #43861: ceph_test_rados_watch_notify hang
no output at all, like comment 2 above:
/a/sage-2020-03-06_17:29:42-rados-wip-sage4-testing-2020-03-05-1645-distro-b...
Sage Weil
08:09 AM Bug #43185: ceph -s not showing client activity
new dump...after disabling almost all mgr modules Anonymous

03/06/2020

11:39 PM Bug #43865: osd-scrub-test.sh fails date check
reproducing this here: http://pulpito.ceph.com/sage-2020-03-06_22:05:09-rados:standalone-wip-sage4-testing-2020-03-05... Sage Weil
10:52 PM Bug #43862 (Can't reproduce): mkfs fsck found fatal error: (2) No such file or directory during c...
Sage Weil
06:15 PM Feature #43377: Make Zstandard compression level a configurable option
*PR*: https://github.com/ceph/ceph/pull/33790 Bryan Stillwell
05:39 PM Bug #44362: osd: uninitialized memory in sendmsg
Merged https://github.com/ceph/ceph/pull/33757 ... should we keep this open or close it? Sage Weil
03:01 PM Bug #43882 (Need More Info): osd to mon connection lost, osd stuck down
i thought i reproduced this, but it was a bug in another PR i was testing. Sage Weil
02:40 PM Bug #43882 (In Progress): osd to mon connection lost, osd stuck down
Sage Weil
12:28 PM Bug #43150 (Resolved): osd-scrub-snaps.sh fails
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
12:25 PM Backport #44070 (Resolved): luminous: Add builtin functionality in ceph-kvstore-tool to repair co...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33195
m...
Nathan Cutler
12:05 PM Backport #43852 (Resolved): nautilus: osd-scrub-snaps.sh fails
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33274
m...
Nathan Cutler
10:37 AM Backport #44490 (Resolved): nautilus: lz4 compressor corrupts data when buffers are unaligned
https://github.com/ceph/ceph/pull/35004 Nathan Cutler
10:36 AM Backport #44489 (Rejected): mimic: lz4 compressor corrupts data when buffers are unaligned
https://github.com/ceph/ceph/pull/35054 Nathan Cutler
10:33 AM Backport #44486 (Resolved): nautilus: Nautilus: Random mon crashes in failed assertion at ceph::t...
https://github.com/ceph/ceph/pull/34542 Nathan Cutler
10:30 AM Backport #44468 (Resolved): nautilus: mon: Get session_map_lock before remove_session
https://github.com/ceph/ceph/pull/34677 Nathan Cutler
10:30 AM Backport #44467 (Rejected): mimic: mon: Get session_map_lock before remove_session
Nathan Cutler
10:30 AM Backport #44464 (Resolved): nautilus: mon: fix/improve mon sync over small keys
https://github.com/ceph/ceph/pull/33765 Nathan Cutler
07:03 AM Bug #44454 (Resolved): expected valgrind issues and found none
Kefu Chai
03:31 AM Bug #44454 (In Progress): expected valgrind issues and found none
running with suite-repo pointing to the commit just *before* the py3 task merge faf701d33aeb6e1657c969a41223b37a6972b... Sage Weil
04:34 AM Bug #44439 (Fix Under Review): osd/osd-scrub-repair.sh fails: scrub/osd-scrub-repair.sh:698: TEST...
David Zafman
02:47 AM Bug #44439: osd/osd-scrub-repair.sh fails: scrub/osd-scrub-repair.sh:698: TEST_repair_stats_ec: ...
This did reproduce after multiple runs. I added a flush_pg_stats and run it many times without seeing the failure. David Zafman
03:37 AM Bug #44373 (Fix Under Review): objecter: invalid read
Fix at https://github.com/ceph/ceph/pull/33771 Adam Emerson

03/05/2020

10:13 PM Bug #44454 (Resolved): expected valgrind issues and found none
http://pulpito.ceph.com/sage-2020-03-05_19:46:30-rados:valgrind-leaks-wip-sage4-testing-2020-03-05-0754-distro-basic-... Sage Weil
09:06 PM Bug #44439: osd/osd-scrub-repair.sh fails: scrub/osd-scrub-repair.sh:698: TEST_repair_stats_ec: ...
hmm, does not reproduce locally for me. Neha Ojha
01:20 PM Bug #44439 (Resolved): osd/osd-scrub-repair.sh fails: scrub/osd-scrub-repair.sh:698: TEST_repair_...
... Sage Weil
08:19 PM Bug #44453: mon: fix/improve mon sync over small keys
Nautilus backport: https://github.com/ceph/ceph/pull/33765 Dan van der Ster
07:18 PM Bug #44453 (Resolved): mon: fix/improve mon sync over small keys
Background: [ceph-users] Can't add a ceph-mon to existing large cluster Neha Ojha
07:32 PM Bug #42830: problem returning mon to cluster
Workaround in our case is: `ceph config set mon mon_sync_max_payload_size 4096`
We have 5 mons again!
Dan van der Ster
02:35 PM Bug #42830: problem returning mon to cluster
I also posted this on the mailinglist, but let me post it here as well:... Wido den Hollander
04:45 PM Backport #44070: luminous: Add builtin functionality in ceph-kvstore-tool to repair corrupted key...

> https://github.com/ceph/ceph/pull/33195
merged
Yuri Weinstein
02:02 PM Bug #44385 (Resolved): ClsHello.WriteReturnData failure
Sage Weil
01:19 PM Bug #41923 (Can't reproduce): 3 different ceph-osd asserts caused by enabling auto-scaler
Sage Weil

03/04/2020

10:25 PM Bug #44311 (New): crash in Objecter and CRUSH map lookup
Neha Ojha
01:42 PM Bug #44311: crash in Objecter and CRUSH map lookup
Scratch that. If you replace qa/workunits/rbd/read-flags.sh with this script https://gist.github.com/MahatiC/a4bf4310... Mahati Chamarthy
01:35 PM Bug #44311: crash in Objecter and CRUSH map lookup
Neha Ojha wrote:
> Is this something that started appearing recently? Do you have a commit or version that works for...
Mahati Chamarthy
10:11 PM Bug #44400: Marking OSD out causes primary-affinity 0 to be ignored when up_set has no common OSD...
This is worth investigating, currently nothing in the choose_acting() function looks at primary-affinity. Neha Ojha
10:07 PM Bug #44348 (Resolved): thrasher can trigger osd shutdown
Neha Ojha
09:40 PM Bug #44427 (New): osd: stuck during shutdown
... Sage Weil
08:55 PM Bug #44362: osd: uninitialized memory in sendmsg
The hole represented by @filler@ is supposed to carry two things:
* zero-byte long ciphertext's fragment acquired fr...
Radoslaw Zarzynski
07:58 PM Bug #37656 (Triaged): FileStore::_do_transaction() crashed with error 17 (merge collection vs osd...
Sage Weil
07:56 PM Bug #37656: FileStore::_do_transaction() crashed with error 17 (merge collection vs osd restart)
the merge happens right before we shut down:... Sage Weil
07:40 PM Bug #37656: FileStore::_do_transaction() crashed with error 17 (merge collection vs osd restart)
... Sage Weil
02:30 PM Bug #44420 (Fix Under Review): cephadm cluster: "ceph ping mon.*" works fine, but "ceph ping mon....
$SUBJ says it all, almost - The error is:... Nathan Cutler
12:52 PM Bug #43365 (Pending Backport): Nautilus: Random mon crashes in failed assertion at ceph::time_det...
Kefu Chai
05:44 AM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
... Kefu Chai
12:43 PM Bug #44407 (Pending Backport): mon: Get session_map_lock before remove_session
Kefu Chai
10:33 AM Bug #44407 (Fix Under Review): mon: Get session_map_lock before remove_session
Kefu Chai
06:08 AM Bug #44407 (Resolved): mon: Get session_map_lock before remove_session
We should protect session_map with session_map_lock. Xiaofei Cui
10:59 AM Backport #44413 (In Progress): nautilus: FTBFS on s390x in openSUSE Build Service due to presence...
Kefu Chai
10:58 AM Backport #44413 (Resolved): nautilus: FTBFS on s390x in openSUSE Build Service due to presence of...
https://github.com/ceph/ceph/pull/33716 Kefu Chai
04:45 AM Bug #39525 (Pending Backport): lz4 compressor corrupts data when buffers are unaligned
Kefu Chai
12:16 AM Bug #44385 (Fix Under Review): ClsHello.WriteReturnData failure
reproduced locally by making the test loop and setting ms_inject_socket_failures=500 on the osd. confirmed this fixe... Sage Weil
12:00 AM Bug #44385 (In Progress): ClsHello.WriteReturnData failure
Sage Weil

03/03/2020

10:13 PM Bug #44362 (In Progress): osd: uninitialized memory in sendmsg
Sage Weil
03:15 AM Bug #44362: osd: uninitialized memory in sendmsg
It seems to me that the specific commit just exposed an existing issue that for some reason did't show up before (lik... Yehuda Sadeh
07:23 PM Bug #44400 (Won't Fix): Marking OSD out causes primary-affinity 0 to be ignored when up_set has n...
Process:
Set primary-affinity 0 on osd.0
Watch 'ceph osd ls-by-primary osd.0' until it has 0 PGs listed.
Mark os...
Wes Dillingham
07:12 PM Bug #43150: osd-scrub-snaps.sh fails
https://github.com/ceph/ceph/pull/33274 merged Yuri Weinstein
04:09 PM Bug #43365 (Fix Under Review): Nautilus: Random mon crashes in failed assertion at ceph::time_det...
Sage Weil
12:58 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
Hi,
same behaviour for us: one of the 3 mons crashes randomly, nearly once per day.
We are using Ceph 14.2.6 PVE ...
David DELON
04:09 PM Bug #44311: crash in Objecter and CRUSH map lookup
Is this something that started appearing recently? Do you have a commit or version that works for this same command? ... Neha Ojha
02:48 PM Bug #44184: Slow / Hanging Ops after pool creation
We've got similar case with a plenty of slow op indications many of them are osd_op_create ones.
Which eventually g...
Igor Fedotov
12:34 AM Bug #44388 (New): osd: valgrind: Invalid read of size 8
... Patrick Donnelly

03/02/2020

10:29 PM Bug #44362: osd: uninitialized memory in sendmsg
the takeaway from http://pulpito.ceph.com/sage-2020-03-02_17:19:00-rados:verify-master-distro-basic-smithi/ is that t... Sage Weil
09:08 PM Bug #44362: osd: uninitialized memory in sendmsg
The regression is between these commits: d27f512d1731988cf7f369559f2fc324f1592047..7b0e18c09eb6060ee23f00c06dac4203a2... Sage Weil
08:39 PM Bug #44385 (Resolved): ClsHello.WriteReturnData failure
... Sage Weil
06:57 PM Bug #44311: crash in Objecter and CRUSH map lookup
To give more context, this issue is blocking progress on rbd op threads config change -> https://github.com/ceph/ceph... Mahati Chamarthy
06:04 PM Bug #44358 (Resolved): messenger addr nonces aren't unique with cephadm
Sage Weil
02:06 PM Bug #44373 (Resolved): objecter: invalid read
... Sage Weil
12:36 PM Backport #44370 (Resolved): nautilus: msg/async: the event center is blocked by rdma construct co...
https://github.com/ceph/ceph/pull/34780 Nathan Cutler
12:36 PM Backport #44369 (Rejected): mimic: msg/async: the event center is blocked by rdma construct conec...
Nathan Cutler
12:36 PM Backport #44368 (Rejected): mimic: Rados should use the '-o outfile' convention
Nathan Cutler

03/01/2020

11:00 PM Bug #44362 (Can't reproduce): osd: uninitialized memory in sendmsg
... Sage Weil
10:55 PM Bug #44358 (Fix Under Review): messenger addr nonces aren't unique with cephadm
Sage Weil
07:47 AM Bug #42452 (Pending Backport): msg/async: the event center is blocked by rdma construct conection...
Kefu Chai
04:18 AM Backport #44360 (In Progress): nautilus: Rados should use the '-o outfile' convention
Kefu Chai
04:18 AM Backport #44360: nautilus: Rados should use the '-o outfile' convention
https://github.com/ceph/ceph/pull/33641 Kefu Chai
04:17 AM Backport #44360 (Resolved): nautilus: Rados should use the '-o outfile' convention
https://github.com/ceph/ceph/pull/33641 Kefu Chai
04:08 AM Bug #42477 (Pending Backport): Rados should use the '-o outfile' convention
we have to backport this change, otherwise we have ... Kefu Chai

02/29/2020

06:24 AM Bug #43185: ceph -s not showing client activity
... Anonymous
06:19 AM Bug #43185: ceph -s not showing client activity
... Anonymous
12:18 AM Bug #44314: osd-backfill-stats.sh failing intermittently in TEST_backfill_sizeup_out() (degraded ...

The kick_recovery_wq didn't get backfill restarted on the failed run. Or a recovery attempt (periodic?) was someho...
David Zafman
 

Also available in: Atom