Project

General

Profile

Activity

From 01/08/2020 to 02/06/2020

02/06/2020

11:55 PM Feature #44025: Make it harder to set pool replica size to 1
Neha Ojha wrote:
> Setting pool size to 1 is dangerous. Add an option like yes_i_really_really_mean_it, similar to w...
Neha Ojha
11:50 PM Feature #44025 (Resolved): Make it harder to set pool replica size to 1
Setting pool size to 1 is dangerous. Add an option like yes_i_really_really_mean_it, similar to what we have for pool... Neha Ojha
11:53 PM Bug #44024 (Fix Under Review): change in utime_t rendering ('T' separator) conflicts with cache t...
Sage Weil
11:26 PM Bug #44024 (Resolved): change in utime_t rendering ('T' separator) conflicts with cache tiering h...
crash like... Sage Weil
06:15 PM Bug #44022 (Resolved): mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd ...
The crash happens on a mimic OSD. Telemetry crash reports have been reporting similar crashes in 14.2.4(may or may no... Neha Ojha
12:50 PM Bug #44015 (New): Cant compile src/tools/rados/rados.cc on 32 bit systems
On my machine size_t is unsigned int. This causes an overflow in src/tools/rados/rados.cc:776: max_obj_len = 5ull * 1... Stefan Bischoff
04:48 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired

For all log entries of all OSDs at 2020-01-28 03:18 with pg[ information and osd primary these are the log lines th...
David Zafman
03:26 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
From mgr.x's log after the last time pg_stats are received we see ... Neha Ojha
03:42 AM Bug #44004: "ceph" command crashes
not reproducible in my testbed. Kefu Chai
03:36 AM Bug #44004: "ceph" command crashes
Sometimes the "ceph" command fails with a segmentation fault, here is the core_backtrace. It seems that it has someth... Xuehan Xu
02:59 AM Bug #44004 (Can't reproduce): "ceph" command crashes
On the most recent master, after building, I ran the command "./bin/ceph -s --connect-timeout 1 -c /home/xuxuehan/cep... Xuehan Xu
01:37 AM Feature #42638 (Fix Under Review): Allow specifying pg_autoscale_mode when creating a new pool
Neha Ojha

02/05/2020

10:20 PM Bug #43893: lingering osd_failure ops (due to failure_info holding references?)
Hmm that prepare_failure() does look like it's behaving a little differently than some of the regular op flow; we mus... Greg Farnum
09:27 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
Logs: /a/dzafman-2020-01-27_22:00:09-upgrade:mimic-x-master-distro-basic-smithi/4712686
There is more than one bu...
Neha Ojha
07:43 PM Backport #43997 (Resolved): nautilus: Ceph tools utilizing "global_[pre_]init" no longer process ...
https://github.com/ceph/ceph/pull/33261 Nathan Cutler
07:43 PM Backport #43996 (Rejected): mimic: Ceph tools utilizing "global_[pre_]init" no longer process "ea...
Nathan Cutler
07:42 PM Backport #43992 (Rejected): nautilus: objecter doesn't send osd_op
Nathan Cutler
07:42 PM Backport #43991 (Rejected): mimic: objecter doesn't send osd_op
Nathan Cutler
07:42 PM Backport #43989 (Resolved): nautilus: osd: Allow 64-char hostname to be added as the "host" in CRUSH
https://github.com/ceph/ceph/pull/33147 Nathan Cutler
07:42 PM Backport #43988 (Rejected): luminous: osd: Allow 64-char hostname to be added as the "host" in CRUSH
https://github.com/ceph/ceph/pull/33146 Nathan Cutler
07:42 PM Backport #43987 (Resolved): mimic: osd: Allow 64-char hostname to be added as the "host" in CRUSH
https://github.com/ceph/ceph/pull/33145 Nathan Cutler
05:39 PM Bug #42347 (Won't Fix): nautilus assert during osd shutdown: FAILED ceph_assert((sharded_in_fligh...
we've backported the osd fast shutdown ( https://github.com/ceph/ceph/pull/32743 ), so this will effectively go away ... Sage Weil
01:39 PM Bug #43975: Slow Requests/OP's types not getting logged
- Types - src/osd/OpRequest.h... Vikhyat Umrao
12:54 PM Bug #43975 (Resolved): Slow Requests/OP's types not getting logged
- From ceph.log... Vikhyat Umrao

02/04/2020

11:46 AM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
-The problem is not only about heap corruption. Stacks are affected as well. Moreover, there is an interesting corrup... Radoslaw Zarzynski
03:28 AM Bug #43813 (Pending Backport): objecter doesn't send osd_op
Sage Weil

02/03/2020

09:49 PM Bug #43954 (New): Issue health warning or error if MON or OSD daemons are holding onto excessive ...
Brad Hubbard
08:35 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
`Thread 63 (Thread 0x7f2e36318700 (LWP 55988))` is poisoned as well.... Radoslaw Zarzynski
08:00 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
It looks that a freshly heap-allocated `OSDMap` instance got corrupted:... Radoslaw Zarzynski
02:35 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
It looks the entire `PGTempMap::data` has been corrupted:... Radoslaw Zarzynski
01:12 PM Bug #43948 (New): Remapped PGs are sometimes not deleted from previous OSDs
I noticed on several clusters (all Nautilus 14.2.6) that on occasion, some OSDs may still hold data for some PGs long... Eric Petit

02/01/2020

04:53 PM Bug #43861: ceph_test_rados_watch_notify hang
same?
/a/sage-2020-02-01_03:27:35-rados-wip-sage-testing-2020-01-31-1746-distro-basic-smithi/4723146
ceph_test_wa...
Sage Weil

01/31/2020

11:31 PM Bug #43795 (Pending Backport): Ceph tools utilizing "global_[pre_]init" no longer process "early"...
Sage Weil
11:09 PM Bug #43185: ceph -s not showing client activity
Can you grab a wallclock profiler dump from the mgr process when its usage goes to 100%?
Learn more about how to use...
Neha Ojha
06:41 AM Bug #43185: ceph -s not showing client activity
strace for the hanging mgr thread... Anonymous
06:37 AM Bug #43185: ceph -s not showing client activity
There's almost no load apart from scrubbing, like this is pretty average io:
client: 20 MiB/s rd, 61 MiB/s w...
Anonymous
10:34 PM Bug #43365 (Closed): Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signe...
FWIW teh two clusters reporting this crash via telemetry are both ubuntu 18.04
closing this as not a ceph issue; l...
Sage Weil
06:02 PM Bug #43813 (Fix Under Review): objecter doesn't send osd_op
Sage Weil
03:50 AM Bug #43813 (In Progress): objecter doesn't send osd_op
Sage Weil
03:46 AM Bug #43813: objecter doesn't send osd_op
/a/sage-2020-01-30_22:27:29-rados-wip-sage-testing-2020-01-30-1230-distro-basic-smithi/4719487... Sage Weil
05:24 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
Only happens when upgrading from mimic to nautilus, see https://tracker.ceph.com/issues/43048#note-7. Neha Ojha
12:23 PM Bug #43885: failed to reach quorum size 9 before timeout expired
Update: Tried running the test a few times but haven't been able to reproduce it. I will continue my attempts. In the... Sridhar Seshasayee
06:37 AM Bug #43885: failed to reach quorum size 9 before timeout expired
There does not appear to be a crash in this case, but there is an election that seems to take a long time followed by... Brad Hubbard
10:24 AM Bug #43929 (Pending Backport): osd: Allow 64-char hostname to be added as the "host" in CRUSH
Kefu Chai
10:16 AM Bug #43929 (Resolved): osd: Allow 64-char hostname to be added as the "host" in CRUSH
On Linux system it is possible to set 64 character length hostname when
HOST_NAME_MAX is set to 64. It means that if...
Michal Skalski
09:46 AM Backport #43928 (In Progress): nautilus: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Nathan Cutler
09:43 AM Backport #43928 (Resolved): nautilus: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
https://github.com/ceph/ceph/pull/33007 Nathan Cutler
09:43 AM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
But if the issue was introduced in 2008, then we'd need to backport further than nautilus... Nathan Cutler
09:42 AM Bug #42977 (Pending Backport): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Adding nautilus backport per Greg's comment "looking at the nautilus code it is susceptible to this too." Nathan Cutler
03:56 AM Bug #42977 (Resolved): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Sage Weil
01:33 AM Bug #37875: osdmaps aren't being cleaned up automatically on healthy cluster
https://github.com/ceph/ceph/pull/19076 is a possible solution to this issue. Brad Hubbard

01/30/2020

11:03 PM Bug #43602 (Won't Fix): Core dumps not collected in standalone tests for distros using systemd-co...
The real fix done elsewhere is to configure the core location so that systemd-coredump is not used. It isn't worth t... David Zafman
04:43 PM Bug #43602 (Fix Under Review): Core dumps not collected in standalone tests for distros using sys...
Sage Weil
04:43 PM Bug #43602 (Resolved): Core dumps not collected in standalone tests for distros using systemd-cor...
Sage Weil
08:30 PM Backport #43651 (Resolved): luminous: Improve upmap change reporting in logs
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32666
m...
Nathan Cutler
08:29 PM Backport #43651 (In Progress): luminous: Improve upmap change reporting in logs
Nathan Cutler
07:40 PM Backport #43919 (Resolved): nautilus: osd stuck down
https://github.com/ceph/ceph/pull/35024 Nathan Cutler
07:39 PM Backport #43916 (Resolved): nautilus: mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) ...
https://github.com/ceph/ceph/pull/33155 Nathan Cutler
04:46 PM Bug #43864 (Resolved): osd/repro_long_log.sh failure
Should have been fixed by https://github.com/ceph/ceph/pull/32945. Neha Ojha
04:17 PM Bug #43864: osd/repro_long_log.sh failure
/a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718221 Sage Weil
04:41 PM Bug #43889: expected MON_CLOCK_SKEW but got none
/a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718332
Sage Weil
04:16 PM Bug #43889: expected MON_CLOCK_SKEW but got none
/a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718133 Sage Weil
04:40 PM Bug #43915 (New): leaked Session (alloc from OSD::ms_handle_authentication)
... Sage Weil
04:37 PM Bug #43914 (Need More Info): nautilus: ceph tell command times out
see https://github.com/ceph/ceph/pull/32989 Sage Weil
04:35 PM Bug #43914 (Resolved): nautilus: ceph tell command times out
... Sage Weil
04:17 PM Bug #43885: failed to reach quorum size 9 before timeout expired
/a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718154
description: rados/...
Sage Weil
03:19 PM Feature #43910 (New): Utilize new Linux kernel v5.6 prctl PR_SET_IO_FLUSHER option
See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d19f1c8e1937baf74e1962aae9f90fa3ae... Jason Dillaman
02:51 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
the second time,... Sage Weil
02:50 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
if i start the osd manually, i can reproduce the same crash:... Sage Weil
02:48 PM Bug #43903 (Resolved): osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
... Sage Weil
12:37 PM Bug #42977 (Fix Under Review): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Hmm this not-returning issue seems to date from 2008 (3859475bbfafb8754841af41044cb41124e87fc7); I'm not sure why it'... Greg Farnum
10:42 AM Bug #42977 (In Progress): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Yep looks like something went horribly wrong in refactoring — we correctly call the new election on receiving an old ... Greg Farnum
09:57 AM Documentation #43896 (Resolved): nautilus upgrade should recommend ceph-osd restarts after enabli...
Following an upgrade to nautilus and `ceph mon enable-msgr2`, running nautilus osds will not yet bind to their v2 add... Dan van der Ster
07:22 AM Bug #43893: lingering osd_failure ops (due to failure_info holding references?)
> We can clear that slow op either by restarting mon.cepherin-mon-7cb9b591e1 or with `ceph osd fail osd.170`.
too ...
Dan van der Ster
07:21 AM Bug #43893 (Duplicate): lingering osd_failure ops (due to failure_info holding references?)
On Nautilus v14.2.6 we see osd_failure ops which linger:... Dan van der Ster
04:11 AM Bug #43892 (Pending Backport): mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during ...
Sage Weil

01/29/2020

11:18 PM Bug #43892 (Fix Under Review): mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during ...
Sage Weil
11:15 PM Bug #43892 (Resolved): mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during n->o upg...
... Sage Weil
10:07 PM Bug #43885: failed to reach quorum size 9 before timeout expired
I wonder if this is somehow related to the election issue we saw in https://tracker.ceph.com/issues/42977. Seems to b... Neha Ojha
01:14 PM Bug #43885 (Can't reproduce): failed to reach quorum size 9 before timeout expired
This pops up occasionally. Here is a recent one:... Sage Weil
09:15 PM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
I think defer() is called by mon.e in receive_propose() because of the following... Neha Ojha
07:39 PM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
on mon.g (3), the epoch is 55 (or looks that way, it just sent these):... Sage Weil
12:41 AM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Let's see what happened in /a/sage-2020-01-24_01:55:08-rados-wip-sage4-testing-2020-01-23-1347-distro-basic-smithi/46... Neha Ojha
07:19 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired

I'm seeing a lot of this in a sample of log segments from osd.6 which is reporting the slow ops. The log for osd.6...
David Zafman
03:55 PM Bug #43882 (Need More Info): osd to mon connection lost, osd stuck down
adding debug: https://github.com/ceph/ceph/pull/32968 Sage Weil
01:06 PM Bug #43882 (Can't reproduce): osd to mon connection lost, osd stuck down
This is a similar symptom to #43825, but it does not appear to be related to split/merge.
OSD is marked down, but ...
Sage Weil
01:45 PM Bug #43889 (Resolved): expected MON_CLOCK_SKEW but got none
description: rados/multimon/{clusters/6.yaml msgr-failures/many.yaml msgr/async.yaml
no_pools.yaml objectstore...
Sage Weil
01:44 PM Bug #43888: osd/osd-bench.sh 'tell osd.N bench' hang
https://github.com/ceph/ceph/pull/32961 to debug Sage Weil
01:41 PM Bug #43888 (Resolved): osd/osd-bench.sh 'tell osd.N bench' hang
... Sage Weil
01:36 PM Bug #43887 (Resolved): ceph_test_rados_delete_pools_parallel failure
... Sage Weil
01:23 PM Bug #43825 (Pending Backport): osd stuck down
Sage Weil
10:03 AM Backport #43881 (Resolved): mimic: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
https://github.com/ceph/ceph/pull/33154 Nathan Cutler
10:03 AM Backport #43880 (Rejected): luminous: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
https://github.com/ceph/ceph/pull/33153 Nathan Cutler
10:03 AM Backport #43879 (Resolved): nautilus: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
https://github.com/ceph/ceph/pull/33152 Nathan Cutler

01/28/2020

11:22 PM Bug #43864 (In Progress): osd/repro_long_log.sh failure
David Zafman
08:03 PM Bug #43864 (Resolved): osd/repro_long_log.sh failure
... Sage Weil
08:44 PM Bug #43865: osd-scrub-test.sh fails date check
This looks like a case where the sleep time wasn't sufficient. The previous run had set 2 days and the next test swi... David Zafman
08:07 PM Bug #43865 (Resolved): osd-scrub-test.sh fails date check
... Sage Weil
08:08 PM Bug #38345 (Pending Backport): mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Sage Weil
08:07 PM Bug #43826 (Resolved): osd: leak of from send_lease
Sage Weil
07:59 PM Bug #43862 (Can't reproduce): mkfs fsck found fatal error: (2) No such file or directory during c...
... Sage Weil
07:45 PM Bug #43861: ceph_test_rados_watch_notify hang
/a/sage-2020-01-28_03:52:05-rados-wip-sage2-testing-2020-01-27-1839-distro-basic-smithi/4713217 Sage Weil
07:43 PM Bug #43861 (Resolved): ceph_test_rados_watch_notify hang
... Sage Weil
07:34 PM Bug #43825 (Fix Under Review): osd stuck down
Sage Weil
07:27 PM Bug #43825 (In Progress): osd stuck down
we are splitting:... Sage Weil
06:59 PM Bug #43825: osd stuck down
2020-01-28T14:56:26.155+0000 7fd3ba08d700 20 osd.6 285 identify_splits_and_merges 1.5 e245 to e285 pg_nums {76=28,89=... Sage Weil
06:39 PM Bug #43825: osd stuck down
... Sage Weil
07:24 PM Bug #43185: ceph -s not showing client activity
Are you observing any client activity in the cluster logs when "ceph -s" isn't reporting them?
It is sometimes poss...
Neha Ojha
06:27 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired

The master branch passed, but my nautilus run hit the same issue:
http://pulpito.ceph.com/dzafman-2020-01-27_21:...
David Zafman
10:42 AM Backport #43852 (Resolved): nautilus: osd-scrub-snaps.sh fails
https://github.com/ceph/ceph/pull/33274 Nathan Cutler
09:40 AM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
Just an update on my side:
After upgrading our monitor Ubuntu 18.04 packages (apt-get upgrade) with the 5.3.0-26-g...
Alex Walender

01/27/2020

09:00 PM Bug #43150 (Pending Backport): osd-scrub-snaps.sh fails
David Zafman
05:05 PM Bug #43807 (Resolved): osd-backfill-recovery-log.sh fails
Sage Weil
04:37 PM Bug #43810 (Resolved): all/recovery_preemption.yaml hang with down pgs
Sage Weil
01:41 PM Bug #43810 (Fix Under Review): all/recovery_preemption.yaml hang with down pgs
Sage Weil
04:02 PM Backport #43821 (In Progress): nautilus: nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_wi...
Nathan Cutler
03:57 PM Bug #43656: AssertionError: not all PGs are active or peered 15 seconds after marking out OSDs
Hi Sage:
This issue appears to have been introduced by https://github.com/ceph/ceph/pull/17619 - a major octopus ...
Nathan Cutler
03:56 PM Backport #43776 (Need More Info): nautilus: AssertionError: not all PGs are active or peered 15 s...
The master PR appears to be fixing an issue introduced by https://github.com/ceph/ceph/pull/17619 - a major octopus f... Nathan Cutler
03:28 PM Backport #43772 (In Progress): nautilus: qa/standalone/misc/ok-to-stop.sh occasionally fails
Nathan Cutler
03:23 PM Backport #43731 (In Progress): nautilus: mon crash in OSDMap::_pg_to_raw_osds from update_pending...
Nathan Cutler
02:41 PM Backport #43630 (In Progress): mimic: segv in collect_sys_info
Nathan Cutler
02:37 PM Backport #43631 (In Progress): nautilus: segv in collect_sys_info
Nathan Cutler
01:26 PM Bug #43826 (Fix Under Review): osd: leak of from send_lease
Sage Weil
12:57 PM Backport #43822: nautilus: Ceph assimilate-conf results in config entries which can not be removed
https://github.com/ceph/ceph/pull/32856 yin zheng
12:55 PM Backport #43822 (In Progress): nautilus: Ceph assimilate-conf results in config entries which can...
Nathan Cutler
12:50 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
We also have a problem:
{
"os_version_id": "10",
"assert_condition": "z >= signedspan::zero()",
"...
Zoltan Fodor
11:58 AM Bug #43833 (Resolved): shaman on bionic/cromson: cmake error: undefined reference to `pthread_cre...
I'm getting this with the current master on shaman:... Sebastian Wagner

01/26/2020

05:20 PM Bug #43826 (Resolved): osd: leak of from send_lease
... Sage Weil
05:18 PM Bug #43807: osd-backfill-recovery-log.sh fails
//a/sage-2020-01-24_23:29:53-rados-wip-sage2-testing-2020-01-24-1408-distro-basic-smithi/4703160 Sage Weil
05:13 PM Bug #43825 (Need More Info): osd stuck down
https://github.com/ceph/ceph/pull/32885 to debug Sage Weil
05:11 PM Bug #43825: osd stuck down
huh, also /a/sage-2020-01-24_23:29:53-rados-wip-sage2-testing-2020-01-24-1408-distro-basic-smithi/4703159 osd.7 Sage Weil
05:07 PM Bug #43825 (Resolved): osd stuck down
osd stuck at epoch 99, cluster at 2000 or something.
monc fails to reconnect to the mon
/a/sage-2020-01-24_23:2...
Sage Weil
04:33 PM Bug #43810: all/recovery_preemption.yaml hang with down pgs
/a/sage-2020-01-24_23:29:53-rados-wip-sage2-testing-2020-01-24-1408-distro-basic-smithi/4702992 Sage Weil
10:40 AM Backport #43822 (Resolved): nautilus: Ceph assimilate-conf results in config entries which can no...
https://github.com/ceph/ceph/pull/32856 Nathan Cutler
10:40 AM Backport #43821 (Resolved): nautilus: nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_...
https://github.com/ceph/ceph/pull/32908 Nathan Cutler
03:54 AM Bug #43552 (Pending Backport): nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_random_osd
Kefu Chai
03:19 AM Bug #43653: test-crash.yaml produce cores
... Kefu Chai

01/25/2020

11:55 AM Backport #43623 (Need More Info): nautilus: pg: fastinfo incorrect when last_update moves backwar...
@Kefu this is non-trivial because of the crimson cleanup commit - can you take it? Nathan Cutler
11:03 AM Backport #43473 (In Progress): nautilus: recursive lock of OpTracker::lock (70)
Nathan Cutler
10:18 AM Backport #43471 (In Progress): nautilus: negative num_objects can set PG_STATE_DEGRADED
Nathan Cutler
12:50 AM Bug #39555 (Resolved): backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
David Zafman
12:50 AM Backport #41499 (Rejected): mimic: backfill_toofull while OSDs are not full (Unneccessary HEALTH_...

Portions of the original pull request already in Mimic, the rest doesn't make sense without complete backfill full ...
David Zafman

01/24/2020

10:52 PM Bug #43807 (Fix Under Review): osd-backfill-recovery-log.sh fails
Neha Ojha
10:36 PM Bug #43807 (In Progress): osd-backfill-recovery-log.sh fails
Neha Ojha
10:22 PM Bug #43807 (Fix Under Review): osd-backfill-recovery-log.sh fails
Neha Ojha
08:00 PM Bug #43807: osd-backfill-recovery-log.sh fails
/a/sage-2020-01-24_13:15:58-rados-wip-sage2-testing-2020-01-23-1953-distro-basic-smithi/4701051 Sage Weil
04:06 PM Bug #43807: osd-backfill-recovery-log.sh fails
The test needs to be updated due to https://github.com/ceph/ceph/pull/32683 - anything else that sets the log lengths... Josh Durgin
01:21 PM Bug #43807 (Resolved): osd-backfill-recovery-log.sh fails
... Sage Weil
10:20 PM Backport #41499 (In Progress): mimic: backfill_toofull while OSDs are not full (Unneccessary HEAL...
David Zafman
08:05 PM Bug #43296 (Pending Backport): Ceph assimilate-conf results in config entries which can not be re...
Sage Weil
07:56 PM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
/a/sage-2020-01-24_13:15:58-rados-wip-sage2-testing-2020-01-23-1953-distro-basic-smithi/4700893 Sage Weil
07:20 PM Bug #43810: all/recovery_preemption.yaml hang with down pgs
/a/sage-2020-01-24_13:15:58-rados-wip-sage2-testing-2020-01-23-1953-distro-basic-smithi/4700883
Sage Weil
01:30 PM Bug #43810 (Resolved): all/recovery_preemption.yaml hang with down pgs
... Sage Weil
04:54 PM Bug #43813 (Need More Info): objecter doesn't send osd_op
maybe this will help debug: https://github.com/ceph/ceph/pull/32850 Sage Weil
04:44 PM Bug #43813: objecter doesn't send osd_op
current thinking: paused = true... Sage Weil
02:30 PM Bug #43813 (Resolved): objecter doesn't send osd_op
/a/sage-2020-01-24_01:55:08-rados-wip-sage4-testing-2020-01-23-1347-distro-basic-smithi/4697914
ceph-client.admin....
Sage Weil
04:46 PM Backport #43469 (In Progress): nautilus: asynchronous recovery + backfill might spin pg undersize...
Nathan Cutler
04:41 PM Backport #43346 (In Progress): nautilus: short pg log + cache tier ceph_test_rados out of order r...
Nathan Cutler
04:39 PM Backport #43319 (In Progress): nautilus: PeeringState::GoClean will call purge_strays uncondition...
Nathan Cutler
04:25 PM Backport #43256 (In Progress): nautilus: monitor config store: Deleting logging config settings d...
Nathan Cutler
04:23 PM Backport #43245 (In Progress): nautilus: osd: increase priority in certain OSD perf counters
Nathan Cutler
04:22 PM Backport #43239 (In Progress): nautilus: ok-to-stop incorrect for some ec pgs
Nathan Cutler
04:21 PM Backport #43099 (In Progress): nautilus: nautilus:osd: network numa affinity not supporting subne...
Nathan Cutler
03:59 PM Bug #17945: ceph_test_rados_api_tier: failed to decode hitset in HitSetWrite test
... Sage Weil
03:52 PM Backport #43783 (In Progress): nautilus: mgr commands fail when using non-client auth
Nathan Cutler
03:43 PM Bug #43795: Ceph tools utilizing "global_[pre_]init" no longer process "early" environment options
Original post: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/RDQZ6E3XEGLPGYEBAUNV7DVYJUR5DLWR/ Jason Dillaman
03:41 PM Bug #43795 (Fix Under Review): Ceph tools utilizing "global_[pre_]init" no longer process "early"...
Jason Dillaman
02:08 PM Backport #40891 (Resolved): nautilus: Pool settings aren't populated to OSD after restart.
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32123
m...
Nathan Cutler
02:02 PM Backport #43530: nautilus: Change default upmap_max_deviation to 5
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31956
m...
Nathan Cutler
02:02 PM Backport #43529: nautilus: Remove use of rules batching for upmap balancer
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31956
m...
Nathan Cutler
02:02 PM Backport #43092: nautilus: Improve OSDMap::calc_pg_upmaps() efficiency
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31956
m...
Nathan Cutler
02:01 PM Backport #42797: nautilus: unnecessary error message "calc_pg_upmaps failed to build overfull/und...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31956
m...
Nathan Cutler
01:53 PM Bug #24974: Segmentation fault in tcmalloc::ThreadCache::ReleaseToCentralCache()
... Sage Weil
01:18 PM Bug #42977 (Triaged): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
/a/sage-2020-01-24_01:55:08-rados-wip-sage4-testing-2020-01-23-1347-distro-basic-smithi/4697995
I bet this has a s...
Sage Weil

01/23/2020

08:24 PM Bug #43795 (Resolved): Ceph tools utilizing "global_[pre_]init" no longer process "early" environ...
Commit 7f23142f5ccc5ac8153d32b2c9a8353593831967 in PR 20172 [1] dropped the "env_to_vec" calls issued prior to invoki... Jason Dillaman
06:00 PM Bug #43582: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
/a/sage-2020-01-23_15:27:54-rados-wip-sage2-testing-2020-01-23-0635-distro-basic-smithi/4696610 Sage Weil
04:57 PM Backport #43783 (Resolved): nautilus: mgr commands fail when using non-client auth
https://github.com/ceph/ceph/pull/32769 Nathan Cutler
04:55 PM Backport #43776 (Rejected): nautilus: AssertionError: not all PGs are active or peered 15 seconds...
Nathan Cutler
04:55 PM Backport #43772 (Resolved): nautilus: qa/standalone/misc/ok-to-stop.sh occasionally fails
https://github.com/ceph/ceph/pull/32844 Nathan Cutler
09:36 AM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
I also have this problem
I’m running on:
Supermicro X11DPU
Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz
10GBase...
Andreas Grip

01/22/2020

11:10 PM Backport #41500 (Rejected): luminous: backfill_toofull while OSDs are not full (Unneccessary HEAL...
David Zafman
11:07 PM Bug #38309 (Resolved): Limit loops waiting for force-backfill/force-recovery to happen
David Zafman
11:06 PM Backport #38352 (Rejected): luminous: Limit loops waiting for force-backfill/force-recovery to ha...
David Zafman
11:06 PM Bug #38840 (Resolved): snaps missing in mapper, should be: ca was r -2...repaired
David Zafman
11:06 PM Backport #39520 (Rejected): luminous: snaps missing in mapper, should be: ca was r -2...repaired
David Zafman
11:05 PM Bug #41522 (Resolved): ceph-objectstore-tool can't remove head with bad snapset
David Zafman
11:05 PM Backport #41597 (Rejected): luminous: ceph-objectstore-tool can't remove head with bad snapset
David Zafman
10:11 PM Bug #43643 (Need More Info): Error ENOTSUP: Some osds belong to multiple subtrees: [0, 1, 2, 3, 4...
When are you seeing this error? Any related logs will be helpful. Neha Ojha
08:27 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
The root cause of this issue was introduced in https://github.com/ceph/ceph/pull/30223, which aligns with Jason's com... Neha Ojha
04:58 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
I'm not really sure what the fix is here. The OSD doesn't really know (and can't really know) that the message comin... Sage Weil
04:50 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
Ah, those are two different connections. after sending the first message,... Sage Weil
04:37 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
client log is cluster2-client.mirror.3.24765.log
Sage Weil
12:45 AM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
I reproduced the bug with increased messenger and objecter logging on the client and osd.
Logs: /a/nojha-2020-01-1...
Neha Ojha
05:12 PM Bug #43403 (Resolved): unittest_lockdep unreliable
Sage Weil
04:35 PM Backport #40891: nautilus: Pool settings aren't populated to OSD after restart.
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32123
merged
Yuri Weinstein
02:19 PM Bug #38069: upgrade:jewel-x-luminous with short_pg_log.yaml fails with assert(s <= can_rollback_to)
Hi all,
what does "jewel to luminous split upgrades" and "boundary conditions" mean?
We're currently in the middl...
Aleksei Zakharov
05:40 AM Bug #43602: Core dumps not collected in standalone tests for distros using systemd-coredump
Changed to a fix to just report misconfiguration David Zafman
05:39 AM Bug #43312 (Resolved): Change default upmap_max_deviation to 5
David Zafman
05:39 AM Backport #43530 (Resolved): nautilus: Change default upmap_max_deviation to 5
David Zafman
05:33 AM Backport #43726 (In Progress): nautilus: osd-recovery-space.sh has a race
David Zafman
05:32 AM Bug #42756 (Resolved): unnecessary error message "calc_pg_upmaps failed to build overfull/underfull"
David Zafman
05:31 AM Backport #42797 (Resolved): nautilus: unnecessary error message "calc_pg_upmaps failed to build o...
David Zafman
05:30 AM Bug #42718 (Resolved): Improve OSDMap::calc_pg_upmaps() efficiency
David Zafman
05:29 AM Backport #43092 (Resolved): nautilus: Improve OSDMap::calc_pg_upmaps() efficiency
David Zafman
05:28 AM Backport #43246 (In Progress): nautilus: Nearfull warnings are incorrect
David Zafman
05:17 AM Bug #43307 (Resolved): Remove use of rules batching for upmap balancer
David Zafman
05:17 AM Backport #43529 (Resolved): nautilus: Remove use of rules batching for upmap balancer
David Zafman
04:35 AM Bug #43752 (New): Master tracker for upmap performance improvements

I put this bug in rados project because most of the actual code is in OSDMap::calc_pg_upmaps().
David Zafman
12:15 AM Bug #42566: mgr commands fail when using non-client auth
nautilus backport: https://github.com/ceph/ceph/pull/32769 Sage Weil
12:11 AM Bug #42566 (Pending Backport): mgr commands fail when using non-client auth
ah, this does need to be backported. see #42666 Sage Weil
12:12 AM Bug #42666 (Duplicate): mgropen from mgr comes from unknown.$id instead of mgr.$id
The problem is actually the same as #42566: the second/additional mgrc instance is sending the mgropen based on the !... Sage Weil

01/21/2020

11:10 PM Backport #43530: nautilus: Change default upmap_max_deviation to 5
David Zafman wrote:
> https://github.com/ceph/ceph/pull/31956
merged
Yuri Weinstein
11:10 PM Bug #43296: Ceph assimilate-conf results in config entries which can not be removed
adding nautilus backport since the bug was reported against that version Nathan Cutler
11:10 PM Backport #43529: nautilus: Remove use of rules batching for upmap balancer
David Zafman wrote:
> https://github.com/ceph/ceph/pull/31956
merged
Yuri Weinstein
11:10 PM Backport #43092: nautilus: Improve OSDMap::calc_pg_upmaps() efficiency
David Zafman wrote:
> https://github.com/ceph/ceph/pull/31956
merged
Yuri Weinstein
11:10 PM Backport #42797: nautilus: unnecessary error message "calc_pg_upmaps failed to build overfull/und...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31956
merged
Yuri Weinstein
10:49 PM Backport #43232 (Need More Info): nautilus: pgs stuck in laggy state
supposed to bake in master for a couple months Nathan Cutler
09:17 PM Bug #43403 (Fix Under Review): unittest_lockdep unreliable
Sage Weil
09:10 PM Bug #43552 (Fix Under Review): nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_random_osd
Sage Weil
07:17 PM Bug #42666: mgropen from mgr comes from unknown.$id instead of mgr.$id
compare to vstart, on that same version,... Sage Weil
07:16 PM Bug #42666: mgropen from mgr comes from unknown.$id instead of mgr.$id
from reesi001, with a fresh mgr restart,... Sage Weil
03:19 PM Bug #43721 (Pending Backport): qa/standalone/misc/ok-to-stop.sh occasionally fails
Sage Weil
04:20 AM Bug #43656 (Pending Backport): AssertionError: not all PGs are active or peered 15 seconds after ...
Sage Weil

01/20/2020

10:54 PM Bug #43296 (Fix Under Review): Ceph assimilate-conf results in config entries which can not be re...
Sage Weil
10:22 PM Bug #41255 (Resolved): backfill_toofull seen on cluster where the most full OSD is at 1%
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
10:18 PM Backport #43731 (Resolved): nautilus: mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
https://github.com/ceph/ceph/pull/32905 Nathan Cutler
10:17 PM Backport #43726 (Resolved): nautilus: osd-recovery-space.sh has a race
https://github.com/ceph/ceph/pull/32774 Nathan Cutler
10:10 PM Backport #41584 (Resolved): mimic: backfill_toofull seen on cluster where the most full OSD is at 1%
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32361
m...
Nathan Cutler
08:01 PM Backport #41584: mimic: backfill_toofull seen on cluster where the most full OSD is at 1%
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32361
merged
Yuri Weinstein
09:39 PM Backport #43531 (Resolved): mimic: Change default upmap_max_deviation to 5
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31957
m...
Nathan Cutler
09:39 PM Backport #43094 (Resolved): mimic: Improve OSDMap::calc_pg_upmaps() efficiency
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31957
m...
Nathan Cutler
08:00 PM Backport #43094: mimic: Improve OSDMap::calc_pg_upmaps() efficiency
David Zafman wrote:
> https://github.com/ceph/ceph/pull/31957
merged
Yuri Weinstein
09:39 PM Backport #42798 (Resolved): mimic: unnecessary error message "calc_pg_upmaps failed to build over...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31957
m...
Nathan Cutler
08:00 PM Backport #42798: mimic: unnecessary error message "calc_pg_upmaps failed to build overfull/underf...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31957
merged
Yuri Weinstein
08:00 PM Backport #43530: nautilus: Change default upmap_max_deviation to 5
https://github.com/ceph/ceph/pull/31957https://github.com/ceph/ceph/pull/31957 merged Yuri Weinstein
07:25 PM Bug #43721 (Fix Under Review): qa/standalone/misc/ok-to-stop.sh occasionally fails
Sage Weil
07:15 PM Bug #43721 (Resolved): qa/standalone/misc/ok-to-stop.sh occasionally fails
... Sage Weil
06:03 PM Bug #38345 (Fix Under Review): mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Sage Weil
05:53 PM Bug #43397 (Resolved): FS_DEGRADED to cluster log despite --no-mon-health-to-clog
Sage Weil
03:48 PM Bug #43656 (Fix Under Review): AssertionError: not all PGs are active or peered 15 seconds after ...
Sage Weil
03:40 PM Bug #43656: AssertionError: not all PGs are active or peered 15 seconds after marking out OSDs
/a/sage-2020-01-20_14:10:17-rados:thrash-erasure-code-wip-sage-testing-2020-01-19-1713-distro-basic-smithi/4688160 Sage Weil
12:13 AM Bug #43653 (Resolved): test-crash.yaml produce cores
Sage Weil

01/19/2020

05:49 PM Bug #43653 (Fix Under Review): test-crash.yaml produce cores
Sage Weil
03:43 PM Bug #42918 (Closed): memory corruption and lockups with I-Object
this was reverted in master Sage Weil
03:15 PM Bug #39398 (Duplicate): osd: fast_info need update when pglog rewind
Sage Weil
03:12 AM Bug #43404 (Pending Backport): mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
Sage Weil

01/18/2020

08:39 PM Bug #43592 (Pending Backport): osd-recovery-space.sh has a race
Sage Weil
06:53 PM Bug #43422 (Resolved): qa/standalone/mon/osd-pool-create.sh fails to grep utf8 pool name
Sage Weil
02:56 PM Bug #42666 (In Progress): mgropen from mgr comes from unknown.$id instead of mgr.$id
Sage Weil
10:39 AM Backport #43652 (In Progress): mimic: Improve upmap change reporting in logs
Nathan Cutler
10:38 AM Backport #43650 (In Progress): nautilus: Improve upmap change reporting in logs
Nathan Cutler
10:32 AM Backport #42878 (Resolved): nautilus: ceph_test_admin_socket_output fails in rados qa suite
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32063
m...
Nathan Cutler
10:20 AM Backport #43620 (In Progress): nautilus: mon shutdown timeout (race with async compaction)
Nathan Cutler
09:33 AM Backport #24471: luminous: Ceph-osd crash when activate SPDK
Thanks, Kefu Nathan Cutler
05:51 AM Backport #24471: luminous: Ceph-osd crash when activate SPDK
i should not have added it to luminous. as it's 44851549bbc58520a32c15a7db5097b5e44dd53f which introduced queue_t.
...
Kefu Chai

01/17/2020

11:30 PM Bug #43656: AssertionError: not all PGs are active or peered 15 seconds after marking out OSDs
In this case, the workload happened to delete the old pool/pgs and create a new one right before the check, so the ne... Sage Weil
11:29 PM Bug #43656 (Resolved): AssertionError: not all PGs are active or peered 15 seconds after marking ...
... Sage Weil
10:17 PM Backport #43651 (Resolved): luminous: Improve upmap change reporting in logs
David Zafman
07:50 PM Backport #43651 (In Progress): luminous: Improve upmap change reporting in logs
David Zafman
07:49 PM Backport #43651 (Resolved): luminous: Improve upmap change reporting in logs
https://github.com/ceph/ceph/pull/32666 David Zafman
09:10 PM Bug #43653 (Resolved): test-crash.yaml produce cores
/a/sage-2020-01-17_18:51:12-rados-wip-sage-testing-2020-01-17-1009-distro-basic-smithi/4678354 Sage Weil
08:21 PM Bug #43422 (Fix Under Review): qa/standalone/mon/osd-pool-create.sh fails to grep utf8 pool name
Sage Weil
07:49 PM Backport #43652 (Resolved): mimic: Improve upmap change reporting in logs
https://github.com/ceph/ceph/pull/32717 David Zafman
07:49 PM Backport #43650 (Resolved): nautilus: Improve upmap change reporting in logs
https://github.com/ceph/ceph/pull/32716 David Zafman
07:48 PM Bug #41016 (Pending Backport): Improve upmap change reporting in logs
David Zafman
04:28 AM Bug #43643: Error ENOTSUP: Some osds belong to multiple subtrees: [0, 1, 2, 3, 4, 5, 6, 7, 8]
crush map dump file. 伟 宋
04:25 AM Bug #43643 (Need More Info): Error ENOTSUP: Some osds belong to multiple subtrees: [0, 1, 2, 3, 4...
Error ENOTSUP: Some osds belong to multiple subtrees: [0, 1, 2, 3, 4, 5, 6, 7, 8]
verison:ceph 12.2.11
I did no...
伟 宋

01/16/2020

04:49 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
Hi @Frank - please open a new bug report and include the output of "ceph health detail" in your bug description. This... Nathan Cutler
02:52 PM Bug #38296 (Resolved): segv in fgets() in collect_sys_info reading /proc/cpuinfo
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:46 PM Backport #43632 (Rejected): luminous: segv in collect_sys_info
Nathan Cutler
02:46 PM Backport #43631 (Resolved): nautilus: segv in collect_sys_info
https://github.com/ceph/ceph/pull/32901 Nathan Cutler
02:46 PM Backport #43630 (Resolved): mimic: segv in collect_sys_info
https://github.com/ceph/ceph/pull/32902 Nathan Cutler
02:45 PM Backport #43623 (Rejected): nautilus: pg: fastinfo incorrect when last_update moves backward in time
Nathan Cutler
02:44 PM Backport #43622 (Rejected): mimic: pg: fastinfo incorrect when last_update moves backward in time
Nathan Cutler
02:44 PM Backport #43621 (Rejected): luminous: pg: fastinfo incorrect when last_update moves backward in time
Nathan Cutler
02:44 PM Backport #43620 (Resolved): nautilus: mon shutdown timeout (race with async compaction)
https://github.com/ceph/ceph/pull/32715 Nathan Cutler
01:01 PM Bug #43403: unittest_lockdep unreliable
Also happening in nautilus PRs Nathan Cutler
10:59 AM Documentation #20867 (Closed): OSD::build_past_intervals_parallel()'s comment is stale
This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio... Zac Dover
10:52 AM Documentation #16356: doc: manual deployment of ceph monitor needs fix
I'll take this one and look into it. Zac Dover
10:02 AM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
Hello Greg,
I have digged a little bit deeper between the appearance of the crashes and the states of our machines...
Alex Walender
06:03 AM Bug #41313: PG distribution completely messed up since Nautilus
After being told on the ML that my low PG count is supposed to be the reason, I raised it. Nothing changed, still sam... Anonymous
02:07 AM Bug #41016: Improve upmap change reporting in logs
We aren't going to add logging to _apply_upmap() because it is invasive to get a CephContext into that function to be... David Zafman
12:59 AM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
Looking at the logs that Jason added http://qa-proxy.ceph.com/teuthology/jdillaman-2020-01-09_16:20:49-rbd-wip-jd-tes... Neha Ojha

01/15/2020

11:45 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
Hmm, what hardware are you guys running this on?
We consulted with our C++ time guru and the current theory is a b...
Greg Farnum
10:07 PM Bug #43584: MON_DOWN during mon_join process
I'm pretty sure this is a test issue, since we don't make guarantees about monitor elections, especially on first boo... Greg Farnum
09:04 PM Bug #43404 (Fix Under Review): mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
Sage Weil
07:33 PM Bug #41016 (In Progress): Improve upmap change reporting in logs
David Zafman
06:24 PM Bug #41016: Improve upmap change reporting in logs

The function _apply_upmap() would be getting called very frequently. This makes logging there problematic. Also, ...
David Zafman
06:58 PM Bug #43587 (Pending Backport): mon shutdown timeout (race with async compaction)
Sage Weil
02:48 PM Bug #43587 (Fix Under Review): mon shutdown timeout (race with async compaction)
Sage Weil
04:54 PM Backport #42878: nautilus: ceph_test_admin_socket_output fails in rados qa suite
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32063
merged
Yuri Weinstein
01:50 PM Bug #42668 (Need More Info): ceph daemon osd.* fails in osd container but ceph daemon mds.* does ...
Nathan Cutler
11:51 AM Feature #42638: Allow specifying pg_autoscale_mode when creating a new pool
I really think this issue should be closed.
I tested with the autoscaler globally set to on. Tested the pool creatio...
Stephan Müller
06:11 AM Bug #43602 (Fix Under Review): Core dumps not collected in standalone tests for distros using sys...
David Zafman
05:19 AM Bug #43602 (Won't Fix): Core dumps not collected in standalone tests for distros using systemd-co...
David Zafman
04:53 AM Bug #43306 (Pending Backport): segv in collect_sys_info
Kefu Chai
04:19 AM Bug #43580 (Pending Backport): pg: fastinfo incorrect when last_update moves backward in time
Kefu Chai

01/14/2020

09:28 PM Bug #40649: set_mon_vals failed to set cluster_network = 10.1.2.0/24: Configuration option 'clust...
FYI, I was able to remove the config settings with:
$ ceph config rm <who> <what>
followed by
$ ceph config ...
Frank Ritchie
08:13 PM Bug #43485 (Resolved): Deprecated full/nearfull added back by mistake
David Zafman
03:46 PM Bug #43582 (In Progress): rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
Kefu Chai
03:46 PM Bug #43582: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
i double checked update_mgrmap() in ceph_monstore_tool.cc. which is called when handling rebuild subcommand. will try... Kefu Chai
01:50 PM Bug #43597: stuck waiting for pg to advance to epoch
1.c... Sage Weil
01:36 PM Bug #43597 (New): stuck waiting for pg to advance to epoch
... Sage Weil
08:58 AM Bug #43306: segv in collect_sys_info
https://github.com/ceph/ceph/pull/32630 is posted to avoid using fgets(). Kefu Chai
08:40 AM Documentation #4568 (Closed): FAQ entry for changing journal size/moving journal
This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio... Zac Dover
08:39 AM Documentation #3466 (Closed): rados manpage: bench still documents "read" rather than "seq/rand"
This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio... Zac Dover
08:37 AM Documentation #3447 (Closed): doc: how to recover from a failed journal device
This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio... Zac Dover
08:36 AM Documentation #3218 (Closed): Doc: osdmaptool manpage out of date with code *and* usage
This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio... Zac Dover
08:35 AM Documentation #3166 (Closed): doc: Explain OSD up/down, in/out: what does it mean, where does it ...
This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio... Zac Dover
08:34 AM Documentation #3054 (Closed): doc: omap, tmap, xattrs
This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio... Zac Dover
08:32 AM Documentation #2272 (Closed): FAQs: RADOS reliability and availability
This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio... Zac Dover
02:35 AM Bug #43592 (Resolved): osd-recovery-space.sh has a race

The function wait_for_state() returns success when there are no PGs in a selected state. The test's purpose of wai...
David Zafman

01/13/2020

10:06 PM Backport #43532 (Resolved): luminous: Change default upmap_max_deviation to 5
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32586
m...
Nathan Cutler
10:05 PM Backport #39474 (Resolved): luminous: segv in fgets() in collect_sys_info reading /proc/cpuinfo
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32349
m...
Nathan Cutler
10:04 PM Backport #41730 (Resolved): luminous: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_mis...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31855
m...
Nathan Cutler
08:28 PM Bug #43591 (New): /sbin/fstrim can interfere with umount
... Sage Weil
08:13 PM Bug #43306 (Fix Under Review): segv in collect_sys_info
Sage Weil
08:11 PM Bug #43306: segv in collect_sys_info
#38296 changed the buffer to 1024 chars, but /proc/cpuinfo can be bigger than that, too. On smithi (8 CPUs), it's 9... Sage Weil
02:47 PM Bug #43587 (Resolved): mon shutdown timeout (race with async compaction)
... Sage Weil
02:42 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
FWIW, I am seeing this issue after an upgrade from 12.2.12 to 14.2.6.
The status is HEALTH_WARN not HEALTH_ERR but...
Frank Ritchie
02:21 PM Bug #43404: mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
/a/sage-2020-01-12_21:37:03-rados-wip-sage-testing-2020-01-12-0621-distro-basic-smithi/4660728... Sage Weil
02:16 PM Bug #43584 (Resolved): MON_DOWN during mon_join process
/a/sage-2020-01-12_21:37:03-rados-wip-sage-testing-2020-01-12-0621-distro-basic-smithi/4660691... Sage Weil
02:02 PM Bug #43582 (Resolved): rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
... Sage Weil
01:39 PM Bug #43580 (Fix Under Review): pg: fastinfo incorrect when last_update moves backward in time
Kefu Chai
01:05 PM Bug #43580 (Resolved): pg: fastinfo incorrect when last_update moves backward in time
If, during peering, last_update moves backwards, we may rewrite the full info but leave a fastinfo record in place wi... Sage Weil
12:24 PM Bug #42821 (Resolved): src/msg/async/net_handler.cc: Fix compilation
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
12:23 PM Bug #43454 (Resolved): ceph monitor crashes after updating 'mon_memory_target' config setting.
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
12:19 PM Backport #43495 (Resolved): nautilus: ceph monitor crashes after updating 'mon_memory_target' con...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32520
m...
Nathan Cutler
12:13 PM Backport #42997 (Resolved): nautilus: acting_recovery_backfill won't catch all up peers
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32064
m...
Nathan Cutler
12:13 PM Backport #42853 (Resolved): nautilus: format error: ceph osd stat --format=json
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32062
m...
Nathan Cutler
12:12 PM Backport #42846 (Resolved): nautilus: src/msg/async/net_handler.cc: Fix compilation
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31736
m...
Nathan Cutler
07:38 AM Bug #43555: raw usage is far from total pool usage
only overwrite file liang sibin

01/12/2020

09:29 PM Backport #43532: luminous: Change default upmap_max_deviation to 5
David Zafman wrote:
> https://github.com/ceph/ceph/pull/32586
merged
Yuri Weinstein

01/10/2020

11:36 PM Bug #43555 (New): raw usage is far from total pool usage
ceph -v
ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)...
liang sibin
10:32 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
http://pulpito.ceph.com/nojha-2020-01-10_19:11:03-rbd:mirror-thrash-master-distro-basic-smithi/4653675/
Observatio...
Neha Ojha
08:30 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
Reproduces with -s rbd:mirror-thrash and --filter 'rbd-mirror-fsx-workunit'
http://pulpito.ceph.com/nojha-2020-01-...
Neha Ojha
10:03 PM Bug #43553 (Can't reproduce): mon: client mon_status fails
... Patrick Donnelly
09:07 PM Bug #40649: set_mon_vals failed to set cluster_network = 10.1.2.0/24: Configuration option 'clust...
This also happened to me during an upgrade from Luminous to Nautilus.
The cluster/public networks were not defined...
Frank Ritchie
07:26 PM Bug #43552 (Resolved): nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_random_osd
... Patrick Donnelly
02:45 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
We are also running into this issue.
Jan 07 19:03:42 pmxc05 ceph-mon[3701783]: 2020-01-07 19:03:42.625 7fe59c03d...
Glen Aidukas
01:39 PM Bug #39665 (Resolved): kstore: memory may leak on KStore::_do_read_stripe
Kefu Chai
01:34 PM Bug #43412 (Resolved): cephadm ceph_manager IndexError: list index out of range
Kefu Chai
04:55 AM Backport #43532 (In Progress): luminous: Change default upmap_max_deviation to 5
David Zafman
04:54 AM Backport #43531 (In Progress): mimic: Change default upmap_max_deviation to 5
David Zafman

01/09/2020

10:00 PM Bug #42328 (New): osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
This issue is still occurring with today's master branch:
http://qa-proxy.ceph.com/teuthology/jdillaman-2020-01-09...
Jason Dillaman
04:56 PM Backport #43495: nautilus: ceph monitor crashes after updating 'mon_memory_target' config setting.
Sridhar Seshasayee wrote:
> https://github.com/ceph/ceph/pull/32520
merged
Yuri Weinstein
02:28 AM Bug #43412 (Fix Under Review): cephadm ceph_manager IndexError: list index out of range
Kefu Chai
12:39 AM Backport #43529 (In Progress): nautilus: Remove use of rules batching for upmap balancer
David Zafman
12:27 AM Backport #43529 (Resolved): nautilus: Remove use of rules batching for upmap balancer
https://github.com/ceph/ceph/pull/31956 David Zafman
12:39 AM Backport #43530 (In Progress): nautilus: Change default upmap_max_deviation to 5
David Zafman
12:28 AM Backport #43530 (Resolved): nautilus: Change default upmap_max_deviation to 5
https://github.com/ceph/ceph/pull/31956 David Zafman
12:28 AM Backport #43532 (Resolved): luminous: Change default upmap_max_deviation to 5
https://github.com/ceph/ceph/pull/32586 David Zafman
12:28 AM Backport #43531 (Resolved): mimic: Change default upmap_max_deviation to 5
https://github.com/ceph/ceph/pull/31957 David Zafman

01/08/2020

10:23 PM Bug #43312 (Pending Backport): Change default upmap_max_deviation to 5
Neha Ojha
10:10 PM Bug #43307 (Pending Backport): Remove use of rules batching for upmap balancer
Neha Ojha
10:09 PM Bug #43397 (Fix Under Review): FS_DEGRADED to cluster log despite --no-mon-health-to-clog
Neha Ojha
10:04 PM Bug #43412: cephadm ceph_manager IndexError: list index out of range
Kefu's got a PR for this Josh Durgin
05:31 AM Bug #43412: cephadm ceph_manager IndexError: list index out of range
I'm guessing it's caused by there being no pools at the time. So the random choice fails. Maybe we need to do somethi... Matthew Oliver
10:02 PM Bug #43422: qa/standalone/mon/osd-pool-create.sh fails to grep utf8 pool name
probably need to set LANG to utf8 Josh Durgin
08:23 AM Bug #43185: ceph -s not showing client activity
We run 14.2.4. I see mgr process at 100% sometimes and I been told that the reason for lack of activity show might be... Anonymous
02:24 AM Bug #43520 (In Progress): segfault in kstore's pending stripes
Chang Liu
02:23 AM Bug #43520: segfault in kstore's pending stripes
ceph version 14.2.1-700.3.0.2.407 (c823e6bbf85437561d2165c0f4b5d8c6bd726975) nautilus (stable)
1: (()+0xf5e0) [0x7f...
Chang Liu
02:20 AM Bug #43520 (In Progress): segfault in kstore's pending stripes
Chang Liu
 

Also available in: Atom