Activity
From 01/07/2020 to 02/05/2020
02/05/2020
- 10:20 PM Bug #43893: lingering osd_failure ops (due to failure_info holding references?)
- Hmm that prepare_failure() does look like it's behaving a little differently than some of the regular op flow; we mus...
- 09:27 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- Logs: /a/dzafman-2020-01-27_22:00:09-upgrade:mimic-x-master-distro-basic-smithi/4712686
There is more than one bu... - 07:43 PM Backport #43997 (Resolved): nautilus: Ceph tools utilizing "global_[pre_]init" no longer process ...
- https://github.com/ceph/ceph/pull/33261
- 07:43 PM Backport #43996 (Rejected): mimic: Ceph tools utilizing "global_[pre_]init" no longer process "ea...
- 07:42 PM Backport #43992 (Rejected): nautilus: objecter doesn't send osd_op
- 07:42 PM Backport #43991 (Rejected): mimic: objecter doesn't send osd_op
- 07:42 PM Backport #43989 (Resolved): nautilus: osd: Allow 64-char hostname to be added as the "host" in CRUSH
- https://github.com/ceph/ceph/pull/33147
- 07:42 PM Backport #43988 (Rejected): luminous: osd: Allow 64-char hostname to be added as the "host" in CRUSH
- https://github.com/ceph/ceph/pull/33146
- 07:42 PM Backport #43987 (Resolved): mimic: osd: Allow 64-char hostname to be added as the "host" in CRUSH
- https://github.com/ceph/ceph/pull/33145
- 05:39 PM Bug #42347 (Won't Fix): nautilus assert during osd shutdown: FAILED ceph_assert((sharded_in_fligh...
- we've backported the osd fast shutdown ( https://github.com/ceph/ceph/pull/32743 ), so this will effectively go away ...
- 01:39 PM Bug #43975: Slow Requests/OP's types not getting logged
- - Types - src/osd/OpRequest.h...
- 12:54 PM Bug #43975 (Resolved): Slow Requests/OP's types not getting logged
- - From ceph.log...
02/04/2020
- 11:46 AM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- -The problem is not only about heap corruption. Stacks are affected as well. Moreover, there is an interesting corrup...
- 03:28 AM Bug #43813 (Pending Backport): objecter doesn't send osd_op
02/03/2020
- 09:49 PM Bug #43954 (New): Issue health warning or error if MON or OSD daemons are holding onto excessive ...
- 08:35 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- `Thread 63 (Thread 0x7f2e36318700 (LWP 55988))` is poisoned as well....
- 08:00 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- It looks that a freshly heap-allocated `OSDMap` instance got corrupted:...
- 02:35 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- It looks the entire `PGTempMap::data` has been corrupted:...
- 01:12 PM Bug #43948 (New): Remapped PGs are sometimes not deleted from previous OSDs
- I noticed on several clusters (all Nautilus 14.2.6) that on occasion, some OSDs may still hold data for some PGs long...
02/01/2020
- 04:53 PM Bug #43861: ceph_test_rados_watch_notify hang
- same?
/a/sage-2020-02-01_03:27:35-rados-wip-sage-testing-2020-01-31-1746-distro-basic-smithi/4723146
ceph_test_wa...
01/31/2020
- 11:31 PM Bug #43795 (Pending Backport): Ceph tools utilizing "global_[pre_]init" no longer process "early"...
- 11:09 PM Bug #43185: ceph -s not showing client activity
- Can you grab a wallclock profiler dump from the mgr process when its usage goes to 100%?
Learn more about how to use... - 06:41 AM Bug #43185: ceph -s not showing client activity
- strace for the hanging mgr thread...
- 06:37 AM Bug #43185: ceph -s not showing client activity
- There's almost no load apart from scrubbing, like this is pretty average io:
client: 20 MiB/s rd, 61 MiB/s w... - 10:34 PM Bug #43365 (Closed): Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signe...
- FWIW teh two clusters reporting this crash via telemetry are both ubuntu 18.04
closing this as not a ceph issue; l... - 06:02 PM Bug #43813 (Fix Under Review): objecter doesn't send osd_op
- 03:50 AM Bug #43813 (In Progress): objecter doesn't send osd_op
- 03:46 AM Bug #43813: objecter doesn't send osd_op
- /a/sage-2020-01-30_22:27:29-rados-wip-sage-testing-2020-01-30-1230-distro-basic-smithi/4719487...
- 05:24 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- Only happens when upgrading from mimic to nautilus, see https://tracker.ceph.com/issues/43048#note-7.
- 12:23 PM Bug #43885: failed to reach quorum size 9 before timeout expired
- Update: Tried running the test a few times but haven't been able to reproduce it. I will continue my attempts. In the...
- 06:37 AM Bug #43885: failed to reach quorum size 9 before timeout expired
- There does not appear to be a crash in this case, but there is an election that seems to take a long time followed by...
- 10:24 AM Bug #43929 (Pending Backport): osd: Allow 64-char hostname to be added as the "host" in CRUSH
- 10:16 AM Bug #43929 (Resolved): osd: Allow 64-char hostname to be added as the "host" in CRUSH
- On Linux system it is possible to set 64 character length hostname when
HOST_NAME_MAX is set to 64. It means that if... - 09:46 AM Backport #43928 (In Progress): nautilus: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- 09:43 AM Backport #43928 (Resolved): nautilus: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- https://github.com/ceph/ceph/pull/33007
- 09:43 AM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- But if the issue was introduced in 2008, then we'd need to backport further than nautilus...
- 09:42 AM Bug #42977 (Pending Backport): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- Adding nautilus backport per Greg's comment "looking at the nautilus code it is susceptible to this too."
- 03:56 AM Bug #42977 (Resolved): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- 01:33 AM Bug #37875: osdmaps aren't being cleaned up automatically on healthy cluster
- https://github.com/ceph/ceph/pull/19076 is a possible solution to this issue.
01/30/2020
- 11:03 PM Bug #43602 (Won't Fix): Core dumps not collected in standalone tests for distros using systemd-co...
- The real fix done elsewhere is to configure the core location so that systemd-coredump is not used. It isn't worth t...
- 04:43 PM Bug #43602 (Fix Under Review): Core dumps not collected in standalone tests for distros using sys...
- 04:43 PM Bug #43602 (Resolved): Core dumps not collected in standalone tests for distros using systemd-cor...
- 08:30 PM Backport #43651 (Resolved): luminous: Improve upmap change reporting in logs
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32666
m... - 08:29 PM Backport #43651 (In Progress): luminous: Improve upmap change reporting in logs
- 07:40 PM Backport #43919 (Resolved): nautilus: osd stuck down
- https://github.com/ceph/ceph/pull/35024
- 07:39 PM Backport #43916 (Resolved): nautilus: mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) ...
- https://github.com/ceph/ceph/pull/33155
- 04:46 PM Bug #43864 (Resolved): osd/repro_long_log.sh failure
- Should have been fixed by https://github.com/ceph/ceph/pull/32945.
- 04:17 PM Bug #43864: osd/repro_long_log.sh failure
- /a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718221
- 04:41 PM Bug #43889: expected MON_CLOCK_SKEW but got none
- /a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718332
- 04:16 PM Bug #43889: expected MON_CLOCK_SKEW but got none
- /a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718133
- 04:40 PM Bug #43915 (New): leaked Session (alloc from OSD::ms_handle_authentication)
- ...
- 04:37 PM Bug #43914 (Need More Info): nautilus: ceph tell command times out
- see https://github.com/ceph/ceph/pull/32989
- 04:35 PM Bug #43914 (Resolved): nautilus: ceph tell command times out
- ...
- 04:17 PM Bug #43885: failed to reach quorum size 9 before timeout expired
- /a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718154
description: rados/... - 03:19 PM Feature #43910 (New): Utilize new Linux kernel v5.6 prctl PR_SET_IO_FLUSHER option
- See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d19f1c8e1937baf74e1962aae9f90fa3ae...
- 02:51 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- the second time,...
- 02:50 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- if i start the osd manually, i can reproduce the same crash:...
- 02:48 PM Bug #43903 (Resolved): osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- ...
- 12:37 PM Bug #42977 (Fix Under Review): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- Hmm this not-returning issue seems to date from 2008 (3859475bbfafb8754841af41044cb41124e87fc7); I'm not sure why it'...
- 10:42 AM Bug #42977 (In Progress): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- Yep looks like something went horribly wrong in refactoring — we correctly call the new election on receiving an old ...
- 09:57 AM Documentation #43896 (Resolved): nautilus upgrade should recommend ceph-osd restarts after enabli...
- Following an upgrade to nautilus and `ceph mon enable-msgr2`, running nautilus osds will not yet bind to their v2 add...
- 07:22 AM Bug #43893: lingering osd_failure ops (due to failure_info holding references?)
- > We can clear that slow op either by restarting mon.cepherin-mon-7cb9b591e1 or with `ceph osd fail osd.170`.
too ... - 07:21 AM Bug #43893 (Duplicate): lingering osd_failure ops (due to failure_info holding references?)
- On Nautilus v14.2.6 we see osd_failure ops which linger:...
- 04:11 AM Bug #43892 (Pending Backport): mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during ...
01/29/2020
- 11:18 PM Bug #43892 (Fix Under Review): mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during ...
- 11:15 PM Bug #43892 (Resolved): mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during n->o upg...
- ...
- 10:07 PM Bug #43885: failed to reach quorum size 9 before timeout expired
- I wonder if this is somehow related to the election issue we saw in https://tracker.ceph.com/issues/42977. Seems to b...
- 01:14 PM Bug #43885 (Can't reproduce): failed to reach quorum size 9 before timeout expired
- This pops up occasionally. Here is a recent one:...
- 09:15 PM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- I think defer() is called by mon.e in receive_propose() because of the following...
- 07:39 PM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- on mon.g (3), the epoch is 55 (or looks that way, it just sent these):...
- 12:41 AM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- Let's see what happened in /a/sage-2020-01-24_01:55:08-rados-wip-sage4-testing-2020-01-23-1347-distro-basic-smithi/46...
- 07:19 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
I'm seeing a lot of this in a sample of log segments from osd.6 which is reporting the slow ops. The log for osd.6...- 03:55 PM Bug #43882 (Need More Info): osd to mon connection lost, osd stuck down
- adding debug: https://github.com/ceph/ceph/pull/32968
- 01:06 PM Bug #43882 (Can't reproduce): osd to mon connection lost, osd stuck down
- This is a similar symptom to #43825, but it does not appear to be related to split/merge.
OSD is marked down, but ... - 01:45 PM Bug #43889 (Resolved): expected MON_CLOCK_SKEW but got none
- description: rados/multimon/{clusters/6.yaml msgr-failures/many.yaml msgr/async.yaml
no_pools.yaml objectstore... - 01:44 PM Bug #43888: osd/osd-bench.sh 'tell osd.N bench' hang
- https://github.com/ceph/ceph/pull/32961 to debug
- 01:41 PM Bug #43888 (Resolved): osd/osd-bench.sh 'tell osd.N bench' hang
- ...
- 01:36 PM Bug #43887 (Resolved): ceph_test_rados_delete_pools_parallel failure
- ...
- 01:23 PM Bug #43825 (Pending Backport): osd stuck down
- 10:03 AM Backport #43881 (Resolved): mimic: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- https://github.com/ceph/ceph/pull/33154
- 10:03 AM Backport #43880 (Rejected): luminous: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- https://github.com/ceph/ceph/pull/33153
- 10:03 AM Backport #43879 (Resolved): nautilus: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- https://github.com/ceph/ceph/pull/33152
01/28/2020
- 11:22 PM Bug #43864 (In Progress): osd/repro_long_log.sh failure
- 08:03 PM Bug #43864 (Resolved): osd/repro_long_log.sh failure
- ...
- 08:44 PM Bug #43865: osd-scrub-test.sh fails date check
- This looks like a case where the sleep time wasn't sufficient. The previous run had set 2 days and the next test swi...
- 08:07 PM Bug #43865 (Resolved): osd-scrub-test.sh fails date check
- ...
- 08:08 PM Bug #38345 (Pending Backport): mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- 08:07 PM Bug #43826 (Resolved): osd: leak of from send_lease
- 07:59 PM Bug #43862 (Can't reproduce): mkfs fsck found fatal error: (2) No such file or directory during c...
- ...
- 07:45 PM Bug #43861: ceph_test_rados_watch_notify hang
- /a/sage-2020-01-28_03:52:05-rados-wip-sage2-testing-2020-01-27-1839-distro-basic-smithi/4713217
- 07:43 PM Bug #43861 (Resolved): ceph_test_rados_watch_notify hang
- ...
- 07:34 PM Bug #43825 (Fix Under Review): osd stuck down
- 07:27 PM Bug #43825 (In Progress): osd stuck down
- we are splitting:...
- 06:59 PM Bug #43825: osd stuck down
- 2020-01-28T14:56:26.155+0000 7fd3ba08d700 20 osd.6 285 identify_splits_and_merges 1.5 e245 to e285 pg_nums {76=28,89=...
- 06:39 PM Bug #43825: osd stuck down
- ...
- 07:24 PM Bug #43185: ceph -s not showing client activity
- Are you observing any client activity in the cluster logs when "ceph -s" isn't reporting them?
It is sometimes poss... - 06:27 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
The master branch passed, but my nautilus run hit the same issue:
http://pulpito.ceph.com/dzafman-2020-01-27_21:...- 10:42 AM Backport #43852 (Resolved): nautilus: osd-scrub-snaps.sh fails
- https://github.com/ceph/ceph/pull/33274
- 09:40 AM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- Just an update on my side:
After upgrading our monitor Ubuntu 18.04 packages (apt-get upgrade) with the 5.3.0-26-g...
01/27/2020
- 09:00 PM Bug #43150 (Pending Backport): osd-scrub-snaps.sh fails
- 05:05 PM Bug #43807 (Resolved): osd-backfill-recovery-log.sh fails
- 04:37 PM Bug #43810 (Resolved): all/recovery_preemption.yaml hang with down pgs
- 01:41 PM Bug #43810 (Fix Under Review): all/recovery_preemption.yaml hang with down pgs
- 04:02 PM Backport #43821 (In Progress): nautilus: nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_wi...
- 03:57 PM Bug #43656: AssertionError: not all PGs are active or peered 15 seconds after marking out OSDs
- Hi Sage:
This issue appears to have been introduced by https://github.com/ceph/ceph/pull/17619 - a major octopus ... - 03:56 PM Backport #43776 (Need More Info): nautilus: AssertionError: not all PGs are active or peered 15 s...
- The master PR appears to be fixing an issue introduced by https://github.com/ceph/ceph/pull/17619 - a major octopus f...
- 03:28 PM Backport #43772 (In Progress): nautilus: qa/standalone/misc/ok-to-stop.sh occasionally fails
- 03:23 PM Backport #43731 (In Progress): nautilus: mon crash in OSDMap::_pg_to_raw_osds from update_pending...
- 02:41 PM Backport #43630 (In Progress): mimic: segv in collect_sys_info
- 02:37 PM Backport #43631 (In Progress): nautilus: segv in collect_sys_info
- 01:26 PM Bug #43826 (Fix Under Review): osd: leak of from send_lease
- 12:57 PM Backport #43822: nautilus: Ceph assimilate-conf results in config entries which can not be removed
- https://github.com/ceph/ceph/pull/32856
- 12:55 PM Backport #43822 (In Progress): nautilus: Ceph assimilate-conf results in config entries which can...
- 12:50 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- We also have a problem:
{
"os_version_id": "10",
"assert_condition": "z >= signedspan::zero()",
"... - 11:58 AM Bug #43833 (Resolved): shaman on bionic/cromson: cmake error: undefined reference to `pthread_cre...
- I'm getting this with the current master on shaman:...
01/26/2020
- 05:20 PM Bug #43826 (Resolved): osd: leak of from send_lease
- ...
- 05:18 PM Bug #43807: osd-backfill-recovery-log.sh fails
- //a/sage-2020-01-24_23:29:53-rados-wip-sage2-testing-2020-01-24-1408-distro-basic-smithi/4703160
- 05:13 PM Bug #43825 (Need More Info): osd stuck down
- https://github.com/ceph/ceph/pull/32885 to debug
- 05:11 PM Bug #43825: osd stuck down
- huh, also /a/sage-2020-01-24_23:29:53-rados-wip-sage2-testing-2020-01-24-1408-distro-basic-smithi/4703159 osd.7
- 05:07 PM Bug #43825 (Resolved): osd stuck down
- osd stuck at epoch 99, cluster at 2000 or something.
monc fails to reconnect to the mon
/a/sage-2020-01-24_23:2... - 04:33 PM Bug #43810: all/recovery_preemption.yaml hang with down pgs
- /a/sage-2020-01-24_23:29:53-rados-wip-sage2-testing-2020-01-24-1408-distro-basic-smithi/4702992
- 10:40 AM Backport #43822 (Resolved): nautilus: Ceph assimilate-conf results in config entries which can no...
- https://github.com/ceph/ceph/pull/32856
- 10:40 AM Backport #43821 (Resolved): nautilus: nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_...
- https://github.com/ceph/ceph/pull/32908
- 03:54 AM Bug #43552 (Pending Backport): nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_random_osd
- 03:19 AM Bug #43653: test-crash.yaml produce cores
- ...
01/25/2020
- 11:55 AM Backport #43623 (Need More Info): nautilus: pg: fastinfo incorrect when last_update moves backwar...
- @Kefu this is non-trivial because of the crimson cleanup commit - can you take it?
- 11:03 AM Backport #43473 (In Progress): nautilus: recursive lock of OpTracker::lock (70)
- 10:18 AM Backport #43471 (In Progress): nautilus: negative num_objects can set PG_STATE_DEGRADED
- 12:50 AM Bug #39555 (Resolved): backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
- 12:50 AM Backport #41499 (Rejected): mimic: backfill_toofull while OSDs are not full (Unneccessary HEALTH_...
Portions of the original pull request already in Mimic, the rest doesn't make sense without complete backfill full ...
01/24/2020
- 10:52 PM Bug #43807 (Fix Under Review): osd-backfill-recovery-log.sh fails
- 10:36 PM Bug #43807 (In Progress): osd-backfill-recovery-log.sh fails
- 10:22 PM Bug #43807 (Fix Under Review): osd-backfill-recovery-log.sh fails
- 08:00 PM Bug #43807: osd-backfill-recovery-log.sh fails
- /a/sage-2020-01-24_13:15:58-rados-wip-sage2-testing-2020-01-23-1953-distro-basic-smithi/4701051
- 04:06 PM Bug #43807: osd-backfill-recovery-log.sh fails
- The test needs to be updated due to https://github.com/ceph/ceph/pull/32683 - anything else that sets the log lengths...
- 01:21 PM Bug #43807 (Resolved): osd-backfill-recovery-log.sh fails
- ...
- 10:20 PM Backport #41499 (In Progress): mimic: backfill_toofull while OSDs are not full (Unneccessary HEAL...
- 08:05 PM Bug #43296 (Pending Backport): Ceph assimilate-conf results in config entries which can not be re...
- 07:56 PM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- /a/sage-2020-01-24_13:15:58-rados-wip-sage2-testing-2020-01-23-1953-distro-basic-smithi/4700893
- 07:20 PM Bug #43810: all/recovery_preemption.yaml hang with down pgs
- /a/sage-2020-01-24_13:15:58-rados-wip-sage2-testing-2020-01-23-1953-distro-basic-smithi/4700883
- 01:30 PM Bug #43810 (Resolved): all/recovery_preemption.yaml hang with down pgs
- ...
- 04:54 PM Bug #43813 (Need More Info): objecter doesn't send osd_op
- maybe this will help debug: https://github.com/ceph/ceph/pull/32850
- 04:44 PM Bug #43813: objecter doesn't send osd_op
- current thinking: paused = true...
- 02:30 PM Bug #43813 (Resolved): objecter doesn't send osd_op
- /a/sage-2020-01-24_01:55:08-rados-wip-sage4-testing-2020-01-23-1347-distro-basic-smithi/4697914
ceph-client.admin.... - 04:46 PM Backport #43469 (In Progress): nautilus: asynchronous recovery + backfill might spin pg undersize...
- 04:41 PM Backport #43346 (In Progress): nautilus: short pg log + cache tier ceph_test_rados out of order r...
- 04:39 PM Backport #43319 (In Progress): nautilus: PeeringState::GoClean will call purge_strays uncondition...
- 04:25 PM Backport #43256 (In Progress): nautilus: monitor config store: Deleting logging config settings d...
- 04:23 PM Backport #43245 (In Progress): nautilus: osd: increase priority in certain OSD perf counters
- 04:22 PM Backport #43239 (In Progress): nautilus: ok-to-stop incorrect for some ec pgs
- 04:21 PM Backport #43099 (In Progress): nautilus: nautilus:osd: network numa affinity not supporting subne...
- 03:59 PM Bug #17945: ceph_test_rados_api_tier: failed to decode hitset in HitSetWrite test
- ...
- 03:52 PM Backport #43783 (In Progress): nautilus: mgr commands fail when using non-client auth
- 03:43 PM Bug #43795: Ceph tools utilizing "global_[pre_]init" no longer process "early" environment options
- Original post: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/RDQZ6E3XEGLPGYEBAUNV7DVYJUR5DLWR/
- 03:41 PM Bug #43795 (Fix Under Review): Ceph tools utilizing "global_[pre_]init" no longer process "early"...
- 02:08 PM Backport #40891 (Resolved): nautilus: Pool settings aren't populated to OSD after restart.
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32123
m... - 02:02 PM Backport #43530: nautilus: Change default upmap_max_deviation to 5
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31956
m... - 02:02 PM Backport #43529: nautilus: Remove use of rules batching for upmap balancer
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31956
m... - 02:02 PM Backport #43092: nautilus: Improve OSDMap::calc_pg_upmaps() efficiency
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31956
m... - 02:01 PM Backport #42797: nautilus: unnecessary error message "calc_pg_upmaps failed to build overfull/und...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31956
m... - 01:53 PM Bug #24974: Segmentation fault in tcmalloc::ThreadCache::ReleaseToCentralCache()
- ...
- 01:18 PM Bug #42977 (Triaged): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- /a/sage-2020-01-24_01:55:08-rados-wip-sage4-testing-2020-01-23-1347-distro-basic-smithi/4697995
I bet this has a s...
01/23/2020
- 08:24 PM Bug #43795 (Resolved): Ceph tools utilizing "global_[pre_]init" no longer process "early" environ...
- Commit 7f23142f5ccc5ac8153d32b2c9a8353593831967 in PR 20172 [1] dropped the "env_to_vec" calls issued prior to invoki...
- 06:00 PM Bug #43582: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- /a/sage-2020-01-23_15:27:54-rados-wip-sage2-testing-2020-01-23-0635-distro-basic-smithi/4696610
- 04:57 PM Backport #43783 (Resolved): nautilus: mgr commands fail when using non-client auth
- https://github.com/ceph/ceph/pull/32769
- 04:55 PM Backport #43776 (Rejected): nautilus: AssertionError: not all PGs are active or peered 15 seconds...
- 04:55 PM Backport #43772 (Resolved): nautilus: qa/standalone/misc/ok-to-stop.sh occasionally fails
- https://github.com/ceph/ceph/pull/32844
- 09:36 AM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- I also have this problem
I’m running on:
Supermicro X11DPU
Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz
10GBase...
01/22/2020
- 11:10 PM Backport #41500 (Rejected): luminous: backfill_toofull while OSDs are not full (Unneccessary HEAL...
- 11:07 PM Bug #38309 (Resolved): Limit loops waiting for force-backfill/force-recovery to happen
- 11:06 PM Backport #38352 (Rejected): luminous: Limit loops waiting for force-backfill/force-recovery to ha...
- 11:06 PM Bug #38840 (Resolved): snaps missing in mapper, should be: ca was r -2...repaired
- 11:06 PM Backport #39520 (Rejected): luminous: snaps missing in mapper, should be: ca was r -2...repaired
- 11:05 PM Bug #41522 (Resolved): ceph-objectstore-tool can't remove head with bad snapset
- 11:05 PM Backport #41597 (Rejected): luminous: ceph-objectstore-tool can't remove head with bad snapset
- 10:11 PM Bug #43643 (Need More Info): Error ENOTSUP: Some osds belong to multiple subtrees: [0, 1, 2, 3, 4...
- When are you seeing this error? Any related logs will be helpful.
- 08:27 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- The root cause of this issue was introduced in https://github.com/ceph/ceph/pull/30223, which aligns with Jason's com...
- 04:58 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- I'm not really sure what the fix is here. The OSD doesn't really know (and can't really know) that the message comin...
- 04:50 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- Ah, those are two different connections. after sending the first message,...
- 04:37 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- client log is cluster2-client.mirror.3.24765.log
- 12:45 AM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- I reproduced the bug with increased messenger and objecter logging on the client and osd.
Logs: /a/nojha-2020-01-1... - 05:12 PM Bug #43403 (Resolved): unittest_lockdep unreliable
- 04:35 PM Backport #40891: nautilus: Pool settings aren't populated to OSD after restart.
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32123
merged - 02:19 PM Bug #38069: upgrade:jewel-x-luminous with short_pg_log.yaml fails with assert(s <= can_rollback_to)
- Hi all,
what does "jewel to luminous split upgrades" and "boundary conditions" mean?
We're currently in the middl... - 05:40 AM Bug #43602: Core dumps not collected in standalone tests for distros using systemd-coredump
- Changed to a fix to just report misconfiguration
- 05:39 AM Bug #43312 (Resolved): Change default upmap_max_deviation to 5
- 05:39 AM Backport #43530 (Resolved): nautilus: Change default upmap_max_deviation to 5
- 05:33 AM Backport #43726 (In Progress): nautilus: osd-recovery-space.sh has a race
- 05:32 AM Bug #42756 (Resolved): unnecessary error message "calc_pg_upmaps failed to build overfull/underfull"
- 05:31 AM Backport #42797 (Resolved): nautilus: unnecessary error message "calc_pg_upmaps failed to build o...
- 05:30 AM Bug #42718 (Resolved): Improve OSDMap::calc_pg_upmaps() efficiency
- 05:29 AM Backport #43092 (Resolved): nautilus: Improve OSDMap::calc_pg_upmaps() efficiency
- 05:28 AM Backport #43246 (In Progress): nautilus: Nearfull warnings are incorrect
- 05:17 AM Bug #43307 (Resolved): Remove use of rules batching for upmap balancer
- 05:17 AM Backport #43529 (Resolved): nautilus: Remove use of rules batching for upmap balancer
- 04:35 AM Bug #43752 (New): Master tracker for upmap performance improvements
I put this bug in rados project because most of the actual code is in OSDMap::calc_pg_upmaps().- 12:15 AM Bug #42566: mgr commands fail when using non-client auth
- nautilus backport: https://github.com/ceph/ceph/pull/32769
- 12:11 AM Bug #42566 (Pending Backport): mgr commands fail when using non-client auth
- ah, this does need to be backported. see #42666
- 12:12 AM Bug #42666 (Duplicate): mgropen from mgr comes from unknown.$id instead of mgr.$id
- The problem is actually the same as #42566: the second/additional mgrc instance is sending the mgropen based on the !...
01/21/2020
- 11:10 PM Backport #43530: nautilus: Change default upmap_max_deviation to 5
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/31956
merged - 11:10 PM Bug #43296: Ceph assimilate-conf results in config entries which can not be removed
- adding nautilus backport since the bug was reported against that version
- 11:10 PM Backport #43529: nautilus: Remove use of rules batching for upmap balancer
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/31956
merged - 11:10 PM Backport #43092: nautilus: Improve OSDMap::calc_pg_upmaps() efficiency
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/31956
merged - 11:10 PM Backport #42797: nautilus: unnecessary error message "calc_pg_upmaps failed to build overfull/und...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31956
merged - 10:49 PM Backport #43232 (Need More Info): nautilus: pgs stuck in laggy state
- supposed to bake in master for a couple months
- 09:17 PM Bug #43403 (Fix Under Review): unittest_lockdep unreliable
- 09:10 PM Bug #43552 (Fix Under Review): nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_random_osd
- 07:17 PM Bug #42666: mgropen from mgr comes from unknown.$id instead of mgr.$id
- compare to vstart, on that same version,...
- 07:16 PM Bug #42666: mgropen from mgr comes from unknown.$id instead of mgr.$id
- from reesi001, with a fresh mgr restart,...
- 03:19 PM Bug #43721 (Pending Backport): qa/standalone/misc/ok-to-stop.sh occasionally fails
- 04:20 AM Bug #43656 (Pending Backport): AssertionError: not all PGs are active or peered 15 seconds after ...
01/20/2020
- 10:54 PM Bug #43296 (Fix Under Review): Ceph assimilate-conf results in config entries which can not be re...
- 10:22 PM Bug #41255 (Resolved): backfill_toofull seen on cluster where the most full OSD is at 1%
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:18 PM Backport #43731 (Resolved): nautilus: mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
- https://github.com/ceph/ceph/pull/32905
- 10:17 PM Backport #43726 (Resolved): nautilus: osd-recovery-space.sh has a race
- https://github.com/ceph/ceph/pull/32774
- 10:10 PM Backport #41584 (Resolved): mimic: backfill_toofull seen on cluster where the most full OSD is at 1%
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32361
m... - 08:01 PM Backport #41584: mimic: backfill_toofull seen on cluster where the most full OSD is at 1%
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32361
merged - 09:39 PM Backport #43531 (Resolved): mimic: Change default upmap_max_deviation to 5
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31957
m... - 09:39 PM Backport #43094 (Resolved): mimic: Improve OSDMap::calc_pg_upmaps() efficiency
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31957
m... - 08:00 PM Backport #43094: mimic: Improve OSDMap::calc_pg_upmaps() efficiency
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/31957
merged - 09:39 PM Backport #42798 (Resolved): mimic: unnecessary error message "calc_pg_upmaps failed to build over...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31957
m... - 08:00 PM Backport #42798: mimic: unnecessary error message "calc_pg_upmaps failed to build overfull/underf...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31957
merged - 08:00 PM Backport #43530: nautilus: Change default upmap_max_deviation to 5
- https://github.com/ceph/ceph/pull/31957https://github.com/ceph/ceph/pull/31957 merged
- 07:25 PM Bug #43721 (Fix Under Review): qa/standalone/misc/ok-to-stop.sh occasionally fails
- 07:15 PM Bug #43721 (Resolved): qa/standalone/misc/ok-to-stop.sh occasionally fails
- ...
- 06:03 PM Bug #38345 (Fix Under Review): mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- 05:53 PM Bug #43397 (Resolved): FS_DEGRADED to cluster log despite --no-mon-health-to-clog
- 03:48 PM Bug #43656 (Fix Under Review): AssertionError: not all PGs are active or peered 15 seconds after ...
- 03:40 PM Bug #43656: AssertionError: not all PGs are active or peered 15 seconds after marking out OSDs
- /a/sage-2020-01-20_14:10:17-rados:thrash-erasure-code-wip-sage-testing-2020-01-19-1713-distro-basic-smithi/4688160
- 12:13 AM Bug #43653 (Resolved): test-crash.yaml produce cores
01/19/2020
- 05:49 PM Bug #43653 (Fix Under Review): test-crash.yaml produce cores
- 03:43 PM Bug #42918 (Closed): memory corruption and lockups with I-Object
- this was reverted in master
- 03:15 PM Bug #39398 (Duplicate): osd: fast_info need update when pglog rewind
- 03:12 AM Bug #43404 (Pending Backport): mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
01/18/2020
- 08:39 PM Bug #43592 (Pending Backport): osd-recovery-space.sh has a race
- 06:53 PM Bug #43422 (Resolved): qa/standalone/mon/osd-pool-create.sh fails to grep utf8 pool name
- 02:56 PM Bug #42666 (In Progress): mgropen from mgr comes from unknown.$id instead of mgr.$id
- 10:39 AM Backport #43652 (In Progress): mimic: Improve upmap change reporting in logs
- 10:38 AM Backport #43650 (In Progress): nautilus: Improve upmap change reporting in logs
- 10:32 AM Backport #42878 (Resolved): nautilus: ceph_test_admin_socket_output fails in rados qa suite
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32063
m... - 10:20 AM Backport #43620 (In Progress): nautilus: mon shutdown timeout (race with async compaction)
- 09:33 AM Backport #24471: luminous: Ceph-osd crash when activate SPDK
- Thanks, Kefu
- 05:51 AM Backport #24471: luminous: Ceph-osd crash when activate SPDK
- i should not have added it to luminous. as it's 44851549bbc58520a32c15a7db5097b5e44dd53f which introduced queue_t.
...
01/17/2020
- 11:30 PM Bug #43656: AssertionError: not all PGs are active or peered 15 seconds after marking out OSDs
- In this case, the workload happened to delete the old pool/pgs and create a new one right before the check, so the ne...
- 11:29 PM Bug #43656 (Resolved): AssertionError: not all PGs are active or peered 15 seconds after marking ...
- ...
- 10:17 PM Backport #43651 (Resolved): luminous: Improve upmap change reporting in logs
- 07:50 PM Backport #43651 (In Progress): luminous: Improve upmap change reporting in logs
- 07:49 PM Backport #43651 (Resolved): luminous: Improve upmap change reporting in logs
- https://github.com/ceph/ceph/pull/32666
- 09:10 PM Bug #43653 (Resolved): test-crash.yaml produce cores
- /a/sage-2020-01-17_18:51:12-rados-wip-sage-testing-2020-01-17-1009-distro-basic-smithi/4678354
- 08:21 PM Bug #43422 (Fix Under Review): qa/standalone/mon/osd-pool-create.sh fails to grep utf8 pool name
- 07:49 PM Backport #43652 (Resolved): mimic: Improve upmap change reporting in logs
- https://github.com/ceph/ceph/pull/32717
- 07:49 PM Backport #43650 (Resolved): nautilus: Improve upmap change reporting in logs
- https://github.com/ceph/ceph/pull/32716
- 07:48 PM Bug #41016 (Pending Backport): Improve upmap change reporting in logs
- 04:28 AM Bug #43643: Error ENOTSUP: Some osds belong to multiple subtrees: [0, 1, 2, 3, 4, 5, 6, 7, 8]
- crush map dump file.
- 04:25 AM Bug #43643 (Need More Info): Error ENOTSUP: Some osds belong to multiple subtrees: [0, 1, 2, 3, 4...
- Error ENOTSUP: Some osds belong to multiple subtrees: [0, 1, 2, 3, 4, 5, 6, 7, 8]
verison:ceph 12.2.11
I did no...
01/16/2020
- 04:49 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
- Hi @Frank - please open a new bug report and include the output of "ceph health detail" in your bug description. This...
- 02:52 PM Bug #38296 (Resolved): segv in fgets() in collect_sys_info reading /proc/cpuinfo
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 02:46 PM Backport #43632 (Rejected): luminous: segv in collect_sys_info
- 02:46 PM Backport #43631 (Resolved): nautilus: segv in collect_sys_info
- https://github.com/ceph/ceph/pull/32901
- 02:46 PM Backport #43630 (Resolved): mimic: segv in collect_sys_info
- https://github.com/ceph/ceph/pull/32902
- 02:45 PM Backport #43623 (Rejected): nautilus: pg: fastinfo incorrect when last_update moves backward in time
- 02:44 PM Backport #43622 (Rejected): mimic: pg: fastinfo incorrect when last_update moves backward in time
- 02:44 PM Backport #43621 (Rejected): luminous: pg: fastinfo incorrect when last_update moves backward in time
- 02:44 PM Backport #43620 (Resolved): nautilus: mon shutdown timeout (race with async compaction)
- https://github.com/ceph/ceph/pull/32715
- 01:01 PM Bug #43403: unittest_lockdep unreliable
- Also happening in nautilus PRs
- 10:59 AM Documentation #20867 (Closed): OSD::build_past_intervals_parallel()'s comment is stale
- This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio...
- 10:52 AM Documentation #16356: doc: manual deployment of ceph monitor needs fix
- I'll take this one and look into it.
- 10:02 AM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- Hello Greg,
I have digged a little bit deeper between the appearance of the crashes and the states of our machines... - 06:03 AM Bug #41313: PG distribution completely messed up since Nautilus
- After being told on the ML that my low PG count is supposed to be the reason, I raised it. Nothing changed, still sam...
- 02:07 AM Bug #41016: Improve upmap change reporting in logs
- We aren't going to add logging to _apply_upmap() because it is invasive to get a CephContext into that function to be...
- 12:59 AM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- Looking at the logs that Jason added http://qa-proxy.ceph.com/teuthology/jdillaman-2020-01-09_16:20:49-rbd-wip-jd-tes...
01/15/2020
- 11:45 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- Hmm, what hardware are you guys running this on?
We consulted with our C++ time guru and the current theory is a b... - 10:07 PM Bug #43584: MON_DOWN during mon_join process
- I'm pretty sure this is a test issue, since we don't make guarantees about monitor elections, especially on first boo...
- 09:04 PM Bug #43404 (Fix Under Review): mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
- 07:33 PM Bug #41016 (In Progress): Improve upmap change reporting in logs
- 06:24 PM Bug #41016: Improve upmap change reporting in logs
The function _apply_upmap() would be getting called very frequently. This makes logging there problematic. Also, ...- 06:58 PM Bug #43587 (Pending Backport): mon shutdown timeout (race with async compaction)
- 02:48 PM Bug #43587 (Fix Under Review): mon shutdown timeout (race with async compaction)
- 04:54 PM Backport #42878: nautilus: ceph_test_admin_socket_output fails in rados qa suite
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32063
merged - 01:50 PM Bug #42668 (Need More Info): ceph daemon osd.* fails in osd container but ceph daemon mds.* does ...
- 11:51 AM Feature #42638: Allow specifying pg_autoscale_mode when creating a new pool
- I really think this issue should be closed.
I tested with the autoscaler globally set to on. Tested the pool creatio... - 06:11 AM Bug #43602 (Fix Under Review): Core dumps not collected in standalone tests for distros using sys...
- 05:19 AM Bug #43602 (Won't Fix): Core dumps not collected in standalone tests for distros using systemd-co...
- 04:53 AM Bug #43306 (Pending Backport): segv in collect_sys_info
- 04:19 AM Bug #43580 (Pending Backport): pg: fastinfo incorrect when last_update moves backward in time
01/14/2020
- 09:28 PM Bug #40649: set_mon_vals failed to set cluster_network = 10.1.2.0/24: Configuration option 'clust...
- FYI, I was able to remove the config settings with:
$ ceph config rm <who> <what>
followed by
$ ceph config ... - 08:13 PM Bug #43485 (Resolved): Deprecated full/nearfull added back by mistake
- 03:46 PM Bug #43582 (In Progress): rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- 03:46 PM Bug #43582: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- i double checked update_mgrmap() in ceph_monstore_tool.cc. which is called when handling rebuild subcommand. will try...
- 01:50 PM Bug #43597: stuck waiting for pg to advance to epoch
- 1.c...
- 01:36 PM Bug #43597 (New): stuck waiting for pg to advance to epoch
- ...
- 08:58 AM Bug #43306: segv in collect_sys_info
- https://github.com/ceph/ceph/pull/32630 is posted to avoid using fgets().
- 08:40 AM Documentation #4568 (Closed): FAQ entry for changing journal size/moving journal
- This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio...
- 08:39 AM Documentation #3466 (Closed): rados manpage: bench still documents "read" rather than "seq/rand"
- This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio...
- 08:37 AM Documentation #3447 (Closed): doc: how to recover from a failed journal device
- This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio...
- 08:36 AM Documentation #3218 (Closed): Doc: osdmaptool manpage out of date with code *and* usage
- This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio...
- 08:35 AM Documentation #3166 (Closed): doc: Explain OSD up/down, in/out: what does it mean, where does it ...
- This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio...
- 08:34 AM Documentation #3054 (Closed): doc: omap, tmap, xattrs
- This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio...
- 08:32 AM Documentation #2272 (Closed): FAQs: RADOS reliability and availability
- This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prio...
- 02:35 AM Bug #43592 (Resolved): osd-recovery-space.sh has a race
The function wait_for_state() returns success when there are no PGs in a selected state. The test's purpose of wai...
01/13/2020
- 10:06 PM Backport #43532 (Resolved): luminous: Change default upmap_max_deviation to 5
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32586
m... - 10:05 PM Backport #39474 (Resolved): luminous: segv in fgets() in collect_sys_info reading /proc/cpuinfo
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32349
m... - 10:04 PM Backport #41730 (Resolved): luminous: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_mis...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31855
m... - 08:28 PM Bug #43591 (New): /sbin/fstrim can interfere with umount
- ...
- 08:13 PM Bug #43306 (Fix Under Review): segv in collect_sys_info
- 08:11 PM Bug #43306: segv in collect_sys_info
- #38296 changed the buffer to 1024 chars, but /proc/cpuinfo can be bigger than that, too. On smithi (8 CPUs), it's 9...
- 02:47 PM Bug #43587 (Resolved): mon shutdown timeout (race with async compaction)
- ...
- 02:42 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
- FWIW, I am seeing this issue after an upgrade from 12.2.12 to 14.2.6.
The status is HEALTH_WARN not HEALTH_ERR but... - 02:21 PM Bug #43404: mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
- /a/sage-2020-01-12_21:37:03-rados-wip-sage-testing-2020-01-12-0621-distro-basic-smithi/4660728...
- 02:16 PM Bug #43584 (Resolved): MON_DOWN during mon_join process
- /a/sage-2020-01-12_21:37:03-rados-wip-sage-testing-2020-01-12-0621-distro-basic-smithi/4660691...
- 02:02 PM Bug #43582 (Resolved): rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- ...
- 01:39 PM Bug #43580 (Fix Under Review): pg: fastinfo incorrect when last_update moves backward in time
- 01:05 PM Bug #43580 (Resolved): pg: fastinfo incorrect when last_update moves backward in time
- If, during peering, last_update moves backwards, we may rewrite the full info but leave a fastinfo record in place wi...
- 12:24 PM Bug #42821 (Resolved): src/msg/async/net_handler.cc: Fix compilation
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 12:23 PM Bug #43454 (Resolved): ceph monitor crashes after updating 'mon_memory_target' config setting.
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 12:19 PM Backport #43495 (Resolved): nautilus: ceph monitor crashes after updating 'mon_memory_target' con...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32520
m... - 12:13 PM Backport #42997 (Resolved): nautilus: acting_recovery_backfill won't catch all up peers
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32064
m... - 12:13 PM Backport #42853 (Resolved): nautilus: format error: ceph osd stat --format=json
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32062
m... - 12:12 PM Backport #42846 (Resolved): nautilus: src/msg/async/net_handler.cc: Fix compilation
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31736
m... - 07:38 AM Bug #43555: raw usage is far from total pool usage
- only overwrite file
01/12/2020
- 09:29 PM Backport #43532: luminous: Change default upmap_max_deviation to 5
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/32586
merged
01/10/2020
- 11:36 PM Bug #43555 (New): raw usage is far from total pool usage
- ceph -v
ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)... - 10:32 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- http://pulpito.ceph.com/nojha-2020-01-10_19:11:03-rbd:mirror-thrash-master-distro-basic-smithi/4653675/
Observatio... - 08:30 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- Reproduces with -s rbd:mirror-thrash and --filter 'rbd-mirror-fsx-workunit'
http://pulpito.ceph.com/nojha-2020-01-... - 10:03 PM Bug #43553 (Can't reproduce): mon: client mon_status fails
- ...
- 09:07 PM Bug #40649: set_mon_vals failed to set cluster_network = 10.1.2.0/24: Configuration option 'clust...
- This also happened to me during an upgrade from Luminous to Nautilus.
The cluster/public networks were not defined... - 07:26 PM Bug #43552 (Resolved): nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_random_osd
- ...
- 02:45 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- We are also running into this issue.
Jan 07 19:03:42 pmxc05 ceph-mon[3701783]: 2020-01-07 19:03:42.625 7fe59c03d... - 01:39 PM Bug #39665 (Resolved): kstore: memory may leak on KStore::_do_read_stripe
- 01:34 PM Bug #43412 (Resolved): cephadm ceph_manager IndexError: list index out of range
- 04:55 AM Backport #43532 (In Progress): luminous: Change default upmap_max_deviation to 5
- 04:54 AM Backport #43531 (In Progress): mimic: Change default upmap_max_deviation to 5
01/09/2020
- 10:00 PM Bug #42328 (New): osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- This issue is still occurring with today's master branch:
http://qa-proxy.ceph.com/teuthology/jdillaman-2020-01-09... - 04:56 PM Backport #43495: nautilus: ceph monitor crashes after updating 'mon_memory_target' config setting.
- Sridhar Seshasayee wrote:
> https://github.com/ceph/ceph/pull/32520
merged - 02:28 AM Bug #43412 (Fix Under Review): cephadm ceph_manager IndexError: list index out of range
- 12:39 AM Backport #43529 (In Progress): nautilus: Remove use of rules batching for upmap balancer
- 12:27 AM Backport #43529 (Resolved): nautilus: Remove use of rules batching for upmap balancer
- https://github.com/ceph/ceph/pull/31956
- 12:39 AM Backport #43530 (In Progress): nautilus: Change default upmap_max_deviation to 5
- 12:28 AM Backport #43530 (Resolved): nautilus: Change default upmap_max_deviation to 5
- https://github.com/ceph/ceph/pull/31956
- 12:28 AM Backport #43532 (Resolved): luminous: Change default upmap_max_deviation to 5
- https://github.com/ceph/ceph/pull/32586
- 12:28 AM Backport #43531 (Resolved): mimic: Change default upmap_max_deviation to 5
- https://github.com/ceph/ceph/pull/31957
01/08/2020
- 10:23 PM Bug #43312 (Pending Backport): Change default upmap_max_deviation to 5
- 10:10 PM Bug #43307 (Pending Backport): Remove use of rules batching for upmap balancer
- 10:09 PM Bug #43397 (Fix Under Review): FS_DEGRADED to cluster log despite --no-mon-health-to-clog
- 10:04 PM Bug #43412: cephadm ceph_manager IndexError: list index out of range
- Kefu's got a PR for this
- 05:31 AM Bug #43412: cephadm ceph_manager IndexError: list index out of range
- I'm guessing it's caused by there being no pools at the time. So the random choice fails. Maybe we need to do somethi...
- 10:02 PM Bug #43422: qa/standalone/mon/osd-pool-create.sh fails to grep utf8 pool name
- probably need to set LANG to utf8
- 08:23 AM Bug #43185: ceph -s not showing client activity
- We run 14.2.4. I see mgr process at 100% sometimes and I been told that the reason for lack of activity show might be...
- 02:24 AM Bug #43520 (In Progress): segfault in kstore's pending stripes
- 02:23 AM Bug #43520: segfault in kstore's pending stripes
- ceph version 14.2.1-700.3.0.2.407 (c823e6bbf85437561d2165c0f4b5d8c6bd726975) nautilus (stable)
1: (()+0xf5e0) [0x7f... - 02:20 AM Bug #43520 (In Progress): segfault in kstore's pending stripes
01/07/2020
- 02:46 PM Documentation #41389 (Resolved): wrong datatype describing crush_rule
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 02:45 PM Bug #42177 (Resolved): osd/PrimaryLogPG.cc: 13068: FAILED ceph_assert(obc)
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 02:43 PM Bug #42906 (Resolved): ceph-mon --mkfs: public_address type (v1|v2) is not respected
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:25 AM Backport #43495 (In Progress): nautilus: ceph monitor crashes after updating 'mon_memory_target' ...
- 10:24 AM Backport #43495 (New): nautilus: ceph monitor crashes after updating 'mon_memory_target' config s...
- 10:01 AM Backport #43495 (Resolved): nautilus: ceph monitor crashes after updating 'mon_memory_target' con...
- https://github.com/ceph/ceph/pull/32520
- 09:34 AM Bug #43454: ceph monitor crashes after updating 'mon_memory_target' config setting.
- Tested the fix without using rocksdb and confirmed that the crash is not observed now:
2020-01-07T12:53:09.942+053... - 08:41 AM Bug #43454 (Pending Backport): ceph monitor crashes after updating 'mon_memory_target' config set...
- 02:46 AM Backport #39474: luminous: segv in fgets() in collect_sys_info reading /proc/cpuinfo
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32349
merged - 02:45 AM Backport #41730: luminous: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31855
merged
Also available in: Atom