Activity
From 01/26/2020 to 02/24/2020
02/24/2020
- 10:38 PM Bug #24835 (Can't reproduce): osd daemon spontaneous segfault
- 07:42 PM Bug #44076 (Pending Backport): mon: update + monmap update triggers spawn loop
- 07:36 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- https://github.com/ceph/ceph/pull/33470 - fixing the order of msgr2 vs nautilus install is the first step here.
- 05:48 PM Bug #44248: Receiving RemoteBackfillReserved in WaitLocalBackfillReserved can cause the osd to crash
- ...
- 04:25 PM Bug #44275 (Fix Under Review): NameError: name 'retval' is not defined
- 04:17 PM Bug #44275 (Resolved): NameError: name 'retval' is not defined
- ...
- 03:50 PM Bug #42830: problem returning mon to cluster
- I noticed there is very little osdmap caching in the leader mon -- here we see only 1 single osdmap in the mempool.
... - 05:45 AM Backport #44259 (In Progress): nautilus: Slow Requests/OP's types not getting logged
- 05:03 AM Backport #44259 (Resolved): nautilus: Slow Requests/OP's types not getting logged
- https://github.com/ceph/ceph/pull/33503
- 05:24 AM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- More ftr: the corruption occurs in the crush part of the osdmap:...
- 05:16 AM Bug #43975 (Pending Backport): Slow Requests/OP's types not getting logged
02/23/2020
- 10:08 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- Likely related....
- 09:29 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- Adding crash signature (cf2864eb1281dffc3340730dc2caae163b4c0170132bcbd3dcbd6147d8f29fa8) for the crash described in ...
- 09:05 PM Bug #43861: ceph_test_rados_watch_notify hang
- ...
- 02:29 PM Bug #41313: PG distribution completely messed up since Nautilus
- ...
- 12:13 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- A bit more about our incident ftr.
The cluster has 1301 osds in total: 752 filestore and 549 bluestore. The filest...
02/22/2020
- 04:47 PM Bug #44248 (Resolved): Receiving RemoteBackfillReserved in WaitLocalBackfillReserved can cause th...
- ...
- 01:25 PM Backport #44206: nautilus: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- Started a backport here https://github.com/ceph/ceph/pull/33483
- 09:49 AM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- > o->decode(obl); <------ HERE
I have gdb working now on a coredump so can confirm that:... - 01:00 AM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- ^^ Is a weird red-herring. The FFFFFFFF is because the osdmap contains the crc32c in the last 4 bytes, so that cancel...
- 01:23 AM Bug #43914: nautilus: ceph tell command times out
- This is on nautilus: /a/nojha-2020-02-21_20:34:10-upgrade:mimic-x:stress-split-nautilus-distro-basic-smithi/4788575/
... - 01:14 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- /a/sage-2020-02-21_21:08:33-rados-wip-sage3-testing-2020-02-21-1218-distro-basic-smithi/4788714...
02/21/2020
- 10:48 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- Found something. The crc32c for all my *good* maps is FFFFFFFF (and I assure you they are different maps.. gsutil out...
- 10:16 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- Just to provide the same update I gave to Dan van der Ster over email:
IIRC, we saw this 1-2 times more after the ... - 09:42 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- This is continuing to happen for us. Log file here.
ceph-post-file: 589aa7aa-7a80-49a2-ba55-376e467c4550 - 10:19 PM Bug #42830: problem returning mon to cluster
- Seeing the same here in 13.2.8 starting a new empty mon. Leader's CPU goes to 100%, until an election is called then ...
- 09:03 PM Bug #44243 (Can't reproduce): memstore make check test fails
- ...
- 01:29 PM Bug #42328 (New): osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- It looks like this is still occurring even with a branch that included 8182f52149: http://qa-proxy.ceph.com/teutholo...
- 01:21 PM Bug #42347: nautilus assert during osd shutdown: FAILED ceph_assert((sharded_in_flight_list.back(...
- Bastian Mäuser wrote:
> This is still an issue on 14.2.6 (at least the one shipped with proxmox)
It will appear i... - 12:49 AM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
- FTR this looks identical to https://tracker.ceph.com/issues/39525#note-6
- 12:25 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- So the timeout, as previously mentioned, was 10 seconds although osd_default_notify_timeout is 30 seconds by default....
02/20/2020
- 07:02 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- ok, the first crash isn't becuase we just got bad data.. it's because we just read bad data off of disk. see:...
- 04:09 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- Notes from CERN incident:
- identical corruption, different OSDmaps on different OSDs:... - 05:40 PM Bug #44229 (New): monclient: _check_auth_rotating possible clock skew, rotating keys expired way ...
- seems to affect cephadm bootstrap tests
first, the error message doesn't make sense, since the bound 2020-02-20T16... - 12:20 PM Bug #44184: Slow / Hanging Ops after pool creation
- Neha Ojha wrote:
> Hi Wido,
>
> I did come across something like this while investigating https://tracker.ceph.co... - 12:42 AM Bug #44217 (Can't reproduce): Leaked connection (alloc from AsyncMessenger::add_accept)
- ...
02/19/2020
- 11:42 PM Bug #44076 (Fix Under Review): mon: update + monmap update triggers spawn loop
- 10:45 PM Bug #44157 (Resolved): cli throws bad exceptoin on control-c
- 10:11 PM Bug #44120 (Need More Info): NVMEDevice failed in certain NVMe Disk
- Can you attach logs from the crash? Which version are using?
- 10:08 PM Bug #44184 (Need More Info): Slow / Hanging Ops after pool creation
- Hi Wido,
I did come across something like this while investigating https://tracker.ceph.com/issues/43048. It was a... - 07:18 PM Bug #44184: Slow / Hanging Ops after pool creation
- On the Ceph users list there are multiple reports of people experiencing this:
- https://www.spinics.net/lists/cep... - 04:55 PM Bug #37656 (New): FileStore::_do_transaction() crashed with error 17 (merge collection vs osd res...
- /a/teuthology-2020-02-11_02:30:03-upgrade:mimic-x-nautilus-distro-basic-smithi/4753470/
upgrade:mimic-x/stress-spl... - 11:00 AM Bug #43151 (Resolved): ok-to-stop incorrect for some ec pgs
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 11:00 AM Bug #43721 (Resolved): qa/standalone/misc/ok-to-stop.sh occasionally fails
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:59 AM Backport #44206 (Resolved): nautilus: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap:...
- https://github.com/ceph/ceph/pull/33530
02/18/2020
- 07:55 PM Bug #43903 (Pending Backport): osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- 07:52 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- ...
- 04:43 PM Bug #44184 (Need More Info): Slow / Hanging Ops after pool creation
- On a cluster with 1405 OSDs I've ran into a situation for the second time now where a pool creation resulted into mas...
- 10:28 AM Backport #44085 (Resolved): nautilus: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33278
m... - 10:28 AM Backport #44082 (Resolved): nautilus: expected MON_CLOCK_SKEW but got none
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33276
m... - 10:27 AM Backport #43772 (Resolved): nautilus: qa/standalone/misc/ok-to-stop.sh occasionally fails
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32844
m... - 10:27 AM Backport #43239 (Resolved): nautilus: ok-to-stop incorrect for some ec pgs
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32844
m...
02/17/2020
- 11:45 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- /ceph/teuthology-archive/pdonnell-2020-02-15_16:51:06-fs-wip-pdonnell-testing-20200215.033325-distro-basic-smithi/476...
- 07:08 PM Backport #42662 (In Progress): nautilus:Issue a HEALTH_WARN when a Pool is configured with [min_]...
- 02:16 PM Bug #42347: nautilus assert during osd shutdown: FAILED ceph_assert((sharded_in_flight_list.back(...
- This is still an issue on 14.2.6 (at least the one shipped with proxmox)
02/16/2020
02/15/2020
- 03:11 PM Bug #44157 (Fix Under Review): cli throws bad exceptoin on control-c
- 02:37 PM Bug #44041 (Resolved): osd: MLease in stray state -> Crashed
- 02:37 PM Bug #42328 (Resolved): osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- 02:36 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- /a/sage-2020-02-15_04:59:38-rados-wip-sage3-testing-2020-02-14-1951-distro-basic-smithi/4765960
- 02:56 AM Bug #43975: Slow Requests/OP's types not getting logged
- Before and after logs to show the extra information relating to slow op/types:
Before:
--------...
02/14/2020
- 09:58 PM Bug #43975 (Fix Under Review): Slow Requests/OP's types not getting logged
- 08:22 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- Neha Ojha wrote:
> pg3.4, which is stuck in "peering" shows similar behavior as https://tracker.ceph.com/issues/4304... - 03:28 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- pg3.4, which is stuck in "peering" shows similar behavior as https://tracker.ceph.com/issues/43048#note-15
osd.10 ... - 07:24 PM Bug #44156 (Fix Under Review): RenewLease sent to pre-octopus osds during upgrade
- 05:20 PM Bug #44156 (Resolved): RenewLease sent to pre-octopus osds during upgrade
- ...
- 05:35 PM Bug #44157 (Resolved): cli throws bad exceptoin on control-c
- ...
- 05:23 PM Backport #44085: nautilus: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33278
merged - 05:23 PM Backport #44082: nautilus: expected MON_CLOCK_SKEW but got none
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33276
merged - 05:22 PM Backport #43772: nautilus: qa/standalone/misc/ok-to-stop.sh occasionally fails
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32844
merged - 05:22 PM Backport #43239: nautilus: ok-to-stop incorrect for some ec pgs
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32844
merged - 02:48 PM Backport #43996 (Need More Info): mimic: Ceph tools utilizing "global_[pre_]init" no longer proce...
- should be based on the nautilus backport
- 02:37 PM Backport #42662 (New): nautilus:Issue a HEALTH_WARN when a Pool is configured with [min_]size == 1
- Changed status to re-attempt the backport.
- 02:23 PM Backport #43621: luminous: pg: fastinfo incorrect when last_update moves backward in time
- nautilus backport is marked non-trivial, so this one is also non-trivial
- 02:23 PM Backport #43622 (Need More Info): mimic: pg: fastinfo incorrect when last_update moves backward i...
- nautilus backport is marked non-trivial, so this one is also non-trivial
- 02:21 PM Backport #43472 (In Progress): mimic: negative num_objects can set PG_STATE_DEGRADED
- 02:19 PM Backport #43470 (In Progress): mimic: asynchronous recovery + backfill might spin pg undersized f...
- 02:18 PM Backport #43320 (In Progress): mimic: PeeringState::GoClean will call purge_strays unconditionally
- 12:56 PM Backport #43257 (In Progress): mimic: monitor config store: Deleting logging config settings does...
- 12:50 PM Backport #42996 (In Progress): luminous: acting_recovery_backfill won't catch all up peers
- 12:40 PM Backport #42998 (In Progress): mimic: acting_recovery_backfill won't catch all up peers
- 12:28 PM Backport #42879 (In Progress): mimic: ceph_test_admin_socket_output fails in rados qa suite
- 12:26 PM Backport #42852 (In Progress): mimic: format error: ceph osd stat --format=json
- 09:29 AM Bug #43296 (Resolved): Ceph assimilate-conf results in config entries which can not be removed
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:29 AM Bug #43404 (Resolved): mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:28 AM Bug #43552 (Resolved): nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_random_osd
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:28 AM Bug #43892 (Resolved): mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during n->o upg...
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:26 AM Backport #43879 (Resolved): nautilus: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33152
m... - 09:26 AM Backport #43821 (Resolved): nautilus: nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32908
m... - 09:25 AM Backport #43916 (Resolved): nautilus: mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) ...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33155
m... - 09:25 AM Backport #43989 (Resolved): nautilus: osd: Allow 64-char hostname to be added as the "host" in CRUSH
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33147
m... - 09:24 AM Backport #43928 (Resolved): nautilus: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33007
m... - 09:23 AM Backport #43731 (Resolved): nautilus: mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32905
m... - 09:23 AM Backport #43822 (Resolved): nautilus: Ceph assimilate-conf results in config entries which can no...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32856
m... - 06:16 AM Bug #44120: NVMEDevice failed in certain NVMe Disk
- I tested that my NVME card can create at most 6 pair queues.
- 02:48 AM Bug #43861: ceph_test_rados_watch_notify hang
- Almost certainly the same issue as #44062
02/13/2020
- 11:19 PM Bug #43124 (Resolved): Probably legal crush rules cause upmaps to be cleaned
- 11:05 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- Reproduced and I see #43808 while doing so so I'm going to treat them as related for now at least.
I think we can ... - 02:53 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- I can't reproduce this so far. If anyone can reproduce it reliably maybe we could try increasing the notify timeout i...
- 02:22 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- Ah, that's right, from memory these Warnings are related to valgrind. Valgrind is also notorious for slowing things d...
- 02:02 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- When trying to reproduce I am seeing a *lot* of these which may, or may not, be related....
- 10:48 PM Feature #44131 (New): Add AAAA DNS record for drop.ceph.com
- drop.ceph.com is only reachable through IPv4 because of a lack of a IPv6 DNS record (AAAA). For IPv6 only clusters th...
- 08:15 PM Backport #43879: nautilus: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33152
merged - 08:12 PM Backport #43821: nautilus: nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_random_osd
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32908
merged - 08:09 PM Backport #43916: nautilus: mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during n->o...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33155
merged - 08:08 PM Backport #43989: nautilus: osd: Allow 64-char hostname to be added as the "host" in CRUSH
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33147
merged - 07:35 PM Backport #43928: nautilus: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33007
merged - 07:30 PM Backport #43731: nautilus: mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32905
merged - 07:29 PM Backport #43822: nautilus: Ceph assimilate-conf results in config entries which can not be removed
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32856
merged - 06:08 PM Feature #44025 (In Progress): Make it harder to set pool replica size to 1
- 03:43 PM Bug #43975: Slow Requests/OP's types not getting logged
- The logging to cluster logs was removed as part of re-factoring effort in mimic. Here are the commits of interest:
... - 02:21 PM Backport #44085 (In Progress): nautilus: rebuild-mondb doesn't populate mgr commands -> pg dump E...
- 02:19 PM Bug #44120 (Need More Info): NVMEDevice failed in certain NVMe Disk
- I got the error as following:
nvme_ctrlr.c: 308:spdk_nvme_ctrlr_alloc_io_qpair: *ERROR*: No free I/O queue IDs
Th... - 02:15 PM Backport #44082 (In Progress): nautilus: expected MON_CLOCK_SKEW but got none
- 02:09 PM Backport #44081 (In Progress): nautilus: ceph -s does not show >32bit pg states
- 02:06 PM Backport #43346: nautilus: short pg log + cache tier ceph_test_rados out of order reply
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32848
m... - 02:06 PM Backport #43346 (Resolved): nautilus: short pg log + cache tier ceph_test_rados out of order reply
- 11:54 AM Backport #43346 (In Progress): nautilus: short pg log + cache tier ceph_test_rados out of order r...
- 02:03 PM Backport #43852 (In Progress): nautilus: osd-scrub-snaps.sh fails
- 11:31 AM Backport #43997 (In Progress): nautilus: Ceph tools utilizing "global_[pre_]init" no longer proce...
- 10:27 AM Bug #44089 (Fix Under Review): mon: --format=json does not work for config get or show
- 08:37 AM Bug #44072: Add new Bluestore OSDs to Filestore cluster leads to scrub errors (union_shard_errors...
- grep for checking ASCII-only names:...
- 08:35 AM Bug #44072: Add new Bluestore OSDs to Filestore cluster leads to scrub errors (union_shard_errors...
- Hi, David
> Do all the objects with missing copies have names that included multi-byte characters?
yes, most of... - 12:29 AM Bug #44072: Add new Bluestore OSDs to Filestore cluster leads to scrub errors (union_shard_errors...
Two questions:
Do all the objects with missing copies have names that included multi-byte characters?
Are the...- 07:09 AM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- I'd like to reopen this, since there are now reports about crashes on Centos (see possible duplicate linked to this i...
- 01:43 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- In the failure that sage observed on master, I looked at pg4.7, which is stuck in creating+peering.
osd.10(mimic) ...
02/12/2020
- 11:58 PM Feature #44108 (In Progress): mon: osd: handle 2-(main-)site stretch clusters explicitly, so no a...
- People have hacked together stretch clusters on top of Ceph using 3 sites for years, or even using 2 sites and interv...
- 11:56 PM Feature #44107 (Fix Under Review): mon: produce stable election results when netsplits and other ...
- 11:53 PM Feature #44107 (Resolved): mon: produce stable election results when netsplits and other errors h...
- Right now, in netsplits and similar error conditions the monitors do not produce a stable quorum: whichever monitors ...
- 10:42 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- Sure Neha
- 10:16 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
- Brad, can you please take a look at this?
- 12:18 AM Bug #44062 (Triaged): LibRadosWatchNotify.WatchNotify failure
- /a/sage-2020-02-11_20:49:48-rados-wip-sage-testing-2020-02-11-1121-distro-basic-smithi/4755080
- 10:35 PM Bug #44004 (Can't reproduce): "ceph" command crashes
- 10:34 PM Bug #44015: Cant compile src/tools/rados/rados.cc on 32 bit systems
- Following is the explanation for why it was done....
- 04:33 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- Runs:
* http://pulpito.ceph.com/rzarzynski_bug43903,
* http://pulpito.ceph.com/rzarzynski_bug43903_more_pgnum_c... - 03:58 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- Saw something similar but on Centos 8: https://tracker.ceph.com/issues/44078.
Marking as related for now, possibly i... - 07:18 AM Feature #44025: Make it harder to set pool replica size to 1
- Deepika Upadhyay wrote:
> I assume you are talking about:
>
> > To remove a pool the mon_allow_pool_delete flag ... - 05:13 AM Feature #44025: Make it harder to set pool replica size to 1
- Greg Farnum wrote:
> Pool deletion also requires a config option to be set on the monitor before it's allowed throug... - 03:44 AM Bug #44092 (Resolved): mon: config commands do not accept whitespace style config name
- e.g....
02/11/2020
- 10:31 PM Backport #44070 (In Progress): luminous: Add builtin functionality in ceph-kvstore-tool to repair...
- https://github.com/ceph/ceph/pull/33195
- 10:28 AM Backport #44070: luminous: Add builtin functionality in ceph-kvstore-tool to repair corrupted key...
- We need backporting of PR 16745 and subsequent PRs. Refer original tracker #17730 for adding support to repair leveld...
- 03:07 AM Backport #44070 (New): luminous: Add builtin functionality in ceph-kvstore-tool to repair corrupt...
- We seems to have it in ceph-kvstore-tool as "destructive-repair" option ? Is this option does leveldb/rocksdb repair?...
- 03:01 AM Backport #44070 (Closed): luminous: Add builtin functionality in ceph-kvstore-tool to repair corr...
- 02:57 AM Backport #44070 (Resolved): luminous: Add builtin functionality in ceph-kvstore-tool to repair co...
- In some cases like ceph cluster upgrade or due to filesystem issue, the leveldb/rocksdb gets corrupted which can caus...
- 10:20 PM Bug #44089 (Fix Under Review): mon: --format=json does not work for config get or show
- In addition to the json output not working, when giving either these commands a specific key to fetch:...
- 09:58 PM Bug #38358 (Resolved): short pg log + cache tier ceph_test_rados out of order reply
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:55 PM Feature #41647 (Resolved): pg_autoscaler should show a warning if pg_num isn't a power of two
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:54 PM Bug #42346 (Resolved): Nearfull warnings are incorrect
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:54 PM Bug #42411 (Resolved): nautilus:osd: network numa affinity not supporting subnet port
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:54 PM Bug #42566 (Resolved): mgr commands fail when using non-client auth
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:53 PM Bug #42780 (Resolved): recursive lock of OpTracker::lock (70)
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:53 PM Bug #42961 (Resolved): osd: increase priority in certain OSD perf counters
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:53 PM Backport #44088 (Rejected): mimic: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- 09:53 PM Backport #44087 (Rejected): luminous: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- 09:51 PM Backport #44086 (Rejected): mimic: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- 09:51 PM Backport #44085 (Resolved): nautilus: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- https://github.com/ceph/ceph/pull/33278
- 09:51 PM Backport #44084 (Rejected): luminous: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- 09:51 PM Bug #43587 (Resolved): mon shutdown timeout (race with async compaction)
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:51 PM Bug #43592 (Resolved): osd-recovery-space.sh has a race
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:51 PM Backport #44083 (Resolved): mimic: expected MON_CLOCK_SKEW but got none
- https://github.com/ceph/ceph/pull/34370
- 09:50 PM Backport #44082 (Resolved): nautilus: expected MON_CLOCK_SKEW but got none
- https://github.com/ceph/ceph/pull/33276
- 09:50 PM Backport #44081 (Resolved): nautilus: ceph -s does not show >32bit pg states
- https://github.com/ceph/ceph/pull/33275
- 09:38 PM Backport #43256 (Resolved): nautilus: monitor config store: Deleting logging config settings does...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32846
m... - 09:37 PM Backport #43631 (Resolved): nautilus: segv in collect_sys_info
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32901
m... - 09:37 PM Backport #43473 (Resolved): nautilus: recursive lock of OpTracker::lock (70)
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32858
m... - 09:36 PM Backport #43245 (Resolved): nautilus: osd: increase priority in certain OSD perf counters
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32845
m... - 09:36 PM Backport #43726 (Resolved): nautilus: osd-recovery-space.sh has a race
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32774
m... - 04:26 PM Bug #43795: Ceph tools utilizing "global_[pre_]init" no longer process "early" environment options
- Backport also requires -https://github.com/ceph/ceph/pull/33213- https://github.com/ceph/ceph/pull/33243
- 04:03 PM Bug #43795: Ceph tools utilizing "global_[pre_]init" no longer process "early" environment options
- There will be a second fix for this issue since CLI optionals are no longer overriding the environment.
- 04:03 PM Backport #43997: nautilus: Ceph tools utilizing "global_[pre_]init" no longer process "early" env...
- There will be a second fix for this issue since CLI optionals are no longer overriding the environment.
- 04:02 PM Backport #43996: mimic: Ceph tools utilizing "global_[pre_]init" no longer process "early" enviro...
- There will be a second fix for this issue since CLI optionals are no longer overriding the environment.
- 02:28 PM Bug #44067 (Resolved): cephtool/test.sh test fails to scrub all pools
- 02:27 PM Bug #44076 (Resolved): mon: update + monmap update triggers spawn loop
- - upgrade monitors from mimic to octopus
- quorum of 2/3 monitors
- enable msgr2
then
- third monitor probes... - 09:57 AM Bug #44072 (New): Add new Bluestore OSDs to Filestore cluster leads to scrub errors (union_shard_...
- Hi,
I sat severity=Critical for attention grabbing because i think is serious problem!
We have two different Lu... - 06:18 AM Bug #43582 (Pending Backport): rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- 02:57 AM Bug #44050 (Resolved): mon tell command args don't work
- 02:34 AM Bug #43885 (Can't reproduce): failed to reach quorum size 9 before timeout expired
- This hasn't shown up in master for a while and Sridhar has also not been able to reproduce this, hence reducing prior...
- 12:50 AM Bug #44053 (Resolved): test_envlibrados_for_rocksdb.sh fails on master
- 12:50 AM Bug #44053 (Rejected): test_envlibrados_for_rocksdb.sh fails on master
- 12:49 AM Bug #43833 (Resolved): shaman on bionic/cromson: cmake error: undefined reference to `pthread_cre...
- the error message is misleading. the root cause is...
02/10/2020
- 11:14 PM Bug #43889 (Pending Backport): expected MON_CLOCK_SKEW but got none
- 02:41 PM Bug #43889 (Fix Under Review): expected MON_CLOCK_SKEW but got none
- 09:37 PM Backport #43256: nautilus: monitor config store: Deleting logging config settings does not decrea...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32846
merged - 08:44 PM Backport #43631: nautilus: segv in collect_sys_info
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32901
merged - 08:43 PM Backport #43473: nautilus: recursive lock of OpTracker::lock (70)
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32858
merged - 08:41 PM Backport #43245: nautilus: osd: increase priority in certain OSD perf counters
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32845
merged - 08:38 PM Backport #43726: nautilus: osd-recovery-space.sh has a race
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32774
merged - 06:50 PM Feature #44025: Make it harder to set pool replica size to 1
- Pool deletion also requires a config option to be set on the monitor before it's allowed through.
I think we should ... - 05:27 PM Bug #44067 (Fix Under Review): cephtool/test.sh test fails to scrub all pools
- 05:14 PM Bug #44067 (Resolved): cephtool/test.sh test fails to scrub all pools
- ...
- 02:55 PM Bug #44052 (Pending Backport): ceph -s does not show >32bit pg states
- 02:42 PM Bug #44062 (Resolved): LibRadosWatchNotify.WatchNotify failure
- ...
- 02:37 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- /a/sage-2020-02-09_21:18:03-rados-wip-sage2-testing-2020-02-09-1152-distro-basic-smithi/4749175...
- 10:37 AM Backport #42120 (Resolved): nautilus: pg_autoscaler should show a warning if pg_num isn't a power...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30689
m... - 10:37 AM Backport #43471 (Resolved): nautilus: negative num_objects can set PG_STATE_DEGRADED
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32857
m... - 10:37 AM Backport #43346 (Resolved): nautilus: short pg log + cache tier ceph_test_rados out of order reply
- 10:36 AM Backport #43319 (Resolved): nautilus: PeeringState::GoClean will call purge_strays unconditionally
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32847
m... - 10:36 AM Backport #43099 (Resolved): nautilus: nautilus:osd: network numa affinity not supporting subnet port
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32843
m... - 10:36 AM Backport #43246 (Resolved): nautilus: Nearfull warnings are incorrect
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32773
m... - 10:00 AM Backport #43650 (Resolved): nautilus: Improve upmap change reporting in logs
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32716
m... - 10:00 AM Backport #43620 (Resolved): nautilus: mon shutdown timeout (race with async compaction)
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32715
m... - 09:59 AM Backport #43783 (Resolved): nautilus: mgr commands fail when using non-client auth
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32769
m... - 09:41 AM Bug #43582 (Fix Under Review): rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- 05:19 AM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- Nathan Cutler wrote:
> But if the issue was introduced in 2008, then we'd need to backport further than nautilus...
...
02/09/2020
- 05:40 PM Bug #43889 (In Progress): expected MON_CLOCK_SKEW but got none
- 01:31 PM Bug #44053 (Fix Under Review): test_envlibrados_for_rocksdb.sh fails on master
- 01:30 PM Bug #44053 (Resolved): test_envlibrados_for_rocksdb.sh fails on master
- see https://github.com/ceph/ceph/commit/c724369010a753bd44e11a534d1f42156c4fc12d
should be fixed by https://github... - 12:45 AM Bug #42328 (Fix Under Review): osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
- 12:25 AM Bug #43903 (In Progress): osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
02/08/2020
- 09:55 PM Backport #43919 (In Progress): nautilus: osd stuck down
- 09:47 PM Backport #43916 (In Progress): nautilus: mon/PaxosService.cc: 188: FAILED ceph_assert(have_pendin...
- 09:43 PM Backport #43881 (In Progress): mimic: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- 09:42 PM Backport #43880 (In Progress): luminous: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- 09:41 PM Backport #43879 (In Progress): nautilus: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- 09:12 PM Backport #43989 (In Progress): nautilus: osd: Allow 64-char hostname to be added as the "host" in...
- 09:11 PM Backport #43988 (In Progress): luminous: osd: Allow 64-char hostname to be added as the "host" in...
- 09:10 PM Backport #43987 (In Progress): mimic: osd: Allow 64-char hostname to be added as the "host" in CRUSH
- 09:08 PM Backport #43992 (In Progress): nautilus: objecter doesn't send osd_op
- 09:05 PM Backport #43991 (In Progress): mimic: objecter doesn't send osd_op
- 06:11 PM Bug #44052 (Fix Under Review): ceph -s does not show >32bit pg states
- 06:07 PM Bug #44052 (Resolved): ceph -s does not show >32bit pg states
- ceph -s does not show newer pg states, like repair_failed
- 03:26 PM Bug #44050 (Fix Under Review): mon tell command args don't work
- 02:37 PM Bug #44050: mon tell command args don't work
- 'ceph tell mon.a help' works, but '-h' does not.
- 02:07 PM Bug #44050 (Resolved): mon tell command args don't work
- Also, 'ceph tell mon.a force-sync --yes-i-really-mean-it' seems to be broken:...
- 02:11 PM Feature #42638 (Resolved): Allow specifying pg_autoscale_mode when creating a new pool
- 01:53 PM Bug #43889: expected MON_CLOCK_SKEW but got none
- /a/sage-2020-02-07_23:51:30-rados-wip-sage2-testing-2020-02-07-1439-distro-basic-smithi/4742672
- 01:34 PM Bug #44024 (Resolved): change in utime_t rendering ('T' separator) conflicts with cache tiering h...
- 08:18 AM Bug #43885: failed to reach quorum size 9 before timeout expired
- Since I could not reproduce the issue, I analyzed logs from the original run:
/a/sage-2020-01-29_20:14:58-rados-wip-... - 01:13 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- David Zafman wrote:
> For all log entries of all OSDs at 2020-01-28 03:18 with pg[ information and osd primary these... - 12:27 AM Backport #42120: nautilus: pg_autoscaler should show a warning if pg_num isn't a power of two
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30689
merged
02/07/2020
- 10:31 PM Backport #43471: nautilus: negative num_objects can set PG_STATE_DEGRADED
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32857
merged - 10:31 PM Backport #43346: nautilus: short pg log + cache tier ceph_test_rados out of order reply
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32848
merged - 10:30 PM Backport #43319: nautilus: PeeringState::GoClean will call purge_strays unconditionally
- Nathan Cutler wrote:Reviewed-by: Neha Ojha <nojha@redhat.com>
> https://github.com/ceph/ceph/pull/32847
merged - 10:29 PM Backport #43099: nautilus: nautilus:osd: network numa affinity not supporting subnet port
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32843
merged - 10:29 PM Backport #43246: nautilus: Nearfull warnings are incorrect
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32773
merged - 10:11 PM Backport #43650: nautilus: Improve upmap change reporting in logs
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/32716
merged - 10:09 PM Backport #43620: nautilus: mon shutdown timeout (race with async compaction)
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32715
merged - 10:03 PM Backport #43783: nautilus: mgr commands fail when using non-client auth
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32769
merged - 08:34 PM Bug #44022: mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd crash
- For whatever reason we do not have complete osd logs for this, but from nojha-2020-02-06_01:27:32-upgrade:mimic-x:str...
- 05:28 PM Bug #44022: mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd crash
- ...
- 04:34 PM Bug #44041 (Fix Under Review): osd: MLease in stray state -> Crashed
- 04:03 PM Bug #44041 (Resolved): osd: MLease in stray state -> Crashed
- ...
02/06/2020
- 11:55 PM Feature #44025: Make it harder to set pool replica size to 1
- Neha Ojha wrote:
> Setting pool size to 1 is dangerous. Add an option like yes_i_really_really_mean_it, similar to w... - 11:50 PM Feature #44025 (Resolved): Make it harder to set pool replica size to 1
- Setting pool size to 1 is dangerous. Add an option like yes_i_really_really_mean_it, similar to what we have for pool...
- 11:53 PM Bug #44024 (Fix Under Review): change in utime_t rendering ('T' separator) conflicts with cache t...
- 11:26 PM Bug #44024 (Resolved): change in utime_t rendering ('T' separator) conflicts with cache tiering h...
- crash like...
- 06:15 PM Bug #44022 (Resolved): mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd ...
- The crash happens on a mimic OSD. Telemetry crash reports have been reporting similar crashes in 14.2.4(may or may no...
- 12:50 PM Bug #44015 (New): Cant compile src/tools/rados/rados.cc on 32 bit systems
- On my machine size_t is unsigned int. This causes an overflow in src/tools/rados/rados.cc:776: max_obj_len = 5ull * 1...
- 04:48 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
For all log entries of all OSDs at 2020-01-28 03:18 with pg[ information and osd primary these are the log lines th...- 03:26 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- From mgr.x's log after the last time pg_stats are received we see ...
- 03:42 AM Bug #44004: "ceph" command crashes
- not reproducible in my testbed.
- 03:36 AM Bug #44004: "ceph" command crashes
- Sometimes the "ceph" command fails with a segmentation fault, here is the core_backtrace. It seems that it has someth...
- 02:59 AM Bug #44004 (Can't reproduce): "ceph" command crashes
- On the most recent master, after building, I ran the command "./bin/ceph -s --connect-timeout 1 -c /home/xuxuehan/cep...
- 01:37 AM Feature #42638 (Fix Under Review): Allow specifying pg_autoscale_mode when creating a new pool
02/05/2020
- 10:20 PM Bug #43893: lingering osd_failure ops (due to failure_info holding references?)
- Hmm that prepare_failure() does look like it's behaving a little differently than some of the regular op flow; we mus...
- 09:27 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- Logs: /a/dzafman-2020-01-27_22:00:09-upgrade:mimic-x-master-distro-basic-smithi/4712686
There is more than one bu... - 07:43 PM Backport #43997 (Resolved): nautilus: Ceph tools utilizing "global_[pre_]init" no longer process ...
- https://github.com/ceph/ceph/pull/33261
- 07:43 PM Backport #43996 (Rejected): mimic: Ceph tools utilizing "global_[pre_]init" no longer process "ea...
- 07:42 PM Backport #43992 (Rejected): nautilus: objecter doesn't send osd_op
- 07:42 PM Backport #43991 (Rejected): mimic: objecter doesn't send osd_op
- 07:42 PM Backport #43989 (Resolved): nautilus: osd: Allow 64-char hostname to be added as the "host" in CRUSH
- https://github.com/ceph/ceph/pull/33147
- 07:42 PM Backport #43988 (Rejected): luminous: osd: Allow 64-char hostname to be added as the "host" in CRUSH
- https://github.com/ceph/ceph/pull/33146
- 07:42 PM Backport #43987 (Resolved): mimic: osd: Allow 64-char hostname to be added as the "host" in CRUSH
- https://github.com/ceph/ceph/pull/33145
- 05:39 PM Bug #42347 (Won't Fix): nautilus assert during osd shutdown: FAILED ceph_assert((sharded_in_fligh...
- we've backported the osd fast shutdown ( https://github.com/ceph/ceph/pull/32743 ), so this will effectively go away ...
- 01:39 PM Bug #43975: Slow Requests/OP's types not getting logged
- - Types - src/osd/OpRequest.h...
- 12:54 PM Bug #43975 (Resolved): Slow Requests/OP's types not getting logged
- - From ceph.log...
02/04/2020
- 11:46 AM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- -The problem is not only about heap corruption. Stacks are affected as well. Moreover, there is an interesting corrup...
- 03:28 AM Bug #43813 (Pending Backport): objecter doesn't send osd_op
02/03/2020
- 09:49 PM Bug #43954 (New): Issue health warning or error if MON or OSD daemons are holding onto excessive ...
- 08:35 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- `Thread 63 (Thread 0x7f2e36318700 (LWP 55988))` is poisoned as well....
- 08:00 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- It looks that a freshly heap-allocated `OSDMap` instance got corrupted:...
- 02:35 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- It looks the entire `PGTempMap::data` has been corrupted:...
- 01:12 PM Bug #43948 (New): Remapped PGs are sometimes not deleted from previous OSDs
- I noticed on several clusters (all Nautilus 14.2.6) that on occasion, some OSDs may still hold data for some PGs long...
02/01/2020
- 04:53 PM Bug #43861: ceph_test_rados_watch_notify hang
- same?
/a/sage-2020-02-01_03:27:35-rados-wip-sage-testing-2020-01-31-1746-distro-basic-smithi/4723146
ceph_test_wa...
01/31/2020
- 11:31 PM Bug #43795 (Pending Backport): Ceph tools utilizing "global_[pre_]init" no longer process "early"...
- 11:09 PM Bug #43185: ceph -s not showing client activity
- Can you grab a wallclock profiler dump from the mgr process when its usage goes to 100%?
Learn more about how to use... - 06:41 AM Bug #43185: ceph -s not showing client activity
- strace for the hanging mgr thread...
- 06:37 AM Bug #43185: ceph -s not showing client activity
- There's almost no load apart from scrubbing, like this is pretty average io:
client: 20 MiB/s rd, 61 MiB/s w... - 10:34 PM Bug #43365 (Closed): Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signe...
- FWIW teh two clusters reporting this crash via telemetry are both ubuntu 18.04
closing this as not a ceph issue; l... - 06:02 PM Bug #43813 (Fix Under Review): objecter doesn't send osd_op
- 03:50 AM Bug #43813 (In Progress): objecter doesn't send osd_op
- 03:46 AM Bug #43813: objecter doesn't send osd_op
- /a/sage-2020-01-30_22:27:29-rados-wip-sage-testing-2020-01-30-1230-distro-basic-smithi/4719487...
- 05:24 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
- Only happens when upgrading from mimic to nautilus, see https://tracker.ceph.com/issues/43048#note-7.
- 12:23 PM Bug #43885: failed to reach quorum size 9 before timeout expired
- Update: Tried running the test a few times but haven't been able to reproduce it. I will continue my attempts. In the...
- 06:37 AM Bug #43885: failed to reach quorum size 9 before timeout expired
- There does not appear to be a crash in this case, but there is an election that seems to take a long time followed by...
- 10:24 AM Bug #43929 (Pending Backport): osd: Allow 64-char hostname to be added as the "host" in CRUSH
- 10:16 AM Bug #43929 (Resolved): osd: Allow 64-char hostname to be added as the "host" in CRUSH
- On Linux system it is possible to set 64 character length hostname when
HOST_NAME_MAX is set to 64. It means that if... - 09:46 AM Backport #43928 (In Progress): nautilus: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- 09:43 AM Backport #43928 (Resolved): nautilus: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- https://github.com/ceph/ceph/pull/33007
- 09:43 AM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- But if the issue was introduced in 2008, then we'd need to backport further than nautilus...
- 09:42 AM Bug #42977 (Pending Backport): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- Adding nautilus backport per Greg's comment "looking at the nautilus code it is susceptible to this too."
- 03:56 AM Bug #42977 (Resolved): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- 01:33 AM Bug #37875: osdmaps aren't being cleaned up automatically on healthy cluster
- https://github.com/ceph/ceph/pull/19076 is a possible solution to this issue.
01/30/2020
- 11:03 PM Bug #43602 (Won't Fix): Core dumps not collected in standalone tests for distros using systemd-co...
- The real fix done elsewhere is to configure the core location so that systemd-coredump is not used. It isn't worth t...
- 04:43 PM Bug #43602 (Fix Under Review): Core dumps not collected in standalone tests for distros using sys...
- 04:43 PM Bug #43602 (Resolved): Core dumps not collected in standalone tests for distros using systemd-cor...
- 08:30 PM Backport #43651 (Resolved): luminous: Improve upmap change reporting in logs
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32666
m... - 08:29 PM Backport #43651 (In Progress): luminous: Improve upmap change reporting in logs
- 07:40 PM Backport #43919 (Resolved): nautilus: osd stuck down
- https://github.com/ceph/ceph/pull/35024
- 07:39 PM Backport #43916 (Resolved): nautilus: mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) ...
- https://github.com/ceph/ceph/pull/33155
- 04:46 PM Bug #43864 (Resolved): osd/repro_long_log.sh failure
- Should have been fixed by https://github.com/ceph/ceph/pull/32945.
- 04:17 PM Bug #43864: osd/repro_long_log.sh failure
- /a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718221
- 04:41 PM Bug #43889: expected MON_CLOCK_SKEW but got none
- /a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718332
- 04:16 PM Bug #43889: expected MON_CLOCK_SKEW but got none
- /a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718133
- 04:40 PM Bug #43915 (New): leaked Session (alloc from OSD::ms_handle_authentication)
- ...
- 04:37 PM Bug #43914 (Need More Info): nautilus: ceph tell command times out
- see https://github.com/ceph/ceph/pull/32989
- 04:35 PM Bug #43914 (Resolved): nautilus: ceph tell command times out
- ...
- 04:17 PM Bug #43885: failed to reach quorum size 9 before timeout expired
- /a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718154
description: rados/... - 03:19 PM Feature #43910 (New): Utilize new Linux kernel v5.6 prctl PR_SET_IO_FLUSHER option
- See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d19f1c8e1937baf74e1962aae9f90fa3ae...
- 02:51 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- the second time,...
- 02:50 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- if i start the osd manually, i can reproduce the same crash:...
- 02:48 PM Bug #43903 (Resolved): osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
- ...
- 12:37 PM Bug #42977 (Fix Under Review): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- Hmm this not-returning issue seems to date from 2008 (3859475bbfafb8754841af41044cb41124e87fc7); I'm not sure why it'...
- 10:42 AM Bug #42977 (In Progress): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- Yep looks like something went horribly wrong in refactoring — we correctly call the new election on receiving an old ...
- 09:57 AM Documentation #43896 (Resolved): nautilus upgrade should recommend ceph-osd restarts after enabli...
- Following an upgrade to nautilus and `ceph mon enable-msgr2`, running nautilus osds will not yet bind to their v2 add...
- 07:22 AM Bug #43893: lingering osd_failure ops (due to failure_info holding references?)
- > We can clear that slow op either by restarting mon.cepherin-mon-7cb9b591e1 or with `ceph osd fail osd.170`.
too ... - 07:21 AM Bug #43893 (Duplicate): lingering osd_failure ops (due to failure_info holding references?)
- On Nautilus v14.2.6 we see osd_failure ops which linger:...
- 04:11 AM Bug #43892 (Pending Backport): mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during ...
01/29/2020
- 11:18 PM Bug #43892 (Fix Under Review): mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during ...
- 11:15 PM Bug #43892 (Resolved): mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during n->o upg...
- ...
- 10:07 PM Bug #43885: failed to reach quorum size 9 before timeout expired
- I wonder if this is somehow related to the election issue we saw in https://tracker.ceph.com/issues/42977. Seems to b...
- 01:14 PM Bug #43885 (Can't reproduce): failed to reach quorum size 9 before timeout expired
- This pops up occasionally. Here is a recent one:...
- 09:15 PM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- I think defer() is called by mon.e in receive_propose() because of the following...
- 07:39 PM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- on mon.g (3), the epoch is 55 (or looks that way, it just sent these):...
- 12:41 AM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- Let's see what happened in /a/sage-2020-01-24_01:55:08-rados-wip-sage4-testing-2020-01-23-1347-distro-basic-smithi/46...
- 07:19 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
I'm seeing a lot of this in a sample of log segments from osd.6 which is reporting the slow ops. The log for osd.6...- 03:55 PM Bug #43882 (Need More Info): osd to mon connection lost, osd stuck down
- adding debug: https://github.com/ceph/ceph/pull/32968
- 01:06 PM Bug #43882 (Can't reproduce): osd to mon connection lost, osd stuck down
- This is a similar symptom to #43825, but it does not appear to be related to split/merge.
OSD is marked down, but ... - 01:45 PM Bug #43889 (Resolved): expected MON_CLOCK_SKEW but got none
- description: rados/multimon/{clusters/6.yaml msgr-failures/many.yaml msgr/async.yaml
no_pools.yaml objectstore... - 01:44 PM Bug #43888: osd/osd-bench.sh 'tell osd.N bench' hang
- https://github.com/ceph/ceph/pull/32961 to debug
- 01:41 PM Bug #43888 (Resolved): osd/osd-bench.sh 'tell osd.N bench' hang
- ...
- 01:36 PM Bug #43887 (Resolved): ceph_test_rados_delete_pools_parallel failure
- ...
- 01:23 PM Bug #43825 (Pending Backport): osd stuck down
- 10:03 AM Backport #43881 (Resolved): mimic: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- https://github.com/ceph/ceph/pull/33154
- 10:03 AM Backport #43880 (Rejected): luminous: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- https://github.com/ceph/ceph/pull/33153
- 10:03 AM Backport #43879 (Resolved): nautilus: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- https://github.com/ceph/ceph/pull/33152
01/28/2020
- 11:22 PM Bug #43864 (In Progress): osd/repro_long_log.sh failure
- 08:03 PM Bug #43864 (Resolved): osd/repro_long_log.sh failure
- ...
- 08:44 PM Bug #43865: osd-scrub-test.sh fails date check
- This looks like a case where the sleep time wasn't sufficient. The previous run had set 2 days and the next test swi...
- 08:07 PM Bug #43865 (Resolved): osd-scrub-test.sh fails date check
- ...
- 08:08 PM Bug #38345 (Pending Backport): mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- 08:07 PM Bug #43826 (Resolved): osd: leak of from send_lease
- 07:59 PM Bug #43862 (Can't reproduce): mkfs fsck found fatal error: (2) No such file or directory during c...
- ...
- 07:45 PM Bug #43861: ceph_test_rados_watch_notify hang
- /a/sage-2020-01-28_03:52:05-rados-wip-sage2-testing-2020-01-27-1839-distro-basic-smithi/4713217
- 07:43 PM Bug #43861 (Resolved): ceph_test_rados_watch_notify hang
- ...
- 07:34 PM Bug #43825 (Fix Under Review): osd stuck down
- 07:27 PM Bug #43825 (In Progress): osd stuck down
- we are splitting:...
- 06:59 PM Bug #43825: osd stuck down
- 2020-01-28T14:56:26.155+0000 7fd3ba08d700 20 osd.6 285 identify_splits_and_merges 1.5 e245 to e285 pg_nums {76=28,89=...
- 06:39 PM Bug #43825: osd stuck down
- ...
- 07:24 PM Bug #43185: ceph -s not showing client activity
- Are you observing any client activity in the cluster logs when "ceph -s" isn't reporting them?
It is sometimes poss... - 06:27 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
The master branch passed, but my nautilus run hit the same issue:
http://pulpito.ceph.com/dzafman-2020-01-27_21:...- 10:42 AM Backport #43852 (Resolved): nautilus: osd-scrub-snaps.sh fails
- https://github.com/ceph/ceph/pull/33274
- 09:40 AM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- Just an update on my side:
After upgrading our monitor Ubuntu 18.04 packages (apt-get upgrade) with the 5.3.0-26-g...
01/27/2020
- 09:00 PM Bug #43150 (Pending Backport): osd-scrub-snaps.sh fails
- 05:05 PM Bug #43807 (Resolved): osd-backfill-recovery-log.sh fails
- 04:37 PM Bug #43810 (Resolved): all/recovery_preemption.yaml hang with down pgs
- 01:41 PM Bug #43810 (Fix Under Review): all/recovery_preemption.yaml hang with down pgs
- 04:02 PM Backport #43821 (In Progress): nautilus: nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_wi...
- 03:57 PM Bug #43656: AssertionError: not all PGs are active or peered 15 seconds after marking out OSDs
- Hi Sage:
This issue appears to have been introduced by https://github.com/ceph/ceph/pull/17619 - a major octopus ... - 03:56 PM Backport #43776 (Need More Info): nautilus: AssertionError: not all PGs are active or peered 15 s...
- The master PR appears to be fixing an issue introduced by https://github.com/ceph/ceph/pull/17619 - a major octopus f...
- 03:28 PM Backport #43772 (In Progress): nautilus: qa/standalone/misc/ok-to-stop.sh occasionally fails
- 03:23 PM Backport #43731 (In Progress): nautilus: mon crash in OSDMap::_pg_to_raw_osds from update_pending...
- 02:41 PM Backport #43630 (In Progress): mimic: segv in collect_sys_info
- 02:37 PM Backport #43631 (In Progress): nautilus: segv in collect_sys_info
- 01:26 PM Bug #43826 (Fix Under Review): osd: leak of from send_lease
- 12:57 PM Backport #43822: nautilus: Ceph assimilate-conf results in config entries which can not be removed
- https://github.com/ceph/ceph/pull/32856
- 12:55 PM Backport #43822 (In Progress): nautilus: Ceph assimilate-conf results in config entries which can...
- 12:50 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
- We also have a problem:
{
"os_version_id": "10",
"assert_condition": "z >= signedspan::zero()",
"... - 11:58 AM Bug #43833 (Resolved): shaman on bionic/cromson: cmake error: undefined reference to `pthread_cre...
- I'm getting this with the current master on shaman:...
01/26/2020
- 05:20 PM Bug #43826 (Resolved): osd: leak of from send_lease
- ...
- 05:18 PM Bug #43807: osd-backfill-recovery-log.sh fails
- //a/sage-2020-01-24_23:29:53-rados-wip-sage2-testing-2020-01-24-1408-distro-basic-smithi/4703160
- 05:13 PM Bug #43825 (Need More Info): osd stuck down
- https://github.com/ceph/ceph/pull/32885 to debug
- 05:11 PM Bug #43825: osd stuck down
- huh, also /a/sage-2020-01-24_23:29:53-rados-wip-sage2-testing-2020-01-24-1408-distro-basic-smithi/4703159 osd.7
- 05:07 PM Bug #43825 (Resolved): osd stuck down
- osd stuck at epoch 99, cluster at 2000 or something.
monc fails to reconnect to the mon
/a/sage-2020-01-24_23:2... - 04:33 PM Bug #43810: all/recovery_preemption.yaml hang with down pgs
- /a/sage-2020-01-24_23:29:53-rados-wip-sage2-testing-2020-01-24-1408-distro-basic-smithi/4702992
- 10:40 AM Backport #43822 (Resolved): nautilus: Ceph assimilate-conf results in config entries which can no...
- https://github.com/ceph/ceph/pull/32856
- 10:40 AM Backport #43821 (Resolved): nautilus: nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_...
- https://github.com/ceph/ceph/pull/32908
- 03:54 AM Bug #43552 (Pending Backport): nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_random_osd
- 03:19 AM Bug #43653: test-crash.yaml produce cores
- ...
Also available in: Atom