Project

General

Profile

Activity

From 01/28/2020 to 02/26/2020

02/26/2020

11:57 PM Backport #43472: mimic: negative num_objects can set PG_STATE_DEGRADED
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33331
merged
Yuri Weinstein
11:56 PM Backport #43320: mimic: PeeringState::GoClean will call purge_strays unconditionally
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33329
merged
Yuri Weinstein
11:55 PM Backport #42998: mimic: acting_recovery_backfill won't catch all up peers
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33324
merged
Yuri Weinstein
11:55 PM Backport #42852: mimic: format error: ceph osd stat --format=json
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33322
merged
Yuri Weinstein
11:54 PM Backport #43881: mimic: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33154
merged
Yuri Weinstein
11:52 PM Backport #43881: mimic: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33154
merged
Yuri Weinstein
11:51 PM Backport #43987: mimic: osd: Allow 64-char hostname to be added as the "host" in CRUSH
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33145
merged
Yuri Weinstein
11:51 PM Backport #43652: mimic: Improve upmap change reporting in logs
David Zafman wrote:
> https://github.com/ceph/ceph/pull/32717
merged
Yuri Weinstein
11:49 PM Backport #40890: mimic: Pool settings aren't populated to OSD after restart.
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32125
merged
Yuri Weinstein
11:47 PM Backport #42879: mimic: ceph_test_admin_socket_output fails in rados qa suite
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33323
merged
Yuri Weinstein
11:44 PM Backport #43630: mimic: segv in collect_sys_info
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32902
merged
Yuri Weinstein
11:08 PM Bug #44311: crash in Objecter and CRUSH map lookup
Mahati Chamarthy wrote:
> Neha Ojha wrote:
> > Which version is this on?
>
Current master. To reproduce set rbd_...
Mahati Chamarthy
10:13 PM Bug #44311: crash in Objecter and CRUSH map lookup
Neha Ojha wrote:
> Which version is this on?
Current master
Mahati Chamarthy
10:09 PM Bug #44311 (Need More Info): crash in Objecter and CRUSH map lookup
Which version is this on? Neha Ojha
05:45 PM Bug #44311 (Resolved): crash in Objecter and CRUSH map lookup
When Concurrent reads are issued with the below rbd command, it results in failure due to crash in Objecter and CRUSH... Mahati Chamarthy
09:54 PM Bug #44296 (In Progress): qa/standalone/mgr/balancer.sh fails due to test error and not waiting f...
David Zafman
09:50 PM Feature #44107: mon: produce stable election results when netsplits and other errors happen
Marking anything we need for octopus as "Urgent". Neha Ojha
08:51 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
... Sage Weil
08:48 PM Bug #44314 (Resolved): osd-backfill-stats.sh failing intermittently in TEST_backfill_sizeup_out()...
... Sage Weil
07:47 PM Bug #43914 (Fix Under Review): nautilus: ceph tell command times out
Sage Weil
06:48 PM Bug #43914: nautilus: ceph tell command times out
okay yeah, it's because the command wq uses osd_lock... Sage Weil
06:41 PM Bug #43914: nautilus: ceph tell command times out
so, this was fixed in nautilus, in the sense that https://github.com/ceph/ceph/pull/27696 went into nautilus.
Sage Weil
06:37 PM Bug #43914: nautilus: ceph tell command times out
The thread (or lock?) is busy with... Sage Weil
05:13 PM Bug #43914: nautilus: ceph tell command times out
This run has more relavant information: /a/nojha-2020-02-26_03:20:34-upgrade:mimic-x:stress-split-nautilus-distro-bas... Neha Ojha
07:33 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
follow-up fix: https://github.com/ceph/ceph/pull/33559 (typo in original commit) Sage Weil
05:25 PM Bug #41183: pg autoscale on EC pools
Looks like a fix is going in: https://github.com/ceph/ceph/pull/33170 Brian Koebbe
04:54 PM Bug #41183: pg autoscale on EC pools
Seem to have the same issue here.
158 OSDs with 1 main pool, an EC 5+2 pool with a 2048 pg_num, but the autoscaler...
Brian Koebbe
02:28 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
We're seeing this a couple times a day on debian 10.1, using croit's repo:
kernel 4.19.67-2+deb10u1
ceph version 14...
Edwin Pers
10:54 AM Cleanup #44309 (New): auth: remove deprecated 'auid' field from pool metadata
As per https://github.com/ceph/ceph/pull/23540#issuecomment-413589557, 'auid' field was deprecated but never removed ... Ernesto Puerta
12:09 AM Bug #44297 (Fix Under Review): mon/Monitor.cc: 3924: FAILED ceph_assert(!"send_message on anonymo...
Sage Weil
12:02 AM Bug #44297: mon/Monitor.cc: 3924: FAILED ceph_assert(!"send_message on anonymous connection")
The command is passed from a nautilus monitor:... Sage Weil

02/25/2020

11:51 PM Bug #44275 (Resolved): NameError: name 'retval' is not defined
Sage Weil
11:50 PM Bug #44248 (Pending Backport): Receiving RemoteBackfillReserved in WaitLocalBackfillReserved can ...
Sage Weil
03:16 AM Bug #44248 (Fix Under Review): Receiving RemoteBackfillReserved in WaitLocalBackfillReserved can ...
Neha Ojha
02:37 AM Bug #44248: Receiving RemoteBackfillReserved in WaitLocalBackfillReserved can cause the osd to crash
The problem is that though osd.1 sent a RELEASE to osd.8, we still ended up de-queueing "4184 RemoteBackfillReserved"... Neha Ojha
12:49 AM Bug #44248: Receiving RemoteBackfillReserved in WaitLocalBackfillReserved can cause the osd to crash
This is when 4184 RemoteBackfillReserved was enqueued... Neha Ojha
11:43 PM Bug #44297 (Resolved): mon/Monitor.cc: 3924: FAILED ceph_assert(!"send_message on anonymous conne...
on nautilus->octopus/master upgrade... Sage Weil
11:39 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
... Sage Weil
11:32 PM Bug #44296 (Resolved): qa/standalone/mgr/balancer.sh fails due to test error and not waiting for ...

http://pulpito.ceph.com/dzafman-2020-02-08_20:24:49-rados-wip-zafman-testing-distro-basic-smithi/4746333
With 2 ...
David Zafman
09:36 PM Bug #43914: nautilus: ceph tell command times out
First observation from teuthology.log for /a/nojha-2020-02-21_20:34:10-upgrade:mimic-x:stress-split-nautilus-distro-b... Neha Ojha
06:34 PM Bug #38219: rebuild-mondb hangs
Seen in nautilus: /a/yuriw-2020-02-15_16:49:25-rados-nautilus-distro-basic-smithi/4767419/ Neha Ojha
04:40 PM Backport #43650: nautilus: Improve upmap change reporting in logs
250a778fe8bd6eadf16fa1988403e0410c528543 will be in v14.2.8 Ken Dreyer
03:46 PM Backport #44289 (Resolved): nautilus: mon: update + monmap update triggers spawn loop
https://github.com/ceph/ceph/pull/34500 Nathan Cutler
02:28 PM Backport #44206 (In Progress): nautilus: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempM...
Nathan Cutler
02:23 PM Bug #44286 (New): Cache tiering shows unfound objects after OSD reboots
We've got a cluster with a 3/2 size/min_size replicated cache pool in front of an erasure coded pool used for RBD.
...
Paul Emmerich
01:11 AM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
Might be a stretch, but I just noticed that our bits are flipped nearby the 128k boundary, which is ?coincidentally? ... Dan van der Ster

02/24/2020

10:38 PM Bug #24835 (Can't reproduce): osd daemon spontaneous segfault
Brad Hubbard
07:42 PM Bug #44076 (Pending Backport): mon: update + monmap update triggers spawn loop
Sage Weil
07:36 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
https://github.com/ceph/ceph/pull/33470 - fixing the order of msgr2 vs nautilus install is the first step here. Neha Ojha
05:48 PM Bug #44248: Receiving RemoteBackfillReserved in WaitLocalBackfillReserved can cause the osd to crash
... Neha Ojha
04:25 PM Bug #44275 (Fix Under Review): NameError: name 'retval' is not defined
Varsha Rao
04:17 PM Bug #44275 (Resolved): NameError: name 'retval' is not defined
... Varsha Rao
03:50 PM Bug #42830: problem returning mon to cluster
I noticed there is very little osdmap caching in the leader mon -- here we see only 1 single osdmap in the mempool.
...
Dan van der Ster
05:45 AM Backport #44259 (In Progress): nautilus: Slow Requests/OP's types not getting logged
Sridhar Seshasayee
05:03 AM Backport #44259 (Resolved): nautilus: Slow Requests/OP's types not getting logged
https://github.com/ceph/ceph/pull/33503 Sridhar Seshasayee
05:24 AM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
More ftr: the corruption occurs in the crush part of the osdmap:... Dan van der Ster
05:16 AM Bug #43975 (Pending Backport): Slow Requests/OP's types not getting logged
Sridhar Seshasayee

02/23/2020

10:08 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
Likely related.... Brad Hubbard
09:29 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
Adding crash signature (cf2864eb1281dffc3340730dc2caae163b4c0170132bcbd3dcbd6147d8f29fa8) for the crash described in ... Brad Hubbard
09:05 PM Bug #43861: ceph_test_rados_watch_notify hang
... Sage Weil
02:29 PM Bug #41313: PG distribution completely messed up since Nautilus
... Anonymous
12:13 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
A bit more about our incident ftr.
The cluster has 1301 osds in total: 752 filestore and 549 bluestore. The filest...
Dan van der Ster

02/22/2020

04:47 PM Bug #44248 (Resolved): Receiving RemoteBackfillReserved in WaitLocalBackfillReserved can cause th...
... Neha Ojha
01:25 PM Backport #44206: nautilus: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
Started a backport here https://github.com/ceph/ceph/pull/33483 Dan van der Ster
09:49 AM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
> o->decode(obl); <------ HERE
I have gdb working now on a coredump so can confirm that:...
Dan van der Ster
01:00 AM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
^^ Is a weird red-herring. The FFFFFFFF is because the osdmap contains the crc32c in the last 4 bytes, so that cancel... Dan van der Ster
01:23 AM Bug #43914: nautilus: ceph tell command times out
This is on nautilus: /a/nojha-2020-02-21_20:34:10-upgrade:mimic-x:stress-split-nautilus-distro-basic-smithi/4788575/
...
Neha Ojha
01:14 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
/a/sage-2020-02-21_21:08:33-rados-wip-sage3-testing-2020-02-21-1218-distro-basic-smithi/4788714... Sage Weil

02/21/2020

10:48 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
Found something. The crc32c for all my *good* maps is FFFFFFFF (and I assure you they are different maps.. gsutil out... Dan van der Ster
10:16 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
Just to provide the same update I gave to Dan van der Ster over email:
IIRC, we saw this 1-2 times more after the ...
Erik Lindahl
09:42 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
This is continuing to happen for us. Log file here.
ceph-post-file: 589aa7aa-7a80-49a2-ba55-376e467c4550
Troy Ablan
10:19 PM Bug #42830: problem returning mon to cluster
Seeing the same here in 13.2.8 starting a new empty mon. Leader's CPU goes to 100%, until an election is called then ... Dan van der Ster
09:03 PM Bug #44243 (Can't reproduce): memstore make check test fails
... Sage Weil
01:29 PM Bug #42328 (New): osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
It looks like this is still occurring even with a branch that included 8182f52149: http://qa-proxy.ceph.com/teutholo... Jason Dillaman
01:21 PM Bug #42347: nautilus assert during osd shutdown: FAILED ceph_assert((sharded_in_flight_list.back(...
Bastian Mäuser wrote:
> This is still an issue on 14.2.6 (at least the one shipped with proxmox)
It will appear i...
Sage Weil
12:49 AM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
FTR this looks identical to https://tracker.ceph.com/issues/39525#note-6 Dan van der Ster
12:25 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
So the timeout, as previously mentioned, was 10 seconds although osd_default_notify_timeout is 30 seconds by default.... Brad Hubbard

02/20/2020

07:02 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
ok, the first crash isn't becuase we just got bad data.. it's because we just read bad data off of disk. see:... Sage Weil
04:09 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
Notes from CERN incident:
- identical corruption, different OSDmaps on different OSDs:...
Sage Weil
05:40 PM Bug #44229 (New): monclient: _check_auth_rotating possible clock skew, rotating keys expired way ...
seems to affect cephadm bootstrap tests
first, the error message doesn't make sense, since the bound 2020-02-20T16...
Sage Weil
12:20 PM Bug #44184: Slow / Hanging Ops after pool creation
Neha Ojha wrote:
> Hi Wido,
>
> I did come across something like this while investigating https://tracker.ceph.co...
Wido den Hollander
12:42 AM Bug #44217 (Can't reproduce): Leaked connection (alloc from AsyncMessenger::add_accept)
... Brad Hubbard

02/19/2020

11:42 PM Bug #44076 (Fix Under Review): mon: update + monmap update triggers spawn loop
Sage Weil
10:45 PM Bug #44157 (Resolved): cli throws bad exceptoin on control-c
Sage Weil
10:11 PM Bug #44120 (Need More Info): NVMEDevice failed in certain NVMe Disk
Can you attach logs from the crash? Which version are using? Neha Ojha
10:08 PM Bug #44184 (Need More Info): Slow / Hanging Ops after pool creation
Hi Wido,
I did come across something like this while investigating https://tracker.ceph.com/issues/43048. It was a...
Neha Ojha
07:18 PM Bug #44184: Slow / Hanging Ops after pool creation
On the Ceph users list there are multiple reports of people experiencing this:
- https://www.spinics.net/lists/cep...
Wido den Hollander
04:55 PM Bug #37656 (New): FileStore::_do_transaction() crashed with error 17 (merge collection vs osd res...
/a/teuthology-2020-02-11_02:30:03-upgrade:mimic-x-nautilus-distro-basic-smithi/4753470/
upgrade:mimic-x/stress-spl...
Neha Ojha
11:00 AM Bug #43151 (Resolved): ok-to-stop incorrect for some ec pgs
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
11:00 AM Bug #43721 (Resolved): qa/standalone/misc/ok-to-stop.sh occasionally fails
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
10:59 AM Backport #44206 (Resolved): nautilus: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap:...
https://github.com/ceph/ceph/pull/33530 Nathan Cutler

02/18/2020

07:55 PM Bug #43903 (Pending Backport): osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
Sage Weil
07:52 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
... Sage Weil
04:43 PM Bug #44184 (Need More Info): Slow / Hanging Ops after pool creation
On a cluster with 1405 OSDs I've ran into a situation for the second time now where a pool creation resulted into mas... Wido den Hollander
10:28 AM Backport #44085 (Resolved): nautilus: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33278
m...
Nathan Cutler
10:28 AM Backport #44082 (Resolved): nautilus: expected MON_CLOCK_SKEW but got none
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33276
m...
Nathan Cutler
10:27 AM Backport #43772 (Resolved): nautilus: qa/standalone/misc/ok-to-stop.sh occasionally fails
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32844
m...
Nathan Cutler
10:27 AM Backport #43239 (Resolved): nautilus: ok-to-stop incorrect for some ec pgs
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32844
m...
Nathan Cutler

02/17/2020

11:45 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
/ceph/teuthology-archive/pdonnell-2020-02-15_16:51:06-fs-wip-pdonnell-testing-20200215.033325-distro-basic-smithi/476... Patrick Donnelly
07:08 PM Backport #42662 (In Progress): nautilus:Issue a HEALTH_WARN when a Pool is configured with [min_]...
Nathan Cutler
02:16 PM Bug #42347: nautilus assert during osd shutdown: FAILED ceph_assert((sharded_in_flight_list.back(...
This is still an issue on 14.2.6 (at least the one shipped with proxmox) Bastian Mäuser

02/16/2020

03:30 PM Bug #44156 (Resolved): RenewLease sent to pre-octopus osds during upgrade
Sage Weil

02/15/2020

03:11 PM Bug #44157 (Fix Under Review): cli throws bad exceptoin on control-c
Sage Weil
02:37 PM Bug #44041 (Resolved): osd: MLease in stray state -> Crashed
Sage Weil
02:37 PM Bug #42328 (Resolved): osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
Sage Weil
02:36 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
/a/sage-2020-02-15_04:59:38-rados-wip-sage3-testing-2020-02-14-1951-distro-basic-smithi/4765960 Sage Weil
02:56 AM Bug #43975: Slow Requests/OP's types not getting logged
Before and after logs to show the extra information relating to slow op/types:
Before:
--------...
Sridhar Seshasayee

02/14/2020

09:58 PM Bug #43975 (Fix Under Review): Slow Requests/OP's types not getting logged
Neha Ojha
08:22 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
Neha Ojha wrote:
> pg3.4, which is stuck in "peering" shows similar behavior as https://tracker.ceph.com/issues/4304...
Neha Ojha
03:28 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
pg3.4, which is stuck in "peering" shows similar behavior as https://tracker.ceph.com/issues/43048#note-15
osd.10 ...
Neha Ojha
07:24 PM Bug #44156 (Fix Under Review): RenewLease sent to pre-octopus osds during upgrade
Neha Ojha
05:20 PM Bug #44156 (Resolved): RenewLease sent to pre-octopus osds during upgrade
... Neha Ojha
05:35 PM Bug #44157 (Resolved): cli throws bad exceptoin on control-c
... Sage Weil
05:23 PM Backport #44085: nautilus: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33278
merged
Yuri Weinstein
05:23 PM Backport #44082: nautilus: expected MON_CLOCK_SKEW but got none
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33276
merged
Yuri Weinstein
05:22 PM Backport #43772: nautilus: qa/standalone/misc/ok-to-stop.sh occasionally fails
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32844
merged
Yuri Weinstein
05:22 PM Backport #43239: nautilus: ok-to-stop incorrect for some ec pgs
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32844
merged
Yuri Weinstein
02:48 PM Backport #43996 (Need More Info): mimic: Ceph tools utilizing "global_[pre_]init" no longer proce...
should be based on the nautilus backport Nathan Cutler
02:37 PM Backport #42662 (New): nautilus:Issue a HEALTH_WARN when a Pool is configured with [min_]size == 1
Changed status to re-attempt the backport. Sridhar Seshasayee
02:23 PM Backport #43621: luminous: pg: fastinfo incorrect when last_update moves backward in time
nautilus backport is marked non-trivial, so this one is also non-trivial Nathan Cutler
02:23 PM Backport #43622 (Need More Info): mimic: pg: fastinfo incorrect when last_update moves backward i...
nautilus backport is marked non-trivial, so this one is also non-trivial Nathan Cutler
02:21 PM Backport #43472 (In Progress): mimic: negative num_objects can set PG_STATE_DEGRADED
Nathan Cutler
02:19 PM Backport #43470 (In Progress): mimic: asynchronous recovery + backfill might spin pg undersized f...
Nathan Cutler
02:18 PM Backport #43320 (In Progress): mimic: PeeringState::GoClean will call purge_strays unconditionally
Nathan Cutler
12:56 PM Backport #43257 (In Progress): mimic: monitor config store: Deleting logging config settings does...
Nathan Cutler
12:50 PM Backport #42996 (In Progress): luminous: acting_recovery_backfill won't catch all up peers
Nathan Cutler
12:40 PM Backport #42998 (In Progress): mimic: acting_recovery_backfill won't catch all up peers
Nathan Cutler
12:28 PM Backport #42879 (In Progress): mimic: ceph_test_admin_socket_output fails in rados qa suite
Nathan Cutler
12:26 PM Backport #42852 (In Progress): mimic: format error: ceph osd stat --format=json
Nathan Cutler
09:29 AM Bug #43296 (Resolved): Ceph assimilate-conf results in config entries which can not be removed
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:29 AM Bug #43404 (Resolved): mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:28 AM Bug #43552 (Resolved): nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_random_osd
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:28 AM Bug #43892 (Resolved): mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during n->o upg...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:26 AM Backport #43879 (Resolved): nautilus: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33152
m...
Nathan Cutler
09:26 AM Backport #43821 (Resolved): nautilus: nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32908
m...
Nathan Cutler
09:25 AM Backport #43916 (Resolved): nautilus: mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) ...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33155
m...
Nathan Cutler
09:25 AM Backport #43989 (Resolved): nautilus: osd: Allow 64-char hostname to be added as the "host" in CRUSH
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33147
m...
Nathan Cutler
09:24 AM Backport #43928 (Resolved): nautilus: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/33007
m...
Nathan Cutler
09:23 AM Backport #43731 (Resolved): nautilus: mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32905
m...
Nathan Cutler
09:23 AM Backport #43822 (Resolved): nautilus: Ceph assimilate-conf results in config entries which can no...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32856
m...
Nathan Cutler
06:16 AM Bug #44120: NVMEDevice failed in certain NVMe Disk
I tested that my NVME card can create at most 6 pair queues. Jun Su
02:48 AM Bug #43861: ceph_test_rados_watch_notify hang
Almost certainly the same issue as #44062 Brad Hubbard

02/13/2020

11:19 PM Bug #43124 (Resolved): Probably legal crush rules cause upmaps to be cleaned
David Zafman
11:05 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
Reproduced and I see #43808 while doing so so I'm going to treat them as related for now at least.
I think we can ...
Brad Hubbard
02:53 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
I can't reproduce this so far. If anyone can reproduce it reliably maybe we could try increasing the notify timeout i... Brad Hubbard
02:22 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
Ah, that's right, from memory these Warnings are related to valgrind. Valgrind is also notorious for slowing things d... Brad Hubbard
02:02 AM Bug #44062: LibRadosWatchNotify.WatchNotify failure
When trying to reproduce I am seeing a *lot* of these which may, or may not, be related.... Brad Hubbard
10:48 PM Feature #44131 (New): Add AAAA DNS record for drop.ceph.com
drop.ceph.com is only reachable through IPv4 because of a lack of a IPv6 DNS record (AAAA). For IPv6 only clusters th... Stefan Kooman
08:15 PM Backport #43879: nautilus: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33152
merged
Yuri Weinstein
08:12 PM Backport #43821: nautilus: nautilus: OSDMonitor: SIGFPE in OSDMonitor::share_map_with_random_osd
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32908
merged
Yuri Weinstein
08:09 PM Backport #43916: nautilus: mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during n->o...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33155
merged
Yuri Weinstein
08:08 PM Backport #43989: nautilus: osd: Allow 64-char hostname to be added as the "host" in CRUSH
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33147
merged
Yuri Weinstein
07:35 PM Backport #43928: nautilus: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/33007
merged
Yuri Weinstein
07:30 PM Backport #43731: nautilus: mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32905
merged
Yuri Weinstein
07:29 PM Backport #43822: nautilus: Ceph assimilate-conf results in config entries which can not be removed
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32856
merged
Yuri Weinstein
06:08 PM Feature #44025 (In Progress): Make it harder to set pool replica size to 1
Neha Ojha
03:43 PM Bug #43975: Slow Requests/OP's types not getting logged
The logging to cluster logs was removed as part of re-factoring effort in mimic. Here are the commits of interest:
...
Sridhar Seshasayee
02:21 PM Backport #44085 (In Progress): nautilus: rebuild-mondb doesn't populate mgr commands -> pg dump E...
Nathan Cutler
02:19 PM Bug #44120 (Need More Info): NVMEDevice failed in certain NVMe Disk
I got the error as following:
nvme_ctrlr.c: 308:spdk_nvme_ctrlr_alloc_io_qpair: *ERROR*: No free I/O queue IDs
Th...
Jun Su
02:15 PM Backport #44082 (In Progress): nautilus: expected MON_CLOCK_SKEW but got none
Nathan Cutler
02:09 PM Backport #44081 (In Progress): nautilus: ceph -s does not show >32bit pg states
Nathan Cutler
02:06 PM Backport #43346: nautilus: short pg log + cache tier ceph_test_rados out of order reply
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32848
m...
Nathan Cutler
02:06 PM Backport #43346 (Resolved): nautilus: short pg log + cache tier ceph_test_rados out of order reply
Nathan Cutler
11:54 AM Backport #43346 (In Progress): nautilus: short pg log + cache tier ceph_test_rados out of order r...
Nathan Cutler
02:03 PM Backport #43852 (In Progress): nautilus: osd-scrub-snaps.sh fails
Nathan Cutler
11:31 AM Backport #43997 (In Progress): nautilus: Ceph tools utilizing "global_[pre_]init" no longer proce...
Nathan Cutler
10:27 AM Bug #44089 (Fix Under Review): mon: --format=json does not work for config get or show
Kefu Chai
08:37 AM Bug #44072: Add new Bluestore OSDs to Filestore cluster leads to scrub errors (union_shard_errors...
grep for checking ASCII-only names:... Aleksandr Rudenko
08:35 AM Bug #44072: Add new Bluestore OSDs to Filestore cluster leads to scrub errors (union_shard_errors...
Hi, David
> Do all the objects with missing copies have names that included multi-byte characters?
yes, most of...
Aleksandr Rudenko
12:29 AM Bug #44072: Add new Bluestore OSDs to Filestore cluster leads to scrub errors (union_shard_errors...

Two questions:
Do all the objects with missing copies have names that included multi-byte characters?
Are the...
David Zafman
07:09 AM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
I'd like to reopen this, since there are now reports about crashes on Centos (see possible duplicate linked to this i... Alex Walender
01:43 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
In the failure that sage observed on master, I looked at pg4.7, which is stuck in creating+peering.
osd.10(mimic) ...
Neha Ojha

02/12/2020

11:58 PM Feature #44108 (In Progress): mon: osd: handle 2-(main-)site stretch clusters explicitly, so no a...
People have hacked together stretch clusters on top of Ceph using 3 sites for years, or even using 2 sites and interv... Greg Farnum
11:56 PM Feature #44107 (Fix Under Review): mon: produce stable election results when netsplits and other ...
Greg Farnum
11:53 PM Feature #44107 (Resolved): mon: produce stable election results when netsplits and other errors h...
Right now, in netsplits and similar error conditions the monitors do not produce a stable quorum: whichever monitors ... Greg Farnum
10:42 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
Sure Neha Brad Hubbard
10:16 PM Bug #44062: LibRadosWatchNotify.WatchNotify failure
Brad, can you please take a look at this? Neha Ojha
12:18 AM Bug #44062 (Triaged): LibRadosWatchNotify.WatchNotify failure
/a/sage-2020-02-11_20:49:48-rados-wip-sage-testing-2020-02-11-1121-distro-basic-smithi/4755080 Sage Weil
10:35 PM Bug #44004 (Can't reproduce): "ceph" command crashes
Neha Ojha
10:34 PM Bug #44015: Cant compile src/tools/rados/rados.cc on 32 bit systems
Following is the explanation for why it was done.... Neha Ojha
04:33 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
Runs:
* http://pulpito.ceph.com/rzarzynski_bug43903,
* http://pulpito.ceph.com/rzarzynski_bug43903_more_pgnum_c...
Radoslaw Zarzynski
03:58 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
Saw something similar but on Centos 8: https://tracker.ceph.com/issues/44078.
Marking as related for now, possibly i...
Radoslaw Zarzynski
07:18 AM Feature #44025: Make it harder to set pool replica size to 1
Deepika Upadhyay wrote:
> I assume you are talking about:
>
> > To remove a pool the mon_allow_pool_delete flag ...
Greg Farnum
05:13 AM Feature #44025: Make it harder to set pool replica size to 1
Greg Farnum wrote:
> Pool deletion also requires a config option to be set on the monitor before it's allowed throug...
Deepika Upadhyay
03:44 AM Bug #44092 (Resolved): mon: config commands do not accept whitespace style config name
e.g.... Patrick Donnelly

02/11/2020

10:31 PM Backport #44070 (In Progress): luminous: Add builtin functionality in ceph-kvstore-tool to repair...
https://github.com/ceph/ceph/pull/33195 Prashant D
10:28 AM Backport #44070: luminous: Add builtin functionality in ceph-kvstore-tool to repair corrupted key...
We need backporting of PR 16745 and subsequent PRs. Refer original tracker #17730 for adding support to repair leveld... Prashant D
03:07 AM Backport #44070 (New): luminous: Add builtin functionality in ceph-kvstore-tool to repair corrupt...
We seems to have it in ceph-kvstore-tool as "destructive-repair" option ? Is this option does leveldb/rocksdb repair?... Prashant D
03:01 AM Backport #44070 (Closed): luminous: Add builtin functionality in ceph-kvstore-tool to repair corr...
Prashant D
02:57 AM Backport #44070 (Resolved): luminous: Add builtin functionality in ceph-kvstore-tool to repair co...
In some cases like ceph cluster upgrade or due to filesystem issue, the leveldb/rocksdb gets corrupted which can caus... Prashant D
10:20 PM Bug #44089 (Fix Under Review): mon: --format=json does not work for config get or show
In addition to the json output not working, when giving either these commands a specific key to fetch:... Patrick Donnelly
09:58 PM Bug #38358 (Resolved): short pg log + cache tier ceph_test_rados out of order reply
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:55 PM Feature #41647 (Resolved): pg_autoscaler should show a warning if pg_num isn't a power of two
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:54 PM Bug #42346 (Resolved): Nearfull warnings are incorrect
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:54 PM Bug #42411 (Resolved): nautilus:osd: network numa affinity not supporting subnet port
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:54 PM Bug #42566 (Resolved): mgr commands fail when using non-client auth
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:53 PM Bug #42780 (Resolved): recursive lock of OpTracker::lock (70)
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:53 PM Bug #42961 (Resolved): osd: increase priority in certain OSD perf counters
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:53 PM Backport #44088 (Rejected): mimic: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Nathan Cutler
09:53 PM Backport #44087 (Rejected): luminous: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Nathan Cutler
09:51 PM Backport #44086 (Rejected): mimic: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
Nathan Cutler
09:51 PM Backport #44085 (Resolved): nautilus: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
https://github.com/ceph/ceph/pull/33278 Nathan Cutler
09:51 PM Backport #44084 (Rejected): luminous: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
Nathan Cutler
09:51 PM Bug #43587 (Resolved): mon shutdown timeout (race with async compaction)
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:51 PM Bug #43592 (Resolved): osd-recovery-space.sh has a race
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:51 PM Backport #44083 (Resolved): mimic: expected MON_CLOCK_SKEW but got none
https://github.com/ceph/ceph/pull/34370 Nathan Cutler
09:50 PM Backport #44082 (Resolved): nautilus: expected MON_CLOCK_SKEW but got none
https://github.com/ceph/ceph/pull/33276 Nathan Cutler
09:50 PM Backport #44081 (Resolved): nautilus: ceph -s does not show >32bit pg states
https://github.com/ceph/ceph/pull/33275 Nathan Cutler
09:38 PM Backport #43256 (Resolved): nautilus: monitor config store: Deleting logging config settings does...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32846
m...
Nathan Cutler
09:37 PM Backport #43631 (Resolved): nautilus: segv in collect_sys_info
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32901
m...
Nathan Cutler
09:37 PM Backport #43473 (Resolved): nautilus: recursive lock of OpTracker::lock (70)
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32858
m...
Nathan Cutler
09:36 PM Backport #43245 (Resolved): nautilus: osd: increase priority in certain OSD perf counters
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32845
m...
Nathan Cutler
09:36 PM Backport #43726 (Resolved): nautilus: osd-recovery-space.sh has a race
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32774
m...
Nathan Cutler
04:26 PM Bug #43795: Ceph tools utilizing "global_[pre_]init" no longer process "early" environment options
Backport also requires -https://github.com/ceph/ceph/pull/33213- https://github.com/ceph/ceph/pull/33243 Jason Dillaman
04:03 PM Bug #43795: Ceph tools utilizing "global_[pre_]init" no longer process "early" environment options
There will be a second fix for this issue since CLI optionals are no longer overriding the environment. Jason Dillaman
04:03 PM Backport #43997: nautilus: Ceph tools utilizing "global_[pre_]init" no longer process "early" env...
There will be a second fix for this issue since CLI optionals are no longer overriding the environment. Jason Dillaman
04:02 PM Backport #43996: mimic: Ceph tools utilizing "global_[pre_]init" no longer process "early" enviro...
There will be a second fix for this issue since CLI optionals are no longer overriding the environment. Jason Dillaman
02:28 PM Bug #44067 (Resolved): cephtool/test.sh test fails to scrub all pools
Sage Weil
02:27 PM Bug #44076 (Resolved): mon: update + monmap update triggers spawn loop
- upgrade monitors from mimic to octopus
- quorum of 2/3 monitors
- enable msgr2
then
- third monitor probes...
Sage Weil
09:57 AM Bug #44072 (New): Add new Bluestore OSDs to Filestore cluster leads to scrub errors (union_shard_...
Hi,
I sat severity=Critical for attention grabbing because i think is serious problem!
We have two different Lu...
Aleksandr Rudenko
06:18 AM Bug #43582 (Pending Backport): rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
Kefu Chai
02:57 AM Bug #44050 (Resolved): mon tell command args don't work
Sage Weil
02:34 AM Bug #43885 (Can't reproduce): failed to reach quorum size 9 before timeout expired
This hasn't shown up in master for a while and Sridhar has also not been able to reproduce this, hence reducing prior... Neha Ojha
12:50 AM Bug #44053 (Resolved): test_envlibrados_for_rocksdb.sh fails on master
Kefu Chai
12:50 AM Bug #44053 (Rejected): test_envlibrados_for_rocksdb.sh fails on master
Kefu Chai
12:49 AM Bug #43833 (Resolved): shaman on bionic/cromson: cmake error: undefined reference to `pthread_cre...
the error message is misleading. the root cause is... Kefu Chai

02/10/2020

11:14 PM Bug #43889 (Pending Backport): expected MON_CLOCK_SKEW but got none
Sage Weil
02:41 PM Bug #43889 (Fix Under Review): expected MON_CLOCK_SKEW but got none
Sage Weil
09:37 PM Backport #43256: nautilus: monitor config store: Deleting logging config settings does not decrea...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32846
merged
Yuri Weinstein
08:44 PM Backport #43631: nautilus: segv in collect_sys_info
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32901
merged
Yuri Weinstein
08:43 PM Backport #43473: nautilus: recursive lock of OpTracker::lock (70)
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32858
merged
Yuri Weinstein
08:41 PM Backport #43245: nautilus: osd: increase priority in certain OSD perf counters
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32845
merged
Yuri Weinstein
08:38 PM Backport #43726: nautilus: osd-recovery-space.sh has a race
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32774
merged
Yuri Weinstein
06:50 PM Feature #44025: Make it harder to set pool replica size to 1
Pool deletion also requires a config option to be set on the monitor before it's allowed through.
I think we should ...
Greg Farnum
05:27 PM Bug #44067 (Fix Under Review): cephtool/test.sh test fails to scrub all pools
Sage Weil
05:14 PM Bug #44067 (Resolved): cephtool/test.sh test fails to scrub all pools
... Sage Weil
02:55 PM Bug #44052 (Pending Backport): ceph -s does not show >32bit pg states
Sage Weil
02:42 PM Bug #44062 (Resolved): LibRadosWatchNotify.WatchNotify failure
... Sage Weil
02:37 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
/a/sage-2020-02-09_21:18:03-rados-wip-sage2-testing-2020-02-09-1152-distro-basic-smithi/4749175... Sage Weil
10:37 AM Backport #42120 (Resolved): nautilus: pg_autoscaler should show a warning if pg_num isn't a power...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30689
m...
Nathan Cutler
10:37 AM Backport #43471 (Resolved): nautilus: negative num_objects can set PG_STATE_DEGRADED
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32857
m...
Nathan Cutler
10:37 AM Backport #43346 (Resolved): nautilus: short pg log + cache tier ceph_test_rados out of order reply
Nathan Cutler
10:36 AM Backport #43319 (Resolved): nautilus: PeeringState::GoClean will call purge_strays unconditionally
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32847
m...
Nathan Cutler
10:36 AM Backport #43099 (Resolved): nautilus: nautilus:osd: network numa affinity not supporting subnet port
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32843
m...
Nathan Cutler
10:36 AM Backport #43246 (Resolved): nautilus: Nearfull warnings are incorrect
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32773
m...
Nathan Cutler
10:00 AM Backport #43650 (Resolved): nautilus: Improve upmap change reporting in logs
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32716
m...
Nathan Cutler
10:00 AM Backport #43620 (Resolved): nautilus: mon shutdown timeout (race with async compaction)
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32715
m...
Nathan Cutler
09:59 AM Backport #43783 (Resolved): nautilus: mgr commands fail when using non-client auth
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32769
m...
Nathan Cutler
09:41 AM Bug #43582 (Fix Under Review): rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
Kefu Chai
05:19 AM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Nathan Cutler wrote:
> But if the issue was introduced in 2008, then we'd need to backport further than nautilus...
...
Greg Farnum

02/09/2020

05:40 PM Bug #43889 (In Progress): expected MON_CLOCK_SKEW but got none
Sage Weil
01:31 PM Bug #44053 (Fix Under Review): test_envlibrados_for_rocksdb.sh fails on master
Kefu Chai
01:30 PM Bug #44053 (Resolved): test_envlibrados_for_rocksdb.sh fails on master
see https://github.com/ceph/ceph/commit/c724369010a753bd44e11a534d1f42156c4fc12d
should be fixed by https://github...
Kefu Chai
12:45 AM Bug #42328 (Fix Under Review): osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
Sage Weil
12:25 AM Bug #43903 (In Progress): osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
Sage Weil

02/08/2020

09:55 PM Backport #43919 (In Progress): nautilus: osd stuck down
Shyukri Shyukriev
09:47 PM Backport #43916 (In Progress): nautilus: mon/PaxosService.cc: 188: FAILED ceph_assert(have_pendin...
Shyukri Shyukriev
09:43 PM Backport #43881 (In Progress): mimic: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Shyukri Shyukriev
09:42 PM Backport #43880 (In Progress): luminous: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Shyukri Shyukriev
09:41 PM Backport #43879 (In Progress): nautilus: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Shyukri Shyukriev
09:12 PM Backport #43989 (In Progress): nautilus: osd: Allow 64-char hostname to be added as the "host" in...
Shyukri Shyukriev
09:11 PM Backport #43988 (In Progress): luminous: osd: Allow 64-char hostname to be added as the "host" in...
Shyukri Shyukriev
09:10 PM Backport #43987 (In Progress): mimic: osd: Allow 64-char hostname to be added as the "host" in CRUSH
Shyukri Shyukriev
09:08 PM Backport #43992 (In Progress): nautilus: objecter doesn't send osd_op
Shyukri Shyukriev
09:05 PM Backport #43991 (In Progress): mimic: objecter doesn't send osd_op
Shyukri Shyukriev
06:11 PM Bug #44052 (Fix Under Review): ceph -s does not show >32bit pg states
Sage Weil
06:07 PM Bug #44052 (Resolved): ceph -s does not show >32bit pg states
ceph -s does not show newer pg states, like repair_failed Sage Weil
03:26 PM Bug #44050 (Fix Under Review): mon tell command args don't work
Sage Weil
02:37 PM Bug #44050: mon tell command args don't work
'ceph tell mon.a help' works, but '-h' does not. Sage Weil
02:07 PM Bug #44050 (Resolved): mon tell command args don't work
Also, 'ceph tell mon.a force-sync --yes-i-really-mean-it' seems to be broken:... Sage Weil
02:11 PM Feature #42638 (Resolved): Allow specifying pg_autoscale_mode when creating a new pool
Sage Weil
01:53 PM Bug #43889: expected MON_CLOCK_SKEW but got none
/a/sage-2020-02-07_23:51:30-rados-wip-sage2-testing-2020-02-07-1439-distro-basic-smithi/4742672 Sage Weil
01:34 PM Bug #44024 (Resolved): change in utime_t rendering ('T' separator) conflicts with cache tiering h...
Sage Weil
08:18 AM Bug #43885: failed to reach quorum size 9 before timeout expired
Since I could not reproduce the issue, I analyzed logs from the original run:
/a/sage-2020-01-29_20:14:58-rados-wip-...
Sridhar Seshasayee
01:13 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
David Zafman wrote:
> For all log entries of all OSDs at 2020-01-28 03:18 with pg[ information and osd primary these...
Neha Ojha
12:27 AM Backport #42120: nautilus: pg_autoscaler should show a warning if pg_num isn't a power of two
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30689
merged
Yuri Weinstein

02/07/2020

10:31 PM Backport #43471: nautilus: negative num_objects can set PG_STATE_DEGRADED
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32857
merged
Yuri Weinstein
10:31 PM Backport #43346: nautilus: short pg log + cache tier ceph_test_rados out of order reply
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32848
merged
Yuri Weinstein
10:30 PM Backport #43319: nautilus: PeeringState::GoClean will call purge_strays unconditionally
Nathan Cutler wrote:Reviewed-by: Neha Ojha <nojha@redhat.com>
> https://github.com/ceph/ceph/pull/32847
merged
Yuri Weinstein
10:29 PM Backport #43099: nautilus: nautilus:osd: network numa affinity not supporting subnet port
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32843
merged
Yuri Weinstein
10:29 PM Backport #43246: nautilus: Nearfull warnings are incorrect
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32773
merged
Yuri Weinstein
10:11 PM Backport #43650: nautilus: Improve upmap change reporting in logs
David Zafman wrote:
> https://github.com/ceph/ceph/pull/32716
merged
Yuri Weinstein
10:09 PM Backport #43620: nautilus: mon shutdown timeout (race with async compaction)
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32715
merged
Yuri Weinstein
10:03 PM Backport #43783: nautilus: mgr commands fail when using non-client auth
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32769
merged
Yuri Weinstein
08:34 PM Bug #44022: mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd crash
For whatever reason we do not have complete osd logs for this, but from nojha-2020-02-06_01:27:32-upgrade:mimic-x:str... Neha Ojha
05:28 PM Bug #44022: mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd crash
... Neha Ojha
04:34 PM Bug #44041 (Fix Under Review): osd: MLease in stray state -> Crashed
Sage Weil
04:03 PM Bug #44041 (Resolved): osd: MLease in stray state -> Crashed
... Sage Weil

02/06/2020

11:55 PM Feature #44025: Make it harder to set pool replica size to 1
Neha Ojha wrote:
> Setting pool size to 1 is dangerous. Add an option like yes_i_really_really_mean_it, similar to w...
Neha Ojha
11:50 PM Feature #44025 (Resolved): Make it harder to set pool replica size to 1
Setting pool size to 1 is dangerous. Add an option like yes_i_really_really_mean_it, similar to what we have for pool... Neha Ojha
11:53 PM Bug #44024 (Fix Under Review): change in utime_t rendering ('T' separator) conflicts with cache t...
Sage Weil
11:26 PM Bug #44024 (Resolved): change in utime_t rendering ('T' separator) conflicts with cache tiering h...
crash like... Sage Weil
06:15 PM Bug #44022 (Resolved): mimic: Receiving MLogRec in Started/Primary/Peering/GetInfo causes an osd ...
The crash happens on a mimic OSD. Telemetry crash reports have been reporting similar crashes in 14.2.4(may or may no... Neha Ojha
12:50 PM Bug #44015 (New): Cant compile src/tools/rados/rados.cc on 32 bit systems
On my machine size_t is unsigned int. This causes an overflow in src/tools/rados/rados.cc:776: max_obj_len = 5ull * 1... Stefan Bischoff
04:48 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired

For all log entries of all OSDs at 2020-01-28 03:18 with pg[ information and osd primary these are the log lines th...
David Zafman
03:26 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
From mgr.x's log after the last time pg_stats are received we see ... Neha Ojha
03:42 AM Bug #44004: "ceph" command crashes
not reproducible in my testbed. Kefu Chai
03:36 AM Bug #44004: "ceph" command crashes
Sometimes the "ceph" command fails with a segmentation fault, here is the core_backtrace. It seems that it has someth... Xuehan Xu
02:59 AM Bug #44004 (Can't reproduce): "ceph" command crashes
On the most recent master, after building, I ran the command "./bin/ceph -s --connect-timeout 1 -c /home/xuxuehan/cep... Xuehan Xu
01:37 AM Feature #42638 (Fix Under Review): Allow specifying pg_autoscale_mode when creating a new pool
Neha Ojha

02/05/2020

10:20 PM Bug #43893: lingering osd_failure ops (due to failure_info holding references?)
Hmm that prepare_failure() does look like it's behaving a little differently than some of the regular op flow; we mus... Greg Farnum
09:27 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
Logs: /a/dzafman-2020-01-27_22:00:09-upgrade:mimic-x-master-distro-basic-smithi/4712686
There is more than one bu...
Neha Ojha
07:43 PM Backport #43997 (Resolved): nautilus: Ceph tools utilizing "global_[pre_]init" no longer process ...
https://github.com/ceph/ceph/pull/33261 Nathan Cutler
07:43 PM Backport #43996 (Rejected): mimic: Ceph tools utilizing "global_[pre_]init" no longer process "ea...
Nathan Cutler
07:42 PM Backport #43992 (Rejected): nautilus: objecter doesn't send osd_op
Nathan Cutler
07:42 PM Backport #43991 (Rejected): mimic: objecter doesn't send osd_op
Nathan Cutler
07:42 PM Backport #43989 (Resolved): nautilus: osd: Allow 64-char hostname to be added as the "host" in CRUSH
https://github.com/ceph/ceph/pull/33147 Nathan Cutler
07:42 PM Backport #43988 (Rejected): luminous: osd: Allow 64-char hostname to be added as the "host" in CRUSH
https://github.com/ceph/ceph/pull/33146 Nathan Cutler
07:42 PM Backport #43987 (Resolved): mimic: osd: Allow 64-char hostname to be added as the "host" in CRUSH
https://github.com/ceph/ceph/pull/33145 Nathan Cutler
05:39 PM Bug #42347 (Won't Fix): nautilus assert during osd shutdown: FAILED ceph_assert((sharded_in_fligh...
we've backported the osd fast shutdown ( https://github.com/ceph/ceph/pull/32743 ), so this will effectively go away ... Sage Weil
01:39 PM Bug #43975: Slow Requests/OP's types not getting logged
- Types - src/osd/OpRequest.h... Vikhyat Umrao
12:54 PM Bug #43975 (Resolved): Slow Requests/OP's types not getting logged
- From ceph.log... Vikhyat Umrao

02/04/2020

11:46 AM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
-The problem is not only about heap corruption. Stacks are affected as well. Moreover, there is an interesting corrup... Radoslaw Zarzynski
03:28 AM Bug #43813 (Pending Backport): objecter doesn't send osd_op
Sage Weil

02/03/2020

09:49 PM Bug #43954 (New): Issue health warning or error if MON or OSD daemons are holding onto excessive ...
Brad Hubbard
08:35 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
`Thread 63 (Thread 0x7f2e36318700 (LWP 55988))` is poisoned as well.... Radoslaw Zarzynski
08:00 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
It looks that a freshly heap-allocated `OSDMap` instance got corrupted:... Radoslaw Zarzynski
02:35 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
It looks the entire `PGTempMap::data` has been corrupted:... Radoslaw Zarzynski
01:12 PM Bug #43948 (New): Remapped PGs are sometimes not deleted from previous OSDs
I noticed on several clusters (all Nautilus 14.2.6) that on occasion, some OSDs may still hold data for some PGs long... Eric Petit

02/01/2020

04:53 PM Bug #43861: ceph_test_rados_watch_notify hang
same?
/a/sage-2020-02-01_03:27:35-rados-wip-sage-testing-2020-01-31-1746-distro-basic-smithi/4723146
ceph_test_wa...
Sage Weil

01/31/2020

11:31 PM Bug #43795 (Pending Backport): Ceph tools utilizing "global_[pre_]init" no longer process "early"...
Sage Weil
11:09 PM Bug #43185: ceph -s not showing client activity
Can you grab a wallclock profiler dump from the mgr process when its usage goes to 100%?
Learn more about how to use...
Neha Ojha
06:41 AM Bug #43185: ceph -s not showing client activity
strace for the hanging mgr thread... Anonymous
06:37 AM Bug #43185: ceph -s not showing client activity
There's almost no load apart from scrubbing, like this is pretty average io:
client: 20 MiB/s rd, 61 MiB/s w...
Anonymous
10:34 PM Bug #43365 (Closed): Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signe...
FWIW teh two clusters reporting this crash via telemetry are both ubuntu 18.04
closing this as not a ceph issue; l...
Sage Weil
06:02 PM Bug #43813 (Fix Under Review): objecter doesn't send osd_op
Sage Weil
03:50 AM Bug #43813 (In Progress): objecter doesn't send osd_op
Sage Weil
03:46 AM Bug #43813: objecter doesn't send osd_op
/a/sage-2020-01-30_22:27:29-rados-wip-sage-testing-2020-01-30-1230-distro-basic-smithi/4719487... Sage Weil
05:24 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
Only happens when upgrading from mimic to nautilus, see https://tracker.ceph.com/issues/43048#note-7. Neha Ojha
12:23 PM Bug #43885: failed to reach quorum size 9 before timeout expired
Update: Tried running the test a few times but haven't been able to reproduce it. I will continue my attempts. In the... Sridhar Seshasayee
06:37 AM Bug #43885: failed to reach quorum size 9 before timeout expired
There does not appear to be a crash in this case, but there is an election that seems to take a long time followed by... Brad Hubbard
10:24 AM Bug #43929 (Pending Backport): osd: Allow 64-char hostname to be added as the "host" in CRUSH
Kefu Chai
10:16 AM Bug #43929 (Resolved): osd: Allow 64-char hostname to be added as the "host" in CRUSH
On Linux system it is possible to set 64 character length hostname when
HOST_NAME_MAX is set to 64. It means that if...
Michal Skalski
09:46 AM Backport #43928 (In Progress): nautilus: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Nathan Cutler
09:43 AM Backport #43928 (Resolved): nautilus: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
https://github.com/ceph/ceph/pull/33007 Nathan Cutler
09:43 AM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
But if the issue was introduced in 2008, then we'd need to backport further than nautilus... Nathan Cutler
09:42 AM Bug #42977 (Pending Backport): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Adding nautilus backport per Greg's comment "looking at the nautilus code it is susceptible to this too." Nathan Cutler
03:56 AM Bug #42977 (Resolved): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Sage Weil
01:33 AM Bug #37875: osdmaps aren't being cleaned up automatically on healthy cluster
https://github.com/ceph/ceph/pull/19076 is a possible solution to this issue. Brad Hubbard

01/30/2020

11:03 PM Bug #43602 (Won't Fix): Core dumps not collected in standalone tests for distros using systemd-co...
The real fix done elsewhere is to configure the core location so that systemd-coredump is not used. It isn't worth t... David Zafman
04:43 PM Bug #43602 (Fix Under Review): Core dumps not collected in standalone tests for distros using sys...
Sage Weil
04:43 PM Bug #43602 (Resolved): Core dumps not collected in standalone tests for distros using systemd-cor...
Sage Weil
08:30 PM Backport #43651 (Resolved): luminous: Improve upmap change reporting in logs
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32666
m...
Nathan Cutler
08:29 PM Backport #43651 (In Progress): luminous: Improve upmap change reporting in logs
Nathan Cutler
07:40 PM Backport #43919 (Resolved): nautilus: osd stuck down
https://github.com/ceph/ceph/pull/35024 Nathan Cutler
07:39 PM Backport #43916 (Resolved): nautilus: mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) ...
https://github.com/ceph/ceph/pull/33155 Nathan Cutler
04:46 PM Bug #43864 (Resolved): osd/repro_long_log.sh failure
Should have been fixed by https://github.com/ceph/ceph/pull/32945. Neha Ojha
04:17 PM Bug #43864: osd/repro_long_log.sh failure
/a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718221 Sage Weil
04:41 PM Bug #43889: expected MON_CLOCK_SKEW but got none
/a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718332
Sage Weil
04:16 PM Bug #43889: expected MON_CLOCK_SKEW but got none
/a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718133 Sage Weil
04:40 PM Bug #43915 (New): leaked Session (alloc from OSD::ms_handle_authentication)
... Sage Weil
04:37 PM Bug #43914 (Need More Info): nautilus: ceph tell command times out
see https://github.com/ceph/ceph/pull/32989 Sage Weil
04:35 PM Bug #43914 (Resolved): nautilus: ceph tell command times out
... Sage Weil
04:17 PM Bug #43885: failed to reach quorum size 9 before timeout expired
/a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718154
description: rados/...
Sage Weil
03:19 PM Feature #43910 (New): Utilize new Linux kernel v5.6 prctl PR_SET_IO_FLUSHER option
See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d19f1c8e1937baf74e1962aae9f90fa3ae... Jason Dillaman
02:51 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
the second time,... Sage Weil
02:50 PM Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
if i start the osd manually, i can reproduce the same crash:... Sage Weil
02:48 PM Bug #43903 (Resolved): osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)
... Sage Weil
12:37 PM Bug #42977 (Fix Under Review): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Hmm this not-returning issue seems to date from 2008 (3859475bbfafb8754841af41044cb41124e87fc7); I'm not sure why it'... Greg Farnum
10:42 AM Bug #42977 (In Progress): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Yep looks like something went horribly wrong in refactoring — we correctly call the new election on receiving an old ... Greg Farnum
09:57 AM Documentation #43896 (Resolved): nautilus upgrade should recommend ceph-osd restarts after enabli...
Following an upgrade to nautilus and `ceph mon enable-msgr2`, running nautilus osds will not yet bind to their v2 add... Dan van der Ster
07:22 AM Bug #43893: lingering osd_failure ops (due to failure_info holding references?)
> We can clear that slow op either by restarting mon.cepherin-mon-7cb9b591e1 or with `ceph osd fail osd.170`.
too ...
Dan van der Ster
07:21 AM Bug #43893 (Duplicate): lingering osd_failure ops (due to failure_info holding references?)
On Nautilus v14.2.6 we see osd_failure ops which linger:... Dan van der Ster
04:11 AM Bug #43892 (Pending Backport): mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during ...
Sage Weil

01/29/2020

11:18 PM Bug #43892 (Fix Under Review): mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during ...
Sage Weil
11:15 PM Bug #43892 (Resolved): mon/PaxosService.cc: 188: FAILED ceph_assert(have_pending) during n->o upg...
... Sage Weil
10:07 PM Bug #43885: failed to reach quorum size 9 before timeout expired
I wonder if this is somehow related to the election issue we saw in https://tracker.ceph.com/issues/42977. Seems to b... Neha Ojha
01:14 PM Bug #43885 (Can't reproduce): failed to reach quorum size 9 before timeout expired
This pops up occasionally. Here is a recent one:... Sage Weil
09:15 PM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
I think defer() is called by mon.e in receive_propose() because of the following... Neha Ojha
07:39 PM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
on mon.g (3), the epoch is 55 (or looks that way, it just sent these):... Sage Weil
12:41 AM Bug #42977: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
Let's see what happened in /a/sage-2020-01-24_01:55:08-rados-wip-sage4-testing-2020-01-23-1347-distro-basic-smithi/46... Neha Ojha
07:19 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired

I'm seeing a lot of this in a sample of log segments from osd.6 which is reporting the slow ops. The log for osd.6...
David Zafman
03:55 PM Bug #43882 (Need More Info): osd to mon connection lost, osd stuck down
adding debug: https://github.com/ceph/ceph/pull/32968 Sage Weil
01:06 PM Bug #43882 (Can't reproduce): osd to mon connection lost, osd stuck down
This is a similar symptom to #43825, but it does not appear to be related to split/merge.
OSD is marked down, but ...
Sage Weil
01:45 PM Bug #43889 (Resolved): expected MON_CLOCK_SKEW but got none
description: rados/multimon/{clusters/6.yaml msgr-failures/many.yaml msgr/async.yaml
no_pools.yaml objectstore...
Sage Weil
01:44 PM Bug #43888: osd/osd-bench.sh 'tell osd.N bench' hang
https://github.com/ceph/ceph/pull/32961 to debug Sage Weil
01:41 PM Bug #43888 (Resolved): osd/osd-bench.sh 'tell osd.N bench' hang
... Sage Weil
01:36 PM Bug #43887 (Resolved): ceph_test_rados_delete_pools_parallel failure
... Sage Weil
01:23 PM Bug #43825 (Pending Backport): osd stuck down
Sage Weil
10:03 AM Backport #43881 (Resolved): mimic: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
https://github.com/ceph/ceph/pull/33154 Nathan Cutler
10:03 AM Backport #43880 (Rejected): luminous: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
https://github.com/ceph/ceph/pull/33153 Nathan Cutler
10:03 AM Backport #43879 (Resolved): nautilus: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
https://github.com/ceph/ceph/pull/33152 Nathan Cutler

01/28/2020

11:22 PM Bug #43864 (In Progress): osd/repro_long_log.sh failure
David Zafman
08:03 PM Bug #43864 (Resolved): osd/repro_long_log.sh failure
... Sage Weil
08:44 PM Bug #43865: osd-scrub-test.sh fails date check
This looks like a case where the sleep time wasn't sufficient. The previous run had set 2 days and the next test swi... David Zafman
08:07 PM Bug #43865 (Resolved): osd-scrub-test.sh fails date check
... Sage Weil
08:08 PM Bug #38345 (Pending Backport): mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Sage Weil
08:07 PM Bug #43826 (Resolved): osd: leak of from send_lease
Sage Weil
07:59 PM Bug #43862 (Can't reproduce): mkfs fsck found fatal error: (2) No such file or directory during c...
... Sage Weil
07:45 PM Bug #43861: ceph_test_rados_watch_notify hang
/a/sage-2020-01-28_03:52:05-rados-wip-sage2-testing-2020-01-27-1839-distro-basic-smithi/4713217 Sage Weil
07:43 PM Bug #43861 (Resolved): ceph_test_rados_watch_notify hang
... Sage Weil
07:34 PM Bug #43825 (Fix Under Review): osd stuck down
Sage Weil
07:27 PM Bug #43825 (In Progress): osd stuck down
we are splitting:... Sage Weil
06:59 PM Bug #43825: osd stuck down
2020-01-28T14:56:26.155+0000 7fd3ba08d700 20 osd.6 285 identify_splits_and_merges 1.5 e245 to e285 pg_nums {76=28,89=... Sage Weil
06:39 PM Bug #43825: osd stuck down
... Sage Weil
07:24 PM Bug #43185: ceph -s not showing client activity
Are you observing any client activity in the cluster logs when "ceph -s" isn't reporting them?
It is sometimes poss...
Neha Ojha
06:27 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired

The master branch passed, but my nautilus run hit the same issue:
http://pulpito.ceph.com/dzafman-2020-01-27_21:...
David Zafman
10:42 AM Backport #43852 (Resolved): nautilus: osd-scrub-snaps.sh fails
https://github.com/ceph/ceph/pull/33274 Nathan Cutler
09:40 AM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
Just an update on my side:
After upgrading our monitor Ubuntu 18.04 packages (apt-get upgrade) with the 5.3.0-26-g...
Alex Walender
 

Also available in: Atom